Re: [R] R code helps needed!
Hi Jim, I added more codes besides your original ones. I bet there should be simpler way(s) to do this but this is the best I can think of. Any feedback from you and others will be highly appreciated. Thanks a lot! Steve result<-read.table(text= "intercept decision expected.decision 1 reject reject 2 reject reject 3 reject reject 0 pass pass 3 reject skip 0 pass skip 3 reject skip 5 reject skip 0 pass skip 0 pass pass 3 reject skip 1 reject skip 0 pass skip 0 pass skip 2 reject skip 1 reject reject 0 pass pass 3 reject skip 0 pass skip 2 reject skip 0 pass skip 1 reject skip 2 reject reject 2 reject reject ", header=TRUE,stringsAsFactors=FALSE) int <- result$intercept int # [1] 1 2 3 0 3 0 3 5 0 0 3 1 0 0 2 1 0 3 0 2 0 1 2 2 pass.theo <- which(int==0) pass.theo #[1] 4 6 9 10 13 14 17 19 21 lv1 <- int==0 lv1 # [1] FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE TRUE TRUE FALSE FALSE #[13] TRUE TRUE FALSE FALSE TRUE FALSE TRUE FALSE TRUE FALSE FALSE FALSE pass.1st <- min(which(lv1==TRUE)) pass.1st #[1] 4 m <- c(0:100) interval <- 6*m + pass.1st interval # [1] 4 10 16 22 28 34 40 46 52 58 64 70 76 82 88 94 100 106 #[19] 112 118 124 130 136 142 148 154 160 166 172 178 184 190 196 202 208 214 #[37] 220 226 232 238 244 250 256 262 268 274 280 286 292 298 304 310 316 322 #[55] 328 334 340 346 352 358 364 370 376 382 388 394 400 406 412 418 424 430 #[73] 436 442 448 454 460 466 472 478 484 490 496 502 508 514 520 526 532 538 #[91] 544 550 556 562 568 574 580 586 592 598 604 interval2 <- c(interval[interval<=length(int)], length(int)) interval2 #[1] 4 10 16 22 24 pass.theo #[1] 4 6 9 10 13 14 17 19 21 res <- as.list(NULL) > for(i in 1:(length(interval2)-1)){ res[[i]] <- min(pass.theo[pass.theo >= interval2[i] & pass.theo < interval2[i+1]]) res } #Warning message: #In min(pass.theo[pass.theo >= interval2[i] & pass.theo < interval2[i + : # no non-missing arguments to min; returning Inf res #[[1]] #[1] 4 #[[2]] #[1] 10 #[[3]] #[1] 17 #[[4]] #[1] Inf res <- unlist(res) passes <- res[is.finite(res)] passes #[1] 4 10 17 skips<-as.vector(sapply(passes,function(x) return(x+1:5))) skips2 <- skips[skips<=length(int)] new.decision <- result$decision new.decision[skips2] <- 'skip' new.decision # [1] "reject" "reject" "reject" "pass" "skip" "skip" "skip" "skip" #[9] "skip" "pass" "skip" "skip" "skip" "skip" "skip" "reject" #[17] "pass" "skip" "skip" "skip" "skip" "skip" "reject" "reject" cbind(result, new.decision) # intercept decision expected.decision new.decision #1 1 rejectreject reject #2 2 rejectreject reject #3 3 rejectreject reject #4 0 pass pass pass #5 3 reject skip skip #6 0 pass skip skip #7 3 reject skip skip #8 5 reject skip skip #9 0 pass skip skip #10 0 pass pass pass #11 3 reject skip skip #12 1 reject skip skip #13 0 pass skip skip #14 0 pass skip skip #15 2 reject skip skip #16 1 rejectreject reject #17 0 pass pass pass #18 3 reject skip skip #19 0 pass skip skip #20 2 reject skip skip #21 0 pass skip skip #22 1 reject skip skip #23 2 rejectreject reject #24 2 rejectreject reject On Fri, Mar 3, 2017 at 8:00 AM, SH <empti...@gmail.com> wrote: > Hi Jim, > > Thank you very much for replying back. > > I think the data I presented have not many 'pass' than I thought. The > purpose of the code is to skip sampling for 5 consecutive rows when a > previous row is found as 'pass'. Thus, because the fourth row is > 'pass', sampling will be skipped next five rows (i.e., from 5th to 9th > rows). Therefore any 'pass' within next 5 rows after first 'pass' should > not affect 'skip'. Could you try this? Based on your code, I > guess 'return' function may be one I should search. I haven't used it > before so I am not familiar with the function. I made a new data set with > 'expected.decision' column. In the data set, once a 'pass' is found, the > next sampling starts 5 rows after. For example, since the forth row is > 'pas
Re: [R] R code helps needed!
Hi Jim, Thank you very much for replying back. I think the data I presented have not many 'pass' than I thought. The purpose of the code is to skip sampling for 5 consecutive rows when a previous row is found as 'pass'. Thus, because the fourth row is 'pass', sampling will be skipped next five rows (i.e., from 5th to 9th rows). Therefore any 'pass' within next 5 rows after first 'pass' should not affect 'skip'. Could you try this? Based on your code, I guess 'return' function may be one I should search. I haven't used it before so I am not familiar with the function. I made a new data set with 'expected.decision' column. In the data set, once a 'pass' is found, the next sampling starts 5 rows after. For example, since the forth row is 'pass', the next sampling starts at 10th row. Although 6th row should be 'pass', I want to label them as 'skip' since no sampling is made. The objective of the study is to investigate how many of 'reject' rows get 'skip' with a given sampling scheme, the rate of 'pass' because of skip sampling which should be 'reject'. Could you also try this data and give me your feedback? Thanks again for you helps!!! Steve result<-read.table(text= "intercept decision expected.decision 1 reject reject 2 reject reject 3 reject reject 0 pass pass 3 reject skip 0 pass skip 3 reject skip 5 reject skip 0 pass skip 0 pass pass 3 reject skip 1 reject skip 0 pass skip 0 pass skip 2 reject skip 1 reject reject 0 pass pass 3 reject skip 0 pass skip 2 reject skip 0 pass skip 1 reject skip 2 reject reject 2 reject reject ", header=TRUE,stringsAsFactors=FALSE) passes<-which(result$intercept == 0) skips<-as.vector(sapply(passes,function(x) return(x+1:5))) result$decision[skips]<-"skip" result On Thu, Mar 2, 2017 at 5:42 PM, Jim Lemon <drjimle...@gmail.com> wrote: > Hi Steve, > Try this: > > result<-read.table(text= >"intercept decision > 1 reject > 2 reject > 3 reject > 0 pass > 3 reject > 2 reject > 3 reject > 5 reject > 3 reject > 1 reject > 1 reject > 2 reject > 2 reject > 0 pass > 3 reject > 3 reject > 2 reject > 2 reject > 1 reject > 1 reject > 2 reject > 2 reject", > header=TRUE,stringsAsFactors=FALSE) > passes<-which(result$intercept == 0) > skips<-as.vector(sapply(passes,function(x) return(x+1:5))) > result$decision[skips]<-"skip" > > Note that result$decision must be a character variable for this to > work.If it is a factor, convert it to character. > > Jim > > > On Thu, Mar 2, 2017 at 11:54 PM, SH <empti...@gmail.com> wrote: > > Hi > > > > Although I posted this in stackoverflow yesterday, I am asking here to > get > > helps as soon as quickly. > > > > I need help make code for mocking sampling environment. Here is my code > > below: > > > > First, I generated mock units with 1000 groups of 100 units. Each row is > > considered as independent sample space. > > > > unit <- 100 # Total units > > bad.unit.rate <- .05 # Proportion of bad units > > bad.unit.num <- ceiling(unit*bad.unit.rate) # Bad units > > n.sim=1000 > > unit.group <- matrix(0, nrow=n.sim, ncol=unit)for(i in 1:n.sim){ > > unit.group[i, ] <- sample(rep(0:1, c(unit-bad.unit.num, > bad.unit.num)))} > > dim(unit.group) > > > > It gives 1000 by 100 groups > > > > ss <- 44 # Selected sample size > > > > 44 out of 100 units will be selected and decision (pass or reject) will > be > > made based on sampling. > > > > This below is decision code: > > > > intercept <- rep(0, nrow(unit.group)) > > decision <- rep(0, nrow(unit.group)) > > set.seed(2017)for(i in 1:nrow(unit.group)){ > > selected.unit <- sample(1:unit, ss) > > intercept[i] <- sum(unit.group[i,][selected.unit]) > > decision[i] <- ifelse(intercept[i]==0, 'pass', 'reject') > > result <- cbind(intercept, decision) > > result} > > dim(result) > > head(result, 30) > > > >> head(result, 30) > > intercept decision > > [1,] "1" "reject" > > [2,] "2" "reject" > > [3,] "3" "reject" > > [4,] "0" "pass" > > [5,] "3" "reject" > > [6,] "2" "reject" > > [7,] "3" "reject" > > [8,] "5" "reject" > > [9,] "3&q
[R] R code helps needed!
Hi Although I posted this in stackoverflow yesterday, I am asking here to get helps as soon as quickly. I need help make code for mocking sampling environment. Here is my code below: First, I generated mock units with 1000 groups of 100 units. Each row is considered as independent sample space. unit <- 100 # Total units bad.unit.rate <- .05 # Proportion of bad units bad.unit.num <- ceiling(unit*bad.unit.rate) # Bad units n.sim=1000 unit.group <- matrix(0, nrow=n.sim, ncol=unit)for(i in 1:n.sim){ unit.group[i, ] <- sample(rep(0:1, c(unit-bad.unit.num, bad.unit.num)))} dim(unit.group) It gives 1000 by 100 groups ss <- 44 # Selected sample size 44 out of 100 units will be selected and decision (pass or reject) will be made based on sampling. This below is decision code: intercept <- rep(0, nrow(unit.group)) decision <- rep(0, nrow(unit.group)) set.seed(2017)for(i in 1:nrow(unit.group)){ selected.unit <- sample(1:unit, ss) intercept[i] <- sum(unit.group[i,][selected.unit]) decision[i] <- ifelse(intercept[i]==0, 'pass', 'reject') result <- cbind(intercept, decision) result} dim(result) head(result, 30) > head(result, 30) intercept decision [1,] "1" "reject" [2,] "2" "reject" [3,] "3" "reject" [4,] "0" "pass" [5,] "3" "reject" [6,] "2" "reject" [7,] "3" "reject" [8,] "5" "reject" [9,] "3" "reject" [10,] "1" "reject" [11,] "1" "reject" [12,] "2" "reject" [13,] "2" "reject" [14,] "0" "pass" [15,] "3" "reject" [16,] "3" "reject" [17,] "2" "reject" [18,] "2" "reject" [19,] "1" "reject" [20,] "1" "reject" [21,] "2" "reject" [22,] "2" "reject" I was able to make a decision for each 1000 rows based on sampling as above. Now, I want to make code for "second" decision option as follows. Assuming the row number is in order of time or sequence, if 'intercept' value is 0 or 'decision' is 'pass' in the row 4 above, I want to skip any decision next following 5 (or else) and to label as 'skip', not 'reject'. In the example above, rows from 5 to 9 will be 'skip' than 'reject'. Also, rows from 15 to 19 should be 'skip' instead of 'reject'. Although I tried to make preliminary code with my post, I have no idea where to start. Could anyone help me to make code? Any feedback will be greatly appreciated. Thank you very much in advance!!! Steve [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R code help!
Hi Jean, Thank you so much! Steve On Sat, Sep 19, 2015 at 1:02 PM, Adams, Jean <jvad...@usgs.gov> wrote: > Here's one way to save your results, using a list of lists and a for() > loop. > > nsim <- 100 > outputs <- vector("list", nsim) > for(i in 1:nsim) { > outputs[[i]] <- sim.f(p.s=.05, N=1000, sample.size=69, n.sim=500) > } > > Jean > > On Fri, Sep 18, 2015 at 2:27 PM, SH <empti...@gmail.com> wrote: > >> Dear R users, >> >> I am trying to simulate surveys and the survey result will be used to >> determine the population to be "accepted" or "rejected". With the >> results, >> I would like to calculate cumulative means and plot them to see if a >> converged value is as expected. Below is R-code I generated. I need a >> help to repeat this simulation code as many times (e.g., 100) and keep the >> results as list format if possible. Could you give me any insight? >> >> Thanks a lot in advance, >> >> Steve >> >> sim.f <- function(p.s, N, sample.size, n.sim) { >> pop = sampled.pop = decision = decisionB = cum.mn = as.list(NULL) >> for(i in 1:n.sim) { >>p <- c(rep(1, p.s*N), pop2 <- rep(0, N*(1-p.s))) # Generate sample >> space >>pop[[i]] <- sample(p) # Randomization sample space >>sampled.pop[[i]] <- sample(pop[[i]], sample.size)# Random sampling >>decision[i] <- ifelse(sum(sampled.pop[[i]])>=1, 'Reject','Pass') # >> Decision for each group of n.sim >>decisionB <- ifelse(decision == 'Reject', 1, 0) # Convert to binary >>cum.mn <- cumsum(decisionB) / seq_along(decisionB) # Cummulative mean >> of >> n.sim group decisions >>} >> result = list(population=pop, >> pop_sub = sampled.pop, >> decision = decision, >> decisionB = decisionB, >> cum.mn = cum.mn) >> } >> sim.out <- sim.f(p.s=.05, N=1000, sample.size=69, n.sim=500) >> # I want to repeat this simulation function for example 100 times or and >> also #keep the data so that I can explore later. If it is not possible to >> keep all #outputs, at least I would like to have cum.mn outputs. >> >> summary(sim.out) >> sim.out$population >> sim.out$pop_sub >> sim.out$decision >> sim.out$decisionB >> y1 <- sim.out$cum.mn >> #plot(y1, type='l') >> lines(y2, type='l') >> ... >> lines(y100, type='l') >> abline(h=.95, col='red') >> >> [[alternative HTML version deleted]] >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R code help!
Dear R users, I am trying to simulate surveys and the survey result will be used to determine the population to be "accepted" or "rejected". With the results, I would like to calculate cumulative means and plot them to see if a converged value is as expected. Below is R-code I generated. I need a help to repeat this simulation code as many times (e.g., 100) and keep the results as list format if possible. Could you give me any insight? Thanks a lot in advance, Steve sim.f <- function(p.s, N, sample.size, n.sim) { pop = sampled.pop = decision = decisionB = cum.mn = as.list(NULL) for(i in 1:n.sim) { p <- c(rep(1, p.s*N), pop2 <- rep(0, N*(1-p.s))) # Generate sample space pop[[i]] <- sample(p) # Randomization sample space sampled.pop[[i]] <- sample(pop[[i]], sample.size)# Random sampling decision[i] <- ifelse(sum(sampled.pop[[i]])>=1, 'Reject','Pass') # Decision for each group of n.sim decisionB <- ifelse(decision == 'Reject', 1, 0) # Convert to binary cum.mn <- cumsum(decisionB) / seq_along(decisionB) # Cummulative mean of n.sim group decisions } result = list(population=pop, pop_sub = sampled.pop, decision = decision, decisionB = decisionB, cum.mn = cum.mn) } sim.out <- sim.f(p.s=.05, N=1000, sample.size=69, n.sim=500) # I want to repeat this simulation function for example 100 times or and also #keep the data so that I can explore later. If it is not possible to keep all #outputs, at least I would like to have cum.mn outputs. summary(sim.out) sim.out$population sim.out$pop_sub sim.out$decision sim.out$decisionB y1 <- sim.out$cum.mn #plot(y1, type='l') lines(y2, type='l') ... lines(y100, type='l') abline(h=.95, col='red') [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] grid with random or clustered distribution
Hi R-users, I hope this is not redundant questions. I tried to search similar threads relevant to my questions but could not find. Any input would be greatly appreciated. I want to generate grid with binary values (1 or 0) in n1 by n2 (e.g., 100 by 100 or 200 by 500, etc.) given proportions of 1 and 0 values (e.g., 1, 5, or 10% of 1 from 100 by 100 grid). For clustered distributed grid, I hope to be able to define cluster size if possible. Is there a simple way to generate random/clustered grids with 1 and 0 values with a pre-defined proportion? So far, the function "EVariogram" in the "CompRandFld" package generates clustered grid with 1 and 0. Especially, the example #4 in the "EVariogram" function description is a kind of what I want. Below is the slightly modified code from the original one. However, the code below can't control proportion of 1 and 0 values and complicated or I have no idea how to do it. I believe there may be easies ways to generate random/clustered grids with proportional 1 and 0 values. Thank you very much in advance, Steve library(CompRandFld) library(RandomFields) x0 <- seq(1, 50, length.out=50) y0 <- seq(1, 60, length.out=60) d <- expand.grid(x=x0, y=y0) dim(d) head(d) x <- d$x y <- d$y # Set the model's parameters: corrmodel <- 'exponential' mean <- 0 sill <- 1 nugget <- 0 scale <- 3 set.seed(1221) # Simulation of the Binary-Gaussian random field: data <- RFsim(x, y, corrmodel="exponential", model="BinaryGauss", param=list(mean=mean,sill=sill,scale=scale,nugget=nugget), threshold=0)$data # Empirical lorelogram estimation: fit <- EVariogram(data, x, y, numbins=20, maxdist=7, type="lorelogram") # Results: plot(fit$centers, fit$variograms, xlab='Distance', ylab="Lorelogram", ylim=c(min(fit$variograms), max(fit$variograms)), xlim=c(0, max(fit$centers)), pch=20, main="Spatial Lorelogram") # Plotting plot(d, type='n') text(d, label=data) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grid with random or clustered distribution
Hi Sarah, Thanks for your prompt responding. The methodology in the publication is very similar to what I plan to do. Yes, could you be willing to share the code if you don't mind? Thanks a lot again, Steve On Wed, Sep 9, 2015 at 9:11 AM, Sarah Goslee <sarah.gos...@gmail.com> wrote: > You can use gstat, as in: > > https://www.researchgate.net/publication/43279659_Behavior_of_Vegetation_Sampling_Methods_in_the_Presence_of_Spatial_Autocorrelation > > If you need more detail, I can dig up the code. > > Sarah > > On Wed, Sep 9, 2015 at 8:49 AM, SH <empti...@gmail.com> wrote: > > Hi R-users, > > > > I hope this is not redundant questions. I tried to search similar > threads > > relevant to my questions but could not find. Any input would be greatly > > appreciated. > > > > I want to generate grid with binary values (1 or 0) in n1 by n2 (e.g., > 100 > > by 100 or 200 by 500, etc.) given proportions of 1 and 0 values (e.g., 1, > > 5, or 10% of 1 from 100 by 100 grid). For clustered distributed grid, I > > hope to be able to define cluster size if possible. Is there a simple > way > > to generate random/clustered grids with 1 and 0 values with a > > pre-defined proportion? > > > > So far, the function "EVariogram" in the "CompRandFld" package generates > > clustered grid with 1 and 0. Especially, the example #4 in the > > "EVariogram" function description is a kind of what I want. Below is the > > slightly modified code from the original one. However, the code below > > can't control proportion of 1 and 0 values and complicated or I have no > > idea how to do it. I believe there may be easies ways to > > generate random/clustered grids with proportional 1 and 0 values. > > > > Thank you very much in advance, > > > > Steve > > > > > > library(CompRandFld) > > library(RandomFields) > > > > x0 <- seq(1, 50, length.out=50) > > y0 <- seq(1, 60, length.out=60) > > d <- expand.grid(x=x0, y=y0) > > dim(d) > > head(d) > > x <- d$x > > y <- d$y > > # Set the model's parameters: > > corrmodel <- 'exponential' > > mean <- 0 > > sill <- 1 > > nugget <- 0 > > scale <- 3 > > set.seed(1221) > > # Simulation of the Binary-Gaussian random field: > > data <- RFsim(x, y, corrmodel="exponential", model="BinaryGauss", > > param=list(mean=mean,sill=sill,scale=scale,nugget=nugget), > > threshold=0)$data > > # Empirical lorelogram estimation: > > fit <- EVariogram(data, x, y, numbins=20, maxdist=7, type="lorelogram") > > # Results: > > plot(fit$centers, fit$variograms, xlab='Distance', ylab="Lorelogram", > > ylim=c(min(fit$variograms), max(fit$variograms)), > > xlim=c(0, max(fit$centers)), pch=20, main="Spatial Lorelogram") > > # Plotting > > plot(d, type='n') > > text(d, label=data) > > > > > -- > Sarah Goslee > http://www.functionaldiversity.org > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] grid with random or clustered distribution
Thank you so much! I will try it! On Wed, Sep 9, 2015 at 3:27 PM, Sarah Goslee <sarah.gos...@gmail.com> wrote: > > ### simulate landscapes with spatial autocorrelation ### > ### Sarah Goslee 2015-09-09 ### > ### Goslee 2006 PLANT ECOLOGY 187(2):203-212 ### > > > library(gstat) > > > ## parameters > abun <- 0.2 > dim1 <- 20 > dim2 <- 50 > > > ## setup > xy <- expand.grid(seq_len(dim1), seq_len(dim2)) > names(xy) <- c("x","y") > > > ## three sample simulations > > # no spatial autocorrelation > g.dummy <- gstat(formula = z~x+y, locations = ~x+y, dummy = TRUE, beta > = 0, model = vgm(1,"Nug", 0), nmax = 50) > sim <- predict(g.dummy, newdata = xy, nsim = 1) > random.landscape.000 <- predict(g.dummy, newdata = xy, nsim = 1) > random.landscape.000[,3] <- ifelse(random.landscape.000[,3] > > quantile(random.landscape.000[,3], abun), 0, 1) > > # little spatial autocorrelation > g.dummy <- gstat(formula = z~x+y, locations = ~x+y, dummy = TRUE, beta > = 0, model = vgm(1,"Exp", 5), nmax = 50) > random.landscape.005 <- predict(g.dummy, newdata = xy, nsim = 1) > random.landscape.005[,3] <- ifelse(random.landscape.005[,3] > > quantile(random.landscape.005[,3], abun), 0, 1) > > # much spatial autocorrelation > g.dummy <- gstat(formula = z~x+y, locations = ~x+y, dummy = TRUE, beta > = 0, model = vgm(1,"Exp", 250), nmax = 50) > sim <- predict(g.dummy, newdata = xy, nsim = 1) > random.landscape.250 <- predict(g.dummy, newdata = xy, nsim = 1) > random.landscape.250[,3] <- ifelse(random.landscape.250[,3] > > quantile(random.landscape.250[,3], abun), 0, 1) > > > # plot the simulated landscapes > par(mfrow=c(1,3)) > image(random.landscape.000, main="Null", xaxt="n", yaxt="n", bty="n", > xlim=c(0,dim1), ylim=c(0, dim2), col=c("lightgray", "darkgray")) > image(random.landscape.005, main="5", xaxt="n", yaxt="n", bty="n", > xlim=c(0,dim1), ylim=c(0, dim2), col=c("lightgray", "blue")) > image(random.landscape.250, main="250", sub=paste("abun =", abun), > xaxt="n", yaxt="n", bty="n", xlim=c(0,dim1), ylim=c(0, dim2), > col=c("lightgray", "darkblue")) > > > ###end ### > > > On Wed, Sep 9, 2015 at 9:27 AM, SH <empti...@gmail.com> wrote: > > Hi Sarah, > > > > Thanks for your prompt responding. The methodology in the publication is > > very similar to what I plan to do. Yes, could you be willing to share > the > > code if you don't mind? > > > > Thanks a lot again, > > > > Steve > > > > On Wed, Sep 9, 2015 at 9:11 AM, Sarah Goslee <sarah.gos...@gmail.com> > wrote: > >> > >> You can use gstat, as in: > >> > >> > https://www.researchgate.net/publication/43279659_Behavior_of_Vegetation_Sampling_Methods_in_the_Presence_of_Spatial_Autocorrelation > >> > >> If you need more detail, I can dig up the code. > >> > >> Sarah > >> > >> On Wed, Sep 9, 2015 at 8:49 AM, SH <empti...@gmail.com> wrote: > >> > Hi R-users, > >> > > >> > I hope this is not redundant questions. I tried to search similar > >> > threads > >> > relevant to my questions but could not find. Any input would be > greatly > >> > appreciated. > >> > > >> > I want to generate grid with binary values (1 or 0) in n1 by n2 (e.g., > >> > 100 > >> > by 100 or 200 by 500, etc.) given proportions of 1 and 0 values (e.g., > >> > 1, > >> > 5, or 10% of 1 from 100 by 100 grid). For clustered distributed > grid, I > >> > hope to be able to define cluster size if possible. Is there a simple > >> > way > >> > to generate random/clustered grids with 1 and 0 values with a > >> > pre-defined proportion? > >> > > >> > So far, the function "EVariogram" in the "CompRandFld" package > generates > >> > clustered grid with 1 and 0. Especially, the example #4 in the > >> > "EVariogram" function description is a kind of what I want. Below is > the > >> >
Re: [R] Extract values from multiple lists
Dear Dennis, David, Jeff, and Denes, Thanks for your helps and comments. The simple one seems good enough for my works. Best, Steve On Wed, Dec 17, 2014 at 5:46 AM, Dénes Tóth toth.de...@ttk.mta.hu wrote: Dear Jeff, On 12/17/2014 01:46 AM, Jeff Newmiller wrote: You are chasing ghosts of performance past, Denes. In terms of memory efficiency, yes. In terms of CPU time, there can be significant difference, see below. The data.frame function causes no problems, and if it is used then the OP would not need to presume they know the internal structure of the data frame. See below. (I am using R3.1.2.) a1 - list(x = rnorm(1e6), y = rnorm(1e6)) a2 - list(x = rnorm(1e6), y = rnorm(1e6)) a3 - list(x = rnorm(1e6), y = rnorm(1e6)) # get names of the objects out_names - ls(pattern=a[[:digit:]]$) # amount of memory allocated gc(reset=TRUE) # Explicitly call data frame out2 - data.frame( a1=a1[[x]], a2=a2[[x]], a3=a3[[x]] ) # No copying. gc() # Your suggested retreival method out3a - lapply( lapply( out_names, get ), [[, x ) names( out3a ) - out_names # The obvious way to finish the job works fine. out3 - do.call( data.frame, out3a ) BTW, the even more obvious as.data.frame() produces the same with an even more intuitive interface. However, for lists with a larger number of elements the transformation to a data.frame can be pretty slow. In the toy example, we created only a three-element list. Let's increase it a little bit. --- # this is not even that large datlen - 1e2 listlen - 1e5 # create a toy list mylist - matrix(seq_len(datlen * listlen), nrow = datlen, ncol = listlen) mylist - lapply(1:ncol(mylist), function(i) mylist[, i]) names(mylist) - paste0(V, seq_len(listlen)) # define the more efficient function --- # note that I put class(x) first so that setattr does not # modify the attributes of the original input (see ?setattr, # you have to be careful) setAttrib - function(x) { class(x) - data.frame data.table::setattr(x, row.names, seq_along(x[[1]])) x } # benchmarking # (we do not need microbenchmark here, the differences are # extremely large) - on my machine, 9.4 sec, 8.1 sec vs 0.15 sec gc(reset=TRUE) system.time(df1 - do.call(data.frame, mylist)) gc() system.time(df2 - as.data.frame(mylist)) gc() system.time(df3 - setAttrib(mylist)) gc() # check results identical(df1, df2) identical(df1, df3) Of course for small datasets, one should use the built-in and safe functions (either do.call or as.data.frame). BTW, for the original three-element list, these are even faster than the workaround. All the best, Denes # No copying... well, you do end up with a new list in out3, but the data itself doesn't get copied. gc() On Tue, 16 Dec 2014, D?nes T?th wrote: On 12/16/2014 06:06 PM, SH wrote: Dear List, I hope this posting is not redundant. I have several list outputs with the same components. I ran a function with three different scenarios below (e.g., scen1, scen2, and scen3,...,scenN). I would like to extract the same components and group them as a data frame. For example, pop.inf.r1 - scen1[['pop.inf.r']] pop.inf.r2 - scen2[['pop.inf.r']] pop.inf.r3 - scen3[['pop.inf.r']] ... pop.inf.rN-scenN[['pop.inf.r']] new.df - data.frame(pop.inf.r1, pop.inf.r2, pop.inf.r3,...,pop.inf.rN) My final output would be 'new.df'. Could you help me how I can do that efficiently? If efficiency is of concern, do not use data.frame() but create a list and add the required attributes with data.table::setattr (the setattr function of the data.table package). (You can also consider creating a data.table instead of a data.frame.) # some largish lists a1 - list(x = rnorm(1e6), y = rnorm(1e6)) a2 - list(x = rnorm(1e6), y = rnorm(1e6)) a3 - list(x = rnorm(1e6), y = rnorm(1e6)) # amount of memory allocated gc(reset=TRUE) # get names of the objects out_names - ls(pattern=a[[:digit:]]$) # create a list out - lapply(lapply(out_names, get), [[, x) # note that no copying occured gc() # decorate the list data.table::setattr(out, names, out_names) data.table::setattr(out, row.names, seq_along(out[[1]])) class(out) - data.frame # still no copy gc() # output head(out) HTH, Denes Thanks in advance, Steve P.S.: Below are some examples of summary outputs. summary(scen1) Length Class Mode aql1 -none- numeric rql1 -none- numeric alpha 1 -none- numeric beta 1 -none- numeric n.sim 1 -none- numeric N 1 -none- numeric n.sample 1 -none- numeric n.acc 1 -none- numeric lot.inf.r 1 -none- numeric pop.inf.n 2000 -none- list pop.inf.r 2000 -none- list pop.decision.t1 2000 -none- list pop.decision.t2 2000 -none- list sp.inf.n2000 -none- list sp.inf.r
[R] Extract values from multiple lists
Dear List, I hope this posting is not redundant. I have several list outputs with the same components. I ran a function with three different scenarios below (e.g., scen1, scen2, and scen3,...,scenN). I would like to extract the same components and group them as a data frame. For example, pop.inf.r1 - scen1[['pop.inf.r']] pop.inf.r2 - scen2[['pop.inf.r']] pop.inf.r3 - scen3[['pop.inf.r']] ... pop.inf.rN-scenN[['pop.inf.r']] new.df - data.frame(pop.inf.r1, pop.inf.r2, pop.inf.r3,...,pop.inf.rN) My final output would be 'new.df'. Could you help me how I can do that efficiently? Thanks in advance, Steve P.S.: Below are some examples of summary outputs. summary(scen1) Length Class Mode aql1 -none- numeric rql1 -none- numeric alpha 1 -none- numeric beta 1 -none- numeric n.sim 1 -none- numeric N 1 -none- numeric n.sample 1 -none- numeric n.acc 1 -none- numeric lot.inf.r 1 -none- numeric pop.inf.n 2000 -none- list pop.inf.r 2000 -none- list pop.decision.t1 2000 -none- list pop.decision.t2 2000 -none- list sp.inf.n2000 -none- list sp.inf.r2000 -none- list sp.decision 2000 -none- list summary(scen2) Length Class Mode aql1 -none- numeric rql1 -none- numeric alpha 1 -none- numeric beta 1 -none- numeric n.sim 1 -none- numeric N 1 -none- numeric n.sample 1 -none- numeric n.acc 1 -none- numeric lot.inf.r 1 -none- numeric pop.inf.n 2000 -none- list pop.inf.r 2000 -none- list pop.decision.t1 2000 -none- list pop.decision.t2 2000 -none- list sp.inf.n2000 -none- list sp.inf.r2000 -none- list sp.decision 2000 -none- list summary(scen3) Length Class Mode aql1 -none- numeric rql1 -none- numeric alpha 1 -none- numeric beta 1 -none- numeric n.sim 1 -none- numeric N 1 -none- numeric n.sample 1 -none- numeric n.acc 1 -none- numeric lot.inf.r 1 -none- numeric pop.inf.n 2000 -none- list pop.inf.r 2000 -none- list pop.decision.t1 2000 -none- list pop.decision.t2 2000 -none- list sp.inf.n2000 -none- list sp.inf.r2000 -none- list sp.decision 2000 -none- list [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] data import: strange experience
Dear List: I had some strange experience in importing data. I wonder if anyone of you had the same problem before and would greatly appreciate your suggestion in advance. The original data set in excel format. Here is a brief summary of the procedure I did: 1. I saved the original excel data as csv and txt formats, separately. 2. I imported two data using the following codes. There were no error messages. dftxt = read.table('df.txt',header=T, sep='\t') dfcsv = read.csv('df.csv',header=T, sep=',') 3. When I checked data with 'str', I found that factor levels of a variable were different each other. Levels of dftxt were less than those of dfcsv (48 vs 52). 4. So, I checked 'df.txt' file and found that the missing levels were still there, i.e., there is a no problem in text file. I suspect that something happened when I imported it into R. Since there was no errors in importing the file into R, I do not have an idea where to start to fix it. Do you have any suggestion? Thank you very much in advance, SH [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data import: strange experience
Hi Sarah, Thanks for a prompt feedback. I knew it will be very vague without example. However, I only used two commands to import data and had no 'apparent' errors. The original data have about 19000 obs and I was able to reduce about 3200. I wonder if I can attach the data file (size: 109K) with my email. Best, Steve On Wed, Aug 21, 2013 at 10:46 AM, Sarah Goslee sarah.gos...@gmail.comwrote: Hi, We don't know anything about your data or your file, so it's utterly impossible to offer useful suggestions. The very best thing you can do is condense your problem into a reproducible example, with fake data if necessary. Otherwise you're limited by the ability of the list to guess what you're looking at, and our track record with that is spotty. Sarah On Wed, Aug 21, 2013 at 10:35 AM, SH empti...@gmail.com wrote: Dear List: I had some strange experience in importing data. I wonder if anyone of you had the same problem before and would greatly appreciate your suggestion in advance. The original data set in excel format. Here is a brief summary of the procedure I did: 1. I saved the original excel data as csv and txt formats, separately. 2. I imported two data using the following codes. There were no error messages. dftxt = read.table('df.txt',header=T, sep='\t') dfcsv = read.csv('df.csv',header=T, sep=',') 3. When I checked data with 'str', I found that factor levels of a variable were different each other. Levels of dftxt were less than those of dfcsv (48 vs 52). 4. So, I checked 'df.txt' file and found that the missing levels were still there, i.e., there is a no problem in text file. I suspect that something happened when I imported it into R. Since there was no errors in importing the file into R, I do not have an idea where to start to fix it. Do you have any suggestion? Thank you very much in advance, SH -- Sarah Goslee http://www.functionaldiversity.org [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data import: strange experience
Thanks Peter. It works with read.delim. David: Thanks for your comments. To answer your questions. I don't have 'NA' and all balanced. The number of mssing levels were 4 and it happened only to those four levels. Yes, there is commas embedded and some characters (e.g., '-', space, some wired characters in the middle of names, etc.). I can send you sample data if you are willing to take a look. Even though using 'read.delim' works, I am still curious what caused the problem and potential problem that I may miss. Thanks again, SH On Wed, Aug 21, 2013 at 10:58 AM, David Carlson dcarl...@tamu.edu wrote: This is not really enough information to diagnose the problem. What are the missing factor levels? Were the missing levels combined with another level or do you have missing values (NA) for those observations? Do the extra factor levels include embedded commas? There are differences between read.table and read.csv in the default quote= and comment.char= arguments. - David L Carlson Associate Professor of Anthropology Texas AM University College Station, TX 77840-4352 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of SH Sent: Wednesday, August 21, 2013 9:36 AM To: r-help@r-project.org Subject: [R] data import: strange experience Dear List: I had some strange experience in importing data. I wonder if anyone of you had the same problem before and would greatly appreciate your suggestion in advance. The original data set in excel format. Here is a brief summary of the procedure I did: 1. I saved the original excel data as csv and txt formats, separately. 2. I imported two data using the following codes. There were no error messages. dftxt = read.table('df.txt',header=T, sep='\t') dfcsv = read.csv('df.csv',header=T, sep=',') 3. When I checked data with 'str', I found that factor levels of a variable were different each other. Levels of dftxt were less than those of dfcsv (48 vs 52). 4. So, I checked 'df.txt' file and found that the missing levels were still there, i.e., there is a no problem in text file. I suspect that something happened when I imported it into R. Since there was no errors in importing the file into R, I do not have an idea where to start to fix it. Do you have any suggestion? Thank you very much in advance, SH [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Extract letters from a column
Dear list: I would like to extract three letters from first and second elements in one column and make a new column. For example below, tempdf = read.table(clipboard, header=T, sep='\t') tempdf name var1 var2abb 1 Tom Cruiser16 TomCru 2 Bread Pett25 BrePet 3 Arnold Schwiezer37 ArnSch (p1 = substr(tempdf$name, 1, 3)) [1] Tom Bre Arn I was able to extract three letters from first name, however, I don't know how to extract three letters from last name (i.e., 'Cru', 'Pet', and 'Sch'). Can anyone give me a suggestion? Many thanks in advance. Best, Steve __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Extract letters from a column
Dear Jorge, I gave me this result (below) since it defines starting from the forth letter and ending 6th letter from the first element. substr(tempdf$name, 4, 6) [1] Cr ad old I would like to have letters from first and second elements if possible. Thanks for replying, Steve On Wed, Mar 13, 2013 at 10:10 AM, Jorge I Velez jorgeivanve...@gmail.com wrote: Dear SH, Hmmm... what about substr(tempdf$name, 4, 6)) ? HTH, Jorge.- On Thu, Mar 14, 2013 at 1:06 AM, SH empti...@gmail.com wrote: Dear list: I would like to extract three letters from first and second elements in one column and make a new column. For example below, tempdf = read.table(clipboard, header=T, sep='\t') tempdf name var1 var2abb 1 Tom Cruiser16 TomCru 2 Bread Pett25 BrePet 3 Arnold Schwiezer37 ArnSch (p1 = substr(tempdf$name, 1, 3)) [1] Tom Bre Arn I was able to extract three letters from first name, however, I don't know how to extract three letters from last name (i.e., 'Cru', 'Pet', and 'Sch'). Can anyone give me a suggestion? Many thanks in advance. Best, Steve __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Extract letters from a column
What I want to do is to extrac three letters from first and last name and to combine them to make another column 'abb'. The column 'abb' is to be a my final product. I can make column 'abb' using 'paste' function once I have two parts from the first column 'name'. Thanks, Steve On Wed, Mar 13, 2013 at 10:17 AM, Jorge I Velez jorgeivanve...@gmail.com wrote: Try substr(tempdf$abb 4, 6) --JIV On Thu, Mar 14, 2013 at 1:15 AM, SH empti...@gmail.com wrote: Dear Jorge, I gave me this result (below) since it defines starting from the forth letter and ending 6th letter from the first element. substr(tempdf$name, 4, 6) [1] Cr ad old I would like to have letters from first and second elements if possible. Thanks for replying, Steve On Wed, Mar 13, 2013 at 10:10 AM, Jorge I Velez jorgeivanve...@gmail.com wrote: Dear SH, Hmmm... what about substr(tempdf$name, 4, 6)) ? HTH, Jorge.- On Thu, Mar 14, 2013 at 1:06 AM, SH empti...@gmail.com wrote: Dear list: I would like to extract three letters from first and second elements in one column and make a new column. For example below, tempdf = read.table(clipboard, header=T, sep='\t') tempdf name var1 var2abb 1 Tom Cruiser16 TomCru 2 Bread Pett25 BrePet 3 Arnold Schwiezer37 ArnSch (p1 = substr(tempdf$name, 1, 3)) [1] Tom Bre Arn I was able to extract three letters from first name, however, I don't know how to extract three letters from last name (i.e., 'Cru', 'Pet', and 'Sch'). Can anyone give me a suggestion? Many thanks in advance. Best, Steve __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Extract letters from a column
Thank you so much, Jorge and arun!!! Both works well. Steve On Wed, Mar 13, 2013 at 10:26 AM, Jorge I Velez jorgeivanve...@gmail.com wrote: Try x - c(Tom Cruiser, Bread Pett, Arnold Schwiezer) sapply(strsplit(x, ), function(r) paste0(substr(r[1], 1, 3), substr(r[2], 1, 3))) [1] TomCru BrePet ArnSch HTH, Jorge.- On Thu, Mar 14, 2013 at 1:21 AM, SH empti...@gmail.com wrote: What I want to do is to extrac three letters from first and last name and to combine them to make another column 'abb'. The column 'abb' is to be a my final product. I can make column 'abb' using 'paste' function once I have two parts from the first column 'name'. Thanks, Steve On Wed, Mar 13, 2013 at 10:17 AM, Jorge I Velez jorgeivanve...@gmail.com wrote: Try substr(tempdf$abb 4, 6) --JIV On Thu, Mar 14, 2013 at 1:15 AM, SH empti...@gmail.com wrote: Dear Jorge, I gave me this result (below) since it defines starting from the forth letter and ending 6th letter from the first element. substr(tempdf$name, 4, 6) [1] Cr ad old I would like to have letters from first and second elements if possible. Thanks for replying, Steve On Wed, Mar 13, 2013 at 10:10 AM, Jorge I Velez jorgeivanve...@gmail.com wrote: Dear SH, Hmmm... what about substr(tempdf$name, 4, 6)) ? HTH, Jorge.- On Thu, Mar 14, 2013 at 1:06 AM, SH empti...@gmail.com wrote: Dear list: I would like to extract three letters from first and second elements in one column and make a new column. For example below, tempdf = read.table(clipboard, header=T, sep='\t') tempdf name var1 var2abb 1 Tom Cruiser16 TomCru 2 Bread Pett25 BrePet 3 Arnold Schwiezer37 ArnSch (p1 = substr(tempdf$name, 1, 3)) [1] Tom Bre Arn I was able to extract three letters from first name, however, I don't know how to extract three letters from last name (i.e., 'Cru', 'Pet', and 'Sch'). Can anyone give me a suggestion? Many thanks in advance. Best, Steve __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Extract letters from a column
mmm... great! Thanks a lot all of you for helps!!! Steve On Wed, Mar 13, 2013 at 10:44 AM, Marc Schwartz marc_schwa...@me.com wrote: This could be done in a single step using gsub() with back references in the regex. gsub(^(.{3}).* (.{3}).*$, \\1\\2, Tom Cruise) [1] TomCru Regards, Marc Schwartz On Mar 13, 2013, at 9:21 AM, SH empti...@gmail.com wrote: What I want to do is to extrac three letters from first and last name and to combine them to make another column 'abb'. The column 'abb' is to be a my final product. I can make column 'abb' using 'paste' function once I have two parts from the first column 'name'. Thanks, Steve On Wed, Mar 13, 2013 at 10:17 AM, Jorge I Velez jorgeivanve...@gmail.com wrote: Try substr(tempdf$abb 4, 6) --JIV On Thu, Mar 14, 2013 at 1:15 AM, SH empti...@gmail.com wrote: Dear Jorge, I gave me this result (below) since it defines starting from the forth letter and ending 6th letter from the first element. substr(tempdf$name, 4, 6) [1] Cr ad old I would like to have letters from first and second elements if possible. Thanks for replying, Steve On Wed, Mar 13, 2013 at 10:10 AM, Jorge I Velez jorgeivanve...@gmail.com wrote: Dear SH, Hmmm... what about substr(tempdf$name, 4, 6)) ? HTH, Jorge.- On Thu, Mar 14, 2013 at 1:06 AM, SH empti...@gmail.com wrote: Dear list: I would like to extract three letters from first and second elements in one column and make a new column. For example below, tempdf = read.table(clipboard, header=T, sep='\t') tempdf name var1 var2abb 1 Tom Cruiser16 TomCru 2 Bread Pett25 BrePet 3 Arnold Schwiezer37 ArnSch (p1 = substr(tempdf$name, 1, 3)) [1] Tom Bre Arn I was able to extract three letters from first name, however, I don't know how to extract three letters from last name (i.e., 'Cru', 'Pet', and 'Sch'). Can anyone give me a suggestion? Many thanks in advance. Best, Steve __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] simple linear regression with proportion data
Dear list: Can I use simple linear regression when I have proportion data for both dependent and independent variables? Or, should I use beta regression analysis? Or any suggestion? Thanks! SH __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] simple linear regression with proportion data
Very useful comment and helpful website! Many thanks to you!!! SH On Fri, Nov 16, 2012 at 5:16 PM, Ben Bolker bbol...@gmail.com wrote: SH emptican at gmail.com writes: Dear list: Can I use simple linear regression when I have proportion data for both dependent and independent variables? Or, should I use beta regression analysis? Or any suggestion? The distribution of the independent variable is irrelevant (in some circumstances it matters whether it is measured without error or not). Depending on what you want to do, and how close the proportion data come to 0 or 1, you might choose to use linear regression, or linear regression on arcsine-square-root transformed data, or beta regression. It really depends what you want to do with the answers and what your audience expects. You might try this on http://stats.stackexchange.com with a bit more context. Ben Bolker __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.