[R] Comparing dates in two large data frames
Dear all, I have two data frames (df1 and df2) and for each timepoint in df1 I want to know: is it whithin any of the timespans in df2? The result (e.g. "no" or "yes" or 0 and 1) should be shown in a new column of df1 Here is the code to create the two data frames (the size of the two data frames is approx. the same as in my original data frames): # create data frame df1 ti1 <- seq.POSIXt(from=as.POSIXct("2020/01/01", tz="UTC"), to=as.POSIXct("2020/06/01", tz="UTC"), by="10 min") df1 <- data.frame(Time=ti1) # create data frame df2 with random timespans, i.e. start and end dates start <- sort(sample(seq(as.POSIXct("2020/01/01", tz="UTC"), as.POSIXct("2020/06/01", tz="UTC"), by="1 mins"), 5000)) end <- start + 120 df2 <- data.frame(start=start, end=end) Everything I tried (ifelse combined with sapply or for loops) has been very very very slow. Thus, I am looking for a reasonably fast solution. Thanks a lot for any hint in advance ! Cheers, Thomas __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] package 'gradientForest' and 'extendedForest'
Dear experts, I have 5 environmental predictors and abundance data (300 samples, 60 species, transformation: log(x + min(x,x 0) and use the function 'gradientForest' to estimate (R�-weighted) predictor importance (regression trees). The resulting predictor importance in decreasing order is as follows: pred1, pred2, pred3, pred4, pred5. The two species with the highest R� (goodness-of-fit; output value 'result' of function 'gradientForest') are species 1 (R�=0.76), species 2 (R�=0.74), and species 3 (R�=0.72). To my understanding this means that the model (i.e. the predictor importance ranking) fits best to species 1, 2, and 3 in decreasing order. In a further step I want to know which predictors are the most important for selected species. Thus, I ran separate forests using the 'extendedForest' function with the same parameter settings (and the same set.seed()) as in the function call of 'gradientForest' for species 1, 2, and 3 (and others). Now the resulting predictor importance is (in decreasing order): species1: pred1, pred2, pred4, pred3, pred5; species2: pred1, pred4, pred2, pred5, pred3; species3: pred2, pred4, pred5, pred1, pred3. This seems strange to me, because I believed that the 'extendedForest' function should give similar predictor importance rankings as the 'gradientForest' predictor importance ranking for the species with the highest R� values obtained by 'gradientForest' . I'd be grateful for any help. Thanks a lot in anticipation. Best regards Thomas [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Calculate depth from regular xyz grid for any coordinate within the grid
Dear R-experts, I have a regular grid dataframe (here: the first 50 rows) : # data frame (regular grid) with x, y (UTM-coordinates) and z (depth) # x=UTM coordinates (easting, zone 32) # y=UTM coordinates (northing, zone 32) # z=river-depth (meters) df - data.frame(x=c(3454240, 3454240, 3454240, 3454240, 3454240, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454270, 3454270, 3454270, 3454270, 3454270, 3454270, 3454270, 3454270, 3454270, 3454270), y=c(5970610, 5970620, 5970630, 5970640, 5970650, 5970610, 5970620, 5970630, 5970640, 5970650, 5970660, 5970670, 5970680, 5970690, 5970700, 5970710, 5970720, 5970730, 5970610, 5970620, 5970630, 5970640, 5970650, 5970660, 5970670, 5970680, 5970690, 5970700, 5970710, 5970720, 5970730, 5970740, 5970750, 5970760, 5970770, 5970780, 5970790, 5970800, 5970810, 5970820, 5970610, 5970620, 5970630, 5970640, 5970650, 5970660, 5970670, 5970680, 5970690, 5970700), z= c(-1.5621, -1.5758, -1.5911, -1.6079, -1.6247, -1.5704, -1.5840, -1.5976, -1.6113, -1.6249, -1.6385, -1.6521, -1.6658, -1.6794, -1.6930, -1.7067, -1.7216, -1.7384, -1.5786, -1.5922, -1.6059, -1.6195, -1.6331, -1.6468, -1.6604, -1.6740, -1.6877, -1.7013, -1.7149, -1.7285, -1.7422, -1.7558, -1.7694, -1.7831, -1.7967, -1.8103, -1.8239, -1.8376, -1.8522, -1.8690, -1.5869, -1.6005, -1.6141, -1.6278, -1.6414, -1.6550, -1.6686, -1.6823, -1.6959, -1.7095)) head(df) plot(df[,1:2], las=3) # to show that it's a regular grid My question: is there a function to calculate the depth of any coordinate pair (e.g. x=3454263, y=5970687) within the grid, e.g. by bilinear interpolation or any other meaningful method? Thanks a lot for your help in anticipation Best wishes Thomas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Creating a R-package in R-Studio
Dear R-help community, I am creating my first R-package in RStudio and wanted to add datasets to the package. I added an .RData file containing data (a data frame) in the 'data' folder of the package and could load the data as usual by typing in the console: data(xyz). Then I added a .RData file (a list) and could not load the data. Can anybody explain how I could make data available in my package that are saved as a list? Thanks a lot in anticipation. Best regards Thomas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Avoiding slow for-loops (once again)
# Dear R-experts, # (Once again) I want to avoid the usage of for-loops but unfortunately I don't know how. # I know functions like e.g. 'apply' but don't know how to use them for the solution of the following problem: # I have a data frame 'a' giving the number of columns in data frame 'b' that belong to one group a - data.frame(group1=5, group2=4) b - data.frame(col1=c(0,0,0), col2=c(0,1,0.5), col3=c(0,0,0), col4=c(1/3,0,0.5), col5=c(2/3,0,0), col6=c(0,0,0), col7=c(1,1/3,0), col8=c(0,2/3,0), col9=c(0,0,0)) # ... thus columns 1-5 in 'b' belong to group 1 and columns 6-9 in 'b' belong to group 2 # then I created a data frame giving all possible row combinations of 'b' r - as.data.frame(t(combn(nrow(b), 2))) # .. so e.g. the second row of 'r' tells me that I have to perform an equation with the values of the # first and third row of table 'b'. The equation has to be calculated for each group seperately. # e.g. within group 2 (columns 6-9 in 'b') I have to calculate e.g. for rows 1 and 3 in 'b' # (abs(b[row1,col6] - b[row3, col6]) + abs(b[row1, col7] - b[row3, col7]) + + abs(b[row1, col9] - b[row3, col9]))/2 # the resulting data frame shall look as follows: result - cbind(r, data.frame(group1=c(1,2/3,0.5), group2=c(2/3,0.5,0.5))) # The original tables are much larger and I don't know how to solve this problem w/o a lot of very slow for-loops. # Is there any possible solution w/o using 'for'-loops? # I'd be happy for any suggestions # Thank you very much in anticipation # Best regards # Thomas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Avoiding loops using 'for' and pairwise comparison of columns
Dear R-experts, I'd like to avoid the use of very slow 'for'-loops but I don't know how. My data look as follows (the original data has 1600 rows and 30 columns): # data example c1 - c(1,1,1,0.25,0,1,1,1,0,1) c2 - c(0,0,1,1,0,1,0,1,0.5,1) c3 - c(0,1,1,1,0,0.75,1,1,0.5,0) x - data.frame(c1,c2,c3) I need to compare every column with each other and want to know the percentage of similar values for each column pair. To calculate the percentage of similar values I used the function 'agree' from the irr-package. I solved the problem with a loop that is very slow. library(irr) # required for the function 'agree' # empty data frame for the results a - as.data.frame(matrix(data=NA, nrow=3, ncol=3)) colnames(a) - colnames(x) rownames(a) - colnames(x) # the loop to write the data for (j in 1:ncol(x)){ for (i in 1:ncol(x)){ a[i,j] - agree(cbind(x[,j], x[,i]))$value } } I would be very pleased to receive your suggestions how to avoid the loop. Furthermore the resulting data frame could be displayed as a diagonal matrix without duplicates of each pairwise comparison, but I don't know how to solve this problem. Kind regards Thomas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Rao's quadratic entropy with fuzzy coded trait data
Dear R-help-list-readers, I wonder if it is possible to calculate calculate Rao's quadratic entropy based on fuzzy coded trait data with R? As I have understood it the packages 'FD' and 'ade4' don't seem to support fuzzy coded data for calculation Rao's quadratic entropy. Thank you very much for helping in anticipation. Best regards Thomas [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] fisher.alpha warnings
I have two vectors (a and b) with counts of animals and wanted to calculate fisher's alpha: library(vegan) a - c(2043, 1258, 52, 1867, 107, 1624, 2, 157, 210, 402, 5, 107, 267, 2, 13683) b - c(2043, 1258, 52, 1867, 107, 1624, 2, 157, 210, 402, 5, 107, 267, 2, 3000) fisher.alpha(a) fisher.alpha(b) fisher.alpha(a) gave the following warnings: fisher.alpha(a) [1] 1.572964 Warnmeldungen: 1: In log(p) : NaNs wurden erzeugt 2: In log(1 - x) : NaNs wurden erzeugt 3: In nlm(Dev.logseries, n.r = n.r, p = p, N = N, hessian = TRUE, ...) : NA/Inf durch größte positive Zahl ersetzt fisher.alpha(b) gave no warnings (note: only the last number in the vector 'b' differs from 'a'!) Why did vector 'a' gave warnings and what does that mean for the validity of the calculated alpha-value? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.