[R] Comparing dates in two large data frames

2021-04-10 Thread Kulupp

Dear all,

I have two data frames (df1 and df2) and for each timepoint in df1 I 
want to know: is it whithin any of the timespans in df2? The result 
(e.g. "no" or "yes" or 0 and 1) should be shown in a new column of df1


Here is the code to create the two data frames (the size of the two data 
frames is approx. the same as in my original data frames):


# create data frame df1
ti1 <- seq.POSIXt(from=as.POSIXct("2020/01/01", tz="UTC"), 
to=as.POSIXct("2020/06/01", tz="UTC"), by="10 min")

df1 <- data.frame(Time=ti1)

# create data frame df2 with random timespans, i.e. start and end dates
start <- sort(sample(seq(as.POSIXct("2020/01/01", tz="UTC"), 
as.POSIXct("2020/06/01", tz="UTC"), by="1 mins"), 5000))

end   <- start + 120
df2 <- data.frame(start=start, end=end)

Everything I tried (ifelse combined with sapply or for loops) has been 
very very very slow. Thus, I am looking for a reasonably fast solution.


Thanks a lot for any hint in advance !

Cheers,

Thomas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] package 'gradientForest' and 'extendedForest'

2014-08-26 Thread Kulupp
Dear experts,

I have 5 environmental predictors and abundance data (300 samples, 60 
species, transformation: log(x + min(x,x  0) and use the function 
'gradientForest' to estimate (R�-weighted) predictor importance 
(regression trees). The resulting predictor importance in decreasing 
order is as follows: pred1, pred2, pred3, pred4, pred5. The two species 
with the highest R� (goodness-of-fit; output value 'result' of function 
'gradientForest') are species 1 (R�=0.76), species 2 (R�=0.74), and 
species 3 (R�=0.72). To my understanding this means that the model (i.e. 
the predictor importance ranking) fits best to species 1, 2, and 3 in 
decreasing order. In a further step I want to know which predictors are 
the most important for selected species. Thus, I ran separate forests 
using the 'extendedForest' function with the same parameter settings 
(and the same set.seed()) as in the function call of 'gradientForest' 
for species 1, 2, and 3 (and others). Now the resulting predictor 
importance is (in decreasing order): species1: pred1, pred2, pred4, 
pred3, pred5; species2: pred1, pred4, pred2, pred5, pred3; species3: 
pred2, pred4, pred5, pred1, pred3. This seems strange to me, because I 
believed that the 'extendedForest' function should give similar 
predictor importance rankings as the 'gradientForest' predictor 
importance ranking for the species with the highest R� values obtained 
by 'gradientForest' . I'd be grateful for any help. Thanks a lot in 
anticipation.

Best regards

Thomas


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Calculate depth from regular xyz grid for any coordinate within the grid

2014-07-28 Thread Kulupp

Dear R-experts,

I have a regular grid dataframe (here: the first 50 rows) :

# data frame (regular grid) with x, y (UTM-coordinates) and z (depth)
# x=UTM coordinates (easting, zone 32)
# y=UTM coordinates (northing, zone 32)
# z=river-depth (meters)
df - data.frame(x=c(3454240, 3454240, 3454240, 3454240, 3454240, 
3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 3454250, 
3454250, 3454250, 3454250,
 3454250, 3454250, 3454260, 3454260, 3454260, 
3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 3454260, 
3454260, 3454260, 3454260,
 3454260, 3454260, 3454260, 3454260, 3454260, 
3454260, 3454260, 3454260, 3454270, 3454270, 3454270, 3454270, 3454270, 
3454270, 3454270, 3454270,

 3454270, 3454270),
 y=c(5970610, 5970620, 5970630, 5970640, 5970650, 
5970610, 5970620, 5970630, 5970640, 5970650, 5970660, 5970670, 5970680, 
5970690, 5970700, 5970710,
 5970720, 5970730, 5970610, 5970620, 5970630, 
5970640, 5970650, 5970660, 5970670, 5970680, 5970690, 5970700, 5970710, 
5970720, 5970730, 5970740,
 5970750, 5970760, 5970770, 5970780, 5970790, 
5970800, 5970810, 5970820, 5970610, 5970620, 5970630, 5970640, 5970650, 
5970660, 5970670, 5970680,

 5970690, 5970700),
 z= c(-1.5621, -1.5758, -1.5911, -1.6079, -1.6247, 
-1.5704, -1.5840, -1.5976, -1.6113, -1.6249, -1.6385, -1.6521, -1.6658, 
-1.6794, -1.6930, -1.7067,
  -1.7216, -1.7384, -1.5786, -1.5922, -1.6059, 
-1.6195, -1.6331, -1.6468, -1.6604, -1.6740, -1.6877, -1.7013, -1.7149, 
-1.7285, -1.7422, -1.7558,
  -1.7694, -1.7831, -1.7967, -1.8103, -1.8239, 
-1.8376, -1.8522, -1.8690, -1.5869, -1.6005, -1.6141, -1.6278, -1.6414, 
-1.6550, -1.6686, -1.6823,

  -1.6959, -1.7095))
head(df)
plot(df[,1:2], las=3)   # to show that it's a regular grid

My question: is there a function to calculate the depth of any 
coordinate pair (e.g. x=3454263, y=5970687) within the grid, e.g. by 
bilinear interpolation or any other meaningful method?


Thanks a lot for your help in anticipation

Best wishes

Thomas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Creating a R-package in R-Studio

2014-01-29 Thread Kulupp

Dear R-help community,

I am creating my first R-package in RStudio and wanted to add datasets 
to the package. I added an .RData file containing data (a data frame) in 
the 'data' folder of the package and could load the data as usual by 
typing in the console: data(xyz). Then I added a .RData file (a list) 
and could not load the data. Can anybody explain how I could make data 
available in my package that are saved as a list? Thanks a lot in 
anticipation.


Best regards

Thomas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Avoiding slow for-loops (once again)

2013-08-05 Thread Kulupp

# Dear R-experts,

# (Once again) I want to avoid the usage of for-loops but unfortunately 
I don't know how.
# I know functions like e.g. 'apply' but don't know how to use them for 
the solution of the following problem:
# I have a data frame 'a' giving the number of columns in data frame 'b' 
that belong to one group

a - data.frame(group1=5, group2=4)

b - data.frame(col1=c(0,0,0), col2=c(0,1,0.5), col3=c(0,0,0), 
col4=c(1/3,0,0.5), col5=c(2/3,0,0),
col6=c(0,0,0), col7=c(1,1/3,0), col8=c(0,2/3,0), 
col9=c(0,0,0))


# ... thus columns 1-5 in 'b' belong to group 1 and columns 6-9 in 'b' 
belong to group 2


# then I created a data frame giving all possible row combinations of 'b'
r - as.data.frame(t(combn(nrow(b), 2)))

# .. so e.g. the second row of 'r' tells me that I have to perform an 
equation with the values of the
# first and third row of table 'b'. The equation has to be calculated 
for each group seperately.
# e.g. within group 2 (columns 6-9 in 'b') I have to calculate e.g. for 
rows 1 and 3 in 'b'
#  (abs(b[row1,col6] - b[row3, col6]) + abs(b[row1, col7] - b[row3, 
col7]) +  + abs(b[row1, col9] - b[row3, col9]))/2


# the resulting data frame shall look as follows:
result - cbind(r, data.frame(group1=c(1,2/3,0.5), group2=c(2/3,0.5,0.5)))

# The original tables are much larger and I don't know how to solve this 
problem w/o a lot of very slow for-loops.

# Is there any possible solution w/o using 'for'-loops?

# I'd be happy for any suggestions
# Thank you very much in anticipation
# Best regards
# Thomas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Avoiding loops using 'for' and pairwise comparison of columns

2013-06-24 Thread Kulupp

Dear R-experts,

I'd like to avoid the use of very slow 'for'-loops but I don't know how. 
My data look as follows (the original data has 1600 rows and 30 columns):


# data example
c1 - c(1,1,1,0.25,0,1,1,1,0,1)
c2 - c(0,0,1,1,0,1,0,1,0.5,1)
c3 - c(0,1,1,1,0,0.75,1,1,0.5,0)
x - data.frame(c1,c2,c3)

I need to compare every column with each other and want to know the 
percentage of similar values for each column pair. To calculate the 
percentage of similar values I used the function 'agree' from the 
irr-package. I solved the problem with a loop that is very slow.


library(irr) # required for the function 'agree'

# empty data frame for the results
a - as.data.frame(matrix(data=NA, nrow=3, ncol=3))
colnames(a) - colnames(x)
rownames(a) - colnames(x)

# the loop to write the data
for (j in 1:ncol(x)){
  for (i in 1:ncol(x)){
a[i,j] - agree(cbind(x[,j], x[,i]))$value } }


I would be very pleased to receive your suggestions how to avoid the 
loop. Furthermore the resulting data frame could be displayed as a 
diagonal matrix without duplicates of each pairwise comparison, but I 
don't know how to solve this problem.


Kind regards

Thomas

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Rao's quadratic entropy with fuzzy coded trait data

2013-06-11 Thread Kulupp
Dear R-help-list-readers,

I wonder if it is possible to calculate calculate Rao's quadratic 
entropy based on fuzzy coded trait data with R? As I have understood it 
the packages 'FD' and 'ade4' don't seem to support fuzzy coded data for 
calculation Rao's quadratic entropy. Thank you very much for helping in 
anticipation.

Best regards

Thomas

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] fisher.alpha warnings

2013-03-19 Thread Kulupp
I have two vectors (a and b) with counts of animals and wanted to 
calculate fisher's alpha:


library(vegan)
a - c(2043, 1258, 52, 1867, 107, 1624, 2, 157, 210, 402, 5, 107, 267, 
2, 13683)
b - c(2043, 1258, 52, 1867, 107, 1624, 2, 157, 210, 402, 5, 107, 267, 
2, 3000)

fisher.alpha(a)
fisher.alpha(b)

fisher.alpha(a) gave the following warnings:

 fisher.alpha(a)
[1] 1.572964
Warnmeldungen:
1: In log(p) : NaNs wurden erzeugt
2: In log(1 - x) : NaNs wurden erzeugt
3: In nlm(Dev.logseries, n.r = n.r, p = p, N = N, hessian = TRUE, ...) :
  NA/Inf durch größte positive Zahl ersetzt

fisher.alpha(b) gave no warnings (note: only the last number in the 
vector 'b' differs from 'a'!)


Why did vector 'a' gave warnings and what does that mean for the 
validity of the calculated alpha-value?


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.