Re: [R] take data from a file to another according to their correlation coefficient

2012-05-03 Thread jeff6868
Hi Rui it's me again.
I would have another question in the function process.all you explained
me. But as you already helped me a lot, and as I promised I won't disturb
you again, I want to ask you first if you accept to help me one more time
before telling you more precisely my problem (about adding an automatic
linear regression in order to have more realistic filling data in the gaps). 
I wrote you a personal message (don't know if you got it), because I would
like to send you a present from the Alps to thank you for all the help you
gave me, and maybe the new help (and so to have your home or work postal
address).
If you agree, let me know and send me your address by mail. I'll explain in
a new post what my boss wants me know to add in your function (this function
is so tricky for me to understand with my small knowledge + Google + R
help).

--
View this message in context: 
http://r.789695.n4.nabble.com/take-data-from-a-file-to-another-according-to-their-correlation-coefficient-tp4580054p4605385.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] take data from a file to another according to their correlation coefficient

2012-04-26 Thread jeff6868
Hello Rui,

For the write.table, it's OK!
And for the second one (for the 2nd best correlation) seems to work great!
You're too strong ^^
I have to check a bit more to be sure, but it seems to do it!

If you come in the Alps, it will be more liqueurs such as Chartreuse or
Génépi (from mountain plants) if you know them. I'll offer you one bottle if
you come one day. I could even send it to you in portugal if you want.
Thanks a lot again for all.

Geoffrey

--
View this message in context: 
http://r.789695.n4.nabble.com/take-data-from-a-file-to-another-according-to-their-correlation-coefficient-tp4580054p4590193.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] take data from a file to another according to their correlation coefficient

2012-04-25 Thread jeff6868
Seems to work great! 

I have a last question (or 2) for you about it, and I will leave you alone
afterwords, I promise :)

I tested your function process.all for the automatization. It seems to be
OK.
It's just when I'd like to save the filled data files.
If I name process.all, for example:  test - process.all(lst, corr2008)
and I save it: write.table(test, ...)
and I check the test file, It has filled my data but all the files from
lst are in one file (the columns are: ST001, ST001_time, ST002,
ST002_time, . (with ST001 for station 1 for example)).
How can I cut these files and save them automatically (one file for ST001,
another for ST002, ...) according to these columns names?

And it is possible in your script to take the second best correlated station
data instead of the best one, if there are NAs in this best correlated
station at the same lines with the NA gaps of the station to fill?

Thanks again for all your help. If you come one day in France near the Alps
or Chamonix (where I'm working), just tell me. I'll pay you some beers or a
restaurant! You deserve it ^^
By the way, where do my rescuer come from? Are you a statistician?

Geoffrey

--
View this message in context: 
http://r.789695.n4.nabble.com/take-data-from-a-file-to-another-according-to-their-correlation-coefficient-tp4580054p4586079.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] take data from a file to another according to their correlation coefficient

2012-04-25 Thread Rui Barradas
Hello,

The first is easy.


 How can I cut these files and save them automatically (one file for ST001,
 another for ST002, ...) according to these columns names?
 

Similar to the way they were read, using lapply on the results list.
But first make a file names vector.
(I've used the file extension 'dat'.)


test - process.all(lst, m)
fl.names - paste(names(test), dat, sep=.)
lapply(seq_len(length(test)), function(i) write.table(test[[i]],
fl.names[i], ...))


The second is trickier.


 And it is possible in your script to take the second best correlated
 station data
 instead of the best one, if there are NAs in this best correlated station
 at the same
 lines with the NA gaps of the station to fill?


In the function 'process.all', after the internal function 'f',
include the following.


g - function(station){
x - df.list[[station]]
if(any(is.na(x$data))){
mat[row(mat) == col(mat)] - -Inf
nas - which(is.na(x$data))
ord - order(mat[station, ], decreasing = TRUE)[-c(1, 
ncol(mat))]
for(i in nas){
for(y in ord){
if(!is.na(df.list[[y]]$data[i])){
x$data[i] - df.list[[y]]$data[i]
break
}
}
}
}
x
}


Then, change the second pass to

# Note that the two passes are different
df.list - lapply(seq.int(n), f)
df.list - lapply(seq.int(n), g)


And I come from Portugal. I'm a mathematician (with 6 semesters of stats).
When I go to France, it's more to Charente-Maritime - Cognac, I have friends
there,
and I'll definitelly have a couple of cognacs on you.
Good luck with your assignment.

Rui Barradas



--
View this message in context: 
http://r.789695.n4.nabble.com/take-data-from-a-file-to-another-according-to-their-correlation-coefficient-tp4580054p4586719.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] take data from a file to another according to their correlation coefficient

2012-04-24 Thread jeff6868
Hi again Rui,

I tested your script as you wrote it with my examples, it works perfectly!
It seems to be exactly what I'm trying to do.
I just have a question about your function na.fill.
When I'm trying to apply your script to my data, it doesn't work. I think
it's because in your example, you already open the data.frames in your list.
But in my case, these data.frames are in different files (as I have 70
files). I'm trying to apply your function na.fill on a list.files.
That's why I think it tells me: Error dans x$data : $ operator is invalid
for atomic vectors
I tried like this: x[,2] but it doesn't work too: incorrect number of
dimensions.
How can I do exactly the same for na.fill, but by calling a file (according
to the name of the file) and not directly a data.frame like you (s1,s2,s3)? 

--
View this message in context: 
http://r.789695.n4.nabble.com/take-data-from-a-file-to-another-according-to-their-correlation-coefficient-tp4580054p4583404.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] take data from a file to another according to their correlation coefficient

2012-04-24 Thread Rui Barradas
Hello,

Try putting the function call in a lapply, along

lst - lapply(list.files(path, pattern), read.table, header=TRUE,
stringsAsFactors=FALSE)

You don't stricktly need a list for na.fill, but you do need two
data.frames, not filenames.
The list is used by the other functions.
(It's also a good idea to have related objects within the same data
structure.)

Rui Barradas


--
View this message in context: 
http://r.789695.n4.nabble.com/take-data-from-a-file-to-another-according-to-their-correlation-coefficient-tp4580054p4583785.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] take data from a file to another according to their correlation coefficient

2012-04-23 Thread jeff6868
Hi everyone.

I have a question about a work on R I have to do for my job.
I have temperature data coming from 70 weather stations. One data file
corresponds to one station for one year (so 70 files for one year). Each
file looks like this (important: each file contains NAs):

time  data
01/01/2008 00:00 -0.25 
01/01/2008 00:15 -0.18 
01/01/2008 00:30 -0.25 
01/01/2008 00:45 -0.25 

(one column with date + time every 15mn for the whole year, and one column
with data). 

I already did correlation matrices between my weather stations (in order to
find the nearest). For example:

  Station1 Station2 Station3 [...]
Station11  0.90.8
Station20.9 1 0.7
Station30.8   0.7 1
[...]

Now, I would like to fill the NA data gaps of a station with data from
another station according to their correlation coefficient.
Let's take an example for the Station 1: if the most correlated Station with
Station 1 is Station 2, it has to take data from Station 2 to fill NA gaps
of Station 1, for the same date and hour of course (or same lines as I'm
doing correlations for the same year). 
So for year 2008 (for example), if the correlation is the highest between
Station 1 and 2 (according to all the Stations), and if the data are:

timedata
01/01/2008 00:00   1
01/01/2008 00:15   2   FOR STATION 1
01/01/2008 00:30   *NA* 
01/01/2008 00:45   4 

and 

timedata
01/01/2008 00:00   8
01/01/2008 00:15   9  FOR STATION 2 for the same year and the same
time
01/01/2008 00:30   *10 *
01/01/2008 00:45   11

The Station1 file should become:

timedata
01/01/2008 00:00   1
01/01/2008 00:15   2   STATION 1
01/01/2008 00:30   *10 *
01/01/2008 00:45   4 

Hope you've understood what I would like to do :)
Thanks a lot for your ideas and your replies!






--
View this message in context: 
http://r.789695.n4.nabble.com/take-data-from-a-file-to-another-according-to-their-correlation-coefficient-tp4580054p4580054.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] take data from a file to another according to their correlation coefficient

2012-04-23 Thread Sarah Goslee
Hi,

Even your example should show why this is a bad way to fill in missing weather 
data: you end up with a sequence for station 1 of 1, 2, 10, 4 even though 
that's certainly wrong because Station 2 is reliably 7 units above Station 1. 
Correlated doesn't mean identical.

There are other better options. If you're only missing a single value, 
interpolation between the values you do have for that station is likely better. 
If you're missing lots, regression of that station with another correlated 
station would be the more reasonable way to do what you're trying to propose 
here.

But in fact interpolation of weather data is vey complicated, and the subject 
of a lot of research. The most realistic methods use elevation as a covariate. 
These may well be overkill for your situation, though, unless you are missing 
whole days of data.

Sarah

On Apr 23, 2012, at 6:42 AM, jeff6868 geoffrey_kl...@etu.u-bourgogne.fr wrote:

 Hi everyone.
 
 I have a question about a work on R I have to do for my job.
 I have temperature data coming from 70 weather stations. One data file
 corresponds to one station for one year (so 70 files for one year). Each
 file looks like this (important: each file contains NAs):
 
 time  data
 01/01/2008 00:00 -0.25 
 01/01/2008 00:15 -0.18 
 01/01/2008 00:30 -0.25 
 01/01/2008 00:45 -0.25 
 
 (one column with date + time every 15mn for the whole year, and one column
 with data). 
 
 I already did correlation matrices between my weather stations (in order to
 find the nearest). For example:
 
  Station1 Station2 Station3 [...]
 Station11  0.90.8
 Station20.9 1 0.7
 Station30.8   0.7 1
 [...]
 
 Now, I would like to fill the NA data gaps of a station with data from
 another station according to their correlation coefficient.
 Let's take an example for the Station 1: if the most correlated Station with
 Station 1 is Station 2, it has to take data from Station 2 to fill NA gaps
 of Station 1, for the same date and hour of course (or same lines as I'm
 doing correlations for the same year). 
 So for year 2008 (for example), if the correlation is the highest between
 Station 1 and 2 (according to all the Stations), and if the data are:
 
 timedata
 01/01/2008 00:00   1
 01/01/2008 00:15   2   FOR STATION 1
 01/01/2008 00:30   *NA* 
 01/01/2008 00:45   4 
 
 and 
 
 timedata
 01/01/2008 00:00   8
 01/01/2008 00:15   9  FOR STATION 2 for the same year and the same
 time
 01/01/2008 00:30   *10 *
 01/01/2008 00:45   11
 
 The Station1 file should become:
 
 timedata
 01/01/2008 00:00   1
 01/01/2008 00:15   2   STATION 1
 01/01/2008 00:30   *10 *
 01/01/2008 00:45   4 
 
 Hope you've understood what I would like to do :)
 Thanks a lot for your ideas and your replies!
 
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] take data from a file to another according to their correlation coefficient

2012-04-23 Thread jeff6868
Hi Sarah,

Thank you for your answer.
Yes I know that my proposition is not necessary the better way to do it. But
my problem concerns only big gaps of course (more than half a day of missing
data, till several months of missing data).
I've already filled small gaps with the interpolation that you were talking
in your message (with the function na.approx of the package zoo).
For the study, it's not important to have perfectly  identical values
between the 2 correlated stations, because I'll calculate after the
reconstruction the daily mean of each station. For my boss, it's enough to
work on daily means. But before that, I need to rebuild the big missing data
gaps of my stations (by the way I explained in the first message of my
topic).
Do you have any idea of the way to do it on R according to my first post?
I forgot to precise that my examples are completely fakes! I chose these
numbers in order for you to understand what I want to do (I chose easy and
readable numbers). I tested on excel with 2 stations, it was not too bad
when I filled the gaps (between the data of the 2 well correlated stations).


--
View this message in context: 
http://r.789695.n4.nabble.com/take-data-from-a-file-to-another-according-to-their-correlation-coefficient-tp4580054p4580296.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] take data from a file to another according to their correlation coefficient

2012-04-23 Thread jeff6868
Hi Rui,

Yes you're right. It's me again ^^
This post is the last part (I hope) of my job. You helped me a lot last time
for the correlation matrices. 
I have to leave my work now, so I'll check and test your proposition
tomorrow. But it makes no doubt that it'll help me a lot again. 
I'll tell you tomorrow. Thanks Rui!

--
View this message in context: 
http://r.789695.n4.nabble.com/take-data-from-a-file-to-another-according-to-their-correlation-coefficient-tp4580054p4580898.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] take data from a file to another according to their correlation coefficient

2012-04-23 Thread Rui Barradas
Hello,



jeff6868 wrote
 
 Hi Sarah,
 
 Thank you for your answer.
 Yes I know that my proposition is not necessary the better way to do it.
 But my problem concerns only big gaps of course (more than half a day of
 missing data, till several months of missing data).
 I've already filled small gaps with the interpolation that you were
 talking in your message (with the function na.approx of the package zoo).
 For the study, it's not important to have perfectly  identical values
 between the 2 correlated stations, because I'll calculate after the
 reconstruction the daily mean of each station. For my boss, it's enough to
 work on daily means. But before that, I need to rebuild the big missing
 data gaps of my stations (by the way I explained in the first message of
 my topic).
 Do you have any idea of the way to do it on R according to my first post?
 I forgot to precise that my examples are completely fakes! I chose these
 numbers in order for you to understand what I want to do (I chose easy and
 readable numbers). I tested on excel with 2 stations, it was not too bad
 when I filled the gaps (between the data of the 2 well correlated
 stations).
 

I remember this data set from some time ago. (Weeks?)

First of all, please use ?dput to post your data, it makes it much easier
for everyone to
just copy and paste to an R session. The output you should post looks like
this:

 dput(s1)
structure(list(time = c(01/01/2008 00:00, 01/01/2008 00:15, 
01/01/2008 00:30, 01/01/2008 00:45), data = c(1L, 2L, NA, 
4L)), .Names = c(time, data), row.names = c(NA, -4L), class =
data.frame)
 dput(s2)
structure(list(time = c(01/01/2008 00:00, 01/01/2008 00:15, 
01/01/2008 00:30, 01/01/2008 00:45), data = 8:11), .Names = c(time, 
data), row.names = c(NA, -4L), class = data.frame)
 dput(s3)
structure(list(time = c(01/01/2008 00:00, 01/01/2008 00:15, 
01/01/2008 00:30, 01/01/2008 00:45), data = c(123L, NA, NA, 
NA)), .Names = c(time, data), row.names = c(NA, -4L), class =
data.frame)
 dput(m)
structure(c(1, 0.9, 0.8, 0.9, 1, 0.7, 0.8, 0.7, 1), .Dim = c(3L, 
3L), .Dimnames = list(c(Station1, Station2, Station3), 
c(Station1, Station2, Station3)))


I've named your data.frames 's1', 's2' and made up an 's3'; 'm' is the
correlation matrix.

Now the problem.
Sarah's comment seems sensible, to just fill in missing values using some
other dataset isn't very canonic
but here it goes.
It assumes the data frames are in a list.

lst - list(s1, s2, s3)
names(lst) - paste(Station, seq.int(length(lst)), sep=)
lst



# station - list number or name, not the data.frame
# mat - correlation matrix
get.max.cor - function(station, mat){
mat[row(mat) == col(mat)] - -Inf
which( mat[station, ] == max(mat[station, ]) )
}

# x - data.frame to be transformed
# y - data.frame with greater correlation
na.fill - function(x, y){
i - is.na(x$data)
x$data[i] - y$data[i]
x
}

mx.cor - get.max.cor(1, m)
mx.cor
na.fill(lst[[1]], lst[[mx.cor]])

Like it's said in the comments before the function, the call to the first
function could be

get.max.cor(Station1, m)

The two functions above solve the problem, all what's left to do is to
automate their calls.
Note that there might be a need for two passes through 'na.fill', if the
data.frame with greater correlation
also has NAs. This is the case of Station1 filling in values for Station3.
Try commenting out the second pass
in the function below


process.all - function(df.list, mat){
f - function(station)
na.fill(df.list[[ station ]], df.list[[ max.cor[station] ]])
#
n - length(df.list)
nms - names(df.list)
# First the max on each row
max.cor - sapply(seq.int(n), get.max.cor, m)
# Note the two passes
df.list - lapply(seq.int(n), f)
df.list - lapply(seq.int(n), f)
# Makes nicer output
names(df.list) - nms
df.list
}

process.all(lst, m)



Hope this helps,

Rui Barradas


--
View this message in context: 
http://r.789695.n4.nabble.com/take-data-from-a-file-to-another-according-to-their-correlation-coefficient-tp4580054p4580845.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.