> On 22 Jul 2016, at 01:34, lily li <chocol...@gmail.com> wrote:
> 
> I have a question about interpolating missing values in a dataframe.

First of all, filling missing values action must be taken into account very 
carefully. It must be known the nature of the data that wanted to be filled and 
most of the time, to let them be NA is the most appropriate action.

> The
> dataframe is in the following, Column C has no data before 2009-01-05 and
> after 2009-12-31, how to interpolate data for the blanks?

Why a dataframe? Is there any relationship between columns A,B and C? If there 
is, then you might want to consider filling missing values by a linear model 
approach instead of interpolation. You said that there is not data before 
2009-01-05 and after 2009-12-31 but according to dataframe, there is not data 
after 2009-11-20?

> That is to say,
> interpolate linearly between these two gaps using 5.4 and 6.1? Thanks.

Also you metion interpolating blanks but you want interpolation between two 
gaps? Do you want to fill missing values before 2009-01-05 and after 2009-11-20 
or do you want to find intermediate values between 2009-01-05 and 2009-11-20? 
This is a bit unclear.

> 
> 
> df
> time                A      B     C
> 2009-01-01    3      4.5
> 2009-01-02    4      5
> 2009-01-03    3.3   6
> 2009-01-04    4.1   7
> 2009-01-05    4.4   6.2   5.4
> ...
> 
> 2009-11-20    5.1   5.5   6.1
> 2009-11-21    5.4   4
> ...
> 2009-12-31    4.5   6


If you want to fill missing values at the end-points for column C (before 
2009-01-05 and after 2009-11-20), and all data you have is between 2009-01-05 
and 2009-11-20, this means that you want extrapolation (guessing unkonwn values 
that is out of known values). So, you can use only values at column C to guess 
missing end-point values. You can use splinefun (or spline) functions for this 
purpose. But let me note that this kind of approach might help you only for a 
few missing values close to end-points. Otherwise, you might find yourself in a 
huge mistake. 

As I mentioned in my first sentence, If you have a relationship between all 
columns or you have data for column C for other years (for instance, assume 
that you have data for column C for 2007, 2008, and 2010 but not 2009) you may 
want to try a statistical approach to fill the missing values.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to