Thanks, Ismail. For the gaps before 2009-01-05 and after 2009-11-20, I use the year 2010 to fill in the missing values for column C. There is no relationship between column A, B, and C. For the missing values between 2009-01-05 and 2009-11-20, if there are any, I found this approach is very helpful. with(df, approx(x=time, y=C, xout=seq(min(time), max(time), by="days")))
On Thu, Jul 21, 2016 at 5:14 PM, Ismail SEZEN <sezenism...@gmail.com> wrote: > > > On 22 Jul 2016, at 01:34, lily li <chocol...@gmail.com> wrote: > > > > I have a question about interpolating missing values in a dataframe. > > First of all, filling missing values action must be taken into account > very carefully. It must be known the nature of the data that wanted to be > filled and most of the time, to let them be NA is the most appropriate > action. > > > The > > dataframe is in the following, Column C has no data before 2009-01-05 and > > after 2009-12-31, how to interpolate data for the blanks? > > Why a dataframe? Is there any relationship between columns A,B and C? If > there is, then you might want to consider filling missing values by a > linear model approach instead of interpolation. You said that there is not > data before 2009-01-05 and after 2009-12-31 but according to dataframe, > there is not data after 2009-11-20? > > > That is to say, > > interpolate linearly between these two gaps using 5.4 and 6.1? Thanks. > > Also you metion interpolating blanks but you want interpolation between > two gaps? Do you want to fill missing values before 2009-01-05 and after > 2009-11-20 or do you want to find intermediate values between 2009-01-05 > and 2009-11-20? This is a bit unclear. > > > > > > > df > > time A B C > > 2009-01-01 3 4.5 > > 2009-01-02 4 5 > > 2009-01-03 3.3 6 > > 2009-01-04 4.1 7 > > 2009-01-05 4.4 6.2 5.4 > > ... > > > > 2009-11-20 5.1 5.5 6.1 > > 2009-11-21 5.4 4 > > ... > > 2009-12-31 4.5 6 > > > If you want to fill missing values at the end-points for column C (before > 2009-01-05 and after 2009-11-20), and all data you have is between > 2009-01-05 and 2009-11-20, this means that you want extrapolation (guessing > unkonwn values that is out of known values). So, you can use only values at > column C to guess missing end-point values. You can use splinefun (or > spline) functions for this purpose. But let me note that this kind of > approach might help you only for a few missing values close to end-points. > Otherwise, you might find yourself in a huge mistake. > > As I mentioned in my first sentence, If you have a relationship between > all columns or you have data for column C for other years (for instance, > assume that you have data for column C for 2007, 2008, and 2010 but not > 2009) you may want to try a statistical approach to fill the missing values. > > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.