approx() has a 'rule' argument that controls how it deals with extrapolation. Run help(approx) and read about the details.
Bill Dunlap TIBCO Software wdunlap tibco.com On Fri, Jul 22, 2016 at 8:29 AM, lily li <chocol...@gmail.com> wrote: > Thanks, Ismail. > For the gaps before 2009-01-05 and after 2009-11-20, I use the year 2010 to > fill in the missing values for column C. There is no relationship between > column A, B, and C. > For the missing values between 2009-01-05 and 2009-11-20, if there are any, > I found this approach is very helpful. > with(df, approx(x=time, y=C, xout=seq(min(time), max(time), by="days"))) > > > > On Thu, Jul 21, 2016 at 5:14 PM, Ismail SEZEN <sezenism...@gmail.com> > wrote: > > > > > > On 22 Jul 2016, at 01:34, lily li <chocol...@gmail.com> wrote: > > > > > > I have a question about interpolating missing values in a dataframe. > > > > First of all, filling missing values action must be taken into account > > very carefully. It must be known the nature of the data that wanted to be > > filled and most of the time, to let them be NA is the most appropriate > > action. > > > > > The > > > dataframe is in the following, Column C has no data before 2009-01-05 > and > > > after 2009-12-31, how to interpolate data for the blanks? > > > > Why a dataframe? Is there any relationship between columns A,B and C? If > > there is, then you might want to consider filling missing values by a > > linear model approach instead of interpolation. You said that there is > not > > data before 2009-01-05 and after 2009-12-31 but according to dataframe, > > there is not data after 2009-11-20? > > > > > That is to say, > > > interpolate linearly between these two gaps using 5.4 and 6.1? Thanks. > > > > Also you metion interpolating blanks but you want interpolation between > > two gaps? Do you want to fill missing values before 2009-01-05 and after > > 2009-11-20 or do you want to find intermediate values between 2009-01-05 > > and 2009-11-20? This is a bit unclear. > > > > > > > > > > > df > > > time A B C > > > 2009-01-01 3 4.5 > > > 2009-01-02 4 5 > > > 2009-01-03 3.3 6 > > > 2009-01-04 4.1 7 > > > 2009-01-05 4.4 6.2 5.4 > > > ... > > > > > > 2009-11-20 5.1 5.5 6.1 > > > 2009-11-21 5.4 4 > > > ... > > > 2009-12-31 4.5 6 > > > > > > If you want to fill missing values at the end-points for column C (before > > 2009-01-05 and after 2009-11-20), and all data you have is between > > 2009-01-05 and 2009-11-20, this means that you want extrapolation > (guessing > > unkonwn values that is out of known values). So, you can use only values > at > > column C to guess missing end-point values. You can use splinefun (or > > spline) functions for this purpose. But let me note that this kind of > > approach might help you only for a few missing values close to > end-points. > > Otherwise, you might find yourself in a huge mistake. > > > > As I mentioned in my first sentence, If you have a relationship between > > all columns or you have data for column C for other years (for instance, > > assume that you have data for column C for 2007, 2008, and 2010 but not > > 2009) you may want to try a statistical approach to fill the missing > values. > > > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.