Hi Peter and List, I realized the err of my ways here. Thanks for the response; I appreciate the help. The struggles of self-taught statistics and maths continue!
Chris On 15 November 2010 04:34, P Ehlers <ehl...@ucalgary.ca> wrote: > Chris Carleton wrote: > >> Hi List, >> >> I'm trying to get a density estimate for a point of interest from an >> npudens >> object created for a sample of points. I'm working with 4 variables in >> total >> (3 continuous and 1 unordered discrete - the discrete variable is the >> character column in training.csv). When I try to evaluate the density for >> a >> point that was not used in the training dataset, and when I extract the >> fitted values from the npudens object itself, I'm getting values that are >> much greater than 1 in some cases, which, if I understand correctly, >> shouldn't be possible considering a pdf estimate can only be between 0 and >> 1. I think I must be doing something wrong, but I can't see it. Attached >> I've included the training data (training.csv) and the point of interest >> (origin.csv); below I've included the code I'm using and the results I'm >> getting. I also don't understand why, when trying to evaluate the npudens >> object at one point, I'm receiving the same set of fitted values from the >> npudens object with the predict() function. It should be noted that I'm >> indexing the dataframe of training data in order to get samples of the df >> for density estimation (the samples are from different geographic >> locations >> measured on the same set of variables; hence my use of sub-setting by [i] >> and removing columns from the df before running the density estimation). >> Moreover, in the example I'm providing here, the point of interest does >> happen to come from the training dataset, but I'm receiving the same >> results >> when I compare the point of interest to samples of which it is not a part >> (density estimates that are either extremely small, which is acceptable, >> or >> much greater than one, which doesn't seem right to me). Any thoughts would >> be greatly appreciated, >> >> Chris >> >> > I haven't looked at this in any detail, but why do say that pdf values > cannot exceed 1? That's certainly not true in general. > > -Peter Ehlers > > > fitted(npudens(tdat=training_df[training_cols_select][training_df$cat == >>> >> i,])) >> >> [1] 7.762187e+18 9.385532e+18 6.514318e+18 7.583486e+18 6.283017e+18 >> [6] 6.167344e+18 9.820551e+18 7.952821e+18 7.882741e+18 1.744266e+19 >> [11] 6.653258e+18 8.704722e+18 8.631365e+18 1.876052e+19 1.995445e+19 >> [16] 2.323802e+19 1.203780e+19 8.493055e+18 8.485279e+18 1.722033e+19 >> [21] 2.227207e+19 2.177740e+19 2.168679e+19 9.329572e+18 9.380505e+18 >> [26] 1.023311e+19 2.109676e+19 7.903112e+18 7.935457e+18 8.917777e+18 >> [31] 8.899827e+18 6.265440e+18 6.204720e+18 6.276559e+18 6.218002e+18 >> >> npu_dens <- >>> npudens(tdat=training_df[training_cols_select][training_df$cat >>> >> == i,]) >> >>> summary(npu_dens) >>> >> >> Density Data: 35 training points, in 4 variable(s) >> aster_srtm_aspect aster_srtm_dem_filled aster_srtm_slope >> Bandwidth(s): 29.22422 2.500559e-24 3.111467 >> class_unsup_pc_iso >> Bandwidth(s): 0.2304616 >> >> Bandwidth Type: Fixed >> Log Likelihood: 1531.598 >> >> Continuous Kernel Type: Second-Order Gaussian >> No. Continuous Vars.: 3 >> >> Unordered Categorical Kernel Type: Aitchison and Aitken >> No. Unordered Categorical Vars.: 1 >> >> predict(npu_dens,newdata=origin[training_cols_select])) >>> >> >> [1] 7.762187e+18 9.385532e+18 6.514318e+18 7.583486e+18 6.283017e+18 >> [6] 6.167344e+18 9.820551e+18 7.952821e+18 7.882741e+18 1.744266e+19 >> [11] 6.653258e+18 8.704722e+18 8.631365e+18 1.876052e+19 1.995445e+19 >> [16] 2.323802e+19 1.203780e+19 8.493055e+18 8.485279e+18 1.722033e+19 >> [21] 2.227207e+19 2.177740e+19 2.168679e+19 9.329572e+18 9.380505e+18 >> [26] 1.023311e+19 2.109676e+19 7.903112e+18 7.935457e+18 8.917777e+18 >> [31] 8.899827e+18 6.265440e+18 6.204720e+18 6.276559e+18 6.218002e+18 >> > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.