Hi List, I'm trying to get a density estimate for a point of interest from an npudens object created for a sample of points. I'm working with 4 variables in total (3 continuous and 1 unordered discrete - the discrete variable is the character column in training.csv). When I try to evaluate the density for a point that was not used in the training dataset, and when I extract the fitted values from the npudens object itself, I'm getting values that are much greater than 1 in some cases, which, if I understand correctly, shouldn't be possible considering a pdf estimate can only be between 0 and 1. I think I must be doing something wrong, but I can't see it. Attached I've included the training data (training.csv) and the point of interest (origin.csv); below I've included the code I'm using and the results I'm getting. I also don't understand why, when trying to evaluate the npudens object at one point, I'm receiving the same set of fitted values from the npudens object with the predict() function. It should be noted that I'm indexing the dataframe of training data in order to get samples of the df for density estimation (the samples are from different geographic locations measured on the same set of variables; hence my use of sub-setting by [i] and removing columns from the df before running the density estimation). Moreover, in the example I'm providing here, the point of interest does happen to come from the training dataset, but I'm receiving the same results when I compare the point of interest to samples of which it is not a part (density estimates that are either extremely small, which is acceptable, or much greater than one, which doesn't seem right to me). Any thoughts would be greatly appreciated,
Chris > fitted(npudens(tdat=training_df[training_cols_select][training_df$cat == i,])) [1] 7.762187e+18 9.385532e+18 6.514318e+18 7.583486e+18 6.283017e+18 [6] 6.167344e+18 9.820551e+18 7.952821e+18 7.882741e+18 1.744266e+19 [11] 6.653258e+18 8.704722e+18 8.631365e+18 1.876052e+19 1.995445e+19 [16] 2.323802e+19 1.203780e+19 8.493055e+18 8.485279e+18 1.722033e+19 [21] 2.227207e+19 2.177740e+19 2.168679e+19 9.329572e+18 9.380505e+18 [26] 1.023311e+19 2.109676e+19 7.903112e+18 7.935457e+18 8.917777e+18 [31] 8.899827e+18 6.265440e+18 6.204720e+18 6.276559e+18 6.218002e+18 > npu_dens <- npudens(tdat=training_df[training_cols_select][training_df$cat == i,]) > summary(npu_dens) Density Data: 35 training points, in 4 variable(s) aster_srtm_aspect aster_srtm_dem_filled aster_srtm_slope Bandwidth(s): 29.22422 2.500559e-24 3.111467 class_unsup_pc_iso Bandwidth(s): 0.2304616 Bandwidth Type: Fixed Log Likelihood: 1531.598 Continuous Kernel Type: Second-Order Gaussian No. Continuous Vars.: 3 Unordered Categorical Kernel Type: Aitchison and Aitken No. Unordered Categorical Vars.: 1 > predict(npu_dens,newdata=origin[training_cols_select])) [1] 7.762187e+18 9.385532e+18 6.514318e+18 7.583486e+18 6.283017e+18 [6] 6.167344e+18 9.820551e+18 7.952821e+18 7.882741e+18 1.744266e+19 [11] 6.653258e+18 8.704722e+18 8.631365e+18 1.876052e+19 1.995445e+19 [16] 2.323802e+19 1.203780e+19 8.493055e+18 8.485279e+18 1.722033e+19 [21] 2.227207e+19 2.177740e+19 2.168679e+19 9.329572e+18 9.380505e+18 [26] 1.023311e+19 2.109676e+19 7.903112e+18 7.935457e+18 8.917777e+18 [31] 8.899827e+18 6.265440e+18 6.204720e+18 6.276559e+18 6.218002e+18
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.