Re: [R] R package 'np' problems

Chris Carleton Mon, 15 Nov 2010 05:46:02 -0800

Hi Peter and List,

I realized the err of my ways here. Thanks for the response; I appreciate
the help. The struggles of self-taught statistics and maths continue!


Chris

On 15 November 2010 04:34, P Ehlers <ehl...@ucalgary.ca> wrote:

> Chris Carleton wrote:
>
>> Hi List,
>>
>> I'm trying to get a density estimate for a point of interest from an
>> npudens
>> object created for a sample of points. I'm working with 4 variables in
>> total
>> (3 continuous and 1 unordered discrete - the discrete variable is the
>> character column in training.csv). When I try to evaluate the density for
>> a
>> point that was not used in the training dataset, and when I extract the
>> fitted values from the npudens object itself, I'm getting values that are
>> much greater than 1 in some cases, which, if I understand correctly,
>> shouldn't be possible considering a pdf estimate can only be between 0 and
>> 1. I think I must be doing something wrong, but I can't see it. Attached
>> I've included the training data (training.csv) and the point of interest
>> (origin.csv); below I've included the code I'm using and the results I'm
>> getting. I also don't understand why, when trying to evaluate the npudens
>> object at one point, I'm receiving the same set of fitted values from the
>> npudens object with the predict() function. It should be noted that I'm
>> indexing the dataframe of training data in order to get samples of the df
>> for density estimation (the samples are from different geographic
>> locations
>> measured on the same set of variables; hence my use of sub-setting by [i]
>> and removing columns from the df before running the density estimation).
>> Moreover, in the example I'm providing here, the point of interest does
>> happen to come from the training dataset, but I'm receiving the same
>> results
>> when I compare the point of interest to samples of which it is not a part
>> (density estimates that are either extremely small, which is acceptable,
>> or
>> much greater than one, which doesn't seem right to me). Any thoughts would
>> be greatly appreciated,
>>
>> Chris
>>
>>
> I haven't looked at this in any detail, but why do say that pdf values
> cannot exceed 1? That's certainly not true in general.
>
>  -Peter Ehlers
>
>
>  fitted(npudens(tdat=training_df[training_cols_select][training_df$cat ==
>>>
>> i,]))
>>
>> [1] 7.762187e+18 9.385532e+18 6.514318e+18 7.583486e+18 6.283017e+18
>>  [6] 6.167344e+18 9.820551e+18 7.952821e+18 7.882741e+18 1.744266e+19
>> [11] 6.653258e+18 8.704722e+18 8.631365e+18 1.876052e+19 1.995445e+19
>> [16] 2.323802e+19 1.203780e+19 8.493055e+18 8.485279e+18 1.722033e+19
>> [21] 2.227207e+19 2.177740e+19 2.168679e+19 9.329572e+18 9.380505e+18
>> [26] 1.023311e+19 2.109676e+19 7.903112e+18 7.935457e+18 8.917777e+18
>> [31] 8.899827e+18 6.265440e+18 6.204720e+18 6.276559e+18 6.218002e+18
>>
>>  npu_dens <-
>>> npudens(tdat=training_df[training_cols_select][training_df$cat
>>>
>> == i,])
>>
>>> summary(npu_dens)
>>>
>>
>> Density Data: 35 training points, in 4 variable(s)
>>              aster_srtm_aspect aster_srtm_dem_filled aster_srtm_slope
>> Bandwidth(s):          29.22422          2.500559e-24         3.111467
>>              class_unsup_pc_iso
>> Bandwidth(s):          0.2304616
>>
>> Bandwidth Type: Fixed
>> Log Likelihood: 1531.598
>>
>> Continuous Kernel Type: Second-Order Gaussian
>> No. Continuous Vars.: 3
>>
>> Unordered Categorical Kernel Type: Aitchison and Aitken
>> No. Unordered Categorical Vars.: 1
>>
>>  predict(npu_dens,newdata=origin[training_cols_select]))
>>>
>>
>> [1] 7.762187e+18 9.385532e+18 6.514318e+18 7.583486e+18 6.283017e+18
>>  [6] 6.167344e+18 9.820551e+18 7.952821e+18 7.882741e+18 1.744266e+19
>> [11] 6.653258e+18 8.704722e+18 8.631365e+18 1.876052e+19 1.995445e+19
>> [16] 2.323802e+19 1.203780e+19 8.493055e+18 8.485279e+18 1.722033e+19
>> [21] 2.227207e+19 2.177740e+19 2.168679e+19 9.329572e+18 9.380505e+18
>> [26] 1.023311e+19 2.109676e+19 7.903112e+18 7.935457e+18 8.917777e+18
>> [31] 8.899827e+18 6.265440e+18 6.204720e+18 6.276559e+18 6.218002e+18
>>
>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R package 'np' problems

Reply via email to