Re: [R-sig-Geo] Holdout Sampling Adaptive Bandwidth SPGWR

Paul Bidanset Fri, 30 Aug 2013 07:09:17 -0700

Alrighty then!

Say I create this adaptive bandwidth model using the original dataset
"georgia"


coords = cbind(georgia$x, georgia$y)
bwsel <- gwr.sel(PctBach ~ TotPop90 + PctRural + PctEld + PctFB + PctPov +
PctBlack, data=georgia, adapt=TRUE, coords, gweight=gwr.Gauss, method =
"aic" )
bw1 <- gw.adapt(coords, coords, quant=bwsel)
model1 <- gwr(PctBach ~ TotPop90 + PctRural + PctEld + PctFB + PctPov +
PctBlack, data=georgia, bw=b1, coords, hatmatrix=T)
model 1

Suppose I receive an updated data set (same dependent and independent
variables) and I wish to test the above model1's ability to predict the
dependent variable of these new data points. If this were a basic lm
regression in R, I would use the "predict()" command. I wish to better
understand how I would do so using a GWR model. I found the below
procedure, but I would like to know first if it is capable accomplishing
this task, and secondly, if I am specifying it correctly. It seems to me
that this procedure, as it stands, doesn't take into account the
appropriate bandwidths for the new data, say, "georgiaNewData"

PredictionsOfNewData  <- gwr(PctBach ~ TotPop90 + PctRural + PctEld + PctFB
+ PctPov + PctBlack, data=gSRDF, adapt=TRUE, gweight=gwr.Gauss, method =
"aic",  bandwidth=bw1,
predictions=TRUE, fit.points=georgiaNewData)
PredictionsOfNewData

Thanks in advance for guidance and insight...


On Fri, Aug 30, 2013 at 9:01 AM, Roger Bivand <roger.biv...@nhh.no> wrote:

> Provide a reproducible code example of your problem using a built in data
> set. No reproducible example, no response, as I cannot guess (and likely
> nobody else can either) what your specific misunderstanding is. Code using
> for example the Georgia data set in the package. You seem to be assuming
> that you understand how GWR works, I don't think that you do, so you have
> to show what you mean in code.
>
> Roger
>
>
> On Fri, 30 Aug 2013, Paul Bidanset wrote:
>
>  Roger,
>>
>> I think all I would like to know is if it is possible to apply a
>> calibrated
>> GWR model to a hold-out sample, and if so, what the most accurate way to
>> do
>> so is. I understand the pitfalls of GWR but would like to learn as much as
>> I can before progressing to the next spatial methodology I learn in R.
>>
>>
>> On Fri, Aug 30, 2013 at 3:37 AM, Roger Bivand <roger.biv...@nhh.no>
>> wrote:
>>
>>  Paul, Luis,
>>>
>>> I suspect that your speculations are completely wrong-headed. Please
>>> provide a reproducible example with a built-in data set, so that there is
>>> at least minimal clarity in what you are guessing. Note in addition that
>>> GWR as a technique should not be used for anything other than exploration
>>> of possible mis-specification in the underlying model with the given
>>> data,
>>> as patterning in coefficients is induced by GWR for simulated covariates
>>> with no pattern.
>>>
>>> Roger
>>>
>>>
>>> On Fri, 30 Aug 2013, Luis Guerra wrote:
>>>
>>>  Thank you Luis. When calibrating the adaptive model, using adapt=t in
>>> the
>>>
>>>> bandwidth selection created the proportion you speak of, which then
>>>>> allowed
>>>>> me to create a bandwidth matrix using gwr.adapt. However, this has not
>>>>> worked for me with holdout samples. Have you had success in this
>>>>> regard?
>>>>>
>>>>>  Now I get what you mean. Let's show an example:
>>>>>
>>>>
>>>> bw <- gwr.sel(var ~ var1, data=yourdata, adapt=TRUE)
>>>> m <- gwr(var~var1, data=yourdata, adapt=bw, fit.points=newdata)
>>>>
>>>> So an adaptative bandwidth (bw) is calculated based on"yourdata", while
>>>> you
>>>> are fitting "newdata" later on using that previously found bw. I had not
>>>> thought about it previously. Let's see whether someone else can help you
>>>> (us).
>>>>
>>>>
>>>>  I do not know the intended influence of these "fit.points". I would
>>>> think
>>>>
>>>>> that new localized regressions are not calculated, as we're testing the
>>>>> model and previous data points' ability to predict for these new ones,
>>>>> but
>>>>> I could be wrong. My current method, however, is producing much poorer
>>>>> results with the holdouts, which I am fairly sure is related to my
>>>>> inability to incorporate the new points necessary bandwidths.
>>>>>
>>>>>  Coming back to the previously created example, imagine that "newdata"
>>>>>
>>>> is a
>>>> single point that you want to fit. Imagine now that "yourdata" is a
>>>> sample
>>>> with 1000 cases. Then you are getting 1000 models with 1000 different
>>>> intercepts and 1000 different beta values to adjust var1, rigth? Which
>>>> of
>>>> all these parameters do you use for fitting "newdata"? And something
>>>> else,
>>>> what would happen with "newdata" if it is enough far away from
>>>> "yourdata"
>>>> and we would be using a fixed bandwidth?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>  On Aug 29, 2013 8:56 PM, "Luis Guerra" <luispelay...@gmail.com> wrote:
>>>>
>>>>>
>>>>>  Dear Paul,
>>>>>
>>>>>>
>>>>>> I am dealing with this kind of problems right now, and if I am not
>>>>>> wrong,
>>>>>> when you want to apply an adaptative bandwidth, you should introduce a
>>>>>> value for the "adapt" parameter instead of for the "bandwidth"
>>>>>> parameter.
>>>>>> This value will be between 0 and 1 and indicates the proportion of
>>>>>> cases
>>>>>> around your regression point that should be included to estimate each
>>>>>> local
>>>>>> model. So depending on the amount of points around each case, the
>>>>>> model
>>>>>> will use a different bandwidth for each point to be fitted.
>>>>>>
>>>>>> Related to your question, do you know what is the influence of the
>>>>>> data
>>>>>> introduced in the "data" parameter to the data to be fitted
>>>>>> (introduced
>>>>>> in
>>>>>> the "fit.points" parameter)? I mean, you have to obtain new local
>>>>>> models
>>>>>> (one for each point to be fitted), so I do not understand whether the
>>>>>> "data" parameter is used somehow...
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>> Luis
>>>>>>
>>>>>>
>>>>>> On Fri, Aug 30, 2013 at 1:26 AM, Paul Bidanset <pbidan...@gmail.com
>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>
>>>>>>  Hi Folks,
>>>>>>
>>>>>>>
>>>>>>> I was curious if anyone has had experience applying an SPGWR model
>>>>>>> with
>>>>>>> an
>>>>>>> adaptive bandwidth matrix to a holdout or validation sample. I am
>>>>>>> using
>>>>>>> the
>>>>>>> "fit.points" command, which does not seem to allow for a new
>>>>>>> bandwidth
>>>>>>> calibrated around the holdout samples XY coordinates. Any direction
>>>>>>> would
>>>>>>> be greatly appreciated.  I am also open to other viable methods.
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Paul
>>>>>>>
>>>>>>>         [[alternative HTML version deleted]]
>>>>>>>
>>>>>>> ______________________________****_________________
>>>>>>> R-sig-Geo mailing list
>>>>>>> R-sig-Geo@r-project.org
>>>>>>> https://stat.ethz.ch/mailman/****listinfo/r-sig-geo<https://stat.ethz.ch/mailman/**listinfo/r-sig-geo>
>>>>>>> <https://**stat.ethz.ch/mailman/listinfo/**r-sig-geo<https://stat.ethz.ch/mailman/listinfo/r-sig-geo>
>>>>>>> >
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>          [[alternative HTML version deleted]]
>>>>
>>>> ______________________________****_________________
>>>> R-sig-Geo mailing list
>>>> R-sig-Geo@r-project.org
>>>> https://stat.ethz.ch/mailman/****listinfo/r-sig-geo<https://stat.ethz.ch/mailman/**listinfo/r-sig-geo>
>>>> <https://**stat.ethz.ch/mailman/listinfo/**r-sig-geo<https://stat.ethz.ch/mailman/listinfo/r-sig-geo>
>>>> >
>>>>
>>>>
>>>>  --
>>> Roger Bivand
>>> Department of Economics, NHH Norwegian School of Economics,
>>> Helleveien 30, N-5045 Bergen, Norway.
>>> voice: +47 55 95 93 55; fax +47 55 95 95 43
>>> e-mail: roger.biv...@nhh.no
>>>
>>>
>>>
>>
>>
>>
> --
> Roger Bivand
> Department of Economics, NHH Norwegian School of Economics,
> Helleveien 30, N-5045 Bergen, Norway.
> voice: +47 55 95 93 55; fax +47 55 95 95 43
> e-mail: roger.biv...@nhh.no
>
>


-- 
Paul Bidanset
(757) 412-9217
pbidan...@gmail.com

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Re: [R-sig-Geo] Holdout Sampling Adaptive Bandwidth SPGWR

Reply via email to