Hi MADlib Developers,     
    To follow Ivan and Frank's suggestion, I am trying to propose the 
description and interface of Geographically weighted regression (GWR). PostGIS 
functions will be invoked to compute distance in some CRS and extract rectangle 
coordinates of study area. If MADlib doesn't have access to PostGIS routines, 
we can only implement some simple GIS utils with our own code .
     GWR models a local relationship of a numerical dependent variable to one 
or more explanatory independent variables to build a model of spatially varying 
relationships. It has been widely used for understanding the spatial pattern of 
natural or social phenomena .
     GWR constructs local equations 
seperately for each location in the table incorporating the dependent 
and independent variables falling within the bandwidth of each target 
geometry. The shape and
extent of the bandwidth is dependent on the spatial kernel type( guass, 
exp and bisquare), distance in fixed methods ( or number of neighbors 
parameters in adpative methods ). Therefore,  the computational burden of GWR 
increases with prediction locations. Parallelized GWR is necessary in 
high-performance environment such as GPDB.
    There are two important hints about GWR. Firstly, GWR can estimate 
coefficients in any locations but can only provide diagnostic information in 
observation locations. In addition, according to P ez et al.(2011), the basic 
GWR is not an appropriate method for small sample sizes (<160). Many advanced 
geographically-weighted methods are proposed in some papers (see  Wheeler DC 
2009, Brunsdon C et al. 2012,Gollini I et al. 2015) which are planned to 
implement in the future.        The description about interface and function 
for GWR is also provided . Coefficients columns in output are seperated for 
easily mapping result in GIS. Can you  kindly  take a look and give me advice 
or feedback to improve it ?  Many Thanks! 
Best,ChenLiang Wang
--------------------------------------------------------------------------------------------------------------------------------------
The description about Geographically Weighted Regression (Spatial 
Statistics->Regression Models)
Training Function of geographically weighted regression training function has 
the following syntax. 
gwregr_train(source_table,
        out_table,
        dependent_varname,
        independent_varname,
        kernel_params,
        adaptive_option,
        ftest_option,
        regression_location,
        prediction_location,
        grouping_cols,
        verbose
    )
-----------------------------------------------------------------------------------------------------------------------------------
Arguments 
source_table 
    TEXT. The name of the table containing the training data.
out_table 
    TEXT. Name of the generated table containing the output model.

    The output table contains the following columns. 
    <...>     Any grouping columns provided during training. Present only if 
the grouping option is used.
    coef_<independent_varname1>, coef_<independent_varname2> ...   FLOAT8[].  
Any columns corresponding to independent_varname of the vector of coefficients 
of the regression in each location. 
    r2     FLOAT8. R-squared coefficient of determination of the model. 
    adjr2    FLOAT8. Adjusted-R-squared coefficient of determination of the 
model.
   local_cond_no     FLOAT8[]. The local condition number of GWR in each 
location  (see Wheeler D2007)  indicates when results are unstable due to local 
multicollinearity (above 30). 
   F1_stats     FLOAT8[]. The F-test array{F-statistic,Numerator DF,Denominator 
DF,p_value} for comparing Ordinary Linear Regression(OLR) and GWR models (see 
Leung et al. 2000) 
   F2_stats     FLOAT8[]. The F-test 
array{F-statistic,Numerator DF,Denominator DF,p_value} for comparing 
Ordinary Linear Regression(OLR) and GWR models (see Leung et al. 2000) 
   F3_stats     FLOAT8[]. The spatial stationary test statistic  for GWR 
coefficients (see Leung et al. 2000)   
   F3_ndf       FLOAT8[]. The spatial stationary test Numerator DF for GWR 
coefficients 
(see Leung et al. 2000)   
   F3_ddf     FLOAT8[]. The spatial stationary test Denominator DF for GWR 
coefficients 
(see Leung et al. 2000)   
   F3_pv     FLOAT8[]. The spatial stationary test p_value for GWR coefficients 
(see Leung et al. 2000)   
   F4_stats     FLOAT8[]. The F-test 
array{F-statistic,Numerator DF,Denominator DF,p_value} for comparing 
Ordinary Linear Regression(OLR) and GWR models (see GWR book p92) 
    num_missing_rows_skipped     INTEGER. The number of rows that have NULL 
values in the dependent and independent variables, and were skipped in the 
computation for each group. 

    A summary table named <out_table>_summary is created together with the 
output table. It has the following columns: 
    source_table     The data source table name 
    out_table     The output table name 
    dependent_varname     The dependent variable 
    independent_varname     The independent variables 
    num_rows_processed     The total number of rows that were used in the 
computation. 
    num_missing_rows_skipped     The total number of rows that were skipped 
because of NULL values in them. 
    kernel_function    The spatial kernel function
    bandwidth    The bandwidth parameter
    adaptive_option    The Boolean variable indicates whether to perform a 
adaptive kernel function.
dependent_varname 
    TEXT. Expression to evaluate for the dependent variable.
independent_varname 
    TEXT. Expression list to evaluate for the independent variables. An 
intercept variable is not assumed. It is common to provide an explicit 
intercept term by including a single constant 1 term in the independent 
variable list.
kernel_params(optional)
    TEXT,default: 'kernel=guass,bw=CV', Parameters for kernel function.
    The kernel parameter is the name of the kernel function to use
    ‘gauss’: wgt = exp(-.5*(vdist/bw)^2); 
    ‘exp’: wgt = exp(-vdist/bw); 
    ‘bisquare’: wgt = (1-(vdist/bw)^2)^2 if vdist < bw, wgt=0 otherwise; 
    Where,wgt indicates weight ,vdist indicates vector of distance, and bw 
indicates bandwidth.
    We can select either CV or AICc when you aren't sure what to use for the 
Distance or Number of neighbors parameter.We can also specify a numerical value 
for bw.If bw is large enough(above 1e7,for example), the estimation of 
coefficients in GWR is equal to the global estimation in ordinary linear 
regression. 
adaptive_option(optional)
    BOOLEAN,default:FALSE. When TRUE, an adaptive kernel is calculated where 
the bandwidth corresponds to the number of nearest neighbours (i.e. adaptive 
distance)
ftest_option(optional)
    BOOLEAN,default:FALSE .  When TRUE, three F-tests and spatial-stationary 
test of coefficients are also conducted and returned with the results according 
to Leung et al. (2000). 
regression_location
    2D Point or Polygon Geometry, A geometry (usually 2D point geometry) 
representing locations where training should be conducted. The length of 
regression_location must be equal to the length of source_table.In most 
cases,it is a geometry field of source_table.
prediction_location(optional)
    2D Point or Polygon Geometry,default:regression_location. A geometry 
(usually 2D point geometry) representing locations where estimation of 
coefficients should be computed.
grouping_cols (optional) 
    TEXT, default: NULL. An expression list used to group the input dataset 
into discrete groups, running one regression per group. Similar to the SQL 
GROUP BY clause. When this value is null, no grouping is used and a single 
result model is generated.
verbose(optional)
    BOOLEAN, default: FALSE. Provides verbose output of the results of training.
---------------------------------------------------------------------------------------------------------------------------------------------
Prediction Function
gwregr_predict(coef, col_ind,newdata_table)
Arguments 
coef 
    FLOAT8[][]. Vector of the coefficients of regression.
col_ind 
    FLOAT8[]. An array containing the independent variable column names. 
newdata_table(optional)
    TEXT. default: NULL. The name of table which provide new data in prediction 
locations. If prediction_location is  same as regression_locations (default 
value) in training fucntion, this parameter is omitted automatically. 
Otherwise, newdata_table is obligatory to provide independent variables with 
identical field names in source_table in prediction locations .

> Date: Fri, 18 Dec 2015 09:18:22 -0800
> Subject: Re: How to contribute a spatial module to MADlib manipulating 
> objects from PostGIS
> From: fmcquil...@pivotal.io
> To: dev@madlib.incubator.apache.org
> 
> Thanks ChenLiang Wang for your interest.
> 
> I would repeat Ivan's welcome to you, and I look forward to your
> contributions in the area of GIS.
> 
> To answer your questions:
> 
> 1.  Yes, it is possible to call PostGIS functions from MADlib.
> 
> 2.  Yes, spatial statistics are suitable for MADlib.
> 
> For documentation, please refer to the Apache MADlib wiki
> http://madlib.incubator.apache.org/
> 
> which includes:
> Quick Start Guides
> 
> Get going with a minimum of fuss.
> 
>    - Installation Guide
>    <https://cwiki.apache.org/confluence/display/MADLIB/Installation+Guide>
>    - Quick Start Guide for Users
>    
> <https://cwiki.apache.org/confluence/display/MADLIB/Quick+Start+Guide+for+Users>
>    - Quick Start Guide for Developers
>    
> <https://cwiki.apache.org/confluence/display/MADLIB/Quick+Start+Guide+for+Developers>
> 
> 
> As Ivan mentioned, writing down the functions you would like to build and
> the interface is a good place to begin.  Then we can discuss on the open
> mailing list.
> 
> Regards,
> Frank
> 
> On Thu, Dec 17, 2015 at 8:11 PM, 王晨 亮 <hi181904...@msn.com> wrote:
> 
> > Thanks for your quick reply. Your suggestion is great. I will give a
> > definitions and description for the spatial statistic functions and
> > comparison with ordinary statistic models.
> >
> >
> > > Date: Thu, 17 Dec 2015 21:56:06 -0500
> > > Subject: Re: How to contribute a spatial module to MADlib manipulating
> > objects from PostGIS
> > > From: inov...@pivotal.io
> > > To: dev@madlib.incubator.apache.org
> > >
> > > Hi ChenLiang,
> > >
> > > I think your proposal is good and worth trying to do it!
> > >
> > > Can I suggest the first steps if you send a proposal of the function
> > > definitions and the parameters and return values as well as description
> > of
> > > the functions and what they do.
> > >
> > > Based on that we can discuss the design of the interface and once it
> > looks
> > > good you can start working on the actual implementation of the coding.
> > > When you get to implementation we can help you on technical challenges.
> > >
> > > Cheers,
> > > Ivan
> > >
> > >
> > >
> > >
> > >
> > > On Thu, Dec 17, 2015 at 9:50 PM, 王晨 亮 <hi181904...@msn.com> wrote:
> > >
> > > > Hi MADlib Developers,
> > > >
> > > >
> > > >
> > > >
> > > > I am a GIS Researcher and have some knowledge on PostGIS, Python,
> > > > C/C++,Java and R.
> > > >
> > > >
> > > >
> > > > I have learned some spatial statistical models during My PhD research
> > in
> > > > GIS. Recently, I have done a job translating GWR (Geographical Weighted
> > > > Regression) from R into Java for my company.  And I would like to
> > > > contribute to MADLib if possible.  I believe PostGIS and MADlib are the
> > > > most powerful extensions of PostgreSQL . Therefore, a spatial
> > statistical
> > > > module connecting the two libraries could be significant . If I can
> > start
> > > > the task , the first goal to implement will be GWR model.
> > > >
> > > >
> > > >
> > > > Now I am reading the developer guide of MADlib. I not quite sure how to
> > > > contribute a geospatial module to MADlib. Is it possible to manipulate
> > > > spatial object or attribute from PostGIS in MADlib ?
> > > >
> > > >
> > > >
> > > > So could anyone suggest a few pointers & links that I can follow to get
> > > > to know:
> > > >
> > > >
> > > >
> > > > 1. how to deal with these dependencies about MADlib?
> > > >
> > > >
> > > >
> > > > 2. whether the spatial statistics module is suitable for MADlib?
> > > >
> > > >
> > > >
> > > > Thank you in advance.
> > > >
> > > >
> > > > ChenLiang Wang
> > > >
> > > >
> >
> >
                                          

Reply via email to