[GitHub] incubator-madlib pull request: SVM: Add Gaussian kernel feature ma...

2016-01-14 Thread cwelton
Github user cwelton commented on a diff in the pull request:

https://github.com/apache/incubator-madlib/pull/10#discussion_r49820902
  
--- Diff: methods/array_ops/src/pg_gp/array_ops.c ---
@@ -824,6 +836,25 @@ array_fill(PG_FUNCTION_ARGS){
 }
 
 /*
+ * This function apply cos function to each element.
+ */
+PG_FUNCTION_INFO_V1(array_cos);
+Datum
+array_cos(PG_FUNCTION_ARGS){
+if (PG_ARGISNULL(0)) { PG_RETURN_NULL(); }
+
+ArrayType *v1 = PG_GETARG_ARRAYTYPE_P(0);
+Oid element_type = ARR_ELEMTYPE(v1);
+Datum v2 = float8_datum_cast(0, element_type);
+
+ArrayType *res = General_Array_to_Array(v1, v2, element_cos);
+
+PG_FREE_IF_COPY(v1, 0);
--- End diff --

In answer to your question, you can think of the parameter that is being 
received as a union between (toastid, text_pointer).  If you receive a pointer 
then you do not get a copy, if you get a toastid then the GET_ARRAY_TYPE 
function will detoast it and return a pointer to you.  In neither case do you 
"get a copy" of a pointer that was passed as input, which is what led to the 
initial confusion here - "free_if_copy" is a misleading name for the macro.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: How to contribute a spatial module to MADlib manipulating objects from PostGIS

2016-01-14 Thread Greg Chase
We always share the presentation and post the video replay as well.

You can also drink lots of coffee, and join us if you are crazy enough.

We like crazy people.

This email encrypted by tiny buttons & fat thumbs, beta voice recognition, and 
autocorrect on my iPhone.

> On Jan 14, 2016, at 6:35 PM, Kuien Liu  wrote:
> 
> Yes, 2AM on Saturday... May you please share Gautam's representation video 
> after the call?
> 
> Cheers,
> Kuien Liu
> 
>> On Wed, Jan 13, 2016 at 6:36 PM, Greg Chase  wrote:
>> As I said, our next call is not China-friendly: 
>> http://mail-archives.apache.org/mod_mbox/incubator-madlib-dev/201601.mbox/%3CCAMg1VtnKB-WoyVqCstfMNCcJVOn2HKQQ6wNfqdovhgnB7zd5cw%40mail.gmail.com%3E
>> 
>> This is this Friday, 10AM Pacifc Standard Time which is 2AM Saturday Beijing 
>> time.
>> 
>> We will arrange a next call in a couple weeks at an Asia friendly time to 
>> support contributors in Asia.
>> 
>> However, if you make the next call, we will make time for you to talk :)
>> 
>> Regards,
>> 
>> -Greg
>> 
>>> On Wed, Jan 13, 2016 at 2:18 AM, Kuien Liu  wrote:
>>> Great, I would like to join it, please send me an invitation if possible.
>>> 
>>> Cheers,
>>> Kuien Liu
>>> 
 On Wed, Jan 13, 2016 at 6:10 PM, Greg Chase  wrote:
 Perhaps ChenLiang would like to join a call with the MADlib community and 
 discuss his contribution?
 
 We have a call this Friday 10AM PST which is not a friendly time for 
 China, but we can schedule a next call at a friendlier time.
 
 This email encrypted by tiny buttons & fat thumbs, beta voice recognition, 
 and autocorrect on my iPhone.
 
 > On Jan 13, 2016, at 1:53 AM, Ivan Novick  wrote:
 >
 > Cool!
 >
 >> On Wed, Jan 13, 2016 at 5:52 PM, Kuien Liu  wrote:
 >>
 >> Got it, I think I can have a (f2f) talk with Chenliang Wang, as he was
 >> graduated from an institute of CAS which is not far from our Beijing
 >> office, and I am familiar with his supervisor and lab director. So I 
 >> think
 >> it is highly possible to find him directly in Beijing.
 >>
 >> Cheers,
 >> Kuien Liu
 >>
 >>> On Wed, Jan 13, 2016 at 3:05 PM, Ivan Novick  
 >>> wrote:
 >>>
 >>> Hello ChenLiang,
 >>>
 >>> I have read your description of the interface and to my understanding
 >>> this is a supervised machine learning algorithm that supports geometry
 >>> data.  Am I correct?
 >>>
 >>> What could be a good industrial use case for this model for some
 >>> examples?  Could you train a system based on locations and weather to 
 >>> find
 >>> bad signals for cell phone?  Can you provide any real world example
 >>> scenario where this type of model will be useful for end users?
 >>>
 >>> Also I am adding CC to some of my colleagues at work.  Kuien, Max,
 >>> Yandong can you provide any feedback on this proposal from your Point 
 >>> of
 >>> View?
 >>>
 >>>
 >>> http://mail-archives.apache.org/mod_mbox/incubator-madlib-dev/201601.mbox/%3cblu175-w72199bca72716d8c1a99bf4...@phx.gbl%3E
 >>>
 >>> Cheers,
 >>> Ivan
 >>>
 >>>
 >>> On Wed, Jan 13, 2016 at 11:20 AM, WangChenLiang 
 >>> wrote:
 >>>
  Sorry, the link of attachment (http://1drv.ms/1ZjAiCg) is lost in the
  previous letter.
 
 > From: hi181904...@msn.com
 > To: dev@madlib.incubator.apache.org
 > Subject: RE: How to contribute a spatial module to MADlib 
 > manipulating
  objects from PostGIS
 > Date: Wed, 13 Jan 2016 11:09:17 +0800
 >
 >
 >
 > Hi   ,Caleb and Ivan!
 >   Thanks for your attention and help. I reviewed the previous draft
  and find
 > something inappropriate. The archive containing the new draft and
  example code
 > is attached in the letter which would be more reasonable  than the
  earlier edition.
 > Please go over the manuscript and give suggestion again .
 > The following are my answers to Caleb's questions.
 > - Does this function require PostGIS to also be
 > installed? If yes, it would be better
 > if we disable the function if
 > PostGIS is not present rather than introduce PostGIS
 > as a dependency. (Similar
 > to what we do with our requirement on the xml module with our PMML
  export
 > functionality).
 >
 >
 >
 > A:Yes. I am trying to avoid
 > input any spatial datatypes in the interface of GWR.
 > But I have no
 > idea if it is necessary to provide simple alternative when PostGIS is
  not
 > available.
 >
 >
 >
 > - What are the exact datatypes in the function
 > definition for regression_location
 > and prediction_location?
 >
 >
 >
>

[GitHub] incubator-madlib pull request: SVM: Add Gaussian kernel feature ma...

2016-01-14 Thread cwelton
Github user cwelton commented on a diff in the pull request:

https://github.com/apache/incubator-madlib/pull/10#discussion_r49812120
  
--- Diff: methods/array_ops/src/pg_gp/array_ops.c ---
@@ -824,6 +836,25 @@ array_fill(PG_FUNCTION_ARGS){
 }
 
 /*
+ * This function apply cos function to each element.
+ */
+PG_FUNCTION_INFO_V1(array_cos);
+Datum
+array_cos(PG_FUNCTION_ARGS){
+if (PG_ARGISNULL(0)) { PG_RETURN_NULL(); }
+
+ArrayType *v1 = PG_GETARG_ARRAYTYPE_P(0);
+Oid element_type = ARR_ELEMTYPE(v1);
+Datum v2 = float8_datum_cast(0, element_type);
+
+ArrayType *res = General_Array_to_Array(v1, v2, element_cos);
+
+PG_FREE_IF_COPY(v1, 0);
--- End diff --

Hmm, taking a closer look at the PG_FREE_IF_COPY macro I see I misread how 
it examines the second parameter.

Ignore my previous comment, this looks safe.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Bayesian Analysis using MADlib (Gibbs Sampling for Probit Regression)

2016-01-14 Thread Caleb Welton
Great seeing the prototype work here, I'm sure that there is something that
we can find from this work that we can bring into MADlib.

However... It is a very different implementation from the existing
algorithms, calling into the madlib matrix functions directly rather than
having the majority of the work done within the abstraction layer.
Unfortunately this leads to a very inefficient implementation.

As demonstration of this I ran this test case:

Dataset: 1 dependent variable, 4 independent variables + intercept,
10,000,00 observations

Run using Postgres 9.4 on a Macbook Pro:

Creating the X matrix from source table: 13.9s
Creating the Y matrix from source table: 9.1s
Computing X_T_X via matrix_mult: 169.2s
Computing X_T_Y via matrix_mult: 114.8s

Calling madlib.linregr_train directly (implicitly calculates all of the
above as well as inverting the X_T_X matrix and calculating some other
statistics): 10.3s

So in total about 30X slower than our existing methodology for doing the
same calculations.  I would expect this delta to potentially get even
larger if it was to move from Postgres to Greenplum or HAWQ where we would
be able to start applying parallelism.  (the specialized XtX multiplication
in linregr parallelizes perfectly, but the more general matrix_mult
functionality may not)

As performance has been a key aspect to our development I'm not sure that
we want to architecturally go down the path outlined in this example code.

That said... I can certainly see how this layer of abstraction could be a
valuable way of expressing things from a development perspective so the
question for the development community is if there is a way that we can
enable people to write code more similar to what Guatam has expressed while
preserving the performance of our existing implementations?

The ideas that come to mind would be to take an API abstraction approach
more akin to what we can see in Theano where we can express a series of
matrix transformations abstractly and then let the framework work out the
best way to calculate the pipeline?  Large project to do that... but it
could one answer to the long held question "how should we define our python
abstraction layer?".

As a whole I'd be pretty resistant to adding dependencies on numpy/scipy
unless there was a compelling use case where the performance overhead of
implementing the MATH (instead of the control flow) in python was not
unacceptably large.

-Caleb

On Thu, Dec 24, 2015 at 12:51 PM, Frank McQuillan 
wrote:

> Gautam,
>
> Thank you for working on this, it can be a great addition to MADlib.  Cpl
> comments below:
>
> 0) Dependencies on numpy and scipy.  Currently the platforms PostgreSQL,
> GPDB and HAWQ do not ship with numpy or scipy by default, so we may need to
> look at this dependency more closely.
>
> 2a,b) The following creation methods exist will exist MADlib 1.9.  They are
> already in the MADlib code base:
>
> -- Create a matrix initialized with ones of given row and column dimension
>   matrix_ones( row_dim, col_dim, matrix_out, out_args)
>
> -- Create a matrix initialized with zeros of given row and column dimension
>   matrix_zeros( row_dim, col_dim, matrix_out, out_args)
>
> -- Create an square identity matrix of size dim x dim
>   matrix_identity( dim, matrix_out, out_args)
>
> -- Create a diag matrix initialized with given diagonal elements
>   matrix_diag( diag_elements, matrix_out, out_args)
>
> 2c) As for “Sampling matrices and scalars from certain distributions. We
> could start with Gaussian (multi-variate), truncated normal, Wishart,
> Inverse-Wishart, Gamma, and Beta.”  I created a JIRA for that here:
> https://issues.apache.org/jira/browse/MADLIB-940
> I agree with your recommendation.
>
> 3) Pipelining
> * it’s an architecture question that I agree we need to address, to reduce
> disk I/O between steps
> * Could be a platform implementation, or we can think about if MADlib can
> do something on top of the existing platform by coming up with a way to
> chain operations in-memory
>
> 4) I would *strongly* encourage you to go the next/last mile and get this
> into MADlib.  The community can help you do it.  And as you say we need to
> figure out how/if to support numpy and scipy, or do MADlib functions via
> Eigen or Boost to handle alternatively.
>
> Frank
>
> On Thu, Dec 24, 2015 at 12:29 PM, Gautam Muralidhar <
> gautam.s.muralid...@gmail.com> wrote:
>
> > > Hi Team MADlib,
> > >
> > > I managed to complete the implementation of the Bayesian analysis of
> the
> > binary Probit regression model on MPP. The code has been tested on the
> > greenplum sandbox VM and seems to work fine. You can find the code here:
> > >
> > >
> >
> https://github.com/gautamsm/data-science-on-mpp/tree/master/BayesianAnalysis
> > >
> > > In the git repo, probit_regression.ipynb is the stand alone python
> > implementation. To verify correctness, I compared against R's MCMCpack
> > library that can also be run in the Jupyter notebook!
> > >
> > > pro

[GitHub] incubator-madlib pull request: SVM: Add Gaussian kernel feature ma...

2016-01-14 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request:

https://github.com/apache/incubator-madlib/pull/10#discussion_r49807227
  
--- Diff: methods/array_ops/src/pg_gp/array_ops.c ---
@@ -824,6 +836,25 @@ array_fill(PG_FUNCTION_ARGS){
 }
 
 /*
+ * This function apply cos function to each element.
+ */
+PG_FUNCTION_INFO_V1(array_cos);
+Datum
+array_cos(PG_FUNCTION_ARGS){
+if (PG_ARGISNULL(0)) { PG_RETURN_NULL(); }
+
+ArrayType *v1 = PG_GETARG_ARRAYTYPE_P(0);
+Oid element_type = ARR_ELEMTYPE(v1);
+Datum v2 = float8_datum_cast(0, element_type);
+
+ArrayType *res = General_Array_to_Array(v1, v2, element_cos);
+
+PG_FREE_IF_COPY(v1, 0);
--- End diff --

So during the detoast, PG_GETARG_ARRAYTYPE_P does not create a copy? I was 
under the impression that we have to free that pointer since a copy is always 
created. All array ops functions in MADlib perform that free, based on similar 
functions in pg source code. 
If that's wrong then we'll have to make a pretty big change in our 
array_ops. 

Snippet from /src/backend/utils/adt/arrayfuncs.c: 
```
Datum
array_eq(PG_FUNCTION_ARGS)
{
ArrayType  *array1 = PG_GETARG_ARRAYTYPE_P(0);
ArrayType  *array2 = PG_GETARG_ARRAYTYPE_P(1);
Oid collation = PG_GET_COLLATION();
int ndims1 = ARR_NDIM(array1);
int ndims2 = ARR_NDIM(array2);
int*dims1 = ARR_DIMS(array1);
int*dims2 = ARR_DIMS(array2);
...
...
...
/* Avoid leaking memory when handed toasted input. */
PG_FREE_IF_COPY(array1, 0);
PG_FREE_IF_COPY(array2, 1);

PG_RETURN_BOOL(result);
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: New MADlib committer: Xiaocheng Tang

2016-01-14 Thread Ivan Novick
+1 and congrats

On Fri, Jan 15, 2016 at 4:53 AM, Caleb Welton  wrote:

> Welcome Xiaocheng!   Based on the contributions you've already made it's
> clear you'll be a great addition to the community.  Keep it up!
>
> It's great seeing the community growing.
>
> -Caleb
>
> On Wed, Jan 13, 2016 at 6:38 PM, Roman Shaposhnik 
> wrote:
>
> > Congrats Xiaocheng! Welcome to the club!
> >
> > Thanks,
> > Roman.
> >
> > On Wed, Jan 13, 2016 at 6:22 PM, Frank McQuillan 
> > wrote:
> > > Dear MADlib dev community,
> > >
> > > The Project Management Committee (PMC) for Apache MADlib has asked
> > > Xiaocheng Tang to become a committer and we are pleased to announce
> that
> > he
> > > has accepted.
> > >
> > > Recently Xiaocheng has been working on a completely new version of
> > Support
> > > Vector Machines in addition to making various bug fixes and refinements
> > to
> > > existing algorithms.
> > >
> > > Being a committer enables easier contribution to the project since
> there
> > is
> > > no need to go via the patch submission process.  This should enable
> > better
> > > productivity.  Being a PMC member enables assistance with the
> management
> > > and to guide the direction of the project.
> > >
> > > Welcome Xiaocheng!
> > >
> > > Regards,
> > > Frank
> >
>


[GitHub] incubator-madlib pull request: SVM: Add Gaussian kernel feature ma...

2016-01-14 Thread cwelton
Github user cwelton commented on the pull request:

https://github.com/apache/incubator-madlib/pull/10#issuecomment-171812287
  
-1 from me as the code currently stands. 

Freeing memory passed to a function can lead to instability of the database 
system and is a complete blocker for merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-madlib pull request: SVM: Add Gaussian kernel feature ma...

2016-01-14 Thread cwelton
Github user cwelton commented on a diff in the pull request:

https://github.com/apache/incubator-madlib/pull/10#discussion_r49795535
  
--- Diff: methods/array_ops/src/pg_gp/array_ops.c ---
@@ -824,6 +836,25 @@ array_fill(PG_FUNCTION_ARGS){
 }
 
 /*
+ * This function apply cos function to each element.
+ */
+PG_FUNCTION_INFO_V1(array_cos);
+Datum
+array_cos(PG_FUNCTION_ARGS){
+if (PG_ARGISNULL(0)) { PG_RETURN_NULL(); }
+
+ArrayType *v1 = PG_GETARG_ARRAYTYPE_P(0);
+Oid element_type = ARR_ELEMTYPE(v1);
+Datum v2 = float8_datum_cast(0, element_type);
+
+ArrayType *res = General_Array_to_Array(v1, v2, element_cos);
+
+PG_FREE_IF_COPY(v1, 0);
--- End diff --

This is a weird use of PG_FREE_IF_COPY that simply looks wrong to me.  This 
call is equivalent to pfree(v1), and since that is a passed in argument you are 
freeing something that does not belong to this function.

A correct call would be PG_FREE_IF_COPY(v1, PG_GETARG_ARRAYTYPE_P(0)), 
except that will never free anything so would be a no-op.  Ultimately this line 
of code should simply be removed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: New MADlib committer: Xiaocheng Tang

2016-01-14 Thread Caleb Welton
Welcome Xiaocheng!   Based on the contributions you've already made it's
clear you'll be a great addition to the community.  Keep it up!

It's great seeing the community growing.

-Caleb

On Wed, Jan 13, 2016 at 6:38 PM, Roman Shaposhnik 
wrote:

> Congrats Xiaocheng! Welcome to the club!
>
> Thanks,
> Roman.
>
> On Wed, Jan 13, 2016 at 6:22 PM, Frank McQuillan 
> wrote:
> > Dear MADlib dev community,
> >
> > The Project Management Committee (PMC) for Apache MADlib has asked
> > Xiaocheng Tang to become a committer and we are pleased to announce that
> he
> > has accepted.
> >
> > Recently Xiaocheng has been working on a completely new version of
> Support
> > Vector Machines in addition to making various bug fixes and refinements
> to
> > existing algorithms.
> >
> > Being a committer enables easier contribution to the project since there
> is
> > no need to go via the patch submission process.  This should enable
> better
> > productivity.  Being a PMC member enables assistance with the management
> > and to guide the direction of the project.
> >
> > Welcome Xiaocheng!
> >
> > Regards,
> > Frank
>


Reminder! [VIRTUAL] MADlib Meeting: Bayesian Analysis of Binomial Response Models on MPP Databases (Greenplum & HAWQ) using MADlib Matrix Operations

2016-01-14 Thread Karen Vuong
Hi everyone,

Just a reminder that the MADlib virtual community meeting is happening
tomorrow at 9:45AM PST. Details are below.

Thanks,

Karen

-- Forwarded message --
From: Karen Vuong 
Date: Thu, Jan 7, 2016 at 6:52 PM
Subject: [VIRTUAL] MADlib Meeting: Bayesian Analysis of Binomial Response
Models on MPP Databases (Greenplum & HAWQ) using MADlib Matrix Operations
To: dev@madlib.incubator.apache.org


Hello MADlib contributors,

We'd like to invite you to the next MADlib virtual community meeting on
Friday, January 15th.

Gautam will present a 20-minute overview of some recent R&D work that he
has been doing using MADlib. Gautam will present Bayesian analysis of
binomial response models on MPP Databases like Greenplum and HAWQ using
MADlib matrix operations. Specifically, he will walk the audience through
Bayesian analysis involving MCMC sampling techniques of the Probit and
Logistic regression models that accept arbitrary user specified parameter
priors. The code for this analysis can be found on the following Github
page:
https://github.com/gautamsm/data-science-on-mpp/tree/master/BayesianAnalysis

About Gautam

Gautam Muralidhar is currently a Sr. Data Scientist at Pivotal where he
helps customers derive actionable insights from data by solving machine
learning problems for them using state of the art analytics infrastructure
and tools from Pivotal's stack. His areas of expertise include machine
learning, image processing, and computer vision. At Pivotal, his work has
spanned multiple verticals including Automotive, Logistics, Finance, and
Healthcare. He holds an undergraduate degree in Electronics and
Communications Engineering from R. V. College of Engineering, Bangalore,
India, and a masters and a Ph.D. degree in Biomedical Engineering from The
University of Texas at Austin, USA.

We look forward to having you join us!

Please join us on January 15th, 2016 at:
https://pivotalcommunity.adobeconnect.com/madlib/

1/15 San Francisco, CA 9:45 AM PST UTC-8 hours
1/15 New York, NY 12:45 PM EST UTC-5 hours

Adobe Connect tips: For issues with Chrome.  A little icon appears in the
address bar.  It's EASY to miss. You click it and it allows Adobe to use
the mic, even if you select 'allow' in the popup, it doesn't work until you
change it in address bar too.


If you have never attended an Adobe Connect meeting before:

Test your connection:
https://pivotalcommunity.adobeconnect.com/common/help/en/support/meeting_test.htm

Get a quick overview: http://www.adobe.com/products/adobeconnect.html

Thanks,

Karen Vuong


Re: How to contribute a spatial module to MADlib manipulating objects from PostGIS

2016-01-14 Thread Greg Chase
Hi Chenliang,
Will we hear from you tomorrow at 10AM Pacific, or in a few weeks when the
call is a better time for Asia-based callers?

-Greg

On Thu, Jan 14, 2016 at 8:18 AM, chenliang wang  wrote:

> Cool!  I'd like to join the next discussion.
>
> Best,
> Chenliang Wang
>
>
> On 01/13/2016 06:36 PM, Greg Chase wrote:
>
>> As I said, our next call is not China-friendly:
>>
>> http://mail-archives.apache.org/mod_mbox/incubator-madlib-dev/201601.mbox/%3CCAMg1VtnKB-WoyVqCstfMNCcJVOn2HKQQ6wNfqdovhgnB7zd5cw%40mail.gmail.com%3E
>>
>> This is this Friday, 10AM Pacifc Standard Time which is 2AM Saturday
>> Beijing time.
>>
>> We will arrange a next call in a couple weeks at an Asia friendly time to
>> support contributors in Asia.
>>
>> However, if you make the next call, we will make time for you to talk :)
>>
>> Regards,
>>
>> -Greg
>>
>> On Wed, Jan 13, 2016 at 2:18 AM, Kuien Liu  wrote:
>>
>> Great, I would like to join it, please send me an invitation if possible.
>>>
>>> Cheers,
>>> Kuien Liu
>>>
>>> On Wed, Jan 13, 2016 at 6:10 PM, Greg Chase  wrote:
>>>
>>> Perhaps ChenLiang would like to join a call with the MADlib community and
 discuss his contribution?

 We have a call this Friday 10AM PST which is not a friendly time for
 China, but we can schedule a next call at a friendlier time.

 This email encrypted by tiny buttons & fat thumbs, beta voice
 recognition, and autocorrect on my iPhone.

 On Jan 13, 2016, at 1:53 AM, Ivan Novick  wrote:
>
> Cool!
>
> On Wed, Jan 13, 2016 at 5:52 PM, Kuien Liu  wrote:
>>
>> Got it, I think I can have a (f2f) talk with Chenliang Wang, as he was
>> graduated from an institute of CAS which is not far from our Beijing
>> office, and I am familiar with his supervisor and lab director. So I
>>
> think

> it is highly possible to find him directly in Beijing.
>>
>> Cheers,
>> Kuien Liu
>>
>> On Wed, Jan 13, 2016 at 3:05 PM, Ivan Novick 
>>>
>> wrote:

> Hello ChenLiang,
>>>
>>> I have read your description of the interface and to my understanding
>>> this is a supervised machine learning algorithm that supports
>>> geometry
>>> data.  Am I correct?
>>>
>>> What could be a good industrial use case for this model for some
>>> examples?  Could you train a system based on locations and weather to
>>>
>> find

> bad signals for cell phone?  Can you provide any real world example
>>> scenario where this type of model will be useful for end users?
>>>
>>> Also I am adding CC to some of my colleagues at work.  Kuien, Max,
>>> Yandong can you provide any feedback on this proposal from your Point
>>>
>> of

> View?
>>>
>>>
>>>
>>>
 http://mail-archives.apache.org/mod_mbox/incubator-madlib-dev/201601.mbox/%3cblu175-w72199bca72716d8c1a99bf4...@phx.gbl%3E

> Cheers,
>>> Ivan
>>>
>>>
>>> On Wed, Jan 13, 2016 at 11:20 AM, WangChenLiang >> >
>>> wrote:
>>>
>>> Sorry, the link of attachment (http://1drv.ms/1ZjAiCg) is lost in

>>> the

> previous letter.

 From: hi181904...@msn.com
> To: dev@madlib.incubator.apache.org
> Subject: RE: How to contribute a spatial module to MADlib
>
 manipulating

> objects from PostGIS

> Date: Wed, 13 Jan 2016 11:09:17 +0800
>
>
>
> Hi   ,Caleb and Ivan!
>Thanks for your attention and help. I reviewed the previous
> draft
>
 and find

> something inappropriate. The archive containing the new draft and
>
 example code

> is attached in the letter which would be more reasonable  than the
>
 earlier edition.

> Please go over the manuscript and give suggestion again .
> The following are my answers to Caleb's questions.
> - Does this function require PostGIS to also be
> installed? If yes, it would be better
> if we disable the function if
> PostGIS is not present rather than introduce PostGIS
> as a dependency. (Similar
> to what we do with our requirement on the xml module with our PMML
>
 export

> functionality).
>
>
>
> A:Yes. I am trying to avoid
> input any spatial datatypes in the interface of GWR.
> But I have no
> idea if it is necessary to provide simple alternative when PostGIS
>
 is

> not

> available.
>
>
>
> - What are the exact datatypes in the function
> definition for regression_location
> and prediction_location?
>
>
>
>
>
> A:I changed the datatype
> to TEXT as the n

Re: How to contribute a spatial module to MADlib manipulating objects from PostGIS

2016-01-14 Thread chenliang wang

Cool!  I'd like to join the next discussion.

Best,
Chenliang Wang

On 01/13/2016 06:36 PM, Greg Chase wrote:

As I said, our next call is not China-friendly:
http://mail-archives.apache.org/mod_mbox/incubator-madlib-dev/201601.mbox/%3CCAMg1VtnKB-WoyVqCstfMNCcJVOn2HKQQ6wNfqdovhgnB7zd5cw%40mail.gmail.com%3E

This is this Friday, 10AM Pacifc Standard Time which is 2AM Saturday
Beijing time.

We will arrange a next call in a couple weeks at an Asia friendly time to
support contributors in Asia.

However, if you make the next call, we will make time for you to talk :)

Regards,

-Greg

On Wed, Jan 13, 2016 at 2:18 AM, Kuien Liu  wrote:


Great, I would like to join it, please send me an invitation if possible.

Cheers,
Kuien Liu

On Wed, Jan 13, 2016 at 6:10 PM, Greg Chase  wrote:


Perhaps ChenLiang would like to join a call with the MADlib community and
discuss his contribution?

We have a call this Friday 10AM PST which is not a friendly time for
China, but we can schedule a next call at a friendlier time.

This email encrypted by tiny buttons & fat thumbs, beta voice
recognition, and autocorrect on my iPhone.


On Jan 13, 2016, at 1:53 AM, Ivan Novick  wrote:

Cool!


On Wed, Jan 13, 2016 at 5:52 PM, Kuien Liu  wrote:

Got it, I think I can have a (f2f) talk with Chenliang Wang, as he was
graduated from an institute of CAS which is not far from our Beijing
office, and I am familiar with his supervisor and lab director. So I

think

it is highly possible to find him directly in Beijing.

Cheers,
Kuien Liu


On Wed, Jan 13, 2016 at 3:05 PM, Ivan Novick 

wrote:

Hello ChenLiang,

I have read your description of the interface and to my understanding
this is a supervised machine learning algorithm that supports geometry
data.  Am I correct?

What could be a good industrial use case for this model for some
examples?  Could you train a system based on locations and weather to

find

bad signals for cell phone?  Can you provide any real world example
scenario where this type of model will be useful for end users?

Also I am adding CC to some of my colleagues at work.  Kuien, Max,
Yandong can you provide any feedback on this proposal from your Point

of

View?




http://mail-archives.apache.org/mod_mbox/incubator-madlib-dev/201601.mbox/%3cblu175-w72199bca72716d8c1a99bf4...@phx.gbl%3E

Cheers,
Ivan


On Wed, Jan 13, 2016 at 11:20 AM, WangChenLiang 
wrote:


Sorry, the link of attachment (http://1drv.ms/1ZjAiCg) is lost in

the

previous letter.


From: hi181904...@msn.com
To: dev@madlib.incubator.apache.org
Subject: RE: How to contribute a spatial module to MADlib

manipulating

objects from PostGIS

Date: Wed, 13 Jan 2016 11:09:17 +0800



Hi   ,Caleb and Ivan!
   Thanks for your attention and help. I reviewed the previous draft

and find

something inappropriate. The archive containing the new draft and

example code

is attached in the letter which would be more reasonable  than the

earlier edition.

Please go over the manuscript and give suggestion again .
The following are my answers to Caleb's questions.
- Does this function require PostGIS to also be
installed? If yes, it would be better
if we disable the function if
PostGIS is not present rather than introduce PostGIS
as a dependency. (Similar
to what we do with our requirement on the xml module with our PMML

export

functionality).



A:Yes. I am trying to avoid
input any spatial datatypes in the interface of GWR.
But I have no
idea if it is necessary to provide simple alternative when PostGIS

is

not

available.



- What are the exact datatypes in the function
definition for regression_location
and prediction_location?





A:I changed the datatype
to TEXT as the name of POINT or MULTIPOLYGON
(centroid of
each polygon for estimation for GWR).



- In the description it describes
regression_location as "The length of
regression_location must be equal to the length of
source_table", which signals to me that it is likely intended to be

a

column of the source table? If not then how is
this length represented?


A: In the previous
interface, I was trying to input a geometry field which could be
from another
table having different row number. Now, I alter the argument
definition and make it
to TEXT. It must be the name of geometry field in the
source table.



- You didn't mark regression_location as
(optional). Due to the way Postgres
functions work all optional arguments
must come after all required arguments,
so having a non-optional argument in
the middle of the optional list must be
avoided.



A:Thanks for
reminding me of this mistake. It is really my fault. The order of
argument is changed in this edition.




- I haven't read through the literature, but it is
not immediately clear to me why
prediction_location is a parameter to
gwregr_train() rather than gwregr_predict().
Can you provide a brief
description to the way that prediction_location is used in
the model and its
relationship to training and prediction.



A: Actually,
there are three ki

Re: FW: How to contribute a spatial module to MADlib manipulating objects from PostGIS

2016-01-14 Thread chenliang wang

Hello Ivan,
Yes, GWR is a local form of OLR taking distance between locations into 
estimation.
Actually, GWR and other spatial models are not widely applied to 
industry compared with classical statistical methods or ML models. A 
representative example would be automated valuation model**(AVM) for 
housing market. AVM is the technology and service generating a 
residential valuation report for consumer in a matter of seconds. 
Because housing market behave different characteristics across space. 
People's preferences are varying with locations , and environmental 
influence may decay with distance. For example, old house in CBD will be 
more expensive than suburban ones with other identical features. Many 
papers prove that AVM using spatial models such as GWR can estimate more 
accurate than classical models. And I have implemented a basic GWR in 
JAVA for our AVM which would be able to capture the spatial variability 
which is the key distinguishing feature of real estate market.
I haven't researched relationship between weather and signals. I guess 
it would be a global correlation and OLR would be enough to model. 
However, we should do some statistical test to detect spatial 
non-stationarity if We have some data. And if we suppose weather or any 
other influence factors distributed in a geographic pattern was absent 
from our data, GWR will be useful to model the hidden law in a spatial 
context .
GWR would be useful for some business analysis with locations. There is 
a tiny example demonstratesanalyzing 911 phone calls using OLS and GWR 
(http://eclectic.ss.uci.edu/~drwhite/pdf/Tutorial-RegressionAnalysis.pdf). 
In my opinion, many business scenario such as LBS may have a chance to 
get value from varying relationship between consumer's preferences and 
influencing factors.


It is great having a chance to communicate with Kuien although I have 
left Beijing. I will keep in touch with him about spatial statistic modules.


Best,
Chenliang

On 01/13/2016 03:05 PM, Ivan Novick wrote:

Hello ChenLiang,

I have read your description of the interface and to my understanding this
is a supervised machine learning algorithm that supports geometry data.  Am
I correct?

What could be a good industrial use case for this model for some examples?
Could you train a system based on locations and weather to find bad signals
for cell phone?  Can you provide any real world example scenario where this
type of model will be useful for end users?

Also I am adding CC to some of my colleagues at work.  Kuien, Max, Yandong
can you provide any feedback on this proposal from your Point of View?

http://mail-archives.apache.org/mod_mbox/incubator-madlib-dev/201601.mbox/%3cblu175-w72199bca72716d8c1a99bf4...@phx.gbl%3E

Cheers,
Ivan


On Wed, Jan 13, 2016 at 11:20 AM, WangChenLiang  wrote:


Sorry, the link of attachment (http://1drv.ms/1ZjAiCg) is lost in the
previous letter.


From: hi181904...@msn.com
To: dev@madlib.incubator.apache.org
Subject: RE: How to contribute a spatial module to MADlib manipulating

objects from PostGIS

Date: Wed, 13 Jan 2016 11:09:17 +0800



Hi   ,Caleb and Ivan!
Thanks for your attention and help. I reviewed the previous draft and

find

something inappropriate. The archive containing the new draft and

example code

is attached in the letter which would be more reasonable  than the

earlier edition.

Please go over the manuscript and give suggestion again .
The following are my answers to Caleb's questions.
- Does this function require PostGIS to also be
installed? If yes, it would be better
if we disable the function if
PostGIS is not present rather than introduce PostGIS
as a dependency. (Similar
to what we do with our requirement on the xml module with our PMML export
functionality).



A:Yes. I am trying to avoid
input any spatial datatypes in the interface of GWR.
But I have no
idea if it is necessary to provide simple alternative when PostGIS is not
available.



- What are the exact datatypes in the function
definition for regression_location
and prediction_location?





A:I changed the datatype
to TEXT as the name of POINT or MULTIPOLYGON
(centroid of
each polygon for estimation for GWR).



- In the description it describes
regression_location as "The length of
regression_location must be equal to the length of
source_table", which signals to me that it is likely intended to be a
column of the source table? If not then how is
this length represented?


A: In the previous
interface, I was trying to input a geometry field which could be
from another
table having different row number. Now, I alter the argument
definition and make it
to TEXT. It must be the name of geometry field in the
source table.



- You didn't mark regression_location as
(optional). Due to the way Postgres
functions work all optional arguments
must come after all required arguments,
so having a non-optional argument in
the middle of the optional list must be
avoided.



A:Thanks for
reminding me of this mist