[GitHub] incubator-madlib pull request: SVM: Add Gaussian kernel feature ma...
Github user cwelton commented on a diff in the pull request: https://github.com/apache/incubator-madlib/pull/10#discussion_r49820902 --- Diff: methods/array_ops/src/pg_gp/array_ops.c --- @@ -824,6 +836,25 @@ array_fill(PG_FUNCTION_ARGS){ } /* + * This function apply cos function to each element. + */ +PG_FUNCTION_INFO_V1(array_cos); +Datum +array_cos(PG_FUNCTION_ARGS){ +if (PG_ARGISNULL(0)) { PG_RETURN_NULL(); } + +ArrayType *v1 = PG_GETARG_ARRAYTYPE_P(0); +Oid element_type = ARR_ELEMTYPE(v1); +Datum v2 = float8_datum_cast(0, element_type); + +ArrayType *res = General_Array_to_Array(v1, v2, element_cos); + +PG_FREE_IF_COPY(v1, 0); --- End diff -- In answer to your question, you can think of the parameter that is being received as a union between (toastid, text_pointer). If you receive a pointer then you do not get a copy, if you get a toastid then the GET_ARRAY_TYPE function will detoast it and return a pointer to you. In neither case do you "get a copy" of a pointer that was passed as input, which is what led to the initial confusion here - "free_if_copy" is a misleading name for the macro. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: How to contribute a spatial module to MADlib manipulating objects from PostGIS
We always share the presentation and post the video replay as well. You can also drink lots of coffee, and join us if you are crazy enough. We like crazy people. This email encrypted by tiny buttons & fat thumbs, beta voice recognition, and autocorrect on my iPhone. > On Jan 14, 2016, at 6:35 PM, Kuien Liu wrote: > > Yes, 2AM on Saturday... May you please share Gautam's representation video > after the call? > > Cheers, > Kuien Liu > >> On Wed, Jan 13, 2016 at 6:36 PM, Greg Chase wrote: >> As I said, our next call is not China-friendly: >> http://mail-archives.apache.org/mod_mbox/incubator-madlib-dev/201601.mbox/%3CCAMg1VtnKB-WoyVqCstfMNCcJVOn2HKQQ6wNfqdovhgnB7zd5cw%40mail.gmail.com%3E >> >> This is this Friday, 10AM Pacifc Standard Time which is 2AM Saturday Beijing >> time. >> >> We will arrange a next call in a couple weeks at an Asia friendly time to >> support contributors in Asia. >> >> However, if you make the next call, we will make time for you to talk :) >> >> Regards, >> >> -Greg >> >>> On Wed, Jan 13, 2016 at 2:18 AM, Kuien Liu wrote: >>> Great, I would like to join it, please send me an invitation if possible. >>> >>> Cheers, >>> Kuien Liu >>> On Wed, Jan 13, 2016 at 6:10 PM, Greg Chase wrote: Perhaps ChenLiang would like to join a call with the MADlib community and discuss his contribution? We have a call this Friday 10AM PST which is not a friendly time for China, but we can schedule a next call at a friendlier time. This email encrypted by tiny buttons & fat thumbs, beta voice recognition, and autocorrect on my iPhone. > On Jan 13, 2016, at 1:53 AM, Ivan Novick wrote: > > Cool! > >> On Wed, Jan 13, 2016 at 5:52 PM, Kuien Liu wrote: >> >> Got it, I think I can have a (f2f) talk with Chenliang Wang, as he was >> graduated from an institute of CAS which is not far from our Beijing >> office, and I am familiar with his supervisor and lab director. So I >> think >> it is highly possible to find him directly in Beijing. >> >> Cheers, >> Kuien Liu >> >>> On Wed, Jan 13, 2016 at 3:05 PM, Ivan Novick >>> wrote: >>> >>> Hello ChenLiang, >>> >>> I have read your description of the interface and to my understanding >>> this is a supervised machine learning algorithm that supports geometry >>> data. Am I correct? >>> >>> What could be a good industrial use case for this model for some >>> examples? Could you train a system based on locations and weather to >>> find >>> bad signals for cell phone? Can you provide any real world example >>> scenario where this type of model will be useful for end users? >>> >>> Also I am adding CC to some of my colleagues at work. Kuien, Max, >>> Yandong can you provide any feedback on this proposal from your Point >>> of >>> View? >>> >>> >>> http://mail-archives.apache.org/mod_mbox/incubator-madlib-dev/201601.mbox/%3cblu175-w72199bca72716d8c1a99bf4...@phx.gbl%3E >>> >>> Cheers, >>> Ivan >>> >>> >>> On Wed, Jan 13, 2016 at 11:20 AM, WangChenLiang >>> wrote: >>> Sorry, the link of attachment (http://1drv.ms/1ZjAiCg) is lost in the previous letter. > From: hi181904...@msn.com > To: dev@madlib.incubator.apache.org > Subject: RE: How to contribute a spatial module to MADlib > manipulating objects from PostGIS > Date: Wed, 13 Jan 2016 11:09:17 +0800 > > > > Hi ,Caleb and Ivan! > Thanks for your attention and help. I reviewed the previous draft and find > something inappropriate. The archive containing the new draft and example code > is attached in the letter which would be more reasonable than the earlier edition. > Please go over the manuscript and give suggestion again . > The following are my answers to Caleb's questions. > - Does this function require PostGIS to also be > installed? If yes, it would be better > if we disable the function if > PostGIS is not present rather than introduce PostGIS > as a dependency. (Similar > to what we do with our requirement on the xml module with our PMML export > functionality). > > > > A:Yes. I am trying to avoid > input any spatial datatypes in the interface of GWR. > But I have no > idea if it is necessary to provide simple alternative when PostGIS is not > available. > > > > - What are the exact datatypes in the function > definition for regression_location > and prediction_location? > > > >
[GitHub] incubator-madlib pull request: SVM: Add Gaussian kernel feature ma...
Github user cwelton commented on a diff in the pull request: https://github.com/apache/incubator-madlib/pull/10#discussion_r49812120 --- Diff: methods/array_ops/src/pg_gp/array_ops.c --- @@ -824,6 +836,25 @@ array_fill(PG_FUNCTION_ARGS){ } /* + * This function apply cos function to each element. + */ +PG_FUNCTION_INFO_V1(array_cos); +Datum +array_cos(PG_FUNCTION_ARGS){ +if (PG_ARGISNULL(0)) { PG_RETURN_NULL(); } + +ArrayType *v1 = PG_GETARG_ARRAYTYPE_P(0); +Oid element_type = ARR_ELEMTYPE(v1); +Datum v2 = float8_datum_cast(0, element_type); + +ArrayType *res = General_Array_to_Array(v1, v2, element_cos); + +PG_FREE_IF_COPY(v1, 0); --- End diff -- Hmm, taking a closer look at the PG_FREE_IF_COPY macro I see I misread how it examines the second parameter. Ignore my previous comment, this looks safe. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: Bayesian Analysis using MADlib (Gibbs Sampling for Probit Regression)
Great seeing the prototype work here, I'm sure that there is something that we can find from this work that we can bring into MADlib. However... It is a very different implementation from the existing algorithms, calling into the madlib matrix functions directly rather than having the majority of the work done within the abstraction layer. Unfortunately this leads to a very inefficient implementation. As demonstration of this I ran this test case: Dataset: 1 dependent variable, 4 independent variables + intercept, 10,000,00 observations Run using Postgres 9.4 on a Macbook Pro: Creating the X matrix from source table: 13.9s Creating the Y matrix from source table: 9.1s Computing X_T_X via matrix_mult: 169.2s Computing X_T_Y via matrix_mult: 114.8s Calling madlib.linregr_train directly (implicitly calculates all of the above as well as inverting the X_T_X matrix and calculating some other statistics): 10.3s So in total about 30X slower than our existing methodology for doing the same calculations. I would expect this delta to potentially get even larger if it was to move from Postgres to Greenplum or HAWQ where we would be able to start applying parallelism. (the specialized XtX multiplication in linregr parallelizes perfectly, but the more general matrix_mult functionality may not) As performance has been a key aspect to our development I'm not sure that we want to architecturally go down the path outlined in this example code. That said... I can certainly see how this layer of abstraction could be a valuable way of expressing things from a development perspective so the question for the development community is if there is a way that we can enable people to write code more similar to what Guatam has expressed while preserving the performance of our existing implementations? The ideas that come to mind would be to take an API abstraction approach more akin to what we can see in Theano where we can express a series of matrix transformations abstractly and then let the framework work out the best way to calculate the pipeline? Large project to do that... but it could one answer to the long held question "how should we define our python abstraction layer?". As a whole I'd be pretty resistant to adding dependencies on numpy/scipy unless there was a compelling use case where the performance overhead of implementing the MATH (instead of the control flow) in python was not unacceptably large. -Caleb On Thu, Dec 24, 2015 at 12:51 PM, Frank McQuillan wrote: > Gautam, > > Thank you for working on this, it can be a great addition to MADlib. Cpl > comments below: > > 0) Dependencies on numpy and scipy. Currently the platforms PostgreSQL, > GPDB and HAWQ do not ship with numpy or scipy by default, so we may need to > look at this dependency more closely. > > 2a,b) The following creation methods exist will exist MADlib 1.9. They are > already in the MADlib code base: > > -- Create a matrix initialized with ones of given row and column dimension > matrix_ones( row_dim, col_dim, matrix_out, out_args) > > -- Create a matrix initialized with zeros of given row and column dimension > matrix_zeros( row_dim, col_dim, matrix_out, out_args) > > -- Create an square identity matrix of size dim x dim > matrix_identity( dim, matrix_out, out_args) > > -- Create a diag matrix initialized with given diagonal elements > matrix_diag( diag_elements, matrix_out, out_args) > > 2c) As for “Sampling matrices and scalars from certain distributions. We > could start with Gaussian (multi-variate), truncated normal, Wishart, > Inverse-Wishart, Gamma, and Beta.” I created a JIRA for that here: > https://issues.apache.org/jira/browse/MADLIB-940 > I agree with your recommendation. > > 3) Pipelining > * it’s an architecture question that I agree we need to address, to reduce > disk I/O between steps > * Could be a platform implementation, or we can think about if MADlib can > do something on top of the existing platform by coming up with a way to > chain operations in-memory > > 4) I would *strongly* encourage you to go the next/last mile and get this > into MADlib. The community can help you do it. And as you say we need to > figure out how/if to support numpy and scipy, or do MADlib functions via > Eigen or Boost to handle alternatively. > > Frank > > On Thu, Dec 24, 2015 at 12:29 PM, Gautam Muralidhar < > gautam.s.muralid...@gmail.com> wrote: > > > > Hi Team MADlib, > > > > > > I managed to complete the implementation of the Bayesian analysis of > the > > binary Probit regression model on MPP. The code has been tested on the > > greenplum sandbox VM and seems to work fine. You can find the code here: > > > > > > > > > https://github.com/gautamsm/data-science-on-mpp/tree/master/BayesianAnalysis > > > > > > In the git repo, probit_regression.ipynb is the stand alone python > > implementation. To verify correctness, I compared against R's MCMCpack > > library that can also be run in the Jupyter notebook! > > > > > > pro
[GitHub] incubator-madlib pull request: SVM: Add Gaussian kernel feature ma...
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/incubator-madlib/pull/10#discussion_r49807227 --- Diff: methods/array_ops/src/pg_gp/array_ops.c --- @@ -824,6 +836,25 @@ array_fill(PG_FUNCTION_ARGS){ } /* + * This function apply cos function to each element. + */ +PG_FUNCTION_INFO_V1(array_cos); +Datum +array_cos(PG_FUNCTION_ARGS){ +if (PG_ARGISNULL(0)) { PG_RETURN_NULL(); } + +ArrayType *v1 = PG_GETARG_ARRAYTYPE_P(0); +Oid element_type = ARR_ELEMTYPE(v1); +Datum v2 = float8_datum_cast(0, element_type); + +ArrayType *res = General_Array_to_Array(v1, v2, element_cos); + +PG_FREE_IF_COPY(v1, 0); --- End diff -- So during the detoast, PG_GETARG_ARRAYTYPE_P does not create a copy? I was under the impression that we have to free that pointer since a copy is always created. All array ops functions in MADlib perform that free, based on similar functions in pg source code. If that's wrong then we'll have to make a pretty big change in our array_ops. Snippet from /src/backend/utils/adt/arrayfuncs.c: ``` Datum array_eq(PG_FUNCTION_ARGS) { ArrayType *array1 = PG_GETARG_ARRAYTYPE_P(0); ArrayType *array2 = PG_GETARG_ARRAYTYPE_P(1); Oid collation = PG_GET_COLLATION(); int ndims1 = ARR_NDIM(array1); int ndims2 = ARR_NDIM(array2); int*dims1 = ARR_DIMS(array1); int*dims2 = ARR_DIMS(array2); ... ... ... /* Avoid leaking memory when handed toasted input. */ PG_FREE_IF_COPY(array1, 0); PG_FREE_IF_COPY(array2, 1); PG_RETURN_BOOL(result); } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: New MADlib committer: Xiaocheng Tang
+1 and congrats On Fri, Jan 15, 2016 at 4:53 AM, Caleb Welton wrote: > Welcome Xiaocheng! Based on the contributions you've already made it's > clear you'll be a great addition to the community. Keep it up! > > It's great seeing the community growing. > > -Caleb > > On Wed, Jan 13, 2016 at 6:38 PM, Roman Shaposhnik > wrote: > > > Congrats Xiaocheng! Welcome to the club! > > > > Thanks, > > Roman. > > > > On Wed, Jan 13, 2016 at 6:22 PM, Frank McQuillan > > wrote: > > > Dear MADlib dev community, > > > > > > The Project Management Committee (PMC) for Apache MADlib has asked > > > Xiaocheng Tang to become a committer and we are pleased to announce > that > > he > > > has accepted. > > > > > > Recently Xiaocheng has been working on a completely new version of > > Support > > > Vector Machines in addition to making various bug fixes and refinements > > to > > > existing algorithms. > > > > > > Being a committer enables easier contribution to the project since > there > > is > > > no need to go via the patch submission process. This should enable > > better > > > productivity. Being a PMC member enables assistance with the > management > > > and to guide the direction of the project. > > > > > > Welcome Xiaocheng! > > > > > > Regards, > > > Frank > > >
[GitHub] incubator-madlib pull request: SVM: Add Gaussian kernel feature ma...
Github user cwelton commented on the pull request: https://github.com/apache/incubator-madlib/pull/10#issuecomment-171812287 -1 from me as the code currently stands. Freeing memory passed to a function can lead to instability of the database system and is a complete blocker for merge. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-madlib pull request: SVM: Add Gaussian kernel feature ma...
Github user cwelton commented on a diff in the pull request: https://github.com/apache/incubator-madlib/pull/10#discussion_r49795535 --- Diff: methods/array_ops/src/pg_gp/array_ops.c --- @@ -824,6 +836,25 @@ array_fill(PG_FUNCTION_ARGS){ } /* + * This function apply cos function to each element. + */ +PG_FUNCTION_INFO_V1(array_cos); +Datum +array_cos(PG_FUNCTION_ARGS){ +if (PG_ARGISNULL(0)) { PG_RETURN_NULL(); } + +ArrayType *v1 = PG_GETARG_ARRAYTYPE_P(0); +Oid element_type = ARR_ELEMTYPE(v1); +Datum v2 = float8_datum_cast(0, element_type); + +ArrayType *res = General_Array_to_Array(v1, v2, element_cos); + +PG_FREE_IF_COPY(v1, 0); --- End diff -- This is a weird use of PG_FREE_IF_COPY that simply looks wrong to me. This call is equivalent to pfree(v1), and since that is a passed in argument you are freeing something that does not belong to this function. A correct call would be PG_FREE_IF_COPY(v1, PG_GETARG_ARRAYTYPE_P(0)), except that will never free anything so would be a no-op. Ultimately this line of code should simply be removed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: New MADlib committer: Xiaocheng Tang
Welcome Xiaocheng! Based on the contributions you've already made it's clear you'll be a great addition to the community. Keep it up! It's great seeing the community growing. -Caleb On Wed, Jan 13, 2016 at 6:38 PM, Roman Shaposhnik wrote: > Congrats Xiaocheng! Welcome to the club! > > Thanks, > Roman. > > On Wed, Jan 13, 2016 at 6:22 PM, Frank McQuillan > wrote: > > Dear MADlib dev community, > > > > The Project Management Committee (PMC) for Apache MADlib has asked > > Xiaocheng Tang to become a committer and we are pleased to announce that > he > > has accepted. > > > > Recently Xiaocheng has been working on a completely new version of > Support > > Vector Machines in addition to making various bug fixes and refinements > to > > existing algorithms. > > > > Being a committer enables easier contribution to the project since there > is > > no need to go via the patch submission process. This should enable > better > > productivity. Being a PMC member enables assistance with the management > > and to guide the direction of the project. > > > > Welcome Xiaocheng! > > > > Regards, > > Frank >
Reminder! [VIRTUAL] MADlib Meeting: Bayesian Analysis of Binomial Response Models on MPP Databases (Greenplum & HAWQ) using MADlib Matrix Operations
Hi everyone, Just a reminder that the MADlib virtual community meeting is happening tomorrow at 9:45AM PST. Details are below. Thanks, Karen -- Forwarded message -- From: Karen Vuong Date: Thu, Jan 7, 2016 at 6:52 PM Subject: [VIRTUAL] MADlib Meeting: Bayesian Analysis of Binomial Response Models on MPP Databases (Greenplum & HAWQ) using MADlib Matrix Operations To: dev@madlib.incubator.apache.org Hello MADlib contributors, We'd like to invite you to the next MADlib virtual community meeting on Friday, January 15th. Gautam will present a 20-minute overview of some recent R&D work that he has been doing using MADlib. Gautam will present Bayesian analysis of binomial response models on MPP Databases like Greenplum and HAWQ using MADlib matrix operations. Specifically, he will walk the audience through Bayesian analysis involving MCMC sampling techniques of the Probit and Logistic regression models that accept arbitrary user specified parameter priors. The code for this analysis can be found on the following Github page: https://github.com/gautamsm/data-science-on-mpp/tree/master/BayesianAnalysis About Gautam Gautam Muralidhar is currently a Sr. Data Scientist at Pivotal where he helps customers derive actionable insights from data by solving machine learning problems for them using state of the art analytics infrastructure and tools from Pivotal's stack. His areas of expertise include machine learning, image processing, and computer vision. At Pivotal, his work has spanned multiple verticals including Automotive, Logistics, Finance, and Healthcare. He holds an undergraduate degree in Electronics and Communications Engineering from R. V. College of Engineering, Bangalore, India, and a masters and a Ph.D. degree in Biomedical Engineering from The University of Texas at Austin, USA. We look forward to having you join us! Please join us on January 15th, 2016 at: https://pivotalcommunity.adobeconnect.com/madlib/ 1/15 San Francisco, CA 9:45 AM PST UTC-8 hours 1/15 New York, NY 12:45 PM EST UTC-5 hours Adobe Connect tips: For issues with Chrome. A little icon appears in the address bar. It's EASY to miss. You click it and it allows Adobe to use the mic, even if you select 'allow' in the popup, it doesn't work until you change it in address bar too. If you have never attended an Adobe Connect meeting before: Test your connection: https://pivotalcommunity.adobeconnect.com/common/help/en/support/meeting_test.htm Get a quick overview: http://www.adobe.com/products/adobeconnect.html Thanks, Karen Vuong
Re: How to contribute a spatial module to MADlib manipulating objects from PostGIS
Hi Chenliang, Will we hear from you tomorrow at 10AM Pacific, or in a few weeks when the call is a better time for Asia-based callers? -Greg On Thu, Jan 14, 2016 at 8:18 AM, chenliang wang wrote: > Cool! I'd like to join the next discussion. > > Best, > Chenliang Wang > > > On 01/13/2016 06:36 PM, Greg Chase wrote: > >> As I said, our next call is not China-friendly: >> >> http://mail-archives.apache.org/mod_mbox/incubator-madlib-dev/201601.mbox/%3CCAMg1VtnKB-WoyVqCstfMNCcJVOn2HKQQ6wNfqdovhgnB7zd5cw%40mail.gmail.com%3E >> >> This is this Friday, 10AM Pacifc Standard Time which is 2AM Saturday >> Beijing time. >> >> We will arrange a next call in a couple weeks at an Asia friendly time to >> support contributors in Asia. >> >> However, if you make the next call, we will make time for you to talk :) >> >> Regards, >> >> -Greg >> >> On Wed, Jan 13, 2016 at 2:18 AM, Kuien Liu wrote: >> >> Great, I would like to join it, please send me an invitation if possible. >>> >>> Cheers, >>> Kuien Liu >>> >>> On Wed, Jan 13, 2016 at 6:10 PM, Greg Chase wrote: >>> >>> Perhaps ChenLiang would like to join a call with the MADlib community and discuss his contribution? We have a call this Friday 10AM PST which is not a friendly time for China, but we can schedule a next call at a friendlier time. This email encrypted by tiny buttons & fat thumbs, beta voice recognition, and autocorrect on my iPhone. On Jan 13, 2016, at 1:53 AM, Ivan Novick wrote: > > Cool! > > On Wed, Jan 13, 2016 at 5:52 PM, Kuien Liu wrote: >> >> Got it, I think I can have a (f2f) talk with Chenliang Wang, as he was >> graduated from an institute of CAS which is not far from our Beijing >> office, and I am familiar with his supervisor and lab director. So I >> > think > it is highly possible to find him directly in Beijing. >> >> Cheers, >> Kuien Liu >> >> On Wed, Jan 13, 2016 at 3:05 PM, Ivan Novick >>> >> wrote: > Hello ChenLiang, >>> >>> I have read your description of the interface and to my understanding >>> this is a supervised machine learning algorithm that supports >>> geometry >>> data. Am I correct? >>> >>> What could be a good industrial use case for this model for some >>> examples? Could you train a system based on locations and weather to >>> >> find > bad signals for cell phone? Can you provide any real world example >>> scenario where this type of model will be useful for end users? >>> >>> Also I am adding CC to some of my colleagues at work. Kuien, Max, >>> Yandong can you provide any feedback on this proposal from your Point >>> >> of > View? >>> >>> >>> >>> http://mail-archives.apache.org/mod_mbox/incubator-madlib-dev/201601.mbox/%3cblu175-w72199bca72716d8c1a99bf4...@phx.gbl%3E > Cheers, >>> Ivan >>> >>> >>> On Wed, Jan 13, 2016 at 11:20 AM, WangChenLiang >> > >>> wrote: >>> >>> Sorry, the link of attachment (http://1drv.ms/1ZjAiCg) is lost in >>> the > previous letter. From: hi181904...@msn.com > To: dev@madlib.incubator.apache.org > Subject: RE: How to contribute a spatial module to MADlib > manipulating > objects from PostGIS > Date: Wed, 13 Jan 2016 11:09:17 +0800 > > > > Hi ,Caleb and Ivan! >Thanks for your attention and help. I reviewed the previous > draft > and find > something inappropriate. The archive containing the new draft and > example code > is attached in the letter which would be more reasonable than the > earlier edition. > Please go over the manuscript and give suggestion again . > The following are my answers to Caleb's questions. > - Does this function require PostGIS to also be > installed? If yes, it would be better > if we disable the function if > PostGIS is not present rather than introduce PostGIS > as a dependency. (Similar > to what we do with our requirement on the xml module with our PMML > export > functionality). > > > > A:Yes. I am trying to avoid > input any spatial datatypes in the interface of GWR. > But I have no > idea if it is necessary to provide simple alternative when PostGIS > is > not > available. > > > > - What are the exact datatypes in the function > definition for regression_location > and prediction_location? > > > > > > A:I changed the datatype > to TEXT as the n
Re: How to contribute a spatial module to MADlib manipulating objects from PostGIS
Cool! I'd like to join the next discussion. Best, Chenliang Wang On 01/13/2016 06:36 PM, Greg Chase wrote: As I said, our next call is not China-friendly: http://mail-archives.apache.org/mod_mbox/incubator-madlib-dev/201601.mbox/%3CCAMg1VtnKB-WoyVqCstfMNCcJVOn2HKQQ6wNfqdovhgnB7zd5cw%40mail.gmail.com%3E This is this Friday, 10AM Pacifc Standard Time which is 2AM Saturday Beijing time. We will arrange a next call in a couple weeks at an Asia friendly time to support contributors in Asia. However, if you make the next call, we will make time for you to talk :) Regards, -Greg On Wed, Jan 13, 2016 at 2:18 AM, Kuien Liu wrote: Great, I would like to join it, please send me an invitation if possible. Cheers, Kuien Liu On Wed, Jan 13, 2016 at 6:10 PM, Greg Chase wrote: Perhaps ChenLiang would like to join a call with the MADlib community and discuss his contribution? We have a call this Friday 10AM PST which is not a friendly time for China, but we can schedule a next call at a friendlier time. This email encrypted by tiny buttons & fat thumbs, beta voice recognition, and autocorrect on my iPhone. On Jan 13, 2016, at 1:53 AM, Ivan Novick wrote: Cool! On Wed, Jan 13, 2016 at 5:52 PM, Kuien Liu wrote: Got it, I think I can have a (f2f) talk with Chenliang Wang, as he was graduated from an institute of CAS which is not far from our Beijing office, and I am familiar with his supervisor and lab director. So I think it is highly possible to find him directly in Beijing. Cheers, Kuien Liu On Wed, Jan 13, 2016 at 3:05 PM, Ivan Novick wrote: Hello ChenLiang, I have read your description of the interface and to my understanding this is a supervised machine learning algorithm that supports geometry data. Am I correct? What could be a good industrial use case for this model for some examples? Could you train a system based on locations and weather to find bad signals for cell phone? Can you provide any real world example scenario where this type of model will be useful for end users? Also I am adding CC to some of my colleagues at work. Kuien, Max, Yandong can you provide any feedback on this proposal from your Point of View? http://mail-archives.apache.org/mod_mbox/incubator-madlib-dev/201601.mbox/%3cblu175-w72199bca72716d8c1a99bf4...@phx.gbl%3E Cheers, Ivan On Wed, Jan 13, 2016 at 11:20 AM, WangChenLiang wrote: Sorry, the link of attachment (http://1drv.ms/1ZjAiCg) is lost in the previous letter. From: hi181904...@msn.com To: dev@madlib.incubator.apache.org Subject: RE: How to contribute a spatial module to MADlib manipulating objects from PostGIS Date: Wed, 13 Jan 2016 11:09:17 +0800 Hi ,Caleb and Ivan! Thanks for your attention and help. I reviewed the previous draft and find something inappropriate. The archive containing the new draft and example code is attached in the letter which would be more reasonable than the earlier edition. Please go over the manuscript and give suggestion again . The following are my answers to Caleb's questions. - Does this function require PostGIS to also be installed? If yes, it would be better if we disable the function if PostGIS is not present rather than introduce PostGIS as a dependency. (Similar to what we do with our requirement on the xml module with our PMML export functionality). A:Yes. I am trying to avoid input any spatial datatypes in the interface of GWR. But I have no idea if it is necessary to provide simple alternative when PostGIS is not available. - What are the exact datatypes in the function definition for regression_location and prediction_location? A:I changed the datatype to TEXT as the name of POINT or MULTIPOLYGON (centroid of each polygon for estimation for GWR). - In the description it describes regression_location as "The length of regression_location must be equal to the length of source_table", which signals to me that it is likely intended to be a column of the source table? If not then how is this length represented? A: In the previous interface, I was trying to input a geometry field which could be from another table having different row number. Now, I alter the argument definition and make it to TEXT. It must be the name of geometry field in the source table. - You didn't mark regression_location as (optional). Due to the way Postgres functions work all optional arguments must come after all required arguments, so having a non-optional argument in the middle of the optional list must be avoided. A:Thanks for reminding me of this mistake. It is really my fault. The order of argument is changed in this edition. - I haven't read through the literature, but it is not immediately clear to me why prediction_location is a parameter to gwregr_train() rather than gwregr_predict(). Can you provide a brief description to the way that prediction_location is used in the model and its relationship to training and prediction. A: Actually, there are three ki
Re: FW: How to contribute a spatial module to MADlib manipulating objects from PostGIS
Hello Ivan, Yes, GWR is a local form of OLR taking distance between locations into estimation. Actually, GWR and other spatial models are not widely applied to industry compared with classical statistical methods or ML models. A representative example would be automated valuation model**(AVM) for housing market. AVM is the technology and service generating a residential valuation report for consumer in a matter of seconds. Because housing market behave different characteristics across space. People's preferences are varying with locations , and environmental influence may decay with distance. For example, old house in CBD will be more expensive than suburban ones with other identical features. Many papers prove that AVM using spatial models such as GWR can estimate more accurate than classical models. And I have implemented a basic GWR in JAVA for our AVM which would be able to capture the spatial variability which is the key distinguishing feature of real estate market. I haven't researched relationship between weather and signals. I guess it would be a global correlation and OLR would be enough to model. However, we should do some statistical test to detect spatial non-stationarity if We have some data. And if we suppose weather or any other influence factors distributed in a geographic pattern was absent from our data, GWR will be useful to model the hidden law in a spatial context . GWR would be useful for some business analysis with locations. There is a tiny example demonstratesanalyzing 911 phone calls using OLS and GWR (http://eclectic.ss.uci.edu/~drwhite/pdf/Tutorial-RegressionAnalysis.pdf). In my opinion, many business scenario such as LBS may have a chance to get value from varying relationship between consumer's preferences and influencing factors. It is great having a chance to communicate with Kuien although I have left Beijing. I will keep in touch with him about spatial statistic modules. Best, Chenliang On 01/13/2016 03:05 PM, Ivan Novick wrote: Hello ChenLiang, I have read your description of the interface and to my understanding this is a supervised machine learning algorithm that supports geometry data. Am I correct? What could be a good industrial use case for this model for some examples? Could you train a system based on locations and weather to find bad signals for cell phone? Can you provide any real world example scenario where this type of model will be useful for end users? Also I am adding CC to some of my colleagues at work. Kuien, Max, Yandong can you provide any feedback on this proposal from your Point of View? http://mail-archives.apache.org/mod_mbox/incubator-madlib-dev/201601.mbox/%3cblu175-w72199bca72716d8c1a99bf4...@phx.gbl%3E Cheers, Ivan On Wed, Jan 13, 2016 at 11:20 AM, WangChenLiang wrote: Sorry, the link of attachment (http://1drv.ms/1ZjAiCg) is lost in the previous letter. From: hi181904...@msn.com To: dev@madlib.incubator.apache.org Subject: RE: How to contribute a spatial module to MADlib manipulating objects from PostGIS Date: Wed, 13 Jan 2016 11:09:17 +0800 Hi ,Caleb and Ivan! Thanks for your attention and help. I reviewed the previous draft and find something inappropriate. The archive containing the new draft and example code is attached in the letter which would be more reasonable than the earlier edition. Please go over the manuscript and give suggestion again . The following are my answers to Caleb's questions. - Does this function require PostGIS to also be installed? If yes, it would be better if we disable the function if PostGIS is not present rather than introduce PostGIS as a dependency. (Similar to what we do with our requirement on the xml module with our PMML export functionality). A:Yes. I am trying to avoid input any spatial datatypes in the interface of GWR. But I have no idea if it is necessary to provide simple alternative when PostGIS is not available. - What are the exact datatypes in the function definition for regression_location and prediction_location? A:I changed the datatype to TEXT as the name of POINT or MULTIPOLYGON (centroid of each polygon for estimation for GWR). - In the description it describes regression_location as "The length of regression_location must be equal to the length of source_table", which signals to me that it is likely intended to be a column of the source table? If not then how is this length represented? A: In the previous interface, I was trying to input a geometry field which could be from another table having different row number. Now, I alter the argument definition and make it to TEXT. It must be the name of geometry field in the source table. - You didn't mark regression_location as (optional). Due to the way Postgres functions work all optional arguments must come after all required arguments, so having a non-optional argument in the middle of the optional list must be avoided. A:Thanks for reminding me of this mist