[jira] [Comment Edited] (MADLIB-1185) Postgres 10 support for MADlib with large tables

Rahul Iyer (JIRA) Wed, 20 Dec 2017 16:10:28 -0800

    [ 
https://issues.apache.org/jira/browse/MADLIB-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16299288#comment-16299288
 ]


Rahul Iyer edited comment on MADLIB-1185 at 12/21/17 12:09 AM:
---------------------------------------------------------------

I see three possible solutions: 
1. Use {{lookup_type_cache}} function in postgres to get a {{TypeCacheEntry}} 
struct and get the type information from it.
2. Hard-wire the values
3. Refactor all the 5 modules to not circumvent the abstraction layer. In most 
cases, we can use the MADlib Allocator to get the array. 

In each case, I suggest deleting the {{__type_info}}. Of the 3 options, option 
2 is simplest to implement. In each of the above mentioned modules, we only 
need info for FLOAT8, INT4, or INT8. I don't expect any change in the storage 
or alignment of these types. Hence I suggest we hard-wire all values directly 
at the caller. We have precedence of this in most of our {{c}} files. 


was (Author: riyer):
I see three possible solutions: 
1. Use {{lookup_type_cache}} function in postgres to get a {{TypeCacheEntry}} 
struct and get the type information from it.
2. Hard-wire the values
3. Refactor all the 5 modules to not circumvent the abstraction layer. In most 
cases, we can use the MADlib Allocator to get the array. 

In each case, I suggest deleting the {{__type_info}}. Of the 3 options, option 
2 is simple to implement. In each of the modules, we only need info for FLOAT8, 
INT4, or INT8. I don't expect any change in the storage or alignment of these 
types. Hence I suggest we hard-wire all values directly at the caller. We have 
precedence of this in most of our {{c}} files. 

> Postgres 10 support for MADlib with large tables
> ------------------------------------------------
>
>                 Key: MADLIB-1185
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1185
>             Project: Apache MADlib
>          Issue Type: Bug
>          Components: DB Abstraction Layer
>            Reporter: Nikhil
>             Fix For: v1.14
>
>
> Running MADlib on postgres10 with a large dataset ( 98000 rows with a double 
> array column) causes the database to crash.
> Repro Steps
> {code}
> 1. create table foo (id integer, x double precision[], y integer);
> 2. Insert at least 1 million rows like these
>   id   |            x            | y
> -------+-------------------------+---
>  97440 | {1,0.2,0,1,0,1,0,0,0,0} | 1
> 3. Now running any C madlib UDF followed by a count(*) of foo will cause the 
> database to crash
> select madlib.poisson_random(1); select count(*) from foo;
> or
> select madlib.svec_plus('{1}:{5}', '{1}:{4}'); select count(*) from foo;
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (MADLIB-1185) Postgres 10 support for MADlib with large tables

Reply via email to