[
https://issues.apache.org/jira/browse/MADLIB-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16299288#comment-16299288
]
Rahul Iyer edited comment on MADLIB-1185 at 12/21/17 12:09 AM:
---------------------------------------------------------------
I see three possible solutions:
1. Use {{lookup_type_cache}} function in postgres to get a {{TypeCacheEntry}}
struct and get the type information from it.
2. Hard-wire the values
3. Refactor all the 5 modules to not circumvent the abstraction layer. In most
cases, we can use the MADlib Allocator to get the array.
In each case, I suggest deleting the {{__type_info}}. Of the 3 options, option
2 is simplest to implement. In each of the above mentioned modules, we only
need info for FLOAT8, INT4, or INT8. I don't expect any change in the storage
or alignment of these types. Hence I suggest we hard-wire all values directly
at the caller. We have precedence of this in most of our {{c}} files.
was (Author: riyer):
I see three possible solutions:
1. Use {{lookup_type_cache}} function in postgres to get a {{TypeCacheEntry}}
struct and get the type information from it.
2. Hard-wire the values
3. Refactor all the 5 modules to not circumvent the abstraction layer. In most
cases, we can use the MADlib Allocator to get the array.
In each case, I suggest deleting the {{__type_info}}. Of the 3 options, option
2 is simple to implement. In each of the modules, we only need info for FLOAT8,
INT4, or INT8. I don't expect any change in the storage or alignment of these
types. Hence I suggest we hard-wire all values directly at the caller. We have
precedence of this in most of our {{c}} files.
> Postgres 10 support for MADlib with large tables
> ------------------------------------------------
>
> Key: MADLIB-1185
> URL: https://issues.apache.org/jira/browse/MADLIB-1185
> Project: Apache MADlib
> Issue Type: Bug
> Components: DB Abstraction Layer
> Reporter: Nikhil
> Fix For: v1.14
>
>
> Running MADlib on postgres10 with a large dataset ( 98000 rows with a double
> array column) causes the database to crash.
> Repro Steps
> {code}
> 1. create table foo (id integer, x double precision[], y integer);
> 2. Insert at least 1 million rows like these
> id | x | y
> -------+-------------------------+---
> 97440 | {1,0.2,0,1,0,1,0,0,0,0} | 1
> 3. Now running any C madlib UDF followed by a count(*) of foo will cause the
> database to crash
> select madlib.poisson_random(1); select count(*) from foo;
> or
> select madlib.svec_plus('{1}:{5}', '{1}:{4}'); select count(*) from foo;
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)