[
https://issues.apache.org/jira/browse/MADLIB-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16284009#comment-16284009
]
Nikhil edited comment on MADLIB-1185 at 12/11/17 5:31 PM:
----------------------------------------------------------
The problem seems to be that calling a MADlib C UDF function and then a
count(\*) of foo causes the database to crash.
Even a simple C UDF like poisson_random crashes the database. We even tried
commenting out everything from the implementation of poisson_random and just
return NULL but it still crashes the database
{code}
AnyType
poisson_random::run(AnyType &args) {
return 1;
}
{code}
Here is the postgres log
{code}
libc++abi.dylib: terminating with uncaught exception of type
madlib::dbconnector::postgres::PGException: The backend raised an exception.
2017-12-08 10:36:04.632 PST [72270] LOG: worker process: parallel worker for
PID 86327 (PID 86328) was terminated by signal 6: Abort trap
2017-12-08 10:36:04.632 PST [72270] LOG: terminating any other active server
processes
2017-12-08 10:36:04.632 PST [86327] WARNING: terminating connection because of
crash of another server process
2017-12-08 10:36:04.632 PST [86327] DETAIL: The postmaster has commanded this
server process to roll back the current transaction and exit, because another
server process exited abnormally and possibly corrupted shared memory.
2017-12-08 10:36:04.632 PST [86327] HINT: In a moment you should be able to
reconnect to the database and repeat your command.
2017-12-08 10:36:04.632 PST [86216] WARNING: terminating connection because of
crash of another server process
2017-12-08 10:36:04.632 PST [86216] DETAIL: The postmaster has commanded this
server process to roll back the current transaction and exit, because another
server process exited abnormally and possibly corrupted shared memory.
2017-12-08 10:36:04.632 PST [86216] HINT: In a moment you should be able to
reconnect to the database and repeat your command.
2017-12-08 10:36:04.633 PST [72270] LOG: all server processes terminated;
reinitializing
2017-12-08 10:36:04.642 PST [86330] LOG: database system was interrupted; last
known up at 2017-12-08 10:32:44 PST
2017-12-08 10:36:04.642 PST [86331] FATAL: the database system is in recovery
mode
2017-12-08 10:36:04.729 PST [86330] LOG: database system was not properly shut
down; automatic recovery in progress
2017-12-08 10:36:04.730 PST [86330] LOG: redo starts at 0/6F83E460
2017-12-08 10:36:04.730 PST [86330] LOG: invalid record length at 0/6F83E498:
wanted 24, got 0
2017-12-08 10:36:04.730 PST [86330] LOG: redo done at 0/6F83E460
2017-12-08 10:36:04.734 PST [72270] LOG: database system is ready to accept
connections
{code}
Notes
1. A table without the double precision array does not crash the database.
2. We also tried altering the storage type for column x of table foo to one of
PLAIN, EXTENDED, MAIN and EXTERNAL but it still crashed the database.
3. We also tried taking MADlib out of the equation by writing our own extension
in c and then calling count(\*) of foo but this did not crash the database.
was (Author: nikhilkak):
The problem seems to be that calling a MADlib C UDF function and then a
count(\*) of foo causes the database to crash.
Even a simple C UDF like poisson_random crashes the database. We even tried
commenting out everything from the implementation of poisson_random and just
return NULL but it still crashes the database
{code}
AnyType
poisson_random::run(AnyType &args) {
return 1;
}
{code}
Here is the postgres log
{code}
libc++abi.dylib: terminating with uncaught exception of type
madlib::dbconnector::postgres::PGException: The backend raised an exception.
2017-12-08 10:36:04.632 PST [72270] LOG: worker process: parallel worker for
PID 86327 (PID 86328) was terminated by signal 6: Abort trap
2017-12-08 10:36:04.632 PST [72270] LOG: terminating any other active server
processes
2017-12-08 10:36:04.632 PST [86327] WARNING: terminating connection because of
crash of another server process
2017-12-08 10:36:04.632 PST [86327] DETAIL: The postmaster has commanded this
server process to roll back the current transaction and exit, because another
server process exited abnormally and possibly corrupted shared memory.
2017-12-08 10:36:04.632 PST [86327] HINT: In a moment you should be able to
reconnect to the database and repeat your command.
2017-12-08 10:36:04.632 PST [86216] WARNING: terminating connection because of
crash of another server process
2017-12-08 10:36:04.632 PST [86216] DETAIL: The postmaster has commanded this
server process to roll back the current transaction and exit, because another
server process exited abnormally and possibly corrupted shared memory.
2017-12-08 10:36:04.632 PST [86216] HINT: In a moment you should be able to
reconnect to the database and repeat your command.
2017-12-08 10:36:04.633 PST [72270] LOG: all server processes terminated;
reinitializing
2017-12-08 10:36:04.642 PST [86330] LOG: database system was interrupted; last
known up at 2017-12-08 10:32:44 PST
2017-12-08 10:36:04.642 PST [86331] FATAL: the database system is in recovery
mode
2017-12-08 10:36:04.729 PST [86330] LOG: database system was not properly shut
down; automatic recovery in progress
2017-12-08 10:36:04.730 PST [86330] LOG: redo starts at 0/6F83E460
2017-12-08 10:36:04.730 PST [86330] LOG: invalid record length at 0/6F83E498:
wanted 24, got 0
2017-12-08 10:36:04.730 PST [86330] LOG: redo done at 0/6F83E460
2017-12-08 10:36:04.734 PST [72270] LOG: database system is ready to accept
connections
{code}
Notes
1. A table without the double precision array does not crash the database.
2. We also tried altering the storage type for column x of table foo to one of
PLAIN, EXTENDED, MAIN and EXTERNAL but it still crashed the database.
3. We also tried taking MADlib out of the equation by writing our own extension
in c and then calling count(\*) of foo but this did not crash the database.
Hence our suspicion with the db abstraction layer.
> Postgres 10 support for MADlib with large tables
> ------------------------------------------------
>
> Key: MADLIB-1185
> URL: https://issues.apache.org/jira/browse/MADLIB-1185
> Project: Apache MADlib
> Issue Type: Bug
> Components: DB Abstraction Layer
> Reporter: Nikhil
> Fix For: v1.13
>
>
> Running MADlib on postgres10 with a large dataset ( 98000 rows with a double
> array column) causes the database to crash.
> Repro Steps
> {code}
> 1. create table foo (id integer, x double precision[], y integer);
> 2. Insert at least 1 million rows like these
> id | x | y
> -------+-------------------------+---
> 97440 | {1,0.2,0,1,0,1,0,0,0,0} | 1
> 3. Now running any C madlib UDF followed by a count(*) of foo will cause the
> database to crash
> select madlib.poisson_random(1); select count(*) from foo;
> or
> select madlib.svec_plus('{1}:{5}', '{1}:{4}'); select count(*) from foo;
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)