[
https://issues.apache.org/jira/browse/MADLIB-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16655449#comment-16655449
]
Orhan Kislal commented on MADLIB-1257:
--------------------------------------
I ran the exact same query on OSX 10.13, PG 10.5, MADlib 1.15.1 and the run did
complete, without error. I did see that the memory usage increased over time up
to 4~5 GBs but that was not high enough to crash my system (16GB RAM).
> PostgreSQL crashed during random forest training
> ------------------------------------------------
>
> Key: MADLIB-1257
> URL: https://issues.apache.org/jira/browse/MADLIB-1257
> Project: Apache MADlib
> Issue Type: Bug
> Components: Module: Random Forest
> Reporter: Rahul Iyer
> Assignee: Orhan Kislal
> Priority: Major
> Fix For: v2.0
>
> Attachments: train_data.gz
>
>
> User reported bug:
> I got a problem when training the grouped data with random forest(300
> features). Small data was fine ( eg, 56K instances in 56 groups), but failed
> for 240K instances in 250 groups. Postgres forced to disconnect the session
> after showing the below message in verbose mode:
> {code:sql}
> NOTICE: view "__madlib_temp_60124179_1532371657_7130296__" will be a
> temporary view
> NOTICE: sql_create_empty_result_table:
> CREATE TABLE analysis.dx_rf_train_output_1 (
> gid integer,
> sample_id integer,
> tree madlib.bytea8);
> NOTICE: sql_refresh_training_pois_cnt:
> TRUNCATE TABLE
> __madlib_temp_91155016_1532371657_5660955__ CASCADE;
> INSERT INTO
> __madlib_temp_91155016_1532371657_5660955__
> SELECT
> *,
> madlib.poisson_random(1) AS poisson_count
> FROM
> (
> SELECT
> *,
> 0.::double precision AS
> __madlib_temp_14328459_1532371657_7318497__
> FROM analysis.dxpredict_svec
> ) subq
> WHERE __madlib_temp_14328459_1532371657_7318497__
> < 1
> NOTICE:
> src_cnt: 158360,
> oob_cnt: 92418,
> dup_cnt: 250617.
> NOTICE: Started tree building for all groups
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
> The PostgreSQL did not capture the detail log even I increased the
> logstatement to "all"
> 2018-07-23 14:47:50.229 EDT [1090] LOG: server process (PID 1980) was
> terminated by signal 11: Segmentation fault
> 2018-07-23 14:47:50.229 EDT [1090] DETAIL: Failed process was running:
> SELECT madlib.forest_train('analysis.dxpredict_svec',
> 'analysis.dx_rf_train_output_1',
> 'rowid',
> 'positive',
> '*',
> 'rowid,positive,case_icd',
> 'case_icd',
> 30::integer,
> 30::integer,
> TRUE::boolean,
> 1::integer,
> 10::integer,
> 3::integer,
> 1::integer,
> 10::integer,
> NULL,
> TRUE
> );
> 2018-07-23 14:47:50.229 EDT [1090] LOG: terminating any other active server
> processes
> 2018-07-23 14:47:50.229 EDT [1401] WARNING: terminating connection because
> of crash of another server process
> {code}
> Another observation - It crashed with 84 groups and 73K instance. In this
> scenario, I shall have pretty enough memory and disk.
> Also seems during the increasing of the groups, it used a lot of temporary
> disk space when the data is over certain groups.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)