Nandish Jayaram created MADLIB-1259: ---------------------------------------
Summary: PostgreSQL out of memory issue with Neural networks training Key: MADLIB-1259 URL: https://issues.apache.org/jira/browse/MADLIB-1259 Project: Apache MADlib Issue Type: Bug Components: Module: Neural Networks Reporter: Nandish Jayaram Fix For: v2.0 Neural network training results in an out of memory exception in the following scenario: * 16 GB RAM * Dataset: Same as the one used in https://issues.apache.org/jira/browse/MADLIB-1257. 23K instances / 300 features in 263 groups * PostgreSQL memory setup. checkpoint_completion_target = '0.9'; default_statistics_target = '500'; effective_cache_size = '12GB'; effective_io_concurrency = '200'; maintenance_work_mem = '2GB'; max_connections = '20'; max_parallel_workers = '4'; max_parallel_workers_per_gather = '2'; max_wal_size = '8GB'; max_worker_processes = '4'; min_wal_size = '4GB'; random_page_cost = '1.1'; shared_buffers = '4GB'; wal_buffers = '16MB'; work_mem = '52428kB'; sysctl -w vm.overcommit_memory=2 to avoid the crash of postmaster With the above database settings and dataset size, the following query resulted in an error: {code:java} SELECT madlib.mlp_classification( 'train_data_sub', -- Source table 'mlp_model', -- Destination table 'features', -- Input features 'positive', -- Label ARRAY[5], -- Number of units per layer 'learning_rate_init=0.003, n_iterations=500, tolerance=0', -- Optimizer params 'tanh', -- Activation function NULL, -- Default weight (1) FALSE, -- No warm start true, -- verbose 'case_icd' -- Grouping ); ERROR: spiexceptions.OutOfMemory: out of memory DETAIL: Failed on request of size 32800. CONTEXT: Traceback (most recent call last): PL/Python function "mlp_classification", line 36, in <module> grouping_col PL/Python function "mlp_classification", line 45, in wrapper PL/Python function "mlp_classification", line 325, in mlp PL/Python function "mlp_classification", line 580, in update PL/Python function "mlp_classification" {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)