[ https://issues.apache.org/jira/browse/MADLIB-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Frank McQuillan closed MADLIB-1406. ----------------------------------- Resolution: Fixed > DL: fit multiple takes up unnecessary disk space > ------------------------------------------------ > > Key: MADLIB-1406 > URL: https://issues.apache.org/jira/browse/MADLIB-1406 > Project: Apache MADlib > Issue Type: Bug > Components: Deep Learning > Reporter: Nikhil Kak > Assignee: Nikhil Kak > Priority: Major > Fix For: v1.17 > > > While testing places10 with fit multiple (gpdb5, 10 iterations and 20 msts), > we ran out of disk space although we had at least 1.5T left at the beginning > of the query. There is no reason for us to use this much space and this > probably means that there is a bug in the code > Here is the query and the failure > {code:java} > DROP TABLE IF EXISTS mst_table, mst_table_summary; > SELECT load_model_selection_table( > 'model_arch_places10', > 'mst_table', > ARRAY[1], > ARRAY[ > $$loss='categorical_crossentropy', optimizer='SGD(lr=0.1, decay=1e-6, > nesterov=True)', metrics=['accuracy']$$, > $$loss='categorical_crossentropy', optimizer='SGD(lr=0.01, > decay=1e-6, nesterov=True)', metrics=['accuracy']$$, > $$loss='categorical_crossentropy', optimizer='SGD(lr=0.001, > decay=1e-6, nesterov=True)', metrics=['accuracy']$$, > $$loss='categorical_crossentropy', optimizer='SGD(lr=0.0001, > decay=1e-6, nesterov=True)', metrics=['accuracy']$$, > $$loss='categorical_crossentropy', optimizer='SGD(lr=0.001, > decay=1e-6, nesterov=False)', metrics=['accuracy']$$ > ], > ARRAY[ > $$batch_size=16, epochs=1, verbose=0$$, > $$batch_size=20, epochs=1, verbose=0$$, > $$batch_size=32, epochs=1, verbose=0$$, > $$batch_size=40, epochs=1, verbose=0$$ > ] > ); > DROP TABLE if exists places10_train_mult_model, > places10_train_mult_model_summary, places10_train_mult_model_info; > SELECT madlib_keras_fit_multiple_model( > 'places10_train_bytea_batched', > 'places10_train_mult_model', > 'mst_table', > 10, > TRUE > ); > -- failed in the 7th iteration > .... > Time for training in iteration 6: 6403.70687222 sec > ERROR: plpy.SPIError: could not extend relation 1663/3721274/1121877: No > space left on device (seg1){code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)