Nikhil Kak created MADLIB-1406:
----------------------------------
Summary: DL: fit multiple takes up unnecessary disk space
Key: MADLIB-1406
URL: https://issues.apache.org/jira/browse/MADLIB-1406
Project: Apache MADlib
Issue Type: Bug
Components: Deep Learning
Reporter: Nikhil Kak
Fix For: v1.17
While testing places10 with fit multiple (gpdb5, 10 iterations and 20 msts), we
ran out of disk space although we had at least 1.5T left at the beginning of
the query. There is no reason for us to use this much space and this probably
means that there is a bug in the code
Here is the query and the failure
{code:java}
DROP TABLE IF EXISTS mst_table, mst_table_summary;
SELECT load_model_selection_table(
'model_arch_places10',
'mst_table',
ARRAY[1],
ARRAY[
$$loss='categorical_crossentropy', optimizer='SGD(lr=0.1, decay=1e-6,
nesterov=True)', metrics=['accuracy']$$,
$$loss='categorical_crossentropy', optimizer='SGD(lr=0.01, decay=1e-6,
nesterov=True)', metrics=['accuracy']$$,
$$loss='categorical_crossentropy', optimizer='SGD(lr=0.001, decay=1e-6,
nesterov=True)', metrics=['accuracy']$$,
$$loss='categorical_crossentropy', optimizer='SGD(lr=0.0001,
decay=1e-6, nesterov=True)', metrics=['accuracy']$$,
$$loss='categorical_crossentropy', optimizer='SGD(lr=0.001, decay=1e-6,
nesterov=False)', metrics=['accuracy']$$
],
ARRAY[
$$batch_size=16, epochs=1, verbose=0$$,
$$batch_size=20, epochs=1, verbose=0$$,
$$batch_size=32, epochs=1, verbose=0$$,
$$batch_size=40, epochs=1, verbose=0$$
]
);
DROP TABLE if exists places10_train_mult_model,
places10_train_mult_model_summary, places10_train_mult_model_info;
SELECT madlib_keras_fit_multiple_model(
'places10_train_bytea_batched',
'places10_train_mult_model',
'mst_table',
10,
TRUE
);
-- failed in the 7th iteration
....
Time for training in iteration 6: 6403.70687222 sec
ERROR: plpy.SPIError: could not extend relation 1663/3721274/1121877: No space
left on device (seg1){code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)