[ 
https://issues.apache.org/jira/browse/MADLIB-1406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Frank McQuillan closed MADLIB-1406.
-----------------------------------
    Resolution: Fixed

> DL: fit multiple takes up unnecessary disk space
> ------------------------------------------------
>
>                 Key: MADLIB-1406
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1406
>             Project: Apache MADlib
>          Issue Type: Bug
>          Components: Deep Learning
>            Reporter: Nikhil Kak
>            Assignee: Nikhil Kak
>            Priority: Major
>             Fix For: v1.17
>
>
> While testing places10 with fit multiple (gpdb5, 10 iterations and 20 msts), 
> we ran out of disk space although we had at least 1.5T left at the beginning 
> of the query. There is no reason for us to use this much space and this 
> probably means that there is a bug in the code
> Here is the query and the failure
> {code:java}
> DROP TABLE IF EXISTS mst_table, mst_table_summary;
> SELECT load_model_selection_table(
>     'model_arch_places10',
>     'mst_table',
>     ARRAY[1],
>     ARRAY[
>         $$loss='categorical_crossentropy', optimizer='SGD(lr=0.1, decay=1e-6, 
> nesterov=True)', metrics=['accuracy']$$,
>         $$loss='categorical_crossentropy', optimizer='SGD(lr=0.01, 
> decay=1e-6, nesterov=True)', metrics=['accuracy']$$,
>         $$loss='categorical_crossentropy', optimizer='SGD(lr=0.001, 
> decay=1e-6, nesterov=True)', metrics=['accuracy']$$,
>         $$loss='categorical_crossentropy', optimizer='SGD(lr=0.0001, 
> decay=1e-6, nesterov=True)', metrics=['accuracy']$$,
>         $$loss='categorical_crossentropy', optimizer='SGD(lr=0.001, 
> decay=1e-6, nesterov=False)', metrics=['accuracy']$$
>     ],
>     ARRAY[
>         $$batch_size=16, epochs=1, verbose=0$$,
>         $$batch_size=20, epochs=1, verbose=0$$,
>         $$batch_size=32, epochs=1, verbose=0$$,
>         $$batch_size=40, epochs=1, verbose=0$$
>     ]
> );
> DROP TABLE if exists places10_train_mult_model, 
> places10_train_mult_model_summary, places10_train_mult_model_info;
> SELECT madlib_keras_fit_multiple_model(
>     'places10_train_bytea_batched',
>     'places10_train_mult_model',
>     'mst_table',
>     10,
>     TRUE
> );
> -- failed in the 7th iteration
> ....
> Time for training in iteration 6: 6403.70687222 sec
> ERROR:  plpy.SPIError: could not extend relation 1663/3721274/1121877: No 
> space left on device  (seg1){code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to