[
https://issues.apache.org/jira/browse/MADLIB-1334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nikhil Kak reopened MADLIB-1334:
--------------------------------
Assignee: Nikhil Kak
> Mini-batch preprocessor for DL running very slowly
> --------------------------------------------------
>
> Key: MADLIB-1334
> URL: https://issues.apache.org/jira/browse/MADLIB-1334
> Project: Apache MADlib
> Issue Type: Bug
> Components: Module: Utilities
> Reporter: Frank McQuillan
> Assignee: Nikhil Kak
> Priority: Major
> Fix For: v1.16
>
>
> Observed on 2-segment Greenplum 5.x cluster using lastest build from MASTER:
> current `minibatch_preprocessor`
> 1) 60K MNIST training examples = 28.1 sec
> 2) 10K MNIST test examples = 5.9 sec
> new `minibatch_preprocessor_dl`
> 3) 60K MNIST training examples = 1912.3 sec
> 4) 10K MNIST test examples = 24.2 sec
> Wonder if there is a bug here, or at least a performance issue? I thought
> `minibatch_preprocessor_dl` was supposed to be faster than
> `minibatch_preprocessor`
> (1)
> {code}
> madlib=#
> madlib=# SELECT madlib.minibatch_preprocessor('mnist_train', --
> Source table
> madlib(# 'mnist_train_packed', --
> Output table
> madlib(# 'y', --
> Dependent variable
> madlib(# 'x', --
> Independent variables
> madlib(# NULL, --
> Grouping
> madlib(# NULL, --
> Buffer size
> madlib(# TRUE --
> One-hot encode integer dependent var
> madlib(# );
> minibatch_preprocessor
> ------------------------
>
> (1 row)
> Time: 28093.977 ms
> {code}
> (2)
> {code}
> madlib=# SELECT madlib.minibatch_preprocessor('mnist_test', -- Source
> table
> madlib(# 'mnist_test_packed', -- Output
> table
> madlib(# 'y', --
> Dependent variable
> madlib(# 'x', --
> Independent variables
> madlib(# NULL, --
> Grouping
> madlib(# NULL, --
> Buffer size
> madlib(# TRUE --
> One-hot encode integer dependent var
> madlib(# );
> minibatch_preprocessor
> ------------------------
>
> (1 row)
> Time: 5934.194 ms
> {code}
> (3)
> {code}
> madlib=# SELECT madlib.minibatch_preprocessor_dl('mnist_train', --
> Source table
> madlib(# 'mnist_train_packed', --
> Output table
> madlib(# 'y', --
> Dependent variable
> madlib(# 'x', --
> Independent variable
> madlib(# NULL, --
> Buffer size
> madlib(# 255, --
> Normalizing constant
> madlib(# NULL
> madlib(# );
> minibatch_preprocessor_dl
> ---------------------------
>
> (1 row)
> Time: 1912268.396 ms
> {code}
> (4)
> {code}
> madlib=# SELECT madlib.minibatch_preprocessor_dl('mnist_test', --
> Source table
> madlib(# 'mnist_test_packed', --
> Output table
> madlib(# 'y', --
> Dependent variable
> madlib(# 'x', --
> Independent variable
> madlib(# NULL, --
> Buffer size
> madlib(# 255, --
> Normalizing constant
> madlib(# NULL
> madlib(# );
> minibatch_preprocessor_dl
> ---------------------------
>
> (1 row)
> Time: 24192.195 ms
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)