[
https://issues.apache.org/jira/browse/MADLIB-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Domino Valdano updated MADLIB-1340:
-----------------------------------
Description:
The minibatcher's internal logic for picking a default batch size isn't strict
enough. It can crash for arrays of datatypes which are less than 32-bits. I
tried to come up with a simple repro, but it still needs some work. Here's
what I have now, for 16-bit type REAL[], haven't had a chance to test it yet:
madlib=# CREATE TABLE foo AS SELECT ARRAY[i,i,i,i,i] AS x, 1 as y FROM (SELECT
ARRAY[i,i,i,i,i] AS i FROM (SELECT GENERATE_SERIES(1,6*1024*1024) AS i) a1 ) a;
madlib=# \d foo;
Table "public.foo"
Column | Type | Modifiers
--------+---------+-----------
id | integer |
x | integer[] |
y | integer |
Distributed randomly
madlib=# SELECT madlib.minibatch_preprocessor_dl('foo','foo_batched', 'y',
'x');
was:
The minibatcher's internal logic for picking a default batch size isn't strict
enough. It can crash for arrays of datatypes which are less than 32-bits. I
tried to come up with a simple repro, but it still needs some work. Here's
what I have now, for 16-bit type REAL[], haven't had a chance to test it yet:
madlib=# CREATE TABLE foo AS SELECT id, ARRAY[1.0,2.0,3.0,4.0]::REAL[] AS x, 1
as Y FROM (SELECT GENERATE_SERIES(1,33*1024*1024) AS id) ids;
NOTICE: Table doesn't have 'DISTRIBUTED BY' clause. Creating a NULL policy
entry.
SELECT 2097152
madlib=# \d foo;
Table "public.foo"
Column | Type | Modifiers
--------+---------+-----------
id | integer |
x | real[] |
y | integer |
Distributed randomly
madlib=# SELECT madlib.minibatch_preprocessor_dl('foo','foo_batched', 'y',
'x');
The above example takes a long time to generate the table. Working on
something that runs a bit faster, but not quite finished yet. We need
something like this:
CREATE TABLE foo AS SELECT ARRAY[i,i,i,i,i] FROM (SELECT ARRAY[i,i,i,i,i] AS i
FROM (SELECT GENERATE_SERIES(1,9*1024) AS i) a1 ) a2;
> minibatch_preprocessor_dl crashes with default batch size
> ---------------------------------------------------------
>
> Key: MADLIB-1340
> URL: https://issues.apache.org/jira/browse/MADLIB-1340
> Project: Apache MADlib
> Issue Type: Bug
> Components: Deep Learning
> Affects Versions: v1.16
> Reporter: Domino Valdano
> Priority: Minor
> Fix For: v1.16
>
>
> The minibatcher's internal logic for picking a default batch size isn't
> strict enough. It can crash for arrays of datatypes which are less than
> 32-bits. I tried to come up with a simple repro, but it still needs some
> work. Here's what I have now, for 16-bit type REAL[], haven't had a chance
> to test it yet:
> madlib=# CREATE TABLE foo AS SELECT ARRAY[i,i,i,i,i] AS x, 1 as y FROM
> (SELECT ARRAY[i,i,i,i,i] AS i FROM (SELECT GENERATE_SERIES(1,6*1024*1024) AS
> i) a1 ) a;
> madlib=# \d foo;
> Table "public.foo"
> Column | Type | Modifiers
> --------+---------+-----------
> id | integer |
> x | integer[] |
> y | integer |
> Distributed randomly
> madlib=# SELECT madlib.minibatch_preprocessor_dl('foo','foo_batched', 'y',
> 'x');
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)