[ 
https://issues.apache.org/jira/browse/MADLIB-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16851415#comment-16851415
 ] 

Jingyi Mei commented on MADLIB-1340:
------------------------------------

Here is the reason why it crashes:

 

In buffer size calculator, we assumed the array to be packed is one dimention, 
and use the array_upper(x, 1) to get the length
of the array. In DL input data preprocessor, we passed the first element
in array_ndims as the length, which is not right because the
array can be multi-dimentional. This caused default_buffer_size_calculator 
returns a bigger buffer size than it supposed to have and rows get too big and 
crashes database.

Instead, we should use
product of all the elements from array_ndims to represent the actual length of
the array. For example, if array_ndims returns [32,32,3], we should pass
32*32*3 instead of 32.

> minibatch_preprocessor_dl crashes with default batch size
> ---------------------------------------------------------
>
>                 Key: MADLIB-1340
>                 URL: https://issues.apache.org/jira/browse/MADLIB-1340
>             Project: Apache MADlib
>          Issue Type: Bug
>          Components: Deep Learning
>    Affects Versions: v1.16
>            Reporter: Domino Valdano
>            Priority: Minor
>             Fix For: v1.16
>
>
> The minibatcher's internal logic for picking a default batch size isn't 
> strict enough.  It can crash for arrays of datatypes which are less than 
> 32-bits.  I tried to come up with a simple repro, but it still needs some 
> work.  Here's what I have now, for 16-bit type REAL[], haven't had a chance 
> to test it yet:
> madlib=# CREATE TABLE foo AS SELECT ARRAY[i,i,i,i,i] AS x, 1 as y FROM 
> (SELECT ARRAY[i,i,i,i,i] AS i FROM (SELECT GENERATE_SERIES(1,6*1024*1024) AS 
> i) a1 ) a;
> madlib=# \d foo;
>       Table "public.foo"
>  Column |  Type   | Modifiers
> --------+---------+-----------
>  x      | integer[]  |
>  y      | integer |
> Distributed randomly
> madlib=# SELECT madlib.minibatch_preprocessor_dl('foo','foo_batched',   'y',  
>   'x');
> TODO:  above example doesn't actually work, because it only has 6-million 
> rows.  Generate an example with at least 150-million rows, and it should work 
> (ie, crash).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to