Jingyi Mei created MADLIB-1224:
----------------------------------

             Summary: Select default buffer size for mini-batch preprocessor
                 Key: MADLIB-1224
                 URL: https://issues.apache.org/jira/browse/MADLIB-1224
             Project: Apache MADlib
          Issue Type: Improvement
          Components: Module: Utilities
            Reporter: Jingyi Mei
             Fix For: v1.14


As a follow up of https://issues.apache.org/jira/browse/MADLIB-1200

 

In minibatch_preprocessor, we made buffer_size as an optional parameter. If it 
is not set, some default value will be assigned. Current considerations are:
 # Within segment, each cell has 1GB limit so that we can't put too many rows 
into one super row to exceed the limit
 # Among segments, data should be distributed as equally as possible to avoid 
data skew so that GPDB can work more efficiently. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to