Jingyi Mei created MADLIB-1224:
----------------------------------
Summary: Select default buffer size for mini-batch preprocessor
Key: MADLIB-1224
URL: https://issues.apache.org/jira/browse/MADLIB-1224
Project: Apache MADlib
Issue Type: Improvement
Components: Module: Utilities
Reporter: Jingyi Mei
Fix For: v1.14
As a follow up of https://issues.apache.org/jira/browse/MADLIB-1200
In minibatch_preprocessor, we made buffer_size as an optional parameter. If it
is not set, some default value will be assigned. Current considerations are:
# Within segment, each cell has 1GB limit so that we can't put too many rows
into one super row to exceed the limit
# Among segments, data should be distributed as equally as possible to avoid
data skew so that GPDB can work more efficiently.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)