[ https://issues.apache.org/jira/browse/MADLIB-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16043374#comment-16043374 ]
ASF GitHub Bot commented on MADLIB-1117: ---------------------------------------- Github user asfgit closed the pull request at: https://github.com/apache/incubator-madlib/pull/138 > Add "columns to process per pass" as an optional param for summary() > -------------------------------------------------------------------- > > Key: MADLIB-1117 > URL: https://issues.apache.org/jira/browse/MADLIB-1117 > Project: Apache MADlib > Issue Type: Improvement > Components: Module: Sketch-based Estimators > Reporter: Frank McQuillan > Assignee: Rahul Iyer > Priority: Minor > Fix For: v1.12 > > > Context > The summary() function > http://madlib.incubator.apache.org/docs/latest/group__grp__summary.html > currently processes 15 columns per pass to keep memory usage below 1 GB > limit. This is a somewhat arbitrary limit since memory usage depends on many > things including data set, and which params in summary() are set. If more > columns per pass could be used, summary() would run faster. > Story > As a MADlib developer, I want to add "columns to process per pass" as an > optional param for summary() function. Default: use 15 columns (which is the > current setting). Suggested param name: "columns_per_pass" though if you > have a better name, that's fine. > Acceptance > 1) Add new optional parameter and update docs. Please add a note so it is > clear what this control does. > 2) Write and pass tests. -- This message was sent by Atlassian JIRA (v6.3.15#6346)