[ https://issues.apache.org/jira/browse/MADLIB-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16308960#comment-16308960 ]
ASF GitHub Bot commented on MADLIB-1167: ---------------------------------------- GitHub user fmcquillan99 opened a pull request: https://github.com/apache/madlib/pull/222 minor update to summary() user docs to finish off https://issues.apache.org/jira/browse/MADLIB-1167 You can merge this pull request into a Git repository by running: $ git pull https://github.com/fmcquillan99/incubator-madlib summary-v1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/madlib/pull/222.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #222 ---- commit 15628d63bccd4b04789d8963ad1291531f312dc1 Author: Frank McQuillan <fmcquillan@...> Date: 2018-01-03T01:10:29Z minor update to summary() user docs ---- > Summary - add more statistics > ----------------------------- > > Key: MADLIB-1167 > URL: https://issues.apache.org/jira/browse/MADLIB-1167 > Project: Apache MADlib > Issue Type: Improvement > Components: Module: Descriptive Statistics > Reporter: Frank McQuillan > Assignee: Jingyi Mei > Fix For: v1.14 > > > In the summary function > http://madlib.apache.org/docs/latest/group__grp__summary.html > add additional statistics: > 1) % positive values > 2) % negative values > 3) % zero values > 4) confidence intervals (95% ?) on mean > * does this make sense, since need to assume a distribution for the data > which we probably cannot infer? > 5) Also please check why min and max are being reported for non-numeric cols. > Is this a bug? > {code} > madlib=# SELECT * FROM houses_summary where target_column='zipcode'; > -[ RECORD 1 ]--------+---------------- > group_by | > group_by_value | > target_column | zipcode > column_number | 8 > data_type | text > row_count | 15 > distinct_values | 2 > missing_values | 0 > blank_values | 0 > fraction_missing | 0 > fraction_blank | 0 > mean | > variance | > min | 6 > max | 6 > first_quartile | > median | > third_quartile | > most_frequent_values | {94301y,84301x} > mfv_frequencies | {10,5} > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)