[ https://issues.apache.org/jira/browse/OAK-6381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063301#comment-16063301 ]
Thomas Mueller edited comment on OAK-6381 at 11/15/17 11:08 AM: ---------------------------------------------------------------- svn.apache.org/r1799938 (trunk) New method LuceneIndex.getFieldTermsInfo(path, field, max). * path is the index path (for example /oak:index/lucene), * field is the field name (for example ":path"), * max is the number of entries to list (for example 100) was (Author: tmueller): svn.apache.org/r1799938 (trunk) New method LuceneIndex.getFieldTerms(path, field, max). * path is the index path (for example /oak:index/lucene), * field is the field name (for example ":path"), * max is the number of entries to list (for example 100) > Improved index analysis tools > ----------------------------- > > Key: OAK-6381 > URL: https://issues.apache.org/jira/browse/OAK-6381 > Project: Jackrabbit Oak > Issue Type: Improvement > Reporter: Thomas Mueller > Assignee: Thomas Mueller > Fix For: 1.8 > > > It would be good to have more tools to analyze indexes: > * For Lucene indexes, get a histogram of samples (terms). We have > "getFieldInfo", which shows which fields are how common, but we don't have > terms. For example the /oak:index/lucene index contains 1 million fulltext > fields and node names for 1 million nodes, but I wonder why, and what typical > nodes names are, and maybe fulltext for most nodes is actually empty. Maybe a > new method "getTermHistogram(int sampleCount)" or similar > * For property indexes, number of updated nodes per second or so. Right now > we can just analyze the counts per key, but some indexes / keys are very > volatile (see many short lived entries) > * For Lucene indexes, writes per second or so (in MB). > * How indexes are used (approximate read nodes / MB per hours) -- This message was sent by Atlassian JIRA (v6.4.14#64029)