[ https://issues.apache.org/jira/browse/LUCENE-9236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17055035#comment-17055035 ]
juan camilo rodriguez duran commented on LUCENE-9236: ----------------------------------------------------- [~rcmuir] could you please elaborate a bit more why introducing sub formats having a single responsibility and the code in the same file will increase complexity? today Lucene80DocValuesProducer is 1565 lines to read, with the approach I'm proposing we will have at beginning 3 classes of 500 lines each one a bit more easy to digest. I just want to know which factors are important to continue keep the code as it is. > Having a modular Doc Values format > ---------------------------------- > > Key: LUCENE-9236 > URL: https://issues.apache.org/jira/browse/LUCENE-9236 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index > Reporter: juan camilo rodriguez duran > Priority: Minor > Labels: docValues > > Today DocValues Consumer/Producer require override 5 different methods, even > if you only want to use one and given that one given field can only support > one doc values type at same time. > > In the attached PR I’ve implemented a new modular version of those classes > (consumer/producer) each one having a single responsibility and writing in > the same unique file. > This is mainly a refactor of the existing format opening the possibility to > override or implement the sub-format you need. > > I’ll do in 3 steps: > # Create a CompositeDocValuesFormat and moving the code of > Lucene80DocValuesFormat in separate classes, without modifying the inner > code. At same time I created a Lucene85CompositeDocValuesFormat based on > these changes. > # I’ll introduce some basic components for writing doc values in general > such as: > ## DocumentIdSetIterator Serializer: used in each type of field based on an > IndexedDISI. > ## Document Ordinals Serializer: Used in Sorted and SortedSet for > deduplicate values using a dictionary. > ## Document Boundaries Serializer (optional used only for multivalued > fields: SortedNumeric and SortedSet) > ## TermsEnum Serializer: useful to write and read the terms dictionary for > sorted and sorted set doc values. > # I’ll create the new Sub-DocValues format using the previous components. > > PR: [https://github.com/apache/lucene-solr/pull/1282] -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org