[ 
https://issues.apache.org/jira/browse/LUCENE-9236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17046339#comment-17046339
 ] 

juan camilo rodriguez duran commented on LUCENE-9236:
-----------------------------------------------------

[~dsmiley] just throwing the exception wouldn't work at least If you want to 
run all test (and be compliant with the API), and here is the main point of 
this API, why if this is used independently, the API force you to support other 
sub formats that can't co-exist at same time for a given field. This same 
pattern is replicated using the EmptyDocValuesProducer, ideally 
DocValues#checkField would be easier if we use only the sub formats.

But still this is not the point of this PR, the first objective is at least 
simplify the code readability by spiting the big classes 
DocValuesProducer/Consumer into Single responsible classes, then do a refactor 
to have more symmetric read and writing classes, and finally If it worth to 
refactor some common components between all formats as the DISI iterator 
reading and writing part.  

> Having a modular Doc Values format
> ----------------------------------
>
>                 Key: LUCENE-9236
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9236
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/index
>            Reporter: juan camilo rodriguez duran
>            Priority: Minor
>              Labels: docValues
>
>  Today DocValues Consumer/Producer require override 5 different methods, even 
> if you only want to use one and given that one given field can only support 
> one doc values type at same time.
>  
> In the attached PR I’ve implemented a new modular version of those classes 
> (consumer/producer) each one having a single responsibility and writing in 
> the same unique file.
> This is mainly a refactor of the existing format opening the possibility to 
> override or implement the sub-format you need.
>  
> I’ll do in 3 steps:
>  # Create a CompositeDocValuesFormat and moving the code of 
> Lucene80DocValuesFormat in separate classes, without modifying the inner 
> code. At same time I created a Lucene85CompositeDocValuesFormat based on 
> these changes.
>  # I’ll introduce some basic components for writing doc values in general 
> such as:
>  ## DocumentIdSetIterator Serializer: used in each type of field based on an 
> IndexedDISI.
>  ## Document Ordinals Serializer: Used in Sorted and SortedSet for 
> deduplicate values using a dictionary.
>  ## Document Boundaries Serializer (optional used only for multivalued 
> fields: SortedNumeric and SortedSet)
>  ## TermsEnum Serializer: useful to write and read the terms dictionary for 
> sorted and sorted set doc values.
>  # I’ll create the new Sub-DocValues format using the previous components.
>  
> PR: [https://github.com/apache/lucene-solr/pull/1282]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to