[ 
https://issues.apache.org/jira/browse/LUCENENET-612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shad Storhaug resolved LUCENENET-612.
-------------------------------------
    Fix Version/s: Lucene.Net 4.8.0
       Resolution: Fixed

This has now been resolved in Lucene.NET 4.8.0-beta00007

> SERIOUS issues with PerFieldAnalyzerWrapper in 4.8
> --------------------------------------------------
>
>                 Key: LUCENENET-612
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-612
>             Project: Lucene.Net
>          Issue Type: Bug
>          Components: Lucene.Net.Analysis.Common
>    Affects Versions: Lucene.Net 4.8.0
>            Reporter: Shad Storhaug
>            Priority: Major
>             Fix For: Lucene.Net 4.8.0
>
>   Original Estimate: 16h
>  Remaining Estimate: 16h
>
> This came in on the user mailing list on 15-July-2019 and was originally 
> reported by Bryan Rojo ([email protected])
>  
> {quote}Not necessarily a bug, but for some people who use 
> PerFieldAnalyzerWrapper like I do this might be worth noting.
> PerFieldAnalyzerWrapper has been "improved" in 4.8 and now uses a 
> PER_FIELD_REUSE_STRATEGY which means that the tokenized fields will be stored 
> in a dictionary, so If you have multiple fields with the same name in your 
> document, then you will only be able to index the very first one that makes 
> it into that dictionary.
> So the problem with this is that you can potentially lose thousands of terms 
> in your index, which could cause your searches to be of very low quality.
> BEWARE.
> {quote}
>  
> There are 2 issues that need to be resolved to address this:
> 1. The documentation for {{PerFieldAnalyzerWrapper}} should be updated to 
> inform users that if they need to use multiple dictionary keys with the same 
> name, they should use {{TreeDictionary<K, V>}}.
> 2. {{TreeDictionary<K, V>}} does not currently implement 
> {{System.Collections.Generic.IDictionary<TKey, TValue>}}, as it was brought 
> over from C5 as-is.
> Another thing of note is that C5 has added support for .NET Standard 1.0 
> since this was brought over.
> However, there still seems to be a few problems that make the C5 types 
> incompatible with Lucene.Net, most notably the lack of support for 
> {{System.Collections.Generic.IDictionary<TKey, TValue>}} in 
> {{TreeDictionary}} and {{System.Collections.Generic.ISet<T>}} in {{TreeSet}} 
> (the latter of which has already been patched in 
> {{Lucene.Net.Support.TreeSet}}).
> I [reported|https://github.com/sestoft/C5/issues/53] the lack of support for 
> {{ISet<T>}} on 6-Nov-2016, but although the maintainers agree this should be 
> done, it still hasn't been. Perhaps a PR to the C5 project is the way to get 
> this done, which would allow us to finally remove these collection copies 
> from Lucene.Net.Support and add a package dependency on C5.
> Another option is to shop around to see if there are any other generic 
> TreeSet/TreeDictionary implementations that have popped up since late 2016 
> that we can check for compatibility.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to