[ 
https://issues.apache.org/jira/browse/LUCENENET-414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13033031#comment-13033031
 ] 

Van Den Berghe, Vincent commented on LUCENENET-414:
---------------------------------------------------

Hello Digy,

Thanks for your response.
I don't want to sound overly pedantic (but please tell me if I do), but this 
changed implementation solves only part of the problem.
Now, CharArraySet derives from Set<T>, which itself derives from List<T>. Items 
are now stored both in this base class, as in the private HashSet<string> _Set.
However, because List<T> doesn't define its modifiers Add(T), Clear() and 
Remove(T) as virtual, the derived implementation defines them as "new". 
This violates a variant of the Liskov substitution principle: an operation on 
the derived type has not the same effect as the same operation on the base type.
In this case, it means that the following code will cause the items in the 
List<T> base type and in the _Set to be desynchronized:

        CharArraySet set=...
        List<string> same=set;
        same.Add("whatever");
        // at this point, same.Contains("whatever")==true but 
set.Contains("whatever")==false even though it's the same instance.

You might rightfully retort that this never happens and I should mind my own 
business, but I know at least one poor soul who did just that: me :-(.

On a completely unrelated matter, the new implementation has 2 methods:

                        public void Add(System.Collections.Generic.IList<T> 
items)
                        public void Add(Support.Set<T> items)

.. which can be collapsed into one, since the only thing used in both cases is 
the enumerator:

                        public void Add(IEnumerable<T> items)

I don't recall the design rule, but it's something like "to increase reuse, 
make your function parameters are general as possible, but their return value 
as specific as possible".
I am unable to get 2.9.4g to investigate further, but if you are moving towards 
the Generic collections in Lucene, the following implementation should be a 
drop-in replacement, without suffering from the aforementioned quirks:

                [Serializable]
                public class Set<T> : ICollection<T>
                {
                        private readonly System.Collections.Generic.HashSet<T> 
_Set = new System.Collections.Generic.HashSet<T>();
                        bool _ReadOnly = false;

                        public Set()
                        {
                        }

                        public Set(bool readOnly)
                        {
                                this._ReadOnly = readOnly;
                        }

                        public bool ReadOnly
                        {
                                set
                                {
                                        _ReadOnly = value;
                                }
                                get
                                {
                                        return _ReadOnly;
                                }
                        }

                        public virtual void Add(T item)
                        {
                                if (_ReadOnly) throw new 
NotSupportedException();
                                if (_Set.Contains(item)) return;
                                _Set.Add(item);
                        }

                        public void Add(IEnumerable<T> items)
                        {
                                if (_ReadOnly) throw new 
NotSupportedException();
                                foreach (T item in items)
                                {
                                        if (_Set.Contains(item)) continue;
                                        _Set.Add(item);
                                }
                        }


                        public void Clear()
                        {
                                if (_ReadOnly) throw new 
NotSupportedException();
                                _Set.Clear();
                        }

                        public bool Contains(T item)
                        {
                                return _Set.Contains(item);
                        }

                        public void CopyTo(T[] array, int arrayIndex)
                        {
                                _Set.CopyTo(array, arrayIndex);
                        }

                        public int Count
                        {
                                get { return _Set.Count; }
                        }

                        public bool IsReadOnly
                        {
                                get { return _ReadOnly; }
                        }

                        public bool Remove(T item)
                        {
                                if (_ReadOnly) throw new 
NotSupportedException();
                                return _Set.Remove(item);
                        }

                        public IEnumerator<T> GetEnumerator()
                        {
                                return _Set.GetEnumerator();
                        }

                        System.Collections.IEnumerator 
System.Collections.IEnumerable.GetEnumerator()
                        {
                                return _Set.GetEnumerator();
                        }


                        public void 
RemoveAll(System.Collections.Generic.IList<T> list)
                        {
                                if (_ReadOnly) throw new 
NotSupportedException();
                        }

                        public void 
RetainAll(System.Collections.Generic.IList<T> list)
                        {
                                if (_ReadOnly) throw new 
NotSupportedException();
                        }

                        public void RemoveAll(System.Collections.ArrayList list)
                        {
                                if (_ReadOnly) throw new 
NotSupportedException();
                        }

                        public void RetainAll(System.Collections.ArrayList list)
                        {
                                if (_ReadOnly) throw new 
NotSupportedException();
                        }
                }

I've copied RemoveAll/RetainAll verbatim, since I don't know what they are for. 
I also left them nonvirtual.
Note that it implements ICollection<T> and not IList<T>: from what I can 
determine, IList<T> is not necessary. It's also weird to have an indexed access 
on a type that by definition cannot have one.
My medicine won't work if non-generic collection interfaces are still lurking 
in other parts of the code, because List<T>, for historical reasons, also 
implements the non-generic IList and ICollection interfaces.
Alas, this opens another can of worms (IList<T> doesn't implement IList, 
ICollection<T> doesn't implement ICollection, but IEnumerable<T> implements 
IEnumerable), but the margin of this e-mail is too small to write down the cure.
I'll stop rambling now and will try my own medicine as soon as I'm able to get 
the 2.9.4g from the ASF SVN.

In the meantime, have a nice week-end and thank you for being the cornerstone 
of .NET's premier search engine.


Vincent 



> The definition of CharArraySet is dangerously confusing and leads to bugs 
> when used.
> ------------------------------------------------------------------------------------
>
>                 Key: LUCENENET-414
>                 URL: https://issues.apache.org/jira/browse/LUCENENET-414
>             Project: Lucene.Net
>          Issue Type: Bug
>          Components: Lucene.Net Core
>    Affects Versions: Lucene.Net 2.9.2
>         Environment: Irrelevant
>            Reporter: Vincent Van Den Berghe
>            Priority: Minor
>             Fix For: Lucene.Net 2.9.2
>
>
> Right now, CharArraySet derives from System.Collections.Hashtable, but 
> doesn't actually use this base type for storing elements.
> However, the StandardAnalyzer.STOP_WORDS_SET is exposed as a 
> System.Collections.Hashtable. The trivial code to build your own stopword set 
> using the StandardAnalyzer.STOP_WORDS_SET and adding your own set of 
> stopwords like this:
> CharArraySet myStopWords = new CharArraySet(StandardAnalyzer.STOP_WORDS_SET, 
> ignoreCase: false);
> foreach (string domainSpecificStopWord in DomainSpecificStopWords)
>     stopWords.Add(domainSpecificStopWord);
> ... will fail because the CharArraySet accepts an ICollection, which will be 
> passed the Hashtable instance of STOP_WORDS_SET: the resulting myStopWords 
> will only contain the DomainSpecificStopWords, and not those from 
> STOP_WORDS_SET.
> One workaround would be to replace the first line with this:
> CharArraySet stopWords = new 
> CharArraySet(StandardAnalyzer.STOP_WORDS_SET.Count + 
> DomainSpecificStopWords.Length, ignoreCase: false);
> foreach (string domainSpecificStopWord in 
> (CharArraySet)StandardAnalyzer.STOP_WORDS_SET)
>     stopWords.Add(domainSpecificStopWord);
> ... but this makes use of the implementation detail (the STOP_WORDS_SET is 
> really an UnmodifiableCharArraySet which is itself a CharArraySet). It works 
> because it forces the foreach() to use the correct 
> CharArraySet.GetEnumerator(), which is defined as a "new" method (this has a 
> bad code smell to it)
> At least 2 possibilities exist to solve this problem:
> - Make CharArraySet use the Hashtable instance and a custom comparator, 
> instead of its own implementation.
> - Make CharArraySet use HashSet<char[]>, defined in .NET 4.0.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to