RE: Bug in Lucene static initialization with multiple threads.

Van Den Berghe, Vincent Tue, 21 Mar 2017 09:24:31 -0700

Hello Shad,

Please find the files attached.

Have fun with them and let me know what develops!

Vincent

-----Original Message-----
From: Shad Storhaug [mailto:[email protected]] 
Sent: Tuesday, March 21, 2017 3:42 PM
To: Van Den Berghe, Vincent <[email protected]>
Cc: [email protected]
Subject: RE: Bug in Lucene static initialization with multiple threads.

Hi Vincent,

Sure, you can email the files to me directly.

For a quick start on Git/GitHub, there is a fairly short book called Pragmatic 
Version Control Using Git 
(https://pragprog.com/book/tsgit/pragmatic-version-control-using-git) that gets 
you up and running quickly. I think you might be attempting to push to the main 
repo - and you won't have permission unless it is explicitly granted. What you 
need to do is to fork the repo to your own GitHub account, then you can 
read/write it as much as necessary. Once you get it to a point where you want 
to submit something, you can do a pull request (either through GitHub or just 
manually email a request) and someone else can then review and merge the 
changes.

Update

I found the source of the 
Lucene.Net.Tests.Index.TestIndexReaderWriter.TestDuringAddIndexes() problem - 
it always occurs when you call an overload of the IndexSearcher constructor 
that takes a TaskScheduler as a parameter and pass a non-null value. This is 
built into the test framework 
(https://github.com/apache/lucenenet/blob/api-work/src/Lucene.Net.TestFramework/Util/LuceneTestCase.cs#L1778)
 to happen rarely, which explains many of the random failures we are seeing. If 
you change the random code to never use a TaskScheduler, the test will always 
pass, change it to always use a TaskScheduler and it will always fail.

The implementation of TaskScheduler we are using for testing 
(https://github.com/apache/lucenenet/blob/api-work/src/Lucene.Net.Core/Support/LimitedConcurrencyLevelTaskScheduler.cs)
 was copied directly from MSDN, so I doubt that is the issue. In fact, there is 
a good chance that the issue is similar to the WeakIdentityMap issue in that 
there is an enumerator/call to enumerator that is not thread-safe see 
(https://github.com/apache/lucenenet/blob/api-work/src/Lucene.Net.Core/Search/IndexSearcher.cs#L474-L500)
 and 
(https://github.com/apache/lucenenet/blob/api-work/src/Lucene.Net.Core/Search/IndexSearcher.cs#L569-L590).

Anyway, I know of at least 3 tests that are failing as a result of this, so 
fixing it would be a big prize.

Thanks,
Shad Storhaug (NightOwl888)

-----Original Message-----
From: Van Den Berghe, Vincent [mailto:[email protected]] 
Sent: Tuesday, March 21, 2017 8:51 PM
To: Shad Storhaug
Cc: [email protected]
Subject: RE: Bug in Lucene static initialization with multiple threads.

Hello Shad,

I had a little time on my hands and looked into this WeakIdentityMap issue, 
more specifically TestConcurrentHashMap which fails for me as well (in 100% of 
the cases). Maybe I have something to contribute: I have 3 observations:

First, notice that in TestConcurrentHashMap , 8 threads are created and then 
all joined by doing the following:

            finally
            {
                foreach (var w in workers)
                {
                    w.Join(1000L);
                }
            }

This gives the first thread 1 second to end, the second one at most 2 seconds 
(1 second + whatever time the first thread needed to end) and so on.  Given the 
amount of work of each test thread, this is far too little time even on a fast 
machine. It takes 13 seconds for all threads to end here.
The corresponding java test has the following:

   while (!exec.awaitTermination(1000L, TimeUnit.MILLISECONDS));

... which in effect just loops until the execution of each thread is finished, 
in units of 1 second.
In TPL, threads would be tasks and we would just be able to call  Task.WaitAll. 
Since we're dealing with "real" threads here,  I would suggest just call 
w.Join() and be done with it.
This would align the test with the java behavior.

Second, there are various weaknesses in the WeakIdentityMap:
1) the implementation of the Keys enumerator 
(IteratorAnonymousInnerClassHelper) relies on the order of the elements in the 
keys collection (outerInstance.backingStore.Keys.ElementAt(position)). This is 
bad for two reasons:
- it is extremely slow (there is no indexed access on 
outerInstance.backingStore.Keys in any current implementation, so ElementAt 
needs to skip "position" elements to get at the correct one)
- it relies on the assumption that removing (or adding) a key in a dictionary 
doesn't change the previous relative key order, which is incorrect in any 
current .NET implementation I am aware of (dictionaries are hash tables with 
collision resolution through chaining, and reusing of slots through a free 
list: it's just asking for trouble).
It turns out that you can use the backing store enumerator to implement the 
keys enumerator directly. The main loop simply becomes:

                        public bool MoveNext()
                        {
                                while (enumerator.MoveNext())
                                {
                                        next = enumerator.Current.Key.Target;
                                        if (next != null)
                                        {
                                                // unfold "null" special value:
                                                if (next == NULL)
                                                        next = null;
                                                return true;
                                        }
                                }
                                return false;
                        }

This works in the non-concurrent case (because we don't touch the collection 
while the enumerator is running), and in the concurrent case as well (because 
the ConcurrentDictionary<K,V> enumerator works by design and handles concurrent 
modifications without a problem).

2) calling Reap() create objects on the heap, even when there are no elements 
to be removed. Sadly, not all of these allocation can be eliminated, but you 
can delay the creation of the keysToRemove list until it's really needed:

                        List<IdentityWeakReference> keysToRemove = null;
                        foreach (IdentityWeakReference zombie in 
backingStore.Keys)
                        {
                                if (!zombie.IsAlive)
                                {
                                        // create the list of keys to remove 
only if there are keys to remove.
                                        // this reduces heap pressure
                                        if (keysToRemove == null)
                                                keysToRemove = new 
List<IdentityWeakReference>();
                                        keysToRemove.Add(zombie);
                                }
                        }

                        if (keysToRemove != null)
                                foreach (var key in keysToRemove)
                                {
                                        backingStore.Remove(key);
                                }

Note that I don't iterate the Keys collection, but use the dictionary 
enumerator. Believe it or not, but this is slightly more efficient for reasons 
I won't explain here since this e-mail is already long enough.
It's sad but inevitable that a heap object is created for the dictionary 
enumerator, because we call it through an interface (IDictionary<K,V>): it we 
had the actual object, no enumerator object would be created on the heap.

3) Equality of weak identity references can be done using only one case (using 
"as" instead of "is"), which is more efficient.

Third, the test itself uses enumerators in a nonstandard manner. The two 
witness cases are:

        IEnumerator<string> iter = map.Keys.GetEnumerator();
        Assert.IsTrue(iter.MoveNext());
        Assert.IsNull(iter.Current);
        Assert.IsFalse(iter.MoveNext());
        Assert.IsFalse(iter.MoveNext());

And

            for (IEnumerator<string> iter = map.Keys.GetEnumerator(); 
iter.MoveNext();)
            {
                //Assert.IsTrue(iter.hasNext()); // try again, should return 
same result!
                string k = iter.Current;
        ...
            }

All the other instances are variants of these witnesses.
The correct way of using IEnumerator<T> is by calling IEnumerator<T>.Dispose() 
after you're finished with the instance. Note that in Lucene itself, foreach() 
is used which does it correctly (ByteBufferIndexInput.cs):

                    foreach (ByteBufferIndexInput clone in clones.Keys)
                    {
                        clone.UnsetBuffers();
                    }

All usages of enumerators in TestWeakIdentityMap.cs must be rewritten 
accordingly. For example:

        using (IEnumerator<string> iter = map.Keys.GetEnumerator())
        {
                Assert.IsTrue(iter.MoveNext());
                Assert.IsNull(iter.Current);
                Assert.IsFalse(iter.MoveNext());
                Assert.IsFalse(iter.MoveNext());
        }
And

            foreach (object k in map.Keys)
            {
        ...
            }

In case you are wondering why this is so important: you cannot guarantee that 
future implementations of an enumerator (especially one on a concurrent 
collection) doesn't have a cleanup to do to get rid of various synchronization 
objects. Right now this isn't the case, but you never know what the future will 
bring. And besides, nice guys Dispose() after their enumeration <g>.

The test passes now. Every time.

I've made the changes to api-work in my local repository, but when I tried to 
"push" or "sync" them, I get :

        Error encountered while pushing to the remote repository: Response 
status code does not indicate success: 403 (Forbidden).

I know next to nothing about GitHub. Can I e-mail the changed files to someone?

Vincent

-----Original Message-----
From: Shad Storhaug [mailto:[email protected]] 
Sent: Monday, March 20, 2017 9:19 PM
To: Van Den Berghe, Vincent <[email protected]>
Cc: [email protected]
Subject: RE: Bug in Lucene static initialization with multiple threads.

Hi Vincent,

Thanks for reporting this. In fact, thank you for all of your assistance 
tracking down bugs.

This issue boils down to being a failed attempt to replace Lucene's 
WeakIdentityMap with a new data structure called WeakDictionary. Since there 
are already tests to verify concurrency on WeakIdentityMap and it is used in a 
couple of other places in Lucene, it would be far better to get it working 
right than to try to fix this alternative version. I guess for the time being 
your workaround should suffice (though, a fix rather than a hack would be 
preferred).

I have spent quite a bit of time on this, but the best I have been able to do 
is to get the Lucene.Net.Tests.Util.TestWeakIdentityMap.TestConcurrentHashMap() 
test to pass about 50% of the time (and I can't seem to even get it back into 
that state). 

Here are a couple of attempts I have made:

https://github.com/NightOwl888/lucenenet/commits/api-work-weak-identity-map-1 - 
using a port of the original Java backing classes
https://github.com/NightOwl888/lucenenet/commits/api-work-weak-identity-map-2 - 
using the .NET WeakIdentity class

And here is the original Java version: 
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.8.0/lucene/core/src/java/org/apache/lucene/util/WeakIdentityMap.java

The complicated part is getting it to "reap" the elements in a thread-safe way 
so the counts are right on several concurrent enumerators. Any assistance you 
could provide to make WeakIdentityMap thread-safe would be much appreciated. Do 
note that the lead branch is now at 
https://github.com/apache/lucenenet/tree/api-work, so please do any work from 
that branch.

Also note there are also currently a few other concurrency tests that are 
failing:

Lucene.Net.Tests.Index.TestIndexReaderWriter.TestDuringAddIndexes()
Lucene.Net.Tests.Search.TestControlledRealTimeReopenThread.TestControlledRealTimeReopenThread_Mem()
Lucene.Net.Tests.Search.TestControlledRealTimeReopenThread.TestCRTReopen()

I am sure that getting to the bottom of these issues will probably fix most of 
the issues you are seeing. If you have any spare time, your help would be 
appreciated on these as well.

Thanks,
Shad Storhaug (NightOwl888)

-----Original Message-----
From: Van Den Berghe, Vincent [mailto:[email protected]] 
Sent: Tuesday, February 7, 2017 6:00 PM
To: [email protected]
Subject: Bug in Lucene static initialization with multiple threads.

Hello,

Every once in a while, I get an error when using Lucene in a multithreaded 
scenario (meaning: using a single IndexWriter in multiple threads, or using a 
distinct IndexWriter in each thread: it doesn't matter).
The exception chain thrown is:

Unhandled Exception: System.ArgumentException: Could not instantiate 
implementing class for Lucene.Net.Analysis.Tokenattributes.ICharTermAttribute
---> System.ArgumentException: Could not find implementing class for 
ICharTermAttribute
--->System.InvalidOperationException: Collection was modified; enumeration 
operation  may not execute.

I could not understand what was going on, especially because it only occurred 
"sometimes". It took me a while to figure out, but I think it's a bug.

Here's the stack trace of the exception when it occurs:

                [External Code]
>             
> Lucene.Net.dll!Lucene.Net.Support.HashMap<Lucene.Net.Support.WeakDictionary<System.Type,
>  System.WeakReference>.WeakKey<System.Type>, 
> System.WeakReference>.GetEnumerator() Line 229           C#
               [External Code]
               Lucene.Net.dll!Lucene.Net.Support.WeakDictionary<System.Type, 
System.WeakReference>.Clean() Line 59           C#
               Lucene.Net.dll!Lucene.Net.Support.WeakDictionary<System.Type, 
System.WeakReference>.CleanIfNeeded() Line 71         C#
               Lucene.Net.dll!Lucene.Net.Support.WeakDictionary<System.Type, 
System.WeakReference>.Add(System.Type key, System.WeakReference value) Line 134 
          C#

Lucene.Net.dll!Lucene.Net.Util.AttributeSource.AttributeFactory.DefaultAttributeFactory.GetClassForInterface<Lucene.Net.Analysis.Tokenattributes.ICharTermAttribute>()
 Line 90  C#

Lucene.Net.dll!Lucene.Net.Util.AttributeSource.AttributeFactory.DefaultAttributeFactory.CreateAttributeInstance<Lucene.Net.Analysis.Tokenattributes.ICharTermAttribute>()
 Line 70  C#

Lucene.Net.dll!Lucene.Net.Util.AttributeSource.AddAttribute<Lucene.Net.Analysis.Tokenattributes.ICharTermAttribute>()
 Line 350                C#

Lucene.Net.dll!Lucene.Net.Documents.Field.StringTokenStream.InitializeInstanceFields()
 Line 658         C#

Lucene.Net.dll!Lucene.Net.Documents.Field.StringTokenStream.StringTokenStream() 
Line 676                C#

Lucene.Net.dll!Lucene.Net.Documents.Field.GetTokenStream(Lucene.Net.Analysis.Analyzer
 analyzer) Line 629         C#

Lucene.Net.dll!Lucene.Net.Index.DocInverterPerField.ProcessFields(Lucene.Net.Index.IndexableField[]
 fields, int count) Line 105              C#

Lucene.Net.dll!Lucene.Net.Index.DocFieldProcessor.ProcessDocument(Lucene.Net.Index.FieldInfos.Builder
 fieldInfos) Line 279          C#

Lucene.Net.dll!Lucene.Net.Index.DocumentsWriterPerThread.UpdateDocument(System.Collections.Generic.IEnumerable<Lucene.Net.Index.IndexableField>
 doc, Lucene.Net.Analysis.Analyzer analyzer, Lucene.Net.Index.Term delTerm) 
Line 287                C#

Lucene.Net.dll!Lucene.Net.Index.DocumentsWriter.UpdateDocument(System.Collections.Generic.IEnumerable<Lucene.Net.Index.IndexableField>
 doc, Lucene.Net.Analysis.Analyzer analyzer, Lucene.Net.Index.Term delTerm) 
Line 574                C#

Lucene.Net.dll!Lucene.Net.Index.IndexWriter.UpdateDocument(Lucene.Net.Index.Term
 term, System.Collections.Generic.IEnumerable<Lucene.Net.Index.IndexableField> 
doc, Lucene.Net.Analysis.Analyzer analyzer) Line 1830          C#

Lucene.Net.dll!Lucene.Net.Index.IndexWriter.AddDocument(System.Collections.Generic.IEnumerable<Lucene.Net.Index.IndexableField>
 doc, Lucene.Net.Analysis.Analyzer analyzer) Line 1455   C#

Lucene.Net.dll!Lucene.Net.Index.IndexWriter.AddDocument(System.Collections.Generic.IEnumerable<Lucene.Net.Index.IndexableField>
 doc) Line 1436   C#

... and to wit, here are the threads just rushing in to do the same:

Not Flagged                        35428    17           Worker Thread <No 
Name>                
Lucene.Net.dll!Lucene.Net.Support.WeakDictionary<System.Type, 
System.WeakReference>.Clean                Normal
Not Flagged                        35444    11           Worker Thread <No 
Name>                
Lucene.Net.dll!Lucene.Net.Support.WeakDictionary<System.Type, 
System.WeakReference>.Clean                Normal
Not Flagged                        44124    12           Worker Thread <No 
Name>                
Lucene.Net.dll!Lucene.Net.Support.WeakDictionary<System.Type, 
System.WeakReference>.Clean                Normal
Not Flagged        >             44140    13           Worker Thread <No Name>  
              Lucene.Net.dll!Lucene.Net.Support.WeakDictionary<System.Type, 
System.WeakReference>.Clean                Normal
Not Flagged                        47700    14           Worker Thread <No 
Name>                
Lucene.Net.dll!Lucene.Net.Support.WeakDictionary<System.Type, 
System.WeakReference>.Clean                Normal
Not Flagged                        28168    15           Worker Thread <No 
Name>                
Lucene.Net.dll!Lucene.Net.Support.WeakDictionary<System.Type, 
System.WeakReference>.Clean                Normal
Not Flagged                        30988    16           Worker Thread <No 
Name>                
Lucene.Net.dll!Lucene.Net.Support.WeakDictionary<System.Type, 
System.WeakReference>.Clean                Normal
Not Flagged                        21828    6              Worker Thread <No 
Name>                
Lucene.Net.dll!Lucene.Net.Support.WeakDictionary<System.Type, 
System.WeakReference>.Clean                Normal

The reason why it only reproduces "sometimes" is because of this little nugget 
of code:

        private void CleanIfNeeded()
        {
            int currentColCount = GC.CollectionCount(0);
            if (currentColCount > _gcCollections)
            {
                Clean();
                _gcCollections = currentColCount;
            }
        }

If one thread does a Clean() operation in the middle of another Clean() 
operation on the same collection that replaces the object being enumerated on, 
you get the exception. Always.
To avoid the intermittence, create a bunch of threads like this and eliminate 
the test "if (currentColCount > _gcCollections)" so that the Clean() code is 
always executed. You'll get the exception every time.

I will not post the correction, but there's a simple workaround: just make sure 
the static initializers are performed in a single thread.
I.e. before creating your threads, do something like this:

new global::Lucene.Net.Documents.TextField("dummy", "dummyvalue", 
global::Lucene.Net.Documents.Field.Store.NO).GetTokenStream(new (some Analyzer 
object));

Replace "some Analyzer object" with an instance of an Analyzer object, it 
doesn't matter which one. It's meaningless, but it has the side effect of 
initializing the static fields without problems.

Vincent

using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Runtime.CompilerServices;
using System.Collections;

namespace Lucene.Net.Util
{
        /*
         * Licensed to the Apache Software Foundation (ASF) under one or more
         * contributor license agreements.  See the NOTICE file distributed with
         * this work for additional information regarding copyright ownership.
         * The ASF licenses this file to You under the Apache License, Version 
2.0
         * (the "License"); you may not use this file except in compliance with
         * the License.  You may obtain a copy of the License at
         *
         *     http://www.apache.org/licenses/LICENSE-2.0
         *
         * Unless required by applicable law or agreed to in writing, software
         * distributed under the License is distributed on an "AS IS" BASIS,
         * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
         * See the License for the specific language governing permissions and
         * limitations under the License.
         */

        /// <summary>
        /// Implements a combination of <seealso cref="java.util.WeakHashMap"/> 
and
        /// <seealso cref="java.util.IdentityHashMap"/>.
        /// Useful for caches that need to key off of a {@code ==} comparison
        /// instead of a {@code .equals}.
        ///
        /// <p>this class is not a general-purpose <seealso 
cref="java.util.Map"/>
        /// implementation! It intentionally violates
        /// Map's general contract, which mandates the use of the equals method
        /// when comparing objects. this class is designed for use only in the
        /// rare cases wherein reference-equality semantics are required.
        ///
        /// <p>this implementation was forked from <a 
href="http://cxf.apache.org/";>Apache CXF</a>
        /// but modified to <b>not</b> implement the <seealso 
cref="java.util.Map"/> interface and
        /// without any set views on it, as those are error-prone and 
inefficient,
        /// if not implemented carefully. The map only contains <seealso 
cref="Iterator"/> implementations
        /// on the values and not-GCed keys. Lucene's implementation also 
supports {@code null}
        /// keys, but those are never weak!
        ///
        /// <p><a name="reapInfo" />The map supports two modes of operation:
        /// <ul>
        ///  <li>{@code reapOnRead = true}: this behaves identical to a 
<seealso cref="java.util.WeakHashMap"/>
        ///  where it also cleans up the reference queue on every read 
operation (<seealso cref="#get(Object)"/>,
        ///  <seealso cref="#containsKey(Object)"/>, <seealso cref="#size()"/>, 
<seealso cref="#valueIterator()"/>), freeing map entries
        ///  of already GCed keys.</li>
        ///  <li>{@code reapOnRead = false}: this mode does not call <seealso 
cref="#reap()"/> on every read
        ///  operation. In this case, the reference queue is only cleaned up on 
write operations
        ///  (like <seealso cref="#put(Object, Object)"/>). this is ideal for 
maps with few entries where
        ///  the keys are unlikely be garbage collected, but there are lots of 
<seealso cref="#get(Object)"/>
        ///  operations. The code can still call <seealso cref="#reap()"/> to 
manually clean up the queue without
        ///  doing a write operation.</li>
        /// </ul>
        ///
        /// @lucene.internal
        /// </summary>
        public sealed class WeakIdentityMap<TKey, TValue>
                 where TKey : class
        {
                private readonly IDictionary<IdentityWeakReference, TValue> 
backingStore;

                private readonly bool reapOnRead;

                /// <summary>
                /// Creates a new {@code WeakIdentityMap} based on a 
non-synchronized <seealso cref="HashMap"/>.
                /// The map <a href="#reapInfo">cleans up the reference queue 
on every read operation</a>.
                /// </summary>
                public static WeakIdentityMap<TKey, TValue> NewHashMap()
                {
                        return NewHashMap(false);
                }

                /// <summary>
                /// Creates a new {@code WeakIdentityMap} based on a 
non-synchronized <seealso cref="HashMap"/>. </summary>
                /// <param name="reapOnRead"> controls if the map <a 
href="#reapInfo">cleans up the reference queue on every read operation</a>. 
</param>
                public static WeakIdentityMap<TKey, TValue> NewHashMap(bool 
reapOnRead)
                {
                        return new WeakIdentityMap<TKey, TValue>(new 
Dictionary<IdentityWeakReference, TValue>(), reapOnRead);
                }

                /// <summary>
                /// Creates a new {@code WeakIdentityMap} based on a <seealso 
cref="ConcurrentHashMap"/>.
                /// The map <a href="#reapInfo">cleans up the reference queue 
on every read operation</a>.
                /// </summary>
                public static WeakIdentityMap<TKey, TValue> 
NewConcurrentHashMap()
                {
                        return NewConcurrentHashMap(true);
                }

                /// <summary>
                /// Creates a new {@code WeakIdentityMap} based on a <seealso 
cref="ConcurrentHashMap"/>. </summary>
                /// <param name="reapOnRead"> controls if the map <a 
href="#reapInfo">cleans up the reference queue on every read operation</a>. 
</param>
                public static WeakIdentityMap<TKey, TValue> 
NewConcurrentHashMap(bool reapOnRead)
                {
                        return new WeakIdentityMap<TKey, TValue>(new 
ConcurrentDictionary<IdentityWeakReference, TValue>(), reapOnRead);
                }

                /// <summary>
                /// Private only constructor, to create use the static factory 
methods. </summary>
                private WeakIdentityMap(IDictionary<IdentityWeakReference, 
TValue> backingStore, bool reapOnRead)
                {
                        this.backingStore = backingStore;
                        this.reapOnRead = reapOnRead;
                }

                /// <summary>
                /// Removes all of the mappings from this map. </summary>
                public void Clear()
                {
                        backingStore.Clear();
                        Reap();
                }

                /// <summary>
                /// Returns {@code true} if this map contains a mapping for the 
specified key. </summary>
                public bool ContainsKey(object key)
                {
                        if (reapOnRead)
                        {
                                Reap();
                        }
                        return backingStore.ContainsKey(new 
IdentityWeakReference(key));
                }

                /// <summary>
                /// Returns the value to which the specified key is mapped. 
</summary>
                public TValue Get(object key)
                {
                        if (reapOnRead)
                        {
                                Reap();
                        }

                        TValue val;
                        if (backingStore.TryGetValue(new 
IdentityWeakReference(key), out val))
                        {
                                return val;
                        }
                        else
                        {
                                return default(TValue);
                        }
                }

                /// <summary>
                /// Associates the specified value with the specified key in 
this map.
                /// If the map previously contained a mapping for this key, the 
old value
                /// is replaced.
                /// </summary>
                public TValue Put(TKey key, TValue value)
                {
                        Reap();
                        return backingStore[new IdentityWeakReference(key)] = 
value;
                }

                public IEnumerable<TKey> Keys
                {
                        get
                        {
                                return new KeyWrapper(this);
                        }
                }

                /// <summary>
                /// LUCENENET specific class to allow the 
                /// GetEnumerator() method to be overridden
                /// for the keys so we can return an enumerator
                /// that is smart enough to clean up the dead keys
                /// and also so that MoveNext() returns false in the
                /// event there are no more values left (instead of returning
                /// a null value in an extra enumeration).
                /// </summary>
                private class KeyWrapper : IEnumerable<TKey>
                {
                        private readonly WeakIdentityMap<TKey, TValue> 
outerInstance;
                        public KeyWrapper(WeakIdentityMap<TKey, TValue> 
outerInstance)
                        {
                                this.outerInstance = outerInstance;
                        }
                        public IEnumerator<TKey> GetEnumerator()
                        {
                                outerInstance.Reap();
                                return new 
IteratorAnonymousInnerClassHelper(outerInstance);
                        }

                        IEnumerator IEnumerable.GetEnumerator()
                        {
                                return GetEnumerator();
                        }
                }

                public IEnumerable<TValue> Values
                {
                        get
                        {
                                if (reapOnRead) Reap();
                                return backingStore.Values;
                        }
                }

                /// <summary>
                /// Returns {@code true} if this map contains no key-value 
mappings. </summary>
                public bool IsEmpty
                {
                        get
                        {
                                return Count == 0;
                        }
                }

                /// <summary>
                /// Removes the mapping for a key from this weak hash map if it 
is present.
                /// Returns the value to which this map previously associated 
the key,
                /// or {@code null} if the map contained no mapping for the key.
                /// A return value of {@code null} does not necessarily 
indicate that
                /// the map contained.
                /// </summary>
                public bool Remove(object key)
                {
                        Reap();
                        return backingStore.Remove(new 
IdentityWeakReference(key));
                }

                /// <summary>
                /// Returns the number of key-value mappings in this map. this 
result is a snapshot,
                /// and may not reflect unprocessed entries that will be 
removed before next
                /// attempted access because they are no longer referenced.
                /// NOTE: This was size() in Lucene.
                /// </summary>
                public int Count
                {
                        get
                        {
                                if (backingStore.Count == 0)
                                {
                                        return 0;
                                }
                                if (reapOnRead)
                                {
                                        Reap();
                                }
                                return backingStore.Count;
                        }
                }

                private class IteratorAnonymousInnerClassHelper : 
IEnumerator<TKey>
                {
                        private readonly WeakIdentityMap<TKey, TValue> 
outerInstance;
                        private readonly 
IEnumerator<KeyValuePair<IdentityWeakReference, TValue>> enumerator;

                        public 
IteratorAnonymousInnerClassHelper(WeakIdentityMap<TKey, TValue> outerInstance)
                        {
                                this.outerInstance = outerInstance;
                                enumerator = 
outerInstance.backingStore.GetEnumerator();
                        }

                        // holds strong reference to next element in backing 
iterator:
                        private object next = null;

                        public TKey Current
                        {
                                get
                                {
                                        return (TKey)next;
                                }
                        }

                        object IEnumerator.Current
                        {
                                get
                                {
                                        return Current;
                                }
                        }


                        public void Dispose()
                        {
                                enumerator.Dispose();
                        }


                        public bool MoveNext()
                        {
                                while (enumerator.MoveNext())
                                {
                                        next = enumerator.Current.Key.Target;
                                        if (next != null)
                                        {
                                                // unfold "null" special value:
                                                if (next == NULL)
                                                        next = null;
                                                return true;
                                        }
                                }
                                return false;
                        }

                        public void Reset()
                        {
                                enumerator.Reset();
                        }
                }

                /// <summary>
                /// Returns an iterator over all values of this map.
                /// this iterator may return values whose key is already
                /// garbage collected while iterator is consumed,
                /// especially if {@code reapOnRead} is {@code false}.
                /// <para/>
                /// NOTE: This was valueIterator() in Lucene.
                /// </summary>
                public IEnumerator<TValue> GetValueEnumerator()
                {
                        if (reapOnRead)
                        {
                                Reap();
                        }
                        return backingStore.Values.GetEnumerator();
                }

                /// <summary>
                /// this method manually cleans up the reference queue to 
remove all garbage
                /// collected key/value pairs from the map. Calling this method 
is not needed
                /// if {@code reapOnRead = true}. Otherwise it might be a good 
idea
                /// to call this method when there is spare time (e.g. from a 
background thread). </summary>
                /// <seealso cref= <a href="#reapInfo">Information about the 
<code>reapOnRead</code> setting</a> </seealso>                     
                public void Reap()
                {
                        List<IdentityWeakReference> keysToRemove = null;
                        foreach (var item in backingStore)
                        {
                                if (!item.Key.IsAlive)
                                {
                                        // create the list of keys to remove 
only if there are keys to remove.
                                        // this reduces heap pressure
                                        if (keysToRemove == null)
                                                keysToRemove = new 
List<IdentityWeakReference>();
                                        keysToRemove.Add(item.Key);
                                }
                        }

                        if (keysToRemove != null)
                                foreach (var key in keysToRemove)
                                {
                                        backingStore.Remove(key);
                                }
                }

                // we keep a hard reference to our NULL key, so map supports 
null keys that never get GCed:
                internal static readonly object NULL = new object();

                private sealed class IdentityWeakReference : WeakReference
                {
                        private readonly int hash;

                        internal IdentityWeakReference(object obj/*, 
ReferenceQueue<object> queue*/)
                                : base(obj == null ? NULL : obj/*, queue*/)
                        {
                                hash = RuntimeHelpers.GetHashCode(obj);
                        }

                        public override int GetHashCode()
                        {
                                return hash;
                        }

                        public override bool Equals(object o)
                        {
                                if (this == o)
                                {
                                        return true;
                                }
                                IdentityWeakReference @ref = o as 
IdentityWeakReference;
                                if (@ref != null)
                                {
                                        if (this.Target == @ref.Target)
                                        {
                                                return true;
                                        }
                                }
                                return false;
                        }
                }
        }
}

using Lucene.Net.Randomized.Generators;
using Lucene.Net.Support;
using NUnit.Framework;
using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Globalization;
using System.Threading;

namespace Lucene.Net.Util
{
    /*
     * Licensed to the Apache Software Foundation (ASF) under one or more
     * contributor license agreements.  See the NOTICE file distributed with
     * this work for additional information regarding copyright ownership.
     * The ASF licenses this file to You under the Apache License, Version 2.0
     * (the "License"); you may not use this file except in compliance with
     * the License.  You may obtain a copy of the License at
     *
     *     http://www.apache.org/licenses/LICENSE-2.0
     *
     * Unless required by applicable law or agreed to in writing, software
     * distributed under the License is distributed on an "AS IS" BASIS,
     * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
     * See the License for the specific language governing permissions and
     * limitations under the License.
     */

    [TestFixture]
    public class TestWeakIdentityMap : LuceneTestCase
    {

        [Test]
        public virtual void TestSimpleHashMap()
        {
            WeakIdentityMap<string, string> map = WeakIdentityMap<string, 
string>.NewHashMap(Random().NextBoolean());
            // we keep strong references to the keys,
            // so WeakIdentityMap will not forget about them:
            string key1 = "foo";
            string key2 = "test1 foo".Split(' ')[1];
            string key3 = "test2 foo".Split(' ')[1];

            // LUCENENET NOTE: As per http://stackoverflow.com/a/543329/181087,
            // the above hack is required in order to ensure the AreNotSame
            // check will work. If you assign the same string to 3 different 
variables
            // without doing some kind of manipulation from the original 
string, the
            // AreNotSame test will fail because the references will be the 
same.

            Assert.AreNotSame(key1, key2);
            Assert.AreEqual(key1, key2);
            Assert.AreNotSame(key1, key3);
            Assert.AreEqual(key1, key3);
            Assert.AreNotSame(key2, key3);
            Assert.AreEqual(key2, key3);

            // try null key & check its iterator also return null:
            map.Put(null, "null");
            {
                                using (IEnumerator<string> iter = 
map.Keys.GetEnumerator())
                                {
                                        Assert.IsTrue(iter.MoveNext());
                                        Assert.IsNull(iter.Current);
                                        Assert.IsFalse(iter.MoveNext());
                                        Assert.IsFalse(iter.MoveNext());
                                }
            }
            // 2 more keys:
            map.Put(key1, "bar1");
            map.Put(key2, "bar2");

            Assert.AreEqual(3, map.Count);

            Assert.AreEqual("bar1", map.Get(key1));
            Assert.AreEqual("bar2", map.Get(key2));
            Assert.AreEqual(null, map.Get(key3));
            Assert.AreEqual("null", map.Get(null));

            Assert.IsTrue(map.ContainsKey(key1));
            Assert.IsTrue(map.ContainsKey(key2));
            Assert.IsFalse(map.ContainsKey(key3));
            Assert.IsTrue(map.ContainsKey(null));

            // repeat and check that we have no double entries
            map.Put(key1, "bar1");
            map.Put(key2, "bar2");
            map.Put(null, "null");

            Assert.AreEqual(3, map.Count);

            Assert.AreEqual("bar1", map.Get(key1));
            Assert.AreEqual("bar2", map.Get(key2));
            Assert.AreEqual(null, map.Get(key3));
            Assert.AreEqual("null", map.Get(null));

            Assert.IsTrue(map.ContainsKey(key1));
            Assert.IsTrue(map.ContainsKey(key2));
            Assert.IsFalse(map.ContainsKey(key3));
            Assert.IsTrue(map.ContainsKey(null));

            map.Remove(null);
            Assert.AreEqual(2, map.Count);
            map.Remove(key1);
            Assert.AreEqual(1, map.Count);
            map.Put(key1, "bar1");
            map.Put(key2, "bar2");
            map.Put(key3, "bar3");
            Assert.AreEqual(3, map.Count);

            int c = 0, keysAssigned = 0;
            foreach (object k in map.Keys)
            {
                //Assert.IsTrue(iter.hasNext()); // try again, should return 
same result!
                // LUCENENET NOTE: Need object.ReferenceEquals here because the 
== operator does more than check reference equality
                Assert.IsTrue(object.ReferenceEquals(k, key1) || 
object.ReferenceEquals(k, key2) | object.ReferenceEquals(k, key3));
                keysAssigned += object.ReferenceEquals(k, key1) ? 1 : 
(object.ReferenceEquals(k, key2) ? 2 : 4);
                c++;
            }
            Assert.AreEqual(3, c);
            Assert.AreEqual(1 + 2 + 4, keysAssigned, "all keys must have been 
seen");

            c = 0;
            for (IEnumerator<string> iter = map.Values.GetEnumerator(); 
iter.MoveNext();)
            {
                string v = iter.Current;
                Assert.IsTrue(v.StartsWith("bar", StringComparison.Ordinal));
                c++;
            }
            Assert.AreEqual(3, c);

            // clear strong refs
            key1 = key2 = key3 = null;

            // check that GC does not cause problems in reap() method, wait 1 
second and let GC work:
            int size = map.Count;
            for (int i = 0; size > 0 && i < 10; i++)
            {
#if !NETSTANDARD
                try
                {
#endif
                    GC.Collect();
                    int newSize = map.Count;
                    Assert.IsTrue(size >= newSize, "previousSize(" + size + 
")>=newSize(" + newSize + ")");
                    size = newSize;
                    Thread.Sleep(TimeSpan.FromSeconds(1));
                    c = 0;
                    foreach (object k in map.Keys)
                    {
                        Assert.IsNotNull(k);
                        c++;
                    }
                    newSize = map.Count;
                    Assert.IsTrue(size >= c, "previousSize(" + size + 
")>=iteratorSize(" + c + ")");
                    Assert.IsTrue(c >= newSize, "iteratorSize(" + c + 
")>=newSize(" + newSize + ")");
                    size = newSize;
#if !NETSTANDARD
                }
#pragma warning disable 168
                catch (ThreadInterruptedException ie)
#pragma warning restore 168
                {
                }
#endif
            }

            map.Clear();
            Assert.AreEqual(0, map.Count);
            Assert.IsTrue(map.IsEmpty);

                        using (IEnumerator<string> it = 
map.Keys.GetEnumerator())
                                Assert.IsFalse(it.MoveNext());
            /*try
            {
              it.Next();
              Assert.Fail("Should throw NoSuchElementException");
            }
            catch (NoSuchElementException nse)
            {
            }*/

            // LUCENENET NOTE: As per http://stackoverflow.com/a/543329/181087,
            // the following hack is required in order to ensure the string 
references
            // are different. If you assign the same string to 2 different 
variables
            // without doing some kind of manipulation from the original 
string, the
            // references will be the same.

            key1 = "test3 foo".Split(' ')[1];
            key2 = "test4 foo".Split(' ')[1];
            map.Put(key1, "bar1");
            map.Put(key2, "bar2");
            Assert.AreEqual(2, map.Count);

            map.Clear();
            Assert.AreEqual(0, map.Count);
            Assert.IsTrue(map.IsEmpty);
        }

        [Test]
        public virtual void TestConcurrentHashMap()
        {
            // don't make threadCount and keyCount random, otherwise easily 
OOMs or fails otherwise:
            const int threadCount = 8, keyCount = 1024;

            RunnableAnonymousInnerClassHelper[] workers = new 
RunnableAnonymousInnerClassHelper[threadCount];
            WeakIdentityMap<object, int?> map = WeakIdentityMap<object, 
int?>.NewConcurrentHashMap(Random().NextBoolean());
            // we keep strong references to the keys,
            // so WeakIdentityMap will not forget about them:
            AtomicReferenceArray<object> keys = new 
AtomicReferenceArray<object>(keyCount);
            for (int j = 0; j < keyCount; j++)
            {
                keys[j] = new object();
            }

            try
            {
                for (int t = 0; t < threadCount; t++)
                {
                    Random rnd = new Random(Random().Next());
                    var worker = new RunnableAnonymousInnerClassHelper(this, 
keyCount, map, keys, rnd);
                    workers[t] = worker;
                    worker.Start();
                }
            }
            finally
            {
                foreach (var w in workers)
                {
                    w.Join();
                }
            }

            // LUCENENET: Since assertions were done on the other threads, we 
need to check the
            // results here.
            for (int i = 0; i < workers.Length; i++)
            {
                assertTrue(string.Format(CultureInfo.InvariantCulture,
                    "worker thread {0} of {1} failed \n" + workers[i].Error, i, 
workers.Length),
                    workers[i].Error == null);
            }


            // clear strong refs
            for (int j = 0; j < keyCount; j++)
            {
                keys[j] = null;
            }

            // check that GC does not cause problems in reap() method:
            int size = map.Count;
            for (int i = 0; size > 0 && i < 10; i++)
            {
#if !NETSTANDARD
                try
                {
#endif
                    GC.Collect();
                    int newSize = map.Count;
                    Assert.IsTrue(size >= newSize, "previousSize(" + size + 
")>=newSize(" + newSize + ")");
                    size = newSize;
                    Thread.Sleep(new TimeSpan(100L));
                    int c = 0;
                    foreach (object k in map.Keys)
                    {
                        Assert.IsNotNull(k);
                        c++;
                    }
                    newSize = map.Count;
                    Assert.IsTrue(size >= c, "previousSize(" + size + 
")>=iteratorSize(" + c + ")");
                    Assert.IsTrue(c >= newSize, "iteratorSize(" + c + 
")>=newSize(" + newSize + ")");
                    size = newSize;
#if !NETSTANDARD
                }
#pragma warning disable 168
                catch (ThreadInterruptedException ie)
#pragma warning restore 168
                {
                }
#endif
            }
        }

        private class RunnableAnonymousInnerClassHelper : ThreadClass
        {
            private readonly TestWeakIdentityMap outerInstance;

            private readonly int keyCount;
            private readonly WeakIdentityMap<object, int?> map;
            private AtomicReferenceArray<object> keys;
            private readonly Random rnd;
            private volatile Exception error;

            public RunnableAnonymousInnerClassHelper(TestWeakIdentityMap 
outerInstance, int keyCount, WeakIdentityMap<object, int?> map, 
AtomicReferenceArray<object> keys, Random rnd)
            {
                this.outerInstance = outerInstance;
                this.keyCount = keyCount;
                this.map = map;
                this.keys = keys;
                this.rnd = rnd;
            }

            public Exception Error
            {
                get { return error; }
            }


            public override void Run()
            {
                int count = AtLeast(rnd, 10000);
                try
                {
                    for (int i = 0; i < count; i++)
                    {
                        int j = rnd.Next(keyCount);
                        switch (rnd.Next(5))
                        {
                            case 0:
                                map.Put(keys[j], Convert.ToInt32(j));
                                break;
                            case 1:
                                int? v = map.Get(keys[j]);
                                if (v != null)
                                {
                                    Assert.AreEqual(j, (int)v);
                                }
                                break;
                            case 2:
                                map.Remove(keys[j]);
                                break;
                            case 3:
                                // renew key, the old one will be GCed at some 
time:
                                keys[j] = new object();
                                break;
                            case 4:
                                // check iterator still working
                                foreach (object k in map.Keys)
                                {
                                    Assert.IsNotNull(k);
                                }
                                break;
                            default:
                                Assert.Fail("Should not get here.");
                                break;
                        }
                    }
                }
                catch (Exception e)
                {
                    e.printStackTrace();
                    this.error = e;
                }
            }
        }
    }
}

RE: Bug in Lucene static initialization with multiple threads.

Reply via email to