Hello Shad,
I have a theory about TestCRTReopen: if you look at the java code
(https://github.com/apache/lucene-solr/blob/releases/lucene-solr/4.8.0/lucene/core/src/java/org/apache/lucene/search/ControlledRealTimeReopenThread.java),
you see there's a relation between the reentrant lock and its condition
variable:
ReentrantLock reopenLock = new ReentrantLock();
Condition reopenCond = reopenLock.newCondition();
Maybe there's some subtlety in there that we miss. The lock is used only as a
guard around the reopen condition, which maybe is how they rule in the Java
Shire, but no such concepts exist as such in C#.
The closest thing to a "real" ReentrantLock implementation I have ever seen in
.NET (complete with condition variables, fair locking, and so on) is
https://github.com/spring-projects/spring-net-threading/blob/master/src/Spring/Spring.Threading/Threading/Locks/ReentrantLock.cs
But that's a gorilla. All we really want is a banana, without the gorilla
attached to it.
So that got me thinking: we know what ControlledRealTImeReopenThread does. Why
don't we implement it in "pure" C# instead of trying to translate it from Java
using synchronization primitives that are almost but not quite totally unlike
those in .NET?
You can find the result in the file attached. I restrained myself and didn't
replace Environment.TickCount with Stopwatch (which would be more correct and
guard against overflows that occur in TickCount every 24.9 days).
In a fit of altruism and insight, I even let all the related unit tests run,
and they all pass. And finish in time!
But that's in my alternate universe, of course <g>.
Vincent
-----Original Message-----
From: Shad Storhaug [mailto:[email protected]]
Sent: Friday, March 24, 2017 3:49 AM
To: Van Den Berghe, Vincent <[email protected]>
Cc: [email protected]
Subject: RE: TestSearcherManager_Mem
Vincent,
FYI - TestSearcherManager_Mem() succeeds much more frequently, but still
randomly fails.
Also, although I was able to make the error message change for TestCRTReopen(),
I didn't manage to get it working. The problem is pretty obvious - the
WaitForGeneration() method
(https://github.com/apache/lucenenet/blob/api-work/src/Lucene.Net.Tests/Search/TestControlledRealTimeReopenThread.cs#L680)
is WAY too slow. Even if I increase the wait period from 20 to 60 seconds it
still doesn't finish in time. I played with a few of the variables in
ControlledRealTimeReopenThread, but couldn't get the behavior to change. I
verified that PulseAll() gets called, but it doesn't seem like it is having any
effect on the Wait().
For TestEarlyTerminationDifferentSorter(), I reviewed all of the code under
test in the Misc project line by line, but there was nothing significant to
fix. So, still broken (sometimes).
Thanks,
Shad Storhaug (NightOwl888)
-----Original Message-----
From: Shad Storhaug [mailto:[email protected]]
Sent: Thursday, March 23, 2017 8:25 PM
To: Van Den Berghe, Vincent; [email protected]
Cc: [email protected]
Subject: RE: TestSearcherManager_Mem
Parallel universe or not, I think you are making progress. I found a similar
IncrementAndGet issue in the ThreadedIndexingAndSearchingTestCase that I have
already corrected. However, it only mattered in one case, in all other cases
the result of IncrementAndGet was not being utilized.
It is like someone intentionally wanted to make it impossible to get all of the
bugs out of this code...
Anyway, stupid is as stupid does...I went through and scanned the entire
codebase for IncrementAndGet and compared it against Lucene. Sure enough, the
Core.Util.RefCount class was refactored from its original. I changed it back to
the original code (backed by an AtomicInteger/AtomicInt32), and the
TestCRTReopen() test no longer fails almost immediately - after a couple of
minutes it now fails with the message "waited too long for commit generation".
I don't know if I fixed it or broke it more, but it is definitely behaving
differently now.
Now, let me see if I can bring your other changes into my universe...perhaps
the new failure has something to do with the reset event.
Thanks,
Shad Storhaug (NightOwl888)
From: Van Den Berghe, Vincent [mailto:[email protected]]
Sent: Thursday, March 23, 2017 7:04 PM
To: [email protected]
Cc: Shad Storhaug
Subject: TestSearcherManager_Mem
Even though I seem to live in a parallel universe where 42 isn't 42 and 4.8
isn't 4.8, I'll have a stab at resolving TestSearcherManager_Mem.
First, there is a method in TrackingIndexWriter:
public virtual long GetAndIncrementGeneration()
{
return indexingGen.IncrementAndGet();
}
The implementation calls the wrong indexGen method: it should call
GetAndIncrement(), which doesn't exist in the .NET version. You can add the
method to the AtomicLong class.
Too bad there's no Interlocked.PostIncrement, but it's easy enough:
public long GetAndIncrement()
{
return Interlocked.Increment(ref value) - 1;
}
And adjust the call accordingly:
public virtual long GetAndIncrementGeneration()
{
return indexingGen.GetAndIncrement();
}
Next we turn our attention to ControlledRealTimeReopenThread<T>.
There's an event defined as follows:
private ManualResetEvent reopenCond = new ManualResetEvent(false);
This is not correct, since the remainder of the implementation only Sets and
Waits, but never resets. Once the event Set, the wait will never ... uh... wait
the second time around. Change this as follows:
private AutoResetEvent reopenCond = new AutoResetEvent(false);
Next, for some reason, time is counted in nanoseconds, but since
Environment.TickCount is in milliseconds, we need to convert it by multiplying
by 1000000.
Unfortunately, this is done by multiplication:
Environment.TickCount * 1000000
Since Environment.TickCount is an int and 1000000 is an int, the result is
negative unless you just rebooted your computer in a Tardis doing a polka
backwards.
Define:
private const long MS_IN_NS = 1000000;
... and change all other references to 1000000 except one (see below) with
MS_IN_NS: this should solve the overflow problem using C#'s promotion rules.
Next, notice that 64-bit integers are sometimes read outside locks:
searchingGen = refreshStartGen;
if (targetGen > searchingGen)
while (targetGen > searchingGen)
This isn't guaranteed to be atomic, and I'm a curmudgeon when it comes to
parallelism and atomicity. Change all these lines by:
Interlocked.Exchange(ref searchingGen, refreshStartGen);
if (targetGen > Interlocked.Read(ref searchingGen))
while (targetGen > Interlocked.Read(ref searchingGen))
In my own spacetime continuum, the test now passes.
Added bonus points: dispose of the waitable event in
ControlledRealTimeReopenThread<T> Dispose method by adding:
reopenCond.Dispose();
(after the Monitor.PulseAll(this); statement)
Extreme added bonus points: the following statement is incorrect, but it works
anyway:
reopenCond.WaitOne(new TimeSpan(sleepNS / 1000000));//Convert
NS to Ticks
(the 1000000 should not be replaced by MS_IN_NS in the new version) The reason
why it's incorrect is because the argument to TimeSpan in the call accepts
100-nanoseconds units, and dividing nanoseconds by 1000000 yields milliseconds
instead. So you will wait a delay of a factor 10000 shorter. It turns out that
the correction (divide by 100) will cause timeouts in the tests, so I left it
as-is.
But all of this might be wrong, I may not even exists.
Vincent
using Lucene.Net.Support;
using System;
using System.Threading;
namespace Lucene.Net.Search
{
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version
2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
using TrackingIndexWriter = Lucene.Net.Index.TrackingIndexWriter;
/// <summary>
/// Utility class that runs a thread to manage periodicc
/// reopens of a <seealso cref="ReferenceManager"/>, with methods to
wait for a specific
/// index changes to become visible. To use this class you
/// must first wrap your <seealso cref="Index.IndexWriter"/> with a
{@link
/// TrackingIndexWriter} and always use it to make changes
/// to the index, saving the returned generation. Then,
/// when a given search request needs to see a specific
/// index change, call the {#waitForGeneration} to wait for
/// that change to be visible. Note that this will only
/// scale well if most searches do not need to wait for a
/// specific index generation.
///
/// @lucene.experimental
/// </summary>
public class ControlledRealTimeReopenThread<T> : ThreadClass,
IDisposable
where T : class
{
private readonly ReferenceManager<T> manager;
private readonly long targetMaxStaleNS;
private readonly long targetMinStaleNS;
private readonly TrackingIndexWriter writer;
private volatile bool finish;
private long waitingGen;
private long searchingGen;
private long refreshStartGen;
private EventWaitHandle reopenCond = new AutoResetEvent(false);
private EventWaitHandle available = new AutoResetEvent(false);
private const long MILLISECONDS_PER_NANOSECOND = 1000000;
/// <summary>
/// Create ControlledRealTimeReopenThread, to periodically
/// reopen the a <seealso cref="ReferenceManager"/>.
/// </summary>
/// <param name="targetMaxStaleSec"> Maximum time until a new
/// reader must be opened; this sets the upper bound
/// on how slowly reopens may occur, when no
/// caller is waiting for a specific generation to
/// become visible.
/// </param>
/// <param name="targetMinStaleSec"> Mininum time until a new
/// reader can be opened; this sets the lower bound
/// on how quickly reopens may occur, when a caller
/// is waiting for a specific generation to
/// become visible. </param>
public ControlledRealTimeReopenThread(TrackingIndexWriter
writer, ReferenceManager<T> manager, double targetMaxStaleSec, double
targetMinStaleSec)
{
if (targetMaxStaleSec < targetMinStaleSec)
{
throw new
System.ArgumentException("targetMaxScaleSec (= " + targetMaxStaleSec + ") <
targetMinStaleSec (=" + targetMinStaleSec + ")");
}
this.writer = writer;
this.manager = manager;
this.targetMaxStaleNS = (long)(1000000000 *
targetMaxStaleSec);
this.targetMinStaleNS = (long)(1000000000 *
targetMinStaleSec);
manager.AddListener(new HandleRefresh(this));
}
private class HandleRefresh : ReferenceManager.IRefreshListener
{
private readonly ControlledRealTimeReopenThread<T>
outerInstance;
public HandleRefresh(ControlledRealTimeReopenThread<T>
outerInstance)
{
this.outerInstance = outerInstance;
}
public virtual void BeforeRefresh()
{
}
public virtual void AfterRefresh(bool didRefresh)
{
outerInstance.RefreshDone();
}
}
private void RefreshDone()
{
lock (this)
{
// if we're finishing, , make it out so that
all waiting search threads will return
searchingGen = finish ? long.MaxValue:
refreshStartGen;
available.Set();
}
reopenCond.Reset();
}
public void Dispose() // LUCENENET TODO: Implement disposable
pattern
{
finish = true;
reopenCond.Set();
#if !NETSTANDARD
try
{
#endif
Join();
#if !NETSTANDARD
}
catch (ThreadInterruptedException ie)
{
throw new
ThreadInterruptedException(ie.ToString(), ie);
}
#endif
// LUCENENET specific: dispose reset event
reopenCond.Dispose();
available.Dispose();
}
/// <summary>
/// Waits for the target generation to become visible in
/// the searcher.
/// If the current searcher is older than the
/// target generation, this method will block
/// until the searcher is reopened, by another via
/// <seealso cref="ReferenceManager#maybeRefresh"/> or until
the <seealso cref="ReferenceManager"/> is closed.
/// </summary>
/// <param name="targetGen"> the generation to wait for </param>
public virtual void WaitForGeneration(long targetGen)
{
WaitForGeneration(targetGen, -1);
}
/// <summary>
/// Waits for the target generation to become visible in
/// the searcher, up to a maximum specified milli-seconds.
/// If the current searcher is older than the target
/// generation, this method will block until the
/// searcher has been reopened by another thread via
/// <seealso cref="ReferenceManager#maybeRefresh"/>, the given
waiting time has elapsed, or until
/// the <seealso cref="ReferenceManager"/> is closed.
/// <p>
/// NOTE: if the waiting time elapses before the requested
target generation is
/// available the current <seealso cref="SearcherManager"/> is
returned instead.
/// </summary>
/// <param name="targetGen">
/// the generation to wait for </param>
/// <param name="maxMS">
/// maximum milliseconds to wait, or -1 to wait
indefinitely </param>
/// <returns> true if the targetGeneration is now available,
/// or false if maxMS wait time was exceeded </returns>
public virtual bool WaitForGeneration(long targetGen, int maxMS)
{
long curGen = writer.Generation;
if (targetGen > curGen)
{
throw new System.ArgumentException("targetGen="
+ targetGen + " was never returned by the ReferenceManager instance (current
gen=" + curGen + ")");
}
lock (this)
if (targetGen <= searchingGen)
return true;
else
{
waitingGen = Math.Max(waitingGen,
targetGen);
reopenCond.Set();
available.Reset();
}
long startMS =
Environment.TickCount;//System.nanoTime() / 1000000;
// LUCENENET specific - reading searchingGen not thread
safe, so use Interlocked.Read()
while (targetGen > Interlocked.Read(ref searchingGen))
{
if (maxMS < 0)
{
available.WaitOne();
}
else
{
long msLeft = (startMS + maxMS) -
Environment.TickCount;//(System.nanoTime()) / 1000000;
if (msLeft <= 0)
{
return false;
}
else
{
available.WaitOne(TimeSpan.FromMilliseconds(msLeft));
}
}
}
return true;
}
public override void Run()
{
// TODO: maybe use private thread ticktock timer, in
// case clock shift messes up nanoTime?
long lastReopenStartNS = Environment.TickCount *
MILLISECONDS_PER_NANOSECOND;
//System.out.println("reopen: start");
while (!finish)
{
bool hasWaiting;
lock (this)
hasWaiting = waitingGen > searchingGen;
long nextReopenStartNS = lastReopenStartNS +
(hasWaiting ? targetMinStaleNS : targetMaxStaleNS);
long sleepNS = nextReopenStartNS -
(Environment.TickCount * MILLISECONDS_PER_NANOSECOND);
if (sleepNS > 0)
#if !NETSTANDARD
try
{
#endif
reopenCond.WaitOne(TimeSpan.FromMilliseconds(sleepNS /
MILLISECONDS_PER_NANOSECOND));//Convert NS to Ticks
#if !NETSTANDARD
}
#pragma warning disable 168
catch (ThreadInterruptedException ie)
#pragma warning restore 168
{
Thread.CurrentThread.Interrupt();
return;
}
#endif
if (finish)
{
break;
}
lastReopenStartNS = Environment.TickCount *
MILLISECONDS_PER_NANOSECOND;
// Save the gen as of when we started the
reopen; the
// listener (HandleRefresh above) copies this to
// searchingGen once the reopen completes:
refreshStartGen =
writer.GetAndIncrementGeneration();
try
{
manager.MaybeRefreshBlocking();
}
catch (System.IO.IOException ioe)
{
throw new Exception(ioe.ToString(),
ioe);
}
}
// this will set the searchingGen so that all waiting
threads will exit
RefreshDone();
}
}
}