Hello Shad,
There are 2 causes for the tests TestOneDocument/TestTwoDocuments never to
terminate:
Cause 1: the Parser.Run() method is never called. In the Java code, this type
implements IRunnable, but here it doesn't. The thread is supposed to be started
at the first call to Parser.Next() but does absolutely nothing:
if (t == null)
{
threadDone = false;
t = new ThreadClass(/*this*/);
t.SetDaemon(true);
t.Start();
}
The minimal solution is to define a new class:
private class MyThreadClass: ThreadClass
{
private readonly Action m_Run;
public MyThreadClass(Action run)
{
m_Run = run;
}
public override void Run()
{
m_Run();
}
}
And change the above code to:
if (t == null)
{
threadDone = false;
t = new MyThreadClass(Run);
t.SetDaemon(true);
t.Start();
}
This will cause progress, but the tests will still fail. The reason is that the
code to create the XmlReader:
Sax.Net.IXmlReader reader =
XmlReaderFactory.Current.CreateXmlReader();
//XMLReaderFactory.createXMLReader();
... fails becasuse XmlReaderFactory.Current expects the reader type to be
loaded from configuration files. Alas, something happens on its way to the
forum and you get a "null reference exception" preceded by a "thread abort
exception, causing the tests to fail because the reader is never created.
I had half a mind to replace the Sax parser (which is an idiom that is not
implemented in .NET) by something more lightweight, but since I'm feeling a bit
under the weather, I just changed the line to:
Sax.Net.IXmlReader reader = new
TagSoup.Net.XmlReaderFactory().CreateXmlReader();
...and be done with it. And on my machine, the tests pass now. I hope they do
too on your special machine <g>
The test TestForever() works as well, but ends with an exception (which is
swallowed):
System.ObjectDisposedException: Cannot access a closed Stream.
at System.IO.__Error.StreamIsClosed()
at System.IO.MemoryStream.Read(Byte[] buffer, Int32 offset, Int32 count)
at System.IO.StreamReader.ReadBuffer()
at System.IO.StreamReader.Read()
at TagSoup.Net.HTMLScanner.Scan(TextReader r, IScanHandler h)
The reason is that the parse call:
reader.Parse(new InputSource(IOUtils.GetDecodingReader(localFileIS,
Encoding.UTF8)));
... seems to want the StreamReader (and by default, the memory stream), after
the source.Dispose() is called,
Since the test passes, I'll pretend the problem doesn't exist.
Vincent
From: Shad Storhaug [mailto:[email protected]]
Sent: Monday, July 31, 2017 10:34 AM
To: Van Den Berghe, Vincent <[email protected]>
Cc: [email protected]
Subject: Benchmark Concurrency Bug
Vincent,
I have pushed Benchmark to my branch here:
https://github.com/NightOwl888/lucenenet/tree/benchmark. There are 106/109
tests passing, but there are 3 tests here that never finish:
https://github.com/NightOwl888/lucenenet/blob/benchmark/src/Lucene.Net.Tests.Benchmark/ByTask/Feeds/EnwikiContentSourceTest.cs#L29
There is also still one unfinished matter in that TagSoup/Sax.Net doesn't
support .NET Standard. It is a close match for Java's SAX parser, but so far
the owner of the project has not replied to my query whether he would be open
to a PR. So, I have my eye on using the HTML Agility Pack instead:
https://www.nuget.org/packages/HtmlAgilityPack. If the concurrency bug happens
to have something to do with Sax.Net, feel free to replace it with the HTML
Agility Pack.
I would appreciate if you could have a look at this when you have a chance.
Thanks,
Shad Storhaug (NightOwl888)