Solr1.4 and threads ....

2012-06-13 Thread Benson Margulies
We've got a tokenizer which is quite explicitly coded on the
assumption that it will only be called from one thread at a time.
After all, what would it mean for two threads to make interleaved
calls to the hasNext() function()?

Yet, a customer of ours with a gigantic instance of Solr 1.4 reports
incidents in which we throw an exception that indicates (we think),
that two different threads made interleaved calls.

Does this suggest anything to anyone? Other than that we've
misanalyzed the logic in the tokenizer and there's a way to make it
burp on one thread?


Re: Solr1.4 and threads ....

2012-06-13 Thread Robert Muir
On Wed, Jun 13, 2012 at 4:38 PM, Benson Margulies bimargul...@gmail.com wrote:

 Does this suggest anything to anyone? Other than that we've
 misanalyzed the logic in the tokenizer and there's a way to make it
 burp on one thread?

it might suggest the different tokenstream instances refer to some
shared object that is not thread safe: we had bugs like this before
(e.g. sharing a JDK collator is ok, but ICU ones are not thread-safe,
so you must clone them).

Because of this we beefed up our base analysis class
(http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_3_6/lucene/test-framework/src/java/org/apache/lucene/analysis/BaseTokenStreamTestCase.java)
to find thread safety bugs like this.

I recommend just grabbing the test-framework.jar (we release it as an
artifact), extend that class and write a test like:
  public void testRandomStrings() throws Exception {
checkRandomData(random, analyzer, 10);
  }

(or use the one in the branch, its even been improved since 3.6)

-- 
lucidimagination.com