I have had both things happen, tests that run fine in IntelliJ fail with ant and vice-versa. Not often but occasionally. If it passes when run from ant I consider it done. I've never dug too far into that anomaly though, but I've guessed it may be related to temp directory handling.
FWIW On Fri, Jun 19, 2015 at 2:43 PM, Michael McCandless <[email protected]> wrote: > Ahh, thanks for bringing closure. > > Others do successfully run tests from Intellij, I think, so I'm not > sure why you see intermittent issues... > > > Mike McCandless > > http://blog.mikemccandless.com > > > On Fri, Jun 19, 2015 at 5:10 PM, Ian <[email protected]> wrote: >> The problem with the tests were actually because of the IDE (Intellij). >> Running the tests with ant directly works just fine. Just thought I would >> have this registered for the record. >> >> ________________________________ >> From: [email protected] >> To: [email protected] >> Subject: RE: Initial work on multi word synonyms and phrase queries >> Date: Thu, 18 Jun 2015 11:53:23 +0000 >> >> >> Issue opened: https://issues.apache.org/jira/browse/LUCENE-6582. >> >> @rcmuir, that change on the test is actually a leftover from one of my >> previous solutions while exploring the problem. It is no longer necessary >> and I removed it from the patch added to the issue above. >> >> To explain a little, in an earlier solution, the current inputs were always >> the first tokens on the output, even if there were longer synonyms (in >> number of terms). That created an inconsistency between position increments >> and position lengths, as I wasn't sure I could have a position increment >> grater than 1. So I changed it to have the first tokens, the ones that >> actually increment the positions, come from the longer synonym. In this way, >> the token stream has the same behavior as before: whenever the position >> increment is 1, the position length is also 1. But that means that, when >> keepOriginal = true and there are synonyms with more terms than the input, >> the original input (tokens with type="word") will come, on the output >> stream, "stacked" on top of synonym tokens. This seemed to me less likely >> to impact elsewhere. >> >> Glad to hear you also deem that code complicated. I was assuming it was hard >> to me because I'm a beginner on the code base ;-) >> >> About the failing tests, in my setup, they are flaky. Sometimes passing >> sometimes failing, and not always the same. But always complaining of >> missing postings formats (last time it was 'FST50'). I'll look around a >> little more to see if I can figure out what's wrong. >> >> Ian >> >>> From: [email protected] >>> Date: Thu, 18 Jun 2015 06:02:09 -0400 >>> Subject: Re: Initial work on multi word synonyms and phrase queries >>> To: [email protected]; [email protected] >>> >>> +1 to opening an issue, thanks for exploring this! It's hairy :) >>> >>> Your windows test failures complaining about FSTOrd50 missing is >>> curious ... I don't run Windows but maybe someone who does has an >>> idea? That postings format comes from lucene/codecs which should be >>> on the class path during tests... >>> >>> Mike McCandless >>> >>> http://blog.mikemccandless.com >>> >>> On Wed, Jun 17, 2015 at 10:21 PM, Robert Muir <[email protected]> wrote: >>> > Hey, thanks for tackling this! That synonymfilter is a beast... >>> > >>> > Can you open a JIRA issue with your patch? >>> > >>> > To me the interesting part is this change in the test: >>> > >>> > if (posInc > 0) { >>> > // This token increments position, so it is starting a new position. >>> > // Its position is the last position plus the posLength of the >>> > // last token that started a position. >>> > pos += lastPosLength; >>> > lastPosLength = posLength; >>> > } >>> > >>> > This currently implies some change to how posInc/posLen are treated on >>> > the consumer side: it would need changes to queryparsers and >>> > indexwriter to work (which is fine, we could figure out those >>> > semantics). But its my understanding this logic might be based on some >>> > properties specific to synonymfilter being greedy, and not really >>> > general to all streams. So maybe it synonymfilter or some other filter >>> > needs to do this adjustment internally instead. >>> > >>> > Anyway, I think we should make an issue and investigate it. >>> > >>> > On Wed, Jun 17, 2015 at 9:56 PM, Ian <[email protected]> wrote: >>> >> Hello, >>> >> >>> >> Some time ago, I had a problem with synonyms and phrase type queries >>> >> (actually, it was elasticsearch and I was using a match query with >>> >> multiple >>> >> terms and the "and" operator, as better explained here: >>> >> https://github.com/elastic/elasticsearch/issues/10394). >>> >> >>> >> That issue led to some work on Lucene: >>> >> https://issues.apache.org/jira/browse/LUCENE-6400 (where I helped a >>> >> little >>> >> with tests) and https://issues.apache.org/jira/browse/LUCENE-6401. This >>> >> issue is also related to >>> >> https://issues.apache.org/jira/browse/LUCENE-3843. >>> >> >>> >> Starting from the discussion on LUCENE-6400, I'm attempting to >>> >> implement a >>> >> solution. Here is a patch with a first step - the implementation to fix >>> >> "SynFilter to be able to 'make positions'" (as was mentioned on the >>> >> issue). >>> >> In this way, the synonym filter generates a correct (or, at least, >>> >> better) >>> >> graph. >>> >> >>> >> As the synonym matching is greedy, I only had to worry about fixing the >>> >> position length of the rules of the current match, no future or past >>> >> synonyms would "span" over this match (please correct me if I'm >>> >> wrong!). It >>> >> did require more buffering, twice as much. >>> >> >>> >> The new behavior I added is not active by default, a new parameter has >>> >> to be >>> >> passed in a new constructor for SynonymFilter. The changes I made do >>> >> change >>> >> the token stream generated by the synonym filter, and I thought it >>> >> would be >>> >> better to let that be a voluntary decision for now. >>> >> >>> >> I did some refactoring on the code, but mostly on what I had to change >>> >> for >>> >> may implementation, so that the patch was not too hard to read. I >>> >> created >>> >> specific unit tests for the new implementation >>> >> (TestMultiWordSynonymFilter) >>> >> that should show how things will be with the new behavior. >>> >> >>> >> Speaking of tests, I ran "analysis-common" tests locally (windows 8, >>> >> java >>> >> 8), and had only 2 unrelated failures (as far as I can tell) >>> >> complaining of >>> >> missing PostingsFormat "FSTOrd50". >>> >> >>> >> Thanks for any help, comment, adjustment on the patch. I'll do my best >>> >> to >>> >> make the necessary adjustments. >>> >> >>> >> Please forgive me if I did not follow any rule, of the code or of the >>> >> list, >>> >> and I would be grateful to be able to learn from my mistakes. >>> >> >>> >> Regards, >>> >> Ian >>> >> >>> >> >>> >> --------------------------------------------------------------------- >>> >> To unsubscribe, e-mail: [email protected] >>> >> For additional commands, e-mail: [email protected] >>> > >>> > --------------------------------------------------------------------- >>> > To unsubscribe, e-mail: [email protected] >>> > For additional commands, e-mail: [email protected] >>> > >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: [email protected] >>> For additional commands, e-mail: [email protected] >>> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
