Ahh, thanks for bringing closure. Others do successfully run tests from Intellij, I think, so I'm not sure why you see intermittent issues...
Mike McCandless http://blog.mikemccandless.com On Fri, Jun 19, 2015 at 5:10 PM, Ian <ianri...@hotmail.com> wrote: > The problem with the tests were actually because of the IDE (Intellij). > Running the tests with ant directly works just fine. Just thought I would > have this registered for the record. > > ________________________________ > From: ianri...@hotmail.com > To: dev@lucene.apache.org > Subject: RE: Initial work on multi word synonyms and phrase queries > Date: Thu, 18 Jun 2015 11:53:23 +0000 > > > Issue opened: https://issues.apache.org/jira/browse/LUCENE-6582. > > @rcmuir, that change on the test is actually a leftover from one of my > previous solutions while exploring the problem. It is no longer necessary > and I removed it from the patch added to the issue above. > > To explain a little, in an earlier solution, the current inputs were always > the first tokens on the output, even if there were longer synonyms (in > number of terms). That created an inconsistency between position increments > and position lengths, as I wasn't sure I could have a position increment > grater than 1. So I changed it to have the first tokens, the ones that > actually increment the positions, come from the longer synonym. In this way, > the token stream has the same behavior as before: whenever the position > increment is 1, the position length is also 1. But that means that, when > keepOriginal = true and there are synonyms with more terms than the input, > the original input (tokens with type="word") will come, on the output > stream, "stacked" on top of synonym tokens. This seemed to me less likely > to impact elsewhere. > > Glad to hear you also deem that code complicated. I was assuming it was hard > to me because I'm a beginner on the code base ;-) > > About the failing tests, in my setup, they are flaky. Sometimes passing > sometimes failing, and not always the same. But always complaining of > missing postings formats (last time it was 'FST50'). I'll look around a > little more to see if I can figure out what's wrong. > > Ian > >> From: luc...@mikemccandless.com >> Date: Thu, 18 Jun 2015 06:02:09 -0400 >> Subject: Re: Initial work on multi word synonyms and phrase queries >> To: dev@lucene.apache.org; ianri...@hotmail.com >> >> +1 to opening an issue, thanks for exploring this! It's hairy :) >> >> Your windows test failures complaining about FSTOrd50 missing is >> curious ... I don't run Windows but maybe someone who does has an >> idea? That postings format comes from lucene/codecs which should be >> on the class path during tests... >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> On Wed, Jun 17, 2015 at 10:21 PM, Robert Muir <rcm...@gmail.com> wrote: >> > Hey, thanks for tackling this! That synonymfilter is a beast... >> > >> > Can you open a JIRA issue with your patch? >> > >> > To me the interesting part is this change in the test: >> > >> > if (posInc > 0) { >> > // This token increments position, so it is starting a new position. >> > // Its position is the last position plus the posLength of the >> > // last token that started a position. >> > pos += lastPosLength; >> > lastPosLength = posLength; >> > } >> > >> > This currently implies some change to how posInc/posLen are treated on >> > the consumer side: it would need changes to queryparsers and >> > indexwriter to work (which is fine, we could figure out those >> > semantics). But its my understanding this logic might be based on some >> > properties specific to synonymfilter being greedy, and not really >> > general to all streams. So maybe it synonymfilter or some other filter >> > needs to do this adjustment internally instead. >> > >> > Anyway, I think we should make an issue and investigate it. >> > >> > On Wed, Jun 17, 2015 at 9:56 PM, Ian <ianri...@hotmail.com> wrote: >> >> Hello, >> >> >> >> Some time ago, I had a problem with synonyms and phrase type queries >> >> (actually, it was elasticsearch and I was using a match query with >> >> multiple >> >> terms and the "and" operator, as better explained here: >> >> https://github.com/elastic/elasticsearch/issues/10394). >> >> >> >> That issue led to some work on Lucene: >> >> https://issues.apache.org/jira/browse/LUCENE-6400 (where I helped a >> >> little >> >> with tests) and https://issues.apache.org/jira/browse/LUCENE-6401. This >> >> issue is also related to >> >> https://issues.apache.org/jira/browse/LUCENE-3843. >> >> >> >> Starting from the discussion on LUCENE-6400, I'm attempting to >> >> implement a >> >> solution. Here is a patch with a first step - the implementation to fix >> >> "SynFilter to be able to 'make positions'" (as was mentioned on the >> >> issue). >> >> In this way, the synonym filter generates a correct (or, at least, >> >> better) >> >> graph. >> >> >> >> As the synonym matching is greedy, I only had to worry about fixing the >> >> position length of the rules of the current match, no future or past >> >> synonyms would "span" over this match (please correct me if I'm >> >> wrong!). It >> >> did require more buffering, twice as much. >> >> >> >> The new behavior I added is not active by default, a new parameter has >> >> to be >> >> passed in a new constructor for SynonymFilter. The changes I made do >> >> change >> >> the token stream generated by the synonym filter, and I thought it >> >> would be >> >> better to let that be a voluntary decision for now. >> >> >> >> I did some refactoring on the code, but mostly on what I had to change >> >> for >> >> may implementation, so that the patch was not too hard to read. I >> >> created >> >> specific unit tests for the new implementation >> >> (TestMultiWordSynonymFilter) >> >> that should show how things will be with the new behavior. >> >> >> >> Speaking of tests, I ran "analysis-common" tests locally (windows 8, >> >> java >> >> 8), and had only 2 unrelated failures (as far as I can tell) >> >> complaining of >> >> missing PostingsFormat "FSTOrd50". >> >> >> >> Thanks for any help, comment, adjustment on the patch. I'll do my best >> >> to >> >> make the necessary adjustments. >> >> >> >> Please forgive me if I did not follow any rule, of the code or of the >> >> list, >> >> and I would be grateful to be able to learn from my mistakes. >> >> >> >> Regards, >> >> Ian >> >> >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> >> For additional commands, e-mail: dev-h...@lucene.apache.org >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> > For additional commands, e-mail: dev-h...@lucene.apache.org >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org