Thanks for the information. That workflow seems good: if it works from ant, it's ok.
> Date: Sat, 20 Jun 2015 09:19:35 -0700 > Subject: Re: Initial work on multi word synonyms and phrase queries > From: [email protected] > To: [email protected] > > I have had both things happen, tests that run fine in IntelliJ fail > with ant and vice-versa. Not often but occasionally. If it passes when > run from ant I consider it done. I've never dug too far into that > anomaly though, but I've guessed it may be related to temp directory > handling. > > FWIW > > On Fri, Jun 19, 2015 at 2:43 PM, Michael McCandless > <[email protected]> wrote: > > Ahh, thanks for bringing closure. > > > > Others do successfully run tests from Intellij, I think, so I'm not > > sure why you see intermittent issues... > > > > > > Mike McCandless > > > > http://blog.mikemccandless.com > > > > > > On Fri, Jun 19, 2015 at 5:10 PM, Ian <[email protected]> wrote: > >> The problem with the tests were actually because of the IDE (Intellij). > >> Running the tests with ant directly works just fine. Just thought I would > >> have this registered for the record. > >> > >> ________________________________ > >> From: [email protected] > >> To: [email protected] > >> Subject: RE: Initial work on multi word synonyms and phrase queries > >> Date: Thu, 18 Jun 2015 11:53:23 +0000 > >> > >> > >> Issue opened: https://issues.apache.org/jira/browse/LUCENE-6582. > >> > >> @rcmuir, that change on the test is actually a leftover from one of my > >> previous solutions while exploring the problem. It is no longer necessary > >> and I removed it from the patch added to the issue above. > >> > >> To explain a little, in an earlier solution, the current inputs were always > >> the first tokens on the output, even if there were longer synonyms (in > >> number of terms). That created an inconsistency between position increments > >> and position lengths, as I wasn't sure I could have a position increment > >> grater than 1. So I changed it to have the first tokens, the ones that > >> actually increment the positions, come from the longer synonym. In this > >> way, > >> the token stream has the same behavior as before: whenever the position > >> increment is 1, the position length is also 1. But that means that, when > >> keepOriginal = true and there are synonyms with more terms than the input, > >> the original input (tokens with type="word") will come, on the output > >> stream, "stacked" on top of synonym tokens. This seemed to me less likely > >> to impact elsewhere. > >> > >> Glad to hear you also deem that code complicated. I was assuming it was > >> hard > >> to me because I'm a beginner on the code base ;-) > >> > >> About the failing tests, in my setup, they are flaky. Sometimes passing > >> sometimes failing, and not always the same. But always complaining of > >> missing postings formats (last time it was 'FST50'). I'll look around a > >> little more to see if I can figure out what's wrong. > >> > >> Ian > >> > >>> From: [email protected] > >>> Date: Thu, 18 Jun 2015 06:02:09 -0400 > >>> Subject: Re: Initial work on multi word synonyms and phrase queries > >>> To: [email protected]; [email protected] > >>> > >>> +1 to opening an issue, thanks for exploring this! It's hairy :) > >>> > >>> Your windows test failures complaining about FSTOrd50 missing is > >>> curious ... I don't run Windows but maybe someone who does has an > >>> idea? That postings format comes from lucene/codecs which should be > >>> on the class path during tests... > >>> > >>> Mike McCandless > >>> > >>> http://blog.mikemccandless.com > >>> > >>> On Wed, Jun 17, 2015 at 10:21 PM, Robert Muir <[email protected]> wrote: > >>> > Hey, thanks for tackling this! That synonymfilter is a beast... > >>> > > >>> > Can you open a JIRA issue with your patch? > >>> > > >>> > To me the interesting part is this change in the test: > >>> > > >>> > if (posInc > 0) { > >>> > // This token increments position, so it is starting a new position. > >>> > // Its position is the last position plus the posLength of the > >>> > // last token that started a position. > >>> > pos += lastPosLength; > >>> > lastPosLength = posLength; > >>> > } > >>> > > >>> > This currently implies some change to how posInc/posLen are treated on > >>> > the consumer side: it would need changes to queryparsers and > >>> > indexwriter to work (which is fine, we could figure out those > >>> > semantics). But its my understanding this logic might be based on some > >>> > properties specific to synonymfilter being greedy, and not really > >>> > general to all streams. So maybe it synonymfilter or some other filter > >>> > needs to do this adjustment internally instead. > >>> > > >>> > Anyway, I think we should make an issue and investigate it. > >>> > > >>> > On Wed, Jun 17, 2015 at 9:56 PM, Ian <[email protected]> wrote: > >>> >> Hello, > >>> >> > >>> >> Some time ago, I had a problem with synonyms and phrase type queries > >>> >> (actually, it was elasticsearch and I was using a match query with > >>> >> multiple > >>> >> terms and the "and" operator, as better explained here: > >>> >> https://github.com/elastic/elasticsearch/issues/10394). > >>> >> > >>> >> That issue led to some work on Lucene: > >>> >> https://issues.apache.org/jira/browse/LUCENE-6400 (where I helped a > >>> >> little > >>> >> with tests) and https://issues.apache.org/jira/browse/LUCENE-6401. This > >>> >> issue is also related to > >>> >> https://issues.apache.org/jira/browse/LUCENE-3843. > >>> >> > >>> >> Starting from the discussion on LUCENE-6400, I'm attempting to > >>> >> implement a > >>> >> solution. Here is a patch with a first step - the implementation to fix > >>> >> "SynFilter to be able to 'make positions'" (as was mentioned on the > >>> >> issue). > >>> >> In this way, the synonym filter generates a correct (or, at least, > >>> >> better) > >>> >> graph. > >>> >> > >>> >> As the synonym matching is greedy, I only had to worry about fixing the > >>> >> position length of the rules of the current match, no future or past > >>> >> synonyms would "span" over this match (please correct me if I'm > >>> >> wrong!). It > >>> >> did require more buffering, twice as much. > >>> >> > >>> >> The new behavior I added is not active by default, a new parameter has > >>> >> to be > >>> >> passed in a new constructor for SynonymFilter. The changes I made do > >>> >> change > >>> >> the token stream generated by the synonym filter, and I thought it > >>> >> would be > >>> >> better to let that be a voluntary decision for now. > >>> >> > >>> >> I did some refactoring on the code, but mostly on what I had to change > >>> >> for > >>> >> may implementation, so that the patch was not too hard to read. I > >>> >> created > >>> >> specific unit tests for the new implementation > >>> >> (TestMultiWordSynonymFilter) > >>> >> that should show how things will be with the new behavior. > >>> >> > >>> >> Speaking of tests, I ran "analysis-common" tests locally (windows 8, > >>> >> java > >>> >> 8), and had only 2 unrelated failures (as far as I can tell) > >>> >> complaining of > >>> >> missing PostingsFormat "FSTOrd50". > >>> >> > >>> >> Thanks for any help, comment, adjustment on the patch. I'll do my best > >>> >> to > >>> >> make the necessary adjustments. > >>> >> > >>> >> Please forgive me if I did not follow any rule, of the code or of the > >>> >> list, > >>> >> and I would be grateful to be able to learn from my mistakes. > >>> >> > >>> >> Regards, > >>> >> Ian > >>> >> > >>> >> > >>> >> --------------------------------------------------------------------- > >>> >> To unsubscribe, e-mail: [email protected] > >>> >> For additional commands, e-mail: [email protected] > >>> > > >>> > --------------------------------------------------------------------- > >>> > To unsubscribe, e-mail: [email protected] > >>> > For additional commands, e-mail: [email protected] > >>> > > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: [email protected] > >>> For additional commands, e-mail: [email protected] > >>> > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [email protected] > > For additional commands, e-mail: [email protected] > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] >
