Re: Single string automaton causes NPE on Terms.intersect( CompiledAutomaton, BytesRef term )

2016-03-25 Thread Michael McCandless
Hi José,

Can you please open a Jira issue about this, and add a test case as a
patch, if you can?  I think it's bad you hit an NPE!  Not sure how
best to fix it, but we can iterate on the issue.

Thanks!

Mike McCandless

http://blog.mikemccandless.com


On Fri, Mar 25, 2016 at 7:11 PM, José Tomás Atria  wrote:
> Ok, digging a little more, I found that the problem mentioned above seems
> to be caused by FieldReader overriding the intersect( CompiledAutomaton,
> BytesRef )
> 
> method
> in Terms.
>
> The overriden method checks to see if the compiled automaton is not
> AUTOMATON_TYPE.NORMAL, and if it isn't, throws an IllegalArgumentException
> and instructs one to use CompiledAutomaton.getTermsEnum( Terms ) instead:
> if (compiled.type != CompiledAutomaton.AUTOMATON_TYPE.NORMAL) {
>   throw new IllegalArgumentException("please use
> CompiledAutomaton.getTermsEnum instead");
> }
>
> which, of course, works perfectly, so I'm doing that now and the problem is
> no more.
>
> However, the method in FieldReader just assumes that the compiled automaton
> is AUTOMATON_TYPE.NORMAL, which causes the above NPE, because the
> runAutomaton of a non-normal CompiledAutomaton is set to null in the
> constructor, lines 191 to 209:
>
> IntsRef singleton = Operations.getSingleton(automaton);
>
> if (singleton != null) {
>   // matches a fixed string
>   type = AUTOMATON_TYPE.SINGLE;
>   commonSuffixRef = null;
>   runAutomaton = null; // <- HERE!
>   this.automaton = null;
>   this.finite = null;
>
>   if (isBinary) {
> term = StringHelper.intsRefToBytesRef(singleton);
>   } else {
> term = new BytesRef(UnicodeUtil.newString(singleton.ints,
> singleton.offset, singleton.length));
>   }
>   sinkState = -1;
>   return;
> }
>
> Not to pretend I have any idea of what I'm talking about, but given that
> the user has relatively little control on which implementation of Terms we
> get at runtime (this user at least), shouldn't the overriding method in
> FieldReader also check the AUTOMATON_TYPE and throw an equally informative
> IllegalArgumentException instead, just for the sake of consistency?
>
> Sorry if all of the above is a little off topic for this list :)
>
> Best,
> jta
>
>
> On Fri, Mar 25, 2016 at 4:33 PM, José Tomás Atria  wrote:
>
>> Hello again!
>>
>> I'm playing around some more with Lucene's automata, and I've bumped into
>> something unexpected but can't figure out if its a bug or an error on my
>> part.
>>
>> briefly: Is it possible to use a single string automaton (i.e. the result
>> of Automata.makeString( String ) )  to intersect a Terms instance? I keep
>> getting NPE's on every attempt at doing this... e.g. this code:
>>
>> // where "term" is a term known to exist in someField
>> CompiledAutomaton ca = new CompiledAutomaton( Automata.makeString( "term"
>> ) );
>> Terms terms = leafReader.terms( someField );
>> TermsEnum tEnum = terms.intersect( ca, null );
>>
>> results in:
>> Exception in thread "main" java.lang.NullPointerException
>> at
>> org.apache.lucene.codecs.blocktree.IntersectTermsEnum.(IntersectTermsEnum.java:127)
>> at
>> org.apache.lucene.codecs.blocktree.FieldReader.intersect(FieldReader.java:185)
>>
>> I assume I'm doing something wrong (I am aware that using an automaton for
>> a single term may be a bad idea, but bear with me), but the fact that it's
>> throwing an NPE prompted me to come and ask...
>>
>> Maybe there's a problem with encodings?
>>
>> Any help greatly appreciated.
>> jta.
>>
>> --
>> entia non sunt multiplicanda praeter necessitatem
>>
>
>
>
> --
> entia non sunt multiplicanda praeter necessitatem

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Single string automaton causes NPE on Terms.intersect( CompiledAutomaton, BytesRef term )

2016-03-25 Thread José Tomás Atria
Ok, digging a little more, I found that the problem mentioned above seems
to be caused by FieldReader overriding the intersect( CompiledAutomaton,
BytesRef )

method
in Terms.

The overriden method checks to see if the compiled automaton is not
AUTOMATON_TYPE.NORMAL, and if it isn't, throws an IllegalArgumentException
and instructs one to use CompiledAutomaton.getTermsEnum( Terms ) instead:
if (compiled.type != CompiledAutomaton.AUTOMATON_TYPE.NORMAL) {
  throw new IllegalArgumentException("please use
CompiledAutomaton.getTermsEnum instead");
}

which, of course, works perfectly, so I'm doing that now and the problem is
no more.

However, the method in FieldReader just assumes that the compiled automaton
is AUTOMATON_TYPE.NORMAL, which causes the above NPE, because the
runAutomaton of a non-normal CompiledAutomaton is set to null in the
constructor, lines 191 to 209:

IntsRef singleton = Operations.getSingleton(automaton);

if (singleton != null) {
  // matches a fixed string
  type = AUTOMATON_TYPE.SINGLE;
  commonSuffixRef = null;
  runAutomaton = null; // <- HERE!
  this.automaton = null;
  this.finite = null;

  if (isBinary) {
term = StringHelper.intsRefToBytesRef(singleton);
  } else {
term = new BytesRef(UnicodeUtil.newString(singleton.ints,
singleton.offset, singleton.length));
  }
  sinkState = -1;
  return;
}

Not to pretend I have any idea of what I'm talking about, but given that
the user has relatively little control on which implementation of Terms we
get at runtime (this user at least), shouldn't the overriding method in
FieldReader also check the AUTOMATON_TYPE and throw an equally informative
IllegalArgumentException instead, just for the sake of consistency?

Sorry if all of the above is a little off topic for this list :)

Best,
jta


On Fri, Mar 25, 2016 at 4:33 PM, José Tomás Atria  wrote:

> Hello again!
>
> I'm playing around some more with Lucene's automata, and I've bumped into
> something unexpected but can't figure out if its a bug or an error on my
> part.
>
> briefly: Is it possible to use a single string automaton (i.e. the result
> of Automata.makeString( String ) )  to intersect a Terms instance? I keep
> getting NPE's on every attempt at doing this... e.g. this code:
>
> // where "term" is a term known to exist in someField
> CompiledAutomaton ca = new CompiledAutomaton( Automata.makeString( "term"
> ) );
> Terms terms = leafReader.terms( someField );
> TermsEnum tEnum = terms.intersect( ca, null );
>
> results in:
> Exception in thread "main" java.lang.NullPointerException
> at
> org.apache.lucene.codecs.blocktree.IntersectTermsEnum.(IntersectTermsEnum.java:127)
> at
> org.apache.lucene.codecs.blocktree.FieldReader.intersect(FieldReader.java:185)
>
> I assume I'm doing something wrong (I am aware that using an automaton for
> a single term may be a bad idea, but bear with me), but the fact that it's
> throwing an NPE prompted me to come and ask...
>
> Maybe there's a problem with encodings?
>
> Any help greatly appreciated.
> jta.
>
> --
> entia non sunt multiplicanda praeter necessitatem
>



-- 
entia non sunt multiplicanda praeter necessitatem


Re: Subset Matching

2016-03-25 Thread Jack Krupansky
There is no simple, direct way to do this "Boolean Reverse Query" in
Lucene, but I suggest filing a Jira to request this as a feature
improvement/new feature.

-- Jack Krupansky

On Fri, Mar 25, 2016 at 11:43 AM, Ahmet Arslan 
wrote:

> Hi Otmar,
>
> For this requirement, you need to create an additional field containing
> the number of words/terms in the field.
>
>
> For example.
>
> field : blue pill
> length = 2
>
>
> query : if you take the blue pill
> length  : 6
>
>
> Please see my previous responses on the same topic:
>
> http://search-lucene.com/m/eHNluYPa11VSxlf1=Re+search+for+documents+where+all+words+of+field+present+in+the+query
>
>
> http://search-lucene.com/m/eHNl9Yu6V1xx3rp=Re+Match+All+terms+in+indexed+field+value
>
> I know they are solr responses but Function Queries exists in Lucene as
> far as know.
>
> Ahmet
> On Friday, March 25, 2016 11:20 AM, Otmar Caduff 
> wrote:
>
>
>
> Hi all
> In Lucene, I know of the possibility of Occur.SHOULD, Occur.MUST and the
> “minimum should match” setting on the boolean query.
>
> Now, when querying, I want to
> - (1)  match the documents which either contain all the terms of the query
> (Occur.MUST for all terms would do that) or,
> - (2)  if all terms for a given field of a document are a subset of the
> query terms, that document should match as well.
>
> Any clue on how to accomplish this?
>
> Otmar
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: Subset Matching

2016-03-25 Thread Ahmet Arslan
Hi Otmar,

For this requirement, you need to create an additional field containing the 
number of words/terms in the field.


For example.

field : blue pill
length = 2


query : if you take the blue pill
length  : 6


Please see my previous responses on the same topic:
http://search-lucene.com/m/eHNluYPa11VSxlf1=Re+search+for+documents+where+all+words+of+field+present+in+the+query

http://search-lucene.com/m/eHNl9Yu6V1xx3rp=Re+Match+All+terms+in+indexed+field+value

I know they are solr responses but Function Queries exists in Lucene as far as 
know.

Ahmet
On Friday, March 25, 2016 11:20 AM, Otmar Caduff  wrote:



Hi all
In Lucene, I know of the possibility of Occur.SHOULD, Occur.MUST and the
“minimum should match” setting on the boolean query.

Now, when querying, I want to
- (1)  match the documents which either contain all the terms of the query
(Occur.MUST for all terms would do that) or,
- (2)  if all terms for a given field of a document are a subset of the
query terms, that document should match as well.

Any clue on how to accomplish this?

Otmar

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Subset Matching

2016-03-25 Thread Sujit Pal
Hi Otmar,

Shouldn't Occur.SHOULD alone do what you ask? Documents that match all
terms in the query would be scored higher than documents that match fewer
than all terms.

-sujit

On Fri, Mar 25, 2016 at 2:20 AM, Otmar Caduff  wrote:

> Hi all
> In Lucene, I know of the possibility of Occur.SHOULD, Occur.MUST and the
> “minimum should match” setting on the boolean query.
>
> Now, when querying, I want to
> - (1)  match the documents which either contain all the terms of the query
> (Occur.MUST for all terms would do that) or,
> - (2)  if all terms for a given field of a document are a subset of the
> query terms, that document should match as well.
>
> Any clue on how to accomplish this?
>
> Otmar
>


Subset Matching

2016-03-25 Thread Otmar Caduff
Hi all
In Lucene, I know of the possibility of Occur.SHOULD, Occur.MUST and the
“minimum should match” setting on the boolean query.

Now, when querying, I want to
- (1)  match the documents which either contain all the terms of the query
(Occur.MUST for all terms would do that) or,
- (2)  if all terms for a given field of a document are a subset of the
query terms, that document should match as well.

Any clue on how to accomplish this?

Otmar