Re: Simple Query String Builder for SolrJ

Geoffrey Slinker Tue, 20 Aug 2024 12:23:27 -0700

Your feed back is greatly appreciated.

The implementation that I have is the result of my very specific use cases 
applied over several years.
My use cases may be so specialized that this isn't meaningful outside of my 
domain. And that's okay.


I have use cases where a search request is converted to a Solr Boolean query.

Then the results of that conversion may be altered in many ways. Suppose the 
weights and scores are to be altered because of special ranking model is 
applied.

Here is a very simplified and contrived example of some of the things I deal 
with in generating queries.


  public TermGroup generate(SearchRequest request) {
    TermGroup group = new TermGroup().withLabel("FULL_REQUEST");
    TermGroup groupA = new TermGroup().withLabel("FIRST_NAMES");
    group.addGroup(groupA);
    groupA.addTerm(new Term("firstName", "jeff"));
    groupA.addTerm(new Term("firstName", "jeffrey"));
    groupA.addTerm(new Term("firstName", "geoffrey"));

    TermGroup groupB = new TermGroup().withLabel("MIDDLE_NAMES");
    group.addGroup(groupB);
    groupB.addTerm(new Term("middleName", "john"));
    groupB.addTerm(new Term("middleName", "jon"));
    groupB.addTerm(new Term("middleName", "sean"));


    TermGroup groupC = new TermGroup().withLabel("LAST_NAMES");
    group.addGroup(groupC);

    groupC.addTerm(new Term("lastName", "smith"));
    groupC.addTerm(new Term("lastName", "smythe"));
    groupC.addTerm(new Term("lastName", "schmidt"));

    return group;
  }


  public void contrivedExample() {
    SearchRequest request = new SearchRequest();
    TermGroup requestGroup = generate(request);

    String prettyQuery = requestGroup.prettyPrint(true, "", "  ", "\n");

    //At least one match on the middle names is wanted
    List<TermGroup> middleNames = requestGroup.findByLabel("MIDDLE_NAMES");
    middleNames.get(0).setOccur(Occur.MUST);

    prettyQuery = requestGroup.prettyPrint(true, "", "  ", "\n");

    //We want to boost last names
    List<TermGroup> lastNames = requestGroup.findByLabel("LAST_NAMES");
    lastNames.get(0).setBoost(2.0f);

    prettyQuery = requestGroup.prettyPrint(true, "", "  ", "\n");

    //We want to weight the first names
    List<TermGroup> firstNames = requestGroup.findByLabel("FIRST_NAMES");
    firstNames.get(0).setBoost( 1.0f / firstNames.get(0).getTerms().size());

    prettyQuery = requestGroup.prettyPrint(true, "", "  ", "\n");


  }

Here are the "prettyQuery" in order of the code:

The original:

/* FULL_REQUEST */
(
  /* FIRST_NAMES */
  (
    firstName:jeff
    firstName:jeffrey
    firstName:geoffrey
  )
  /* MIDDLE_NAMES */
  (
    middleName:john
    middleName:jon
    middleName:sean
  )
  /* LASTNAMES */
  (
    lastName:smith
    lastName:smythe
    lastName:schmidt
  )
)



Must have aleast one match on Middle Name:


/* FULL_REQUEST */
(
  /* FIRST_NAMES */
  (
    firstName:jeff
    firstName:jeffrey
    firstName:geoffrey
  )
  /* MIDDLE_NAMES */
  +(
    middleName:john
    middleName:jon
    middleName:sean
  )
  /* LASTNAMES */
  (
    lastName:smith
    lastName:smythe
    lastName:schmidt
  )
)

//Boost the score of the Last Names

/* FULL_REQUEST */
(
  /* FIRST_NAMES */
  (
    firstName:jeff
    firstName:jeffrey
    firstName:geoffrey
  )
  /* MIDDLE_NAMES */
  +(
    middleName:john
    middleName:jon
    middleName:sean
  )
  /* LAST_NAMES */
  (
    lastName:smith
    lastName:smythe
    lastName:schmidt
  )^2
)


Add a weight to the first names:

/* FULL_REQUEST */
(
  /* FIRST_NAMES */
  (
    firstName:jeff
    firstName:jeffrey
    firstName:geoffrey
  )^0.3333
  /* MIDDLE_NAMES */
  +(
    middleName:john
    middleName:jon
    middleName:sean
  )
  /* LAST_NAMES */
  (
    lastName:smith
    lastName:smythe
    lastName:schmidt
  )^2
)


Sometimes I add more terms. Sometimes I remove a group (sub query). Sometimes I 
surround a group inside another group and then add sibling groups to that group.

I have generators that generate all types of groups (sub queries) and then I 
arrange them and place modifiers on them in a miriad of ways.


The symantics of using a Builder pattern, or a fluent pattern, or just getters 
and setters are implementation details. The style of how things are done in 
Solrj should previal if this is ever added to SolrJ.

The use cases I have dealt with that caused me to create this type of code is 
based on complex query generation.

Often I do not know how many Terms will be in the TermGroup because of the 
request. After I generate the group I might apply a boost like this:

firstNameGroup.setBoost( 1.0f / firstNameGroup.getTerms().size());

I might have a FristNameQueryGenerator, MiddleNameQUeryGenerator, and 
LastNameQuerGenerator. Then I might have a FullNameQueryGenerator that 
encapulates the first, middle, and last name genrators and place them all in a 
group with custom modifiers to get the score and ranking I desire.


I agree with your advice about growing attached to an implementation. My 
broader implementation inside of our proprietary code I am completely attached 
to and will not be moving from it. I just pulled out part of that 
implementation to share. I have only put in about 6 hours of my time to date on 
sharing this, and I do not feel it is time ill used.

Again, thanks for your time and thoughtful responses.

Geoffrey



> On Aug 20, 2024, at 10:07 AM, David Smiley <[email protected]> wrote:
> 
> Let's bikeshed before you write code, okay?  Otherwise you potentially
> waste time and/or grow attached to sunk costs.
> 
> Feedback:
> * avoid the word "term"; it already has Lucene definition and a Solr
> query parser but you're using it in a way that isn't either.  I
> recommend simply  for "fieldQuery" -- these queries target specific
> fields after all.
> * Can we avoid top level classes that the user must know about;
> instead having one class -- QueryBuilder (or named QueryStringBuilder)
> with factory methods that are easily discoverable?  Not a huge deal.
> * Instead of "Group", lets acknowledge these map to a BooleanQuery so
> I think "bool" in some way should be used instead.  Some bool builder
> can then have must() should() filter() methods without needing an
> enum.
> * Can't import any Lucene things
> 
> I'll add examples below of my feedback ideas.
> 
> On Tue, Aug 20, 2024 at 11:04 AM Geoffrey Slinker
> <[email protected]> wrote:
> 
>> Instantiate a Term and set the values and call toString to get a string that 
>> can be used in a Standard Solr Query.
>>       Term term = new Term("pink panther").withBoost(1.5f);
>>       term. toString()
>> 
>>       Output: "pink panther"^1.5
>> 
>>       Term term = new Term("title", "pink panther").withBoost(1.5f);
>>       term. toString()
>> 
>>       Output: title:"pink panther"^1.5
> 
> final QueryStringBuilder B = new QueryStringBuilder(potential
> options); // immutable
> B.field("title", "ping panther").withBoost(1.5f).toString();
> 
> 
>>          TermGroup group = new TermGroup().with(Occur. MUST).withBoost(1.4f);
>>          group. addTerm(new Term("foo", "bar").withProximity(1));
>> 
>>          String query = group. toString();
>> 
>>          Output: +( foo:bar~1 )^1.4
> 
> the outer MUST is pointless but I'll recreate anyway:
> 
> final QueryStringBuilder B = new QueryStringBuilder(potential
> options); // immutable
> B.bool().must(B.fieldFuzzy("foo", "bar", 1).withBoost(1.4)).toString();
> 
>> Example:
>>          TermGroup group = new TermGroup().withConstantScore(5.0f);
>>          group. addTerm(new Term("foo", "bar").withProximity(1));
>> 
>>          String query = group. toString();
>> 
>>          Output: ( foo:bar~1 )^=5
> 
> final QueryStringBuilder B = new QueryStringBuilder(potential
> options); // immutable
> B.fieldFuzzy("foo", "bar", 1).withConstantScore(5.0f).toString();
> // no "group" terminology necessary
> 
>> Instead of using string manipulation to create complex query strings the 
>> TermGroup allows complex queries to be built inside an object model that can 
>> be more easily changed.
>> If you need to generate a query like this:
>>  +(
>>        (
>>                title:"Grand Illusion"~1
>>                title:"Paradise Theatre"~1
>>        )^0.3
>>        (
>>                title:"Night At The Opera"~1
>>                title:"News Of The World"~1
>>        )^0.3
>>        (
>>                title:"Van Halen"~1
>>                title:1984~1
>>        )^0.3
>>  )
>> 
>> 
>>  The code to do so is as simple this:
>> 
>>      TermGroup group = new TermGroup().with(Occur. MUST);
>> 
>>      TermGroup favoriteStyx = group. addGroup().withBoost(0.3f);
>>      TermGroup favoriteQueen = group. addGroup().withBoost(0.3f);
>>      TermGroup favoriteVanHalen = group. addGroup().withBoost(0.3f);
>> 
>>      favoriteStyx. addTerm(new Term("title","Grand Illusion").with(Occur. 
>> SHOULD).withProximity(1));
>>      favoriteStyx. addTerm(new Term("title","Paradise Theatre").with(Occur. 
>> SHOULD).withProximity(1));
>> 
>>      favoriteQueen. addTerm(new Term("title","Night At The 
>> Opera").with(Occur. SHOULD).withProximity(1));
>>      favoriteQueen. addTerm(new Term("title","News Of The 
>> World").with(Occur. SHOULD).withProximity(1));
>> 
>>      favoriteVanHalen. addTerm(new Term("title","Van Halen").with(Occur. 
>> SHOULD).withProximity(1));
>>      favoriteVanHalen. addTerm(new Term("title","1984").with(Occur. 
>> SHOULD).withProximity(1));
>> 
> 
> // again, the outer bool MUST is pointless but will recreate your example
> 
> final QueryStringBuilder B = new QueryStringBuilder(potential
> options); // immutable
> 
> var favoriteStyx = B.bool();
> favoriteStyx.should(B.field("title", "Grand Illusion").withProximity(1));
> favoriteStyx.should(B.field("title", "Paradise Theater").withProximity(1));
> 
> var favoriteQueen = B.bool();
> favoriteQueen.should(B.field("title", "Night At The Opera").withProximity(1));
> favoriteQueen.should(B.field("title", "News Of The World").withProximity(1));
> 
> var favoriteVanHalen = B.bool();
> favoriteVanHalen.should(B.field("title", "Van Halen").withProximity(1));
> favoriteVanHalen.should(B.field("title", "1984").withProximity(1));
> 
> B.bool().must( // pointless wrap
>  B.bool().should(favoriteStyx.withBoost(0.3f))
>               .should(favoriteQueen.withBoost(0.3f))
>               .should(favoriteVanHalen.withBoost(0.3f))
> ).toString();
> 
> ---
> If we imagine plausibly expanding support to write Solr JSON as an
> alternative, then it could affect the code choices.  Like
> toSolrLuceneSyntax() and toSolrQueryDsl().
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Simple Query String Builder for SolrJ

Reply via email to