[
https://issues.apache.org/jira/browse/SOLR-8530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15792021#comment-15792021
]
Joel Bernstein edited comment on SOLR-8530 at 1/2/17 2:19 AM:
--------------------------------------------------------------
I returned to the HavingStream as part of SOLR-8593.
What I found during the implementation is that both implementations described
in this ticket are compatible in the same HavingStream implementation.
What [~dpgove] originally described was indexing a document on the fly and the
using a Lucene/Solr query to implement the boolean logic.
What I described is implementing the boolean logic as stream operations that
would handle typical SQL Having comparisons (=, <, >, <>, >=, <=).
I have implemented the HavingStream I described as part of SOLR-8593 with
syntax that looks like this:
{code}
having(expr, booleanOp)
{code}
Where booleanOp is a new type of operation that returns *TRUE* or *FALSE* for
each tuple. The basic boolean operations have been implemented, such as:
{code}
having(expr, and(gt(field1, 5), lt(field1, 10)))
{code}
This would emit tuples from the underlying expr where field1 is greater then 5
and less then 10.
To implement what [~dpgove] had in mind, we can add a new boolean operation
called *match*. The match operation will index the tuple in an in-memory index
and then match a Lucene/Solr query against it. Here is the sample syntax:
{code}
having(expr, match("field1:[5 TO 10]"))
{code}
The match boolean operation could then be intermingled with other boolean
operations, for example:
{code}
having(expr, and(gt(field2, 8), match("body:(hello world)")))
{code}
Depending on the progress of the SOLR-8593, I may strip out the HavingStream
implementation and commit it with this ticket, so it can be ready for Solr 6.4.
was (Author: joel.bernstein):
I returned to the HavingStream as part of SOLR-8593.
What I found during the implementation is that both implementations described
in this ticket are compatible in the same HavingStream implementation.
What [~dpgove] originally described was indexing a document on the fly and the
using a Lucene/Solr query to implement the boolean logic.
What I described is implementing the boolean logic as stream operations that
would handle typical SQL Having comparisons (=, <, >, <>, >=, <=).
I have implemented the HavingStream I described as part of SOLR-8593 with
syntax that looks like this:
{code}
having(expr, booleanOp)
{code}
Where booleanOp is a new type of operation that returns *TRUE* or *FALSE* for
each tuple. The basic boolean operations have been implemented, such as:
{code}
having(expr, and(gt(field1, 5), lt(field1, 10)))
{code}
This would emit tuples from the underlying expr where field1 is greater the 5
and less then 10.
To implement what [~dpgove] had in mind, we can add a new boolean operation
called *match*. The match operation will index the tuple in an in-memory index
and then match a Lucene/Solr query against it. Here is the sample syntax:
{code}
having(expr, match("field1:[5 TO 10]"))
{code}
The match boolean operation could then be intermingled with other boolean
operations, for example:
{code}
having(expr, and(gt(field2, 8), match("body:(hello world)")))
{code}
Depending on the progress of the SOLR-8593, I may strip out the HavingStream
implementation and commit it with this ticket, so it can be ready for Solr 6.4.
> Add HavingStream to Streaming API and StreamingExpressions
> ----------------------------------------------------------
>
> Key: SOLR-8530
> URL: https://issues.apache.org/jira/browse/SOLR-8530
> Project: Solr
> Issue Type: Improvement
> Components: SolrJ
> Affects Versions: 6.0
> Reporter: Dennis Gove
> Priority: Minor
>
> The goal here is to support something similar to SQL's HAVING clause where
> one can filter documents based on data that is not available in the index.
> For example, filter the output of a reduce(....) based on the calculated
> metrics.
> {code}
> having(
> reduce(
> search(.....),
> sum(cost),
> on=customerId
> ),
> q="sum(cost):[500 TO *]"
> )
> {code}
> This example would return all where the total spent by each distinct customer
> is >= 500. The total spent is calculated via the sum(cost) metric in the
> reduce stream.
> The intent is to support as the filters in the having(...) clause the full
> query syntax of a search(...) clause. I see this being possible in one of two
> ways.
> 1. Use Lucene's MemoryIndex and as each tuple is read out of the underlying
> stream creating an instance of MemoryIndex and apply the query to it. If the
> result of that is >0 then the tuple should be returned from the HavingStream.
> 2. Create an in-memory solr index via something like RamDirectory, read all
> tuples into that in-memory index using the UpdateStream, and then stream out
> of that all the matching tuples from the query.
> There are benefits to each approach but I think the easiest and most direct
> one is the MemoryIndex approach. With MemoryIndex it isn't necessary to read
> all incoming tuples before returning a single tuple. With a MemoryIndex there
> is a need to parse the solr query parameters and create a valid Lucene query
> but I suspect that can be done using existing QParser implementations.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]