[jira] [Comment Edited] (SOLR-8530) Add HavingStream to Streaming API and StreamingExpressions

Joel Bernstein (JIRA) Sun, 01 Jan 2017 18:20:24 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-8530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15792021#comment-15792021
 ]


Joel Bernstein edited comment on SOLR-8530 at 1/2/17 2:19 AM:
--------------------------------------------------------------

I returned to the HavingStream as part of SOLR-8593.

What I found during the implementation is that both implementations described 
in this ticket are compatible in the same HavingStream implementation. 

What [~dpgove] originally described was indexing a document on the fly and the 
using a Lucene/Solr query to implement the boolean logic.

What I described is implementing the boolean logic as stream operations that 
would handle typical SQL Having comparisons (=, <, >, <>, >=, <=). 

I have  implemented the HavingStream I described as part of SOLR-8593 with 
syntax that looks like this:

{code}
having(expr, booleanOp)
{code}

Where booleanOp is a new type of operation that returns *TRUE* or *FALSE* for 
each tuple. The basic boolean operations have been implemented, such as:

{code}
having(expr, and(gt(field1, 5), lt(field1, 10)))
{code}

This would emit tuples from the underlying expr where field1 is greater then 5 
and less then 10.

To implement what [~dpgove] had in mind, we can add a new boolean operation 
called *match*. The match operation will index the tuple in an in-memory index 
and then match a Lucene/Solr query against it. Here is the sample syntax:

{code}
having(expr, match("field1:[5 TO 10]"))
{code}

The match boolean operation could then be intermingled with other boolean 
operations, for example:

{code}
having(expr, and(gt(field2, 8), match("body:(hello world)")))
{code}

Depending on the progress of the SOLR-8593, I may strip out the HavingStream 
implementation and commit it with this ticket, so it can be ready for Solr 6.4.







was (Author: joel.bernstein):
I returned to the HavingStream as part of SOLR-8593.

What I found during the implementation is that both implementations described 
in this ticket are compatible in the same HavingStream implementation. 

What [~dpgove] originally described was indexing a document on the fly and the 
using a Lucene/Solr query to implement the boolean logic.

What I described is implementing the boolean logic as stream operations that 
would handle typical SQL Having comparisons (=, <, >, <>, >=, <=). 

I have  implemented the HavingStream I described as part of SOLR-8593 with 
syntax that looks like this:

{code}
having(expr, booleanOp)
{code}

Where booleanOp is a new type of operation that returns *TRUE* or *FALSE* for 
each tuple. The basic boolean operations have been implemented, such as:

{code}
having(expr, and(gt(field1, 5), lt(field1, 10)))
{code}

This would emit tuples from the underlying expr where field1 is greater the 5 
and less then 10.

To implement what [~dpgove] had in mind, we can add a new boolean operation 
called *match*. The match operation will index the tuple in an in-memory index 
and then match a Lucene/Solr query against it. Here is the sample syntax:

{code}
having(expr, match("field1:[5 TO 10]"))
{code}

The match boolean operation could then be intermingled with other boolean 
operations, for example:

{code}
having(expr, and(gt(field2, 8), match("body:(hello world)")))
{code}

Depending on the progress of the SOLR-8593, I may strip out the HavingStream 
implementation and commit it with this ticket, so it can be ready for Solr 6.4.






> Add HavingStream to Streaming API and StreamingExpressions
> ----------------------------------------------------------
>
>                 Key: SOLR-8530
>                 URL: https://issues.apache.org/jira/browse/SOLR-8530
>             Project: Solr
>          Issue Type: Improvement
>          Components: SolrJ
>    Affects Versions: 6.0
>            Reporter: Dennis Gove
>            Priority: Minor
>
> The goal here is to support something similar to SQL's HAVING clause where 
> one can filter documents based on data that is not available in the index. 
> For example, filter the output of a reduce(....) based on the calculated 
> metrics.
> {code}
> having(
>   reduce(
>     search(.....),
>     sum(cost),
>     on=customerId
>   ),
>   q="sum(cost):[500 TO *]"
> )
> {code}
> This example would return all where the total spent by each distinct customer 
> is >= 500. The total spent is calculated via the sum(cost) metric in the 
> reduce stream.
> The intent is to support as the filters in the having(...) clause the full 
> query syntax of a search(...) clause. I see this being possible in one of two 
> ways. 
> 1. Use Lucene's MemoryIndex and as each tuple is read out of the underlying 
> stream creating an instance of MemoryIndex and apply the query to it. If the 
> result of that is >0 then the tuple should be returned from the HavingStream.
> 2. Create an in-memory solr index via something like RamDirectory, read all 
> tuples into that in-memory index using the UpdateStream, and then stream out 
> of that all the matching tuples from the query.
> There are benefits to each approach but I think the easiest and most direct 
> one is the MemoryIndex approach. With MemoryIndex it isn't necessary to read 
> all incoming tuples before returning a single tuple. With a MemoryIndex there 
> is a need to parse the solr query parameters and create a valid Lucene query 
> but I suspect that can be done using existing QParser implementations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (SOLR-8530) Add HavingStream to Streaming API and StreamingExpressions

Reply via email to