from:"Dennis Gove $JIRA$"

[jira] [Resolved] (SOLR-12271) Analytics Component reads negative float and double field values incorrectly

2018-05-30 Thread Dennis Gove (JIRA)



 [ 
https://issues.apache.org/jira/browse/SOLR-12271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove resolved SOLR-12271.

Resolution: Fixed

> Analytics Component reads negative float and double field values incorrectly
> 
>
> Key: SOLR-12271
> URL: https://issues.apache.org/jira/browse/SOLR-12271
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.4, master (8.0)
>Reporter: Houston Putman
>Assignee: Dennis Gove
>Priority: Major
> Fix For: 7.4, master (8.0)
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently the analytics component uses the incorrect way of converting 
> numeric doc values longs to doubles and floats.
> The fix is easy and the tests now cover this use case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-11914) Remove/move questionable SolrParams methods

2018-04-16 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-11914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16440179#comment-16440179
 ] 

Dennis Gove commented on SOLR-11914:


I agree with [~dsmiley] - the code in the streaming classes appears to be an 
oddly round-about way of doing things. The changes you've made here appear to 
be a much better approach.

> Remove/move questionable SolrParams methods
> ---
>
> Key: SOLR-11914
> URL: https://issues.apache.org/jira/browse/SOLR-11914
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Reporter: David Smiley
>Priority: Minor
>  Labels: newdev
> Attachments: SOLR-11914.patch
>
>
> {{Map getAll(Map sink, Collection 
> params)}} 
> Is only used by the CollectionsHandler, and has particular rules about how it 
> handles multi-valued data that make it not very generic, and thus I think 
> doesn't belong here.  Furthermore the existence of this method is confusing 
> in that it gives the user another choice against it use versus toMap (there 
> are two overloaded variants).
> {{SolrParams toFilteredSolrParams(List names)}}
> Is only called in one place, and something about it bothers me, perhaps just 
> the name or that it ought to be a view maybe.
> {{static Map toMap(NamedList params)}}
> Isn't used and I don't like it; it doesn't even involve a SolrParams!  Legacy 
> of 2006.
> {{static Map toMultiMap(NamedList params)}}
> It doesn't even involve a SolrParams! Legacy of 2006 with some updates since. 
> Used in some places. Perhaps should be moved to NamedList as an instance 
> method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-11924) Add the ability to watch collection set changes in ZkStateReader

2018-04-17 Thread Dennis Gove (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-11924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove resolved SOLR-11924.

   Resolution: Fixed
 Assignee: Dennis Gove
Fix Version/s: (was: master (8.0))

> Add the ability to watch collection set changes in ZkStateReader
> 
>
> Key: SOLR-11924
> URL: https://issues.apache.org/jira/browse/SOLR-11924
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.4, master (8.0)
>Reporter: Houston Putman
>Assignee: Dennis Gove
>Priority: Minor
> Fix For: 7.4
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Allow users to watch when the set of collections for a cluster is changed. 
> This is useful if a user is trying to discover collections within a cloud.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-12355) HashJoinStream's use of String::hashCode results in non-matching tuples being considered matches

2018-05-14 Thread Dennis Gove (JIRA)

Dennis Gove created SOLR-12355:
--

 Summary: HashJoinStream's use of String::hashCode results in 
non-matching tuples being considered matches
 Key: SOLR-12355
 URL: https://issues.apache.org/jira/browse/SOLR-12355
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: SolrJ
Affects Versions: 6.0
Reporter: Dennis Gove
Assignee: Dennis Gove


The following strings have been found to have hashCode conflicts and as such 
can result in HashJoinStream considering two tuples with fields of these values 
to be considered the same.


{code:java}
"MG!!00TNGP::Mtge::".hashCode() == "MG!!00TNH1::Mtge::".hashCode() {code}
This means these two tuples are the same if we're comparing on field "foo"
{code:java}
{
  "foo":"MG!!00TNGP::Mtge::"
}
{
  "foo":"MG!!00TNH1::Mtge::"
}
{code}
and these two tuples are the same if we're comparing on fields "foo,bar"
{code:java}
{
  "foo":"MG!!00TNGP"
  "bar":"Mtge"
}
{
  "foo":"MG!!00TNH1"
  "bar":"Mtge"
}{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12355) HashJoinStream's use of String::hashCode results in non-matching tuples being considered matches

2018-05-14 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475055#comment-16475055
 ] 

Dennis Gove commented on SOLR-12355:


I have a fix for this where instead of calculating the string value's hashCode 
we just use the string value as the key in the hashed set of tuples. I'm 
creating a few test cases to verify this gives us what we want.

> HashJoinStream's use of String::hashCode results in non-matching tuples being 
> considered matches
> 
>
> Key: SOLR-12355
> URL: https://issues.apache.org/jira/browse/SOLR-12355
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 6.0
>Reporter: Dennis Gove
>Assignee: Dennis Gove
>Priority: Major
>
> The following strings have been found to have hashCode conflicts and as such 
> can result in HashJoinStream considering two tuples with fields of these 
> values to be considered the same.
> {code:java}
> "MG!!00TNGP::Mtge::".hashCode() == "MG!!00TNH1::Mtge::".hashCode() {code}
> This means these two tuples are the same if we're comparing on field "foo"
> {code:java}
> {
>   "foo":"MG!!00TNGP::Mtge::"
> }
> {
>   "foo":"MG!!00TNH1::Mtge::"
> }
> {code}
> and these two tuples are the same if we're comparing on fields "foo,bar"
> {code:java}
> {
>   "foo":"MG!!00TNGP"
>   "bar":"Mtge"
> }
> {
>   "foo":"MG!!00TNH1"
>   "bar":"Mtge"
> }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12355) HashJoinStream's use of String::hashCode results in non-matching tuples being considered matches

2018-05-14 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475060#comment-16475060
 ] 

Dennis Gove commented on SOLR-12355:


This also impacts OuterHashJoinStream.

> HashJoinStream's use of String::hashCode results in non-matching tuples being 
> considered matches
> 
>
> Key: SOLR-12355
> URL: https://issues.apache.org/jira/browse/SOLR-12355
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 6.0
>Reporter: Dennis Gove
>Assignee: Dennis Gove
>Priority: Major
>
> The following strings have been found to have hashCode conflicts and as such 
> can result in HashJoinStream considering two tuples with fields of these 
> values to be considered the same.
> {code:java}
> "MG!!00TNGP::Mtge::".hashCode() == "MG!!00TNH1::Mtge::".hashCode() {code}
> This means these two tuples are the same if we're comparing on field "foo"
> {code:java}
> {
>   "foo":"MG!!00TNGP::Mtge::"
> }
> {
>   "foo":"MG!!00TNH1::Mtge::"
> }
> {code}
> and these two tuples are the same if we're comparing on fields "foo,bar"
> {code:java}
> {
>   "foo":"MG!!00TNGP"
>   "bar":"Mtge"
> }
> {
>   "foo":"MG!!00TNH1"
>   "bar":"Mtge"
> }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-12355) HashJoinStream's use of String::hashCode results in non-matching tuples being considered matches

2018-05-15 Thread Dennis Gove (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-12355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-12355:
---
Attachment: SOLR-12355.patch

> HashJoinStream's use of String::hashCode results in non-matching tuples being 
> considered matches
> 
>
> Key: SOLR-12355
> URL: https://issues.apache.org/jira/browse/SOLR-12355
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 6.0
>Reporter: Dennis Gove
>Assignee: Dennis Gove
>Priority: Major
> Attachments: SOLR-12355.patch
>
>
> The following strings have been found to have hashCode conflicts and as such 
> can result in HashJoinStream considering two tuples with fields of these 
> values to be considered the same.
> {code:java}
> "MG!!00TNGP::Mtge::".hashCode() == "MG!!00TNH1::Mtge::".hashCode() {code}
> This means these two tuples are the same if we're comparing on field "foo"
> {code:java}
> {
>   "foo":"MG!!00TNGP::Mtge::"
> }
> {
>   "foo":"MG!!00TNH1::Mtge::"
> }
> {code}
> and these two tuples are the same if we're comparing on fields "foo,bar"
> {code:java}
> {
>   "foo":"MG!!00TNGP"
>   "bar":"Mtge"
> }
> {
>   "foo":"MG!!00TNH1"
>   "bar":"Mtge"
> }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-12355) HashJoinStream's use of String::hashCode results in non-matching tuples being considered matches

2018-05-15 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-12355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475973#comment-16475973
 ] 

Dennis Gove commented on SOLR-12355:


Initial patch attached. I have not yet run the full suite of tests against this.

> HashJoinStream's use of String::hashCode results in non-matching tuples being 
> considered matches
> 
>
> Key: SOLR-12355
> URL: https://issues.apache.org/jira/browse/SOLR-12355
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 6.0
>Reporter: Dennis Gove
>Assignee: Dennis Gove
>Priority: Major
> Attachments: SOLR-12355.patch
>
>
> The following strings have been found to have hashCode conflicts and as such 
> can result in HashJoinStream considering two tuples with fields of these 
> values to be considered the same.
> {code:java}
> "MG!!00TNGP::Mtge::".hashCode() == "MG!!00TNH1::Mtge::".hashCode() {code}
> This means these two tuples are the same if we're comparing on field "foo"
> {code:java}
> {
>   "foo":"MG!!00TNGP::Mtge::"
> }
> {
>   "foo":"MG!!00TNH1::Mtge::"
> }
> {code}
> and these two tuples are the same if we're comparing on fields "foo,bar"
> {code:java}
> {
>   "foo":"MG!!00TNGP"
>   "bar":"Mtge"
> }
> {
>   "foo":"MG!!00TNH1"
>   "bar":"Mtge"
> }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-12355) HashJoinStream's use of String::hashCode results in non-matching tuples being considered matches

2018-05-18 Thread Dennis Gove (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-12355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-12355:
---
Attachment: SOLR-12355.patch

> HashJoinStream's use of String::hashCode results in non-matching tuples being 
> considered matches
> 
>
> Key: SOLR-12355
> URL: https://issues.apache.org/jira/browse/SOLR-12355
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 6.0
>Reporter: Dennis Gove
>Assignee: Dennis Gove
>Priority: Major
> Attachments: SOLR-12355.patch, SOLR-12355.patch
>
>
> The following strings have been found to have hashCode conflicts and as such 
> can result in HashJoinStream considering two tuples with fields of these 
> values to be considered the same.
> {code:java}
> "MG!!00TNGP::Mtge::".hashCode() == "MG!!00TNH1::Mtge::".hashCode() {code}
> This means these two tuples are the same if we're comparing on field "foo"
> {code:java}
> {
>   "foo":"MG!!00TNGP::Mtge::"
> }
> {
>   "foo":"MG!!00TNH1::Mtge::"
> }
> {code}
> and these two tuples are the same if we're comparing on fields "foo,bar"
> {code:java}
> {
>   "foo":"MG!!00TNGP"
>   "bar":"Mtge"
> }
> {
>   "foo":"MG!!00TNH1"
>   "bar":"Mtge"
> }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-12355) HashJoinStream's use of String::hashCode results in non-matching tuples being considered matches

2018-05-18 Thread Dennis Gove (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-12355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove resolved SOLR-12355.

   Resolution: Fixed
Fix Version/s: 7.4

> HashJoinStream's use of String::hashCode results in non-matching tuples being 
> considered matches
> 
>
> Key: SOLR-12355
> URL: https://issues.apache.org/jira/browse/SOLR-12355
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 6.0
>Reporter: Dennis Gove
>Assignee: Dennis Gove
>Priority: Major
> Fix For: 7.4
>
> Attachments: SOLR-12355.patch, SOLR-12355.patch
>
>
> The following strings have been found to have hashCode conflicts and as such 
> can result in HashJoinStream considering two tuples with fields of these 
> values to be considered the same.
> {code:java}
> "MG!!00TNGP::Mtge::".hashCode() == "MG!!00TNH1::Mtge::".hashCode() {code}
> This means these two tuples are the same if we're comparing on field "foo"
> {code:java}
> {
>   "foo":"MG!!00TNGP::Mtge::"
> }
> {
>   "foo":"MG!!00TNH1::Mtge::"
> }
> {code}
> and these two tuples are the same if we're comparing on fields "foo,bar"
> {code:java}
> {
>   "foo":"MG!!00TNGP"
>   "bar":"Mtge"
> }
> {
>   "foo":"MG!!00TNH1"
>   "bar":"Mtge"
> }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-12271) Analytics Component reads negative float and double field values incorrectly

2018-05-25 Thread Dennis Gove (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-12271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove reassigned SOLR-12271:
--

Assignee: Dennis Gove

> Analytics Component reads negative float and double field values incorrectly
> 
>
> Key: SOLR-12271
> URL: https://issues.apache.org/jira/browse/SOLR-12271
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.3.1, 7.4, master (8.0)
>Reporter: Houston Putman
>Assignee: Dennis Gove
>Priority: Major
> Fix For: 7.4, master (8.0)
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently the analytics component uses the incorrect way of converting 
> numeric doc values longs to doubles and floats.
> The fix is easy and the tests now cover this use case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-10512) Innerjoin streaming expressions - Invalid JoinStream error

2018-03-09 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16392851#comment-16392851
 ] 

Dennis Gove commented on SOLR-10512:


It was certainly designed such that the left field in the on clause is the 
field from the first incoming stream and the right field in the on clause is 
the field from the second incoming stream. If that is not occurring then this 
is a very clear bug.

> Innerjoin streaming expressions - Invalid JoinStream error
> --
>
> Key: SOLR-10512
> URL: https://issues.apache.org/jira/browse/SOLR-10512
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: search
>Affects Versions: 6.4.2, 6.5
> Environment: Debian Jessie
>Reporter: Dominique Béjean
>Priority: Major
>
> It looks like innerJoin streaming expression do not work as explained in 
> documentation. An invalid JoinStream error occurs.
> {noformat}
> curl --data-urlencode 'expr=innerJoin(
> search(books, 
>q="*:*", 
>fl="id", 
>sort="id asc"),
> searchreviews, 
>q="*:*", 
>fl="id_book_s", 
>sort="id_book_s asc"), 
> on="id=id_books_s"
> )' http://localhost:8983/solr/books/stream
>   
> {"result-set":{"docs":[{"EXCEPTION":"Invalid JoinStream - all incoming stream 
> comparators (sort) must be a superset of this stream's 
> equalitor.","EOF":true}]}}   
> {noformat}
> It is tottaly similar to the documentation example
> 
> {noformat}
> innerJoin(
>   search(people, q=*:*, fl="personId,name", sort="personId asc"),
>   search(pets, q=type:cat, fl="ownerId,petName", sort="ownerId asc"),
>   on="personId=ownerId"
> )
> {noformat}
> Queries on each collection give :
> {noformat}
> $ curl --data-urlencode 'expr=search(books, 
>q="*:*", 
>fl="id, title_s, pubyear_i", 
>sort="pubyear_i asc", 
>qt="/export")' 
> http://localhost:8983/solr/books/stream
> {
>   "result-set": {
> "docs": [
>   {
> "title_s": "Friends",
> "pubyear_i": 1994,
> "id": "book2"
>   },
>   {
> "title_s": "The Way of Kings",
> "pubyear_i": 2010,
> "id": "book1"
>   },
>   {
> "EOF": true,
> "RESPONSE_TIME": 16
>   }
> ]
>   }
> }
> $ curl --data-urlencode 'expr=search(reviews, 
>q="author_s:d*", 
>fl="id, id_book_s, stars_i, review_dt", 
>sort="id_book_s asc", 
>qt="/export")' 
> http://localhost:8983/solr/reviews/stream
>  
> {
>   "result-set": {
> "docs": [
>   {
> "stars_i": 3,
> "id": "book1_c2",
> "id_book_s": "book1",
> "review_dt": "2014-03-15T12:00:00Z"
>   },
>   {
> "stars_i": 4,
> "id": "book1_c3",
> "id_book_s": "book1",
> "review_dt": "2014-12-15T12:00:00Z"
>   },
>   {
> "stars_i": 3,
> "id": "book2_c2",
> "id_book_s": "book2",
> "review_dt": "1994-03-15T12:00:00Z"
>   },
>   {
> "stars_i": 4,
> "id": "book2_c3",
> "id_book_s": "book2",
> "review_dt": "1994-12-15T12:00:00Z"
>   },
>   {
> "EOF": true,
> "RESPONSE_TIME": 47
>   }
> ]
>   }
> }
> {noformat}
> After more tests, I just had to invert the "on" clause to make it work
> {noformat}
> curl --data-urlencode 'expr=innerJoin(
> search(books, 
>q="*:*", 
>fl="id", 
>sort="id asc"),
> searchreviews, 
>q="*:*", 
>fl="id_book_s", 
>sort="id_book_s asc"), 
> on="id_books_s=id"
> )' http://localhost:8983/solr/books/stream
> 
> {
>   "result-set": {
> "docs": [
>   {
> "title_s": "The Way of Kings",
> "pubyear_i": 2010,
> "stars_i": 5,
> "id": "book1",
>

[jira] [Commented] (SOLR-7535) Add UpdateStream to Streaming API and Streaming Expression

2015-12-22 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069053#comment-15069053
 ] 

Dennis Gove commented on SOLR-7535:
---

For the original mapping take a look at SolrStream, particular the 
{code}mapFields(...){code} function and where it is called from. 

It might make sense to require a SelectStream as the inner stream so that one 
can select the fields they want to insert. Or perhaps supporting a way to 
select fields as part of this stream's expression and it can internally use a 
SelectStream to implement that feature. 

> Add UpdateStream to Streaming API and Streaming Expression
> --
>
> Key: SOLR-7535
> URL: https://issues.apache.org/jira/browse/SOLR-7535
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - java, SolrJ
>Reporter: Joel Bernstein
>Priority: Minor
>
> The ticket adds an UpdateStream implementation to the Streaming API and 
> streaming expressions. The UpdateStream will wrap a TupleStream and send the 
> Tuples it reads to a SolrCloud collection to be indexed.
> This will allow users to pull data from different Solr Cloud collections, 
> merge and transform the streams and send the transformed data to another Solr 
> Cloud collection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-7535) Add UpdateStream to Streaming API and Streaming Expression

2015-12-22 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069053#comment-15069053
 ] 

Dennis Gove edited comment on SOLR-7535 at 12/23/15 2:30 AM:
-

For the original mapping take a look at SolrStream, particularly the 
{code}mapFields(...){code} function and where it is called from. 

It might make sense to require a SelectStream as the inner stream so that one 
can select the fields they want to insert. Or perhaps supporting a way to 
select fields as part of this stream's expression and it can internally use a 
SelectStream to implement that feature. 


was (Author: dpgove):
For the original mapping take a look at SolrStream, particular the 
{code}mapFields(...){code} function and where it is called from. 

It might make sense to require a SelectStream as the inner stream so that one 
can select the fields they want to insert. Or perhaps supporting a way to 
select fields as part of this stream's expression and it can internally use a 
SelectStream to implement that feature. 

> Add UpdateStream to Streaming API and Streaming Expression
> --
>
> Key: SOLR-7535
> URL: https://issues.apache.org/jira/browse/SOLR-7535
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - java, SolrJ
>Reporter: Joel Bernstein
>Priority: Minor
>
> The ticket adds an UpdateStream implementation to the Streaming API and 
> streaming expressions. The UpdateStream will wrap a TupleStream and send the 
> Tuples it reads to a SolrCloud collection to be indexed.
> This will allow users to pull data from different Solr Cloud collections, 
> merge and transform the streams and send the transformed data to another Solr 
> Cloud collection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8458) Parameter substitution for Streaming Expressions

2015-12-27 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15072395#comment-15072395
 ] 

Dennis Gove commented on SOLR-8458:
---

What if we were to make substitution parameters first class citizens similar to 
named parameters? During the parsing in ExpressionParser we could create 
instances of StreamExpressionSubstitutionParameters which exist as first class 
citizens of an StreamExpression object. This would allow us to send (in the 
example in the description) "expr", "left", and "right" through the 
ExpressionParser. Then, a simple method can be added to the StreamFactory which 
accepts a main expression and a map of names => expressions. It could then 
iterate over parameters of the main expression doing replacements until there 
are no more instances of StreamExpressionSubstitutionParameter in the main 
expression. Some checks for infinite loops would have to be added but those are 
relatively simple. 

This approach would allow the logic to exist outside of the StreamHandler which 
I think would be beneficial for the SQL Handler. 

It might also allow for some type of prepared statements with "pre-compiled" 
pieces (similar to what one might see in a DBMS). For example, this might be 
beneficial in a situation where some very expensive part of the expression is 
static which you want to perform different rollups or joins or whatever with. 
An optimizer could hang onto the static results in a RepeatableStream (doesn't 
exist yet) and substitute that into some other expression.

> Parameter substitution for Streaming Expressions
> 
>
> Key: SOLR-8458
> URL: https://issues.apache.org/jira/browse/SOLR-8458
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Priority: Minor
> Attachments: SOLR-8458.patch
>
>
> As Streaming Expressions become more complicated it would be nice to support 
> parameter substitution. For example:
> {code}
> http://localhost:8983/col/stream?expr=merge($left, $right, 
> ...)&left=search(...)&right=search(...)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8467) CloudSolrStream should take a SolrParams object rather than a Map to allow more complex Solr queries to be specified

2015-12-28 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15072706#comment-15072706
 ] 

Dennis Gove commented on SOLR-8467:
---

I can't think of any reason why accepting a [Modifiable]SolrParams object 
instead of a Map would be a bad idea. I like this change.

> CloudSolrStream should take a SolrParams object rather than a Map String> to allow more complex Solr queries to be specified
> 
>
> Key: SOLR-8467
> URL: https://issues.apache.org/jira/browse/SOLR-8467
> Project: Solr
>  Issue Type: Improvement
>Reporter: Erick Erickson
>Assignee: Erick Erickson
> Attachments: SOLR-8647.patch
>
>
> Currently, it's impossible to, say, specify multiple "fq" clauses when using 
> Streaming Aggregation due to the fact that the c'tors take a Map of params.
> Opening to discuss whether we should
> 1> deprecate the current c'tor
> and/or
> 2> add a c'tor that takes a SolrParams object instead.
> and/or
> 3> ???
> I don't see a clean way to go from a Map to a 
> (Modifiable)SolrParams, so existing code would need a significant change. I 
> hacked together a PoC, just to see if I could make CloudSolrStream take a 
> ModifiableSolrParams object instead and it passes tests, but it's so bad that 
> I'm not going to even post it. There's _got_ to be a better way to do this, 
> but at least it's possible



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8458) Parameter substitution for Streaming Expressions

2015-12-28 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15072812#comment-15072812
 ] 

Dennis Gove commented on SOLR-8458:
---

As I see it there are 2 pieces here, both related but separate.

First, adding support for parameter substitution in an expression. This would 
be handled with the changes I discussed above to StreamFactory, StreamParser, 
and the addition of a new type StreamExpressionSubstitutionParameter. Note that 
this doesn't necessarily care how the expressions come in.

And second, adding support for parameter substitution in StreamHandler and in 
an http request. I like the syntax Joel uses in the description. What this 
would mean is that StreamHandler would see http params like "expr", "left" and 
"right", would know that these are expressions (can call into StreamFactory to 
check if something is a valid expression), and would pass them off 
independently to be parsed and then together to be pieced together. 

This approach modularizes the implementation such that how an expression with 
substitution comes in via http is independent to how it is handled within the 
Streaming API. 

For example, the following comes into StreamHandler
{code}
http://localhost:8983/col/stream?expr=merge($left, $right, 
...)&baz=jaz&left=search(...)&right=search(...)&foo=bar
{code}

The StreamHandler will see five parameters, expr, baz, left, right, and foo. It 
would then determine that expr, left, and right are valid expressions and pass 
them off to be parsed into three expression objects. It would then pass all 
three into the factory to be combined into a single Stream object. The factory 
would then iterate (recursively?) until there aren't any more instances of a 
StreamExpressionSubstitutionParameter at any level (considering the possibility 
of infinite loops, of course). At this point it'd then just be passed off to 
create a Stream object as any other expression would be.

Another possibility would be to parse out the substitution expressions and then 
register them in the factory for use during Stream object creation. This would 
negate the need to do that pre-processing of the N substitution expression and 
would give a place to register "pre-compiled" expressions. I'm not a huge fan 
of this approach as it would add more state to the factory and I'm not a huge 
fan of the state it already contains.

> Parameter substitution for Streaming Expressions
> 
>
> Key: SOLR-8458
> URL: https://issues.apache.org/jira/browse/SOLR-8458
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Priority: Minor
> Attachments: SOLR-8458.patch
>
>
> As Streaming Expressions become more complicated it would be nice to support 
> parameter substitution. For example:
> {code}
> http://localhost:8983/col/stream?expr=merge($left, $right, 
> ...)&left=search(...)&right=search(...)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-8458) Parameter substitution for Streaming Expressions

2015-12-28 Thread Dennis Gove (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15072812#comment-15072812
]

Dennis Gove edited comment on SOLR-8458 at 12/28/15 3:27 PM:
-

As I see it there are 2 pieces here, both related but separate.

First, adding support for parameter substitution in an expression. This would
be handled with the changes I discussed above to StreamFactory, StreamParser,
and the addition of a new type StreamExpressionSubstitutionParameter. Note that
this doesn't necessarily care how the expressions come in.

And second, adding support for parameter substitution in StreamHandler and in
an http request. I like the syntax Joel uses in the description. What this
would mean is that StreamHandler would see http params like "expr", "left" and
"right", would know that these are expressions (can call into StreamFactory to
check if something is a valid expression), and would pass them off
independently to be parsed and then together to be pieced together.

This approach modularizes the implementation such that how an expression with
substitution comes in via http is independent to how it is handled within the
Streaming API.

For example, the following comes into StreamHandler
{code}
http://localhost:8983/col/stream?expr=merge($left, $right,
...)&baz=jaz&left=search(...)&right=search(...)&foo=bar
{code}

The StreamHandler will see five parameters, expr, baz, left, right, and foo. It
would then determine that expr, left, and right are valid expressions and pass
them off to be parsed into three expression objects. It would then pass all
three into the factory to be combined into a single Stream object. The factory
would then iterate (recursively?) until there aren't any more instances of a
StreamExpressionSubstitutionParameter at any level (considering the possibility
of infinite loops, of course). At this point it'd then just be passed off to
create a Stream object as any other expression would be.

Another possibility would be to parse out the substitution expressions and then
register them in the factory for use during Stream object creation. This would
negate the need to do that pre-processing of the N substitution expression and
would give a place to register "pre-compiled" expressions. I'm not a huge fan
of this approach as it would add more state to the factory and I'm not a huge
fan of the state it already contains.

I'm happy to take this on unless, [~caomanhdat], you want to continue your work
on it.

was (Author: dpgove):
As I see it there are 2 pieces here, both related but separate.

This approach modularizes the implementation such that how an expression with
substitution comes in via http is independent to how it is handled within the
Streaming API.

For example, the following comes into StreamHandler
{code}
http://localhost:8983/col/stream?expr=merge($left, $right,
...)&baz=jaz&left=search(...)&right=search(...)&foo=bar
{code}

> Parameter substitution for Streaming Expressions
>
>
> Key: SOLR-8458

[jira] [Commented] (SOLR-8458) Parameter substitution for Streaming Expressions

2015-12-28 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15072871#comment-15072871
 ] 

Dennis Gove commented on SOLR-8458:
---

I agree. There's no reason to reinvent and I'm always a fan of keeping things 
consistent. If preprocessing substitution is already implemented for all 
incoming requests then we should absolutely make use of it.

> Parameter substitution for Streaming Expressions
> 
>
> Key: SOLR-8458
> URL: https://issues.apache.org/jira/browse/SOLR-8458
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Priority: Minor
> Attachments: SOLR-8458.patch
>
>
> As Streaming Expressions become more complicated it would be nice to support 
> parameter substitution. For example:
> {code}
> http://localhost:8983/col/stream?expr=merge($left, $right, 
> ...)&left=search(...)&right=search(...)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8458) Parameter substitution for Streaming Expressions

2015-12-28 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15072874#comment-15072874
 ] 

Dennis Gove commented on SOLR-8458:
---

This is great news. I'm all for continuing to make use of this feature. Thanks!

> Parameter substitution for Streaming Expressions
> 
>
> Key: SOLR-8458
> URL: https://issues.apache.org/jira/browse/SOLR-8458
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Priority: Minor
> Attachments: SOLR-8458.patch
>
>
> As Streaming Expressions become more complicated it would be nice to support 
> parameter substitution. For example:
> {code}
> http://localhost:8983/col/stream?expr=merge($left, $right, 
> ...)&left=search(...)&right=search(...)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8458) Add Streaming Expressions tests for parameter substitution

2015-12-28 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15073389#comment-15073389
 ] 

Dennis Gove commented on SOLR-8458:
---

It appears from the thread below that substitution is already supported (see 
Yonik's comment below). At this point the action item would be to add streaming 
expression tests for parameter substitution.

> Add Streaming Expressions tests for parameter substitution
> --
>
> Key: SOLR-8458
> URL: https://issues.apache.org/jira/browse/SOLR-8458
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Priority: Minor
> Attachments: SOLR-8458.patch
>
>
> This ticket is to add Streaming Expression tests that exercise the existing 
> macro expansion feature described here:  
> http://yonik.com/solr-query-parameter-substitution/
> Sample syntax below:
> {code}
> http://localhost:8983/col/stream?expr=merge(${left}, ${right}, 
> ...)&left=search(...)&right=search(...)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-7535) Add UpdateStream to Streaming API and Streaming Expression

2015-12-28 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15073435#comment-15073435
 ] 

Dennis Gove commented on SOLR-7535:
---

I haven't looked at the patch yet but to answer your questions, 

1. The name of the collection in the URL path and collection in any part of the 
expression can absolutely be different. There are couple of cases where this 
difference will most likely appear. First, during a join or merge of multiple 
of collections only one of the collection names can be contained in the URL. 
For example
{code}
innerJoin(
  search(people, fl="personId,name", q="*:*", sort="personId asc"),
  search(address, fl="personId,city", q="state:ny", sort="personId asc"),
  on="personId"
)
{code}
Two collections are being hit but only a single one can be included in the URL. 
There aren't any hard and fast rules about which one should be used in the URL 
and that decision could depend on a lot of different things, especially if the 
collections live in different clouds or on different hardware. 

There is also the possibility that the http request is being sent to what is 
effectively an empty collection which only exists to perform parallel work 
using the streaming api. For example, imagine you want to do some heavy metric 
processing but you don't want to use more resources than necessary on the 
servers where the collections live. You could setup an empty collection on 
totally different hardware with the intent of that hardware to act solely as 
workers on the real collection. This would allow you to do the heavy lifting on 
separate hardware from where the collection actually lives. 

For these reasons the collection name is a required parameter in the base 
streams (SolrCloudStream and FacetStream).

2. There are three types of parameters; positional, unnamed, and named. 
*Positional parameters* are those which must exist in some specific location in 
the expression. IIRC, the only positional parameters are the collection names 
in the base streams. This is done because the collection name is critical and 
as such we can say it is the first parameter, regardless of anything else 
included. 

*Unnamed parameters* are those whose meaning can be determined by the content 
of the parameter. For example, 
{code}
rollup(
  search(people, fl="personId,name,age", q="*:*", sort="personId asc"),
  max(age),
  min(age),
  avg(age)
)
{code}
in this example we know that search(...) is a stream and max(...), min(...), 
and avg(...) are metrics. Unnamed parameters are also very useful in situations 
where the number of parameters of that type are non-determistic. In the example 
above one could provide any number of metrics and by keeping them unnamed the 
user can just keep adding new metrics without worrying about names. Another 
example of this is with the MergeStream where one can merge 2 or more streams 
together.

*Named parameters* are used when you want to be very clear about what a 
particular parameter is being used for. For example, the "on" parameter in a 
join clause is to indicate that the join should be done on some field (or 
fields). The HashJoinStream is an interesting one because we have a named 
parameter "hashed" whose parameter needs to be a stream. In this case the 
decision to use a named parameter was made so as to be very clear to the user 
which stream is being hashed and which one is not. Generally it comes down to 
whether a parameter name would make things clearer for the user.

> Add UpdateStream to Streaming API and Streaming Expression
> --
>
> Key: SOLR-7535
> URL: https://issues.apache.org/jira/browse/SOLR-7535
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - java, SolrJ
>Reporter: Joel Bernstein
>Priority: Minor
> Attachments: SOLR-7535.patch
>
>
> The ticket adds an UpdateStream implementation to the Streaming API and 
> streaming expressions. The UpdateStream will wrap a TupleStream and send the 
> Tuples it reads to a SolrCloud collection to be indexed.
> This will allow users to pull data from different Solr Cloud collections, 
> merge and transform the streams and send the transformed data to another Solr 
> Cloud collection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8176) Model distributed graph traversals with Streaming Expressions

2015-12-29 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15073843#comment-15073843
 ] 

Dennis Gove commented on SOLR-8176:
---

I've been thinking about this a little bit and one thing I keep coming back to 
is that there are different kinds of graph traversals and I think our model 
should take that into account. There are lots of types but I think the two 
major categories are node traversing graphs and edge traversing graphs. 

h3. Node Traversing Graphs
These are graphs where you have some set of root nodes and you want to find 
connected nodes with some set of criteria. For example, given a collection of 
geographic locations (city, county, state, country) with fields "id", "type", 
"parentId", "name" find all cities in NY. As a hiccup the data is not 
completely normalized and some cities have their county listed as their parent 
while some have their state listed as their parent. Ie, you do not know how 
many nodes are between any given city and any given state.
{code}
graph(
  geography,
  root(q="type=state AND name:ny", fl="id"),
  leaf(q="type=city", fl="id,parentId,name"),
  edge("id=parentId")
)
{code}
In this example you're starting with a set of nodes in the geography 
collection, all which have some relationship to each other. You select your 
starting (root) nodes as all states named "ny" (there could be more than one). 
You then define what constitutes an ending (leaf) node as all cities. And 
finally, you say that all edges where nodeA.id == nodeB.parentId should be 
followed.

This traversal can be implemented as a relatively simple iterative search 
following the form
{code}
frontier := search for all root nodes
leaves := empty list

while frontier is not empty
  frontierIds := list of ids of all nodes in frontier list
  leaves :append: search for all nodes whose parentId is in frontierIds and 
matches the leaf filter
  frontier := search for all nodes whose parentId is in frontierIds and does 
not match the leaf filter

{code}
In each iteration the leaves list can grow and the frontier list is replaced 
with the next set of nodes to consider. In the end you have a list of all leaf 
nodes which in some way connect to the original root nodes following the 
defined edge. Note that for simplicity I've left a couple of things out, 
including checking for already traversed nodes to avoid loops. Also, the leaf 
nodes are not added to the frontier but they can be. This would be useful in a 
situation where leaves are connected to leaves.

> Model distributed graph traversals with Streaming Expressions
> -
>
> Key: SOLR-8176
> URL: https://issues.apache.org/jira/browse/SOLR-8176
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - java, SolrCloud, SolrJ
>Affects Versions: Trunk
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>  Labels: Graph
> Fix For: Trunk
>
>
> I think it would be useful to model a few *distributed graph traversal* use 
> cases with Solr's *Streaming Expression* language. This ticket will explore 
> different approaches with a goal of implementing two or three common graph 
> traversal use cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-8176) Model distributed graph traversals with Streaming Expressions

2015-12-29 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15073843#comment-15073843
 ] 

Dennis Gove edited comment on SOLR-8176 at 12/29/15 12:10 PM:
--

I've been thinking about this a little bit and one thing I keep coming back to 
is that there are different kinds of graph traversals and I think our model 
should take that into account. There are lots of types but I think the two 
major categories are node traversing graphs and edge traversing graphs. 

h3. Node Traversing Graphs
These are graphs where you have some set of root nodes and you want to find 
connected nodes with some set of criteria. For example, given a collection of 
geographic locations (city, county, state, country) with fields "id", "type", 
"parentId", "name" find all cities in NY. As a hiccup the data is not 
completely normalized and some cities have their county listed as their parent 
while some have their state listed as their parent. Ie, you do not know how 
many nodes are between any given city and any given state.
{code}
graph(
  geography,
  root(q="type=state AND name:ny", fl="id"),
  leaf(q="type=city", fl="id,parentId,name"),
  edge("id=parentId")
)
{code}
In this example you're starting with a set of nodes in the geography 
collection, all which have some relationship to each other. You select your 
starting (root) nodes as all states named "ny" (there could be more than one). 
You then define what constitutes an ending (leaf) node as all cities. And 
finally, you say that all edges where nodeA.id == nodeB.parentId should be 
followed.

This traversal can be implemented as a relatively simple iterative search 
following the form
{code}
frontier := search for all root nodes
leaves := empty list

while frontier is not empty
  frontierIds := list of ids of all nodes in frontier list
  leaves :append: search for all nodes whose parentId is in frontierIds and 
matches the leaf filter
  frontier := search for all nodes whose parentId is in frontierIds and does 
not match the leaf filter

{code}
In each iteration the leaves list can grow and the frontier list is replaced 
with the next set of nodes to consider. In the end you have a list of all leaf 
nodes which in some way connect to the original root nodes following the 
defined edge. Note that for simplicity I've left a couple of things out, 
including checking for already traversed nodes to avoid loops. Also, the leaf 
nodes are not added to the frontier but they can be. This would be useful in a 
situation where leaves are connected to leaves.

h3. Edge Traversal Graphs
These are graphs where you have some set of edges but the nodes themselves are 
relatively unimportant for traversal. For example, finding the shortest path 
between two nodes, or finding the minimum spanning tree for some set of nodes, 
or finding loops.


was (Author: dpgove):
I've been thinking about this a little bit and one thing I keep coming back to 
is that there are different kinds of graph traversals and I think our model 
should take that into account. There are lots of types but I think the two 
major categories are node traversing graphs and edge traversing graphs. 

h3. Node Traversing Graphs
These are graphs where you have some set of root nodes and you want to find 
connected nodes with some set of criteria. For example, given a collection of 
geographic locations (city, county, state, country) with fields "id", "type", 
"parentId", "name" find all cities in NY. As a hiccup the data is not 
completely normalized and some cities have their county listed as their parent 
while some have their state listed as their parent. Ie, you do not know how 
many nodes are between any given city and any given state.
{code}
graph(
  geography,
  root(q="type=state AND name:ny", fl="id"),
  leaf(q="type=city", fl="id,parentId,name"),
  edge("id=parentId")
)
{code}
In this example you're starting with a set of nodes in the geography 
collection, all which have some relationship to each other. You select your 
starting (root) nodes as all states named "ny" (there could be more than one). 
You then define what constitutes an ending (leaf) node as all cities. And 
finally, you say that all edges where nodeA.id == nodeB.parentId should be 
followed.

This traversal can be implemented as a relatively simple iterative search 
following the form
{code}
frontier := search for all root nodes
leaves := empty list

while frontier is not empty
  frontierIds := list of ids of all nodes in frontier list
  leaves :append: search for all nodes whose parentId is in frontierIds and 
matches the leaf filter
  frontier := search for all nodes whose parentId is in frontierIds and does 
not match the leaf filter

{code}
In each iteration the leaves list can grow and the frontier list is replaced 
with the next set of nodes to consider. In the end you have a list of all leaf 
nodes which

[jira] [Commented] (SOLR-8458) Add Streaming Expressions tests for parameter substitution

2015-12-29 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15073898#comment-15073898
 ] 

Dennis Gove commented on SOLR-8458:
---

Cao,

What's the purpose of ClientTupleStream? It appears it's only used in the tests 
and doesn't add any value as a Stream object.

I'd rather not replace all existing stream creations with a randomized choice 
between doing substitution and not. I think it'd be better to have explicit 
tests which exercise substitution. I don't think it'd be necessary to test that 
substitution on each and every stream class because the implementation is 
outside of the stream classes. Also, it appears that the randomization of the 
choice is non-repeatable. Ie, if I rerun the tests with a -Dtests.seed value 
would the random choices be the same?

It appears that the substitution is just picking some substring in the 
expression and marking it as being a parameter. I think this should test 
substituting entire expression clauses, like 
{code}
http://localhost:8983/col/stream?expr=merge($left, $right, 
...)&left=search(...)&right=search(...)
{code}
where left and right are entire clauses. The tests you've provided appear to do 
something like this
{code}
http://localhost:8983/col/stream?expr=merge(sear$left, se$right..), 
...)&left=ch(...)&right=arch(.
{code}
which I don't think makes much sense. Technically the substitution should 
handle that but I think the codification should be that one would want to 
substitute entire expressions.

> Add Streaming Expressions tests for parameter substitution
> --
>
> Key: SOLR-8458
> URL: https://issues.apache.org/jira/browse/SOLR-8458
> Project: Solr
>  Issue Type: Improvement
>Reporter: Joel Bernstein
>Priority: Minor
> Attachments: SOLR-8458.patch, SOLR-8458.patch
>
>
> This ticket is to add Streaming Expression tests that exercise the existing 
> macro expansion feature described here:  
> http://yonik.com/solr-query-parameter-substitution/
> Sample syntax below:
> {code}
> http://localhost:8983/col/stream?expr=merge(${left}, ${right}, 
> ...)&left=search(...)&right=search(...)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-7535) Add UpdateStream to Streaming API and Streaming Expression

2015-12-29 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15073907#comment-15073907
 ] 

Dennis Gove commented on SOLR-7535:
---

In the Streaming API, read() is called until an EOF tuple is seen. This means 
that, even with an UpdateStream, one would have this code

{code}
while(true){
  tuple = updateStream.read()

  // if # of records is some size, do a commit

  if(tuple.EOF){
break
  }
}
{code}

I think it's the correct thing for an UpdateStream to swallow the individual 
tuples. The use-case you described isn't one I see existing. But if it did then 
I could see it being dealt with using a TeeStream. A TeeStream would work 
exactly like the unix command tee and take a single input stream and tee it out 
into multiple output streams. In this use-case, one would Tee the underlying 
searches. But again, I don't see this need actually existing.

> Add UpdateStream to Streaming API and Streaming Expression
> --
>
> Key: SOLR-7535
> URL: https://issues.apache.org/jira/browse/SOLR-7535
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - java, SolrJ
>Reporter: Joel Bernstein
>Priority: Minor
> Attachments: SOLR-7535.patch
>
>
> The ticket adds an UpdateStream implementation to the Streaming API and 
> streaming expressions. The UpdateStream will wrap a TupleStream and send the 
> Tuples it reads to a SolrCloud collection to be indexed.
> This will allow users to pull data from different Solr Cloud collections, 
> merge and transform the streams and send the transformed data to another Solr 
> Cloud collection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-7535) Add UpdateStream to Streaming API and Streaming Expression

2015-12-29 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15073920#comment-15073920
 ] 

Dennis Gove commented on SOLR-7535:
---

I had an interesting thought related to the call to read().

Should there be some distinction between a ReadStream and a WriteStream. A 
ReadStream is one which reads tuples out while a WriteStream is one which 
writes tuples in. Up until this point we've only ever had ReadStreams and the 
read() method has always made sense. But the UpdateStream is a WriteStream and 
maybe it should have a different function, maybe write(). Also, it might be 
nice to be able to say in a stream that it's direct incoming stream must be a 
WriteStream (for example, a CommitStream would only work on a WriteStream while 
a RollupStream would only work on a ReadStream). (though maybe it'd be 
interesting to do rollups over the output tuples of an UpdateStream.).

Thoughts?

> Add UpdateStream to Streaming API and Streaming Expression
> --
>
> Key: SOLR-7535
> URL: https://issues.apache.org/jira/browse/SOLR-7535
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - java, SolrJ
>Reporter: Joel Bernstein
>Priority: Minor
> Attachments: SOLR-7535.patch
>
>
> The ticket adds an UpdateStream implementation to the Streaming API and 
> streaming expressions. The UpdateStream will wrap a TupleStream and send the 
> Tuples it reads to a SolrCloud collection to be indexed.
> This will allow users to pull data from different Solr Cloud collections, 
> merge and transform the streams and send the transformed data to another Solr 
> Cloud collection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-7535) Add UpdateStream to Streaming API and Streaming Expression

2015-12-29 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15073952#comment-15073952
 ] 

Dennis Gove commented on SOLR-7535:
---

I agree. It needs to be fleshed out some more.

> Add UpdateStream to Streaming API and Streaming Expression
> --
>
> Key: SOLR-7535
> URL: https://issues.apache.org/jira/browse/SOLR-7535
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - java, SolrJ
>Reporter: Joel Bernstein
>Priority: Minor
> Attachments: SOLR-7535.patch
>
>
> The ticket adds an UpdateStream implementation to the Streaming API and 
> streaming expressions. The UpdateStream will wrap a TupleStream and send the 
> Tuples it reads to a SolrCloud collection to be indexed.
> This will allow users to pull data from different Solr Cloud collections, 
> merge and transform the streams and send the transformed data to another Solr 
> Cloud collection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-7535) Add UpdateStream to Streaming API and Streaming Expression

2016-01-01 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15076403#comment-15076403
 ] 

Dennis Gove commented on SOLR-7535:
---

+1 on fault tolerance as well.

1) I think the expected behavior of all streams is that the EOF tuple could 
contain extra metadata about the stream that is only known at the end. This 
allows an clients (or other streams) to know that this metadata didn't come 
from a real document but is just EOF metadata. If there are streams which don't 
handle a non-empty EOF tuple I think those streams should be corrected. 

2) I think you're correct about the ParallelStream and how it operates. I don't 
see a way for the ParallelStream, as currently implemented, to interact with 
the raw tuples coming out from a call to another streams read() method. Ie, it 
does depend on doing the partitioning at the source and cannot do it in the 
middle of a data pipeline. It'd be a nice feature to be able to take a single 
stream of data and split it out onto N streams across N workers.

Here's an example of a pipeline I'd like to be able to create with a 
ParallelStream but currently cannot seem to. Essentially, do something with the 
data then split it off to workers to to perform the expensive operations and 
then bring them back together (I hope the ascii art shows properly). 

{code}
  / --- worker1 --- rollup --- sort ---\
sourceA ---\ /- worker2 --- rollup --- sort \  
--- join ---<-- worker3 --- rollup --- sort -> --- 
mergesort ---\
sourceB ---/ \- worker4 --- rollup --- sort /   
 >--- join  output
  \ --- worker5 --- rollup --- sort ---/
 sourceC ---/
{code}

My understanding is that the parallelization must be done at the start of the 
pipeline and cannot be done in the middle of the pipeline.

Maybe a new stream is required that can split streams off to workers.

> Add UpdateStream to Streaming API and Streaming Expression
> --
>
> Key: SOLR-7535
> URL: https://issues.apache.org/jira/browse/SOLR-7535
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - java, SolrJ
>Reporter: Joel Bernstein
>Priority: Minor
> Attachments: SOLR-7535.patch, SOLR-7535.patch
>
>
> The ticket adds an UpdateStream implementation to the Streaming API and 
> streaming expressions. The UpdateStream will wrap a TupleStream and send the 
> Tuples it reads to a SolrCloud collection to be indexed.
> This will allow users to pull data from different Solr Cloud collections, 
> merge and transform the streams and send the transformed data to another Solr 
> Cloud collection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-7535) Add UpdateStream to Streaming API and Streaming Expression

2016-01-01 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15076418#comment-15076418
 ] 

Dennis Gove commented on SOLR-7535:
---

Clever. I like it.

> Add UpdateStream to Streaming API and Streaming Expression
> --
>
> Key: SOLR-7535
> URL: https://issues.apache.org/jira/browse/SOLR-7535
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - java, SolrJ
>Reporter: Joel Bernstein
>Priority: Minor
> Attachments: SOLR-7535.patch, SOLR-7535.patch
>
>
> The ticket adds an UpdateStream implementation to the Streaming API and 
> streaming expressions. The UpdateStream will wrap a TupleStream and send the 
> Tuples it reads to a SolrCloud collection to be indexed.
> This will allow users to pull data from different Solr Cloud collections, 
> merge and transform the streams and send the transformed data to another Solr 
> Cloud collection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-8479) Add JDBCStream for integration with external data sources

2016-01-01 Thread Dennis Gove (JIRA)

Dennis Gove created SOLR-8479:
-

 Summary: Add JDBCStream for integration with external data sources
 Key: SOLR-8479
 URL: https://issues.apache.org/jira/browse/SOLR-8479
 Project: Solr
  Issue Type: New Feature
  Components: SolrJ
Reporter: Dennis Gove
Priority: Minor


Given that the Streaming API can merge and join multiple incoming SolrStreams 
to perform complex operations on the resulting combined datasets I think it 
would be beneficial to also support incoming streams from other data sources. 

The JDBCStream will provide a Streaming API interface to any data source which 
provides a JDBC driver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-8479) Add JDBCStream for integration with external data sources

2016-01-01 Thread Dennis Gove (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-8479:
--
Attachment: SOLR-8479.patch

This is a first pass at the JDBCStream. There are still open questions and 
unimplemented pieces but I'm putting this out there to start the conversation. 
No tests are included.

1. Currently it's handling the loading of JDBC Driver classes by requiring the 
driver class be provided and will then call 
{code}
Class.forName(driverClassName);
{code}
during open(). I'm wondering if there's a better way to handle this, 
particularly if we can do the loading via config file handling.

> Add JDBCStream for integration with external data sources
> -
>
> Key: SOLR-8479
> URL: https://issues.apache.org/jira/browse/SOLR-8479
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrJ
>Reporter: Dennis Gove
>Priority: Minor
> Attachments: SOLR-8479.patch
>
>
> Given that the Streaming API can merge and join multiple incoming SolrStreams 
> to perform complex operations on the resulting combined datasets I think it 
> would be beneficial to also support incoming streams from other data sources. 
> The JDBCStream will provide a Streaming API interface to any data source 
> which provides a JDBC driver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-8479) Add JDBCStream for integration with external data sources

2016-01-02 Thread Dennis Gove (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-8479:
--
Attachment: SOLR-8479.patch

Adds some simple tests for the raw stream and as embedded inside a SelectStream 
and MergeStream where it is being merged with a CloudSolrStream. 

Still doesn't implement Expressible interface (next on my list). 

> Add JDBCStream for integration with external data sources
> -
>
> Key: SOLR-8479
> URL: https://issues.apache.org/jira/browse/SOLR-8479
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrJ
>Reporter: Dennis Gove
>Priority: Minor
> Attachments: SOLR-8479.patch, SOLR-8479.patch
>
>
> Given that the Streaming API can merge and join multiple incoming SolrStreams 
> to perform complex operations on the resulting combined datasets I think it 
> would be beneficial to also support incoming streams from other data sources. 
> The JDBCStream will provide a Streaming API interface to any data source 
> which provides a JDBC driver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-8479) Add JDBCStream for integration with external data sources

2016-01-02 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15076655#comment-15076655
 ] 

Dennis Gove edited comment on SOLR-8479 at 1/2/16 8:28 PM:
---

Adds some simple tests for the raw stream and as embedded inside a SelectStream 
and MergeStream where it is being merged with a CloudSolrStream. 

The tests are using the in-memory database hsqldb with driver 
"org.hsqldb.jdbcDriver". I chose this as it's already being used in a contrib 
module. I'm open to other options as I'm not a huge fan of this particular 
in-memory database.

Still doesn't implement Expressible interface (next on my list). 


was (Author: dpgove):
Adds some simple tests for the raw stream and as embedded inside a SelectStream 
and MergeStream where it is being merged with a CloudSolrStream. 

Still doesn't implement Expressible interface (next on my list). 

> Add JDBCStream for integration with external data sources
> -
>
> Key: SOLR-8479
> URL: https://issues.apache.org/jira/browse/SOLR-8479
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrJ
>Reporter: Dennis Gove
>Priority: Minor
> Attachments: SOLR-8479.patch, SOLR-8479.patch
>
>
> Given that the Streaming API can merge and join multiple incoming SolrStreams 
> to perform complex operations on the resulting combined datasets I think it 
> would be beneficial to also support incoming streams from other data sources. 
> The JDBCStream will provide a Streaming API interface to any data source 
> which provides a JDBC driver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8479) Add JDBCStream for integration with external data sources

2016-01-02 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15076668#comment-15076668
 ] 

Dennis Gove commented on SOLR-8479:
---

I considered that but I wanted to be sure the test covered non-Solr code bases. 
I think there's value in showing that a non-Solr external source can be used 
and functions as expected.

> Add JDBCStream for integration with external data sources
> -
>
> Key: SOLR-8479
> URL: https://issues.apache.org/jira/browse/SOLR-8479
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrJ
>Reporter: Dennis Gove
>Priority: Minor
> Attachments: SOLR-8479.patch, SOLR-8479.patch
>
>
> Given that the Streaming API can merge and join multiple incoming SolrStreams 
> to perform complex operations on the resulting combined datasets I think it 
> would be beneficial to also support incoming streams from other data sources. 
> The JDBCStream will provide a Streaming API interface to any data source 
> which provides a JDBC driver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6938) Convert build to work with Git rather than SVN.

2016-01-02 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15076730#comment-15076730
 ] 

Dennis Gove commented on LUCENE-6938:
-

You can get the current sha1 with the command

{code}
$> git rev-parse HEAD
{code}
And you can replace HEAD with the name of a branch/tag to get the sha1 of that. 
See 
$> git help rev-parse 
for all the options

> Convert build to work with Git rather than SVN.
> ---
>
> Key: LUCENE-6938
> URL: https://issues.apache.org/jira/browse/LUCENE-6938
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Mark Miller
>Assignee: Mark Miller
> Attachments: LUCENE-6938.patch
>
>
> We assume an SVN checkout in parts of our build and will need to move to 
> assuming a Git checkout.
> Patches against https://github.com/dweiss/lucene-solr-svn2git from 
> LUCENE-6933.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8432) Split StreamExpressionTest into separate tests

2016-01-03 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15080429#comment-15080429
 ] 

Dennis Gove commented on SOLR-8432:
---

The reason for the single @test calling out to multiple functions has to do 
with the test setup code in AbstractFullDistribZkTestBase. Each test method 
will go through realitively expensive test setup / teardown. By only having a 
single test method with calls out to individual methods we can avoid that 
repeated setup/teardown code. 

I ran both the original test class and these changes and the runtime difference 
is significant. The original version completes in 36s while the separated 
versions takes 490s. 

I think for this change to be accepted it would have to include changes in the 
base classes to move some of the setup work from test method setup to test 
class setup. (might actually require new base classes so as not to impact other 
tests using these base test classes).

> Split StreamExpressionTest into separate tests
> --
>
> Key: SOLR-8432
> URL: https://issues.apache.org/jira/browse/SOLR-8432
> Project: Solr
>  Issue Type: Test
>Affects Versions: Trunk
>Reporter: Jason Gerlowski
>Priority: Trivial
> Fix For: Trunk
>
> Attachments: SOLR-8432.patch
>
>
> Currently, {{StreamExpressionTest}} consists of a single JUnit test that 
> calls 10 or 15 methods, each targeting a particular type of stream or 
> scenario.
> Each of these scenario's would benefit being split into its own separate 
> JUnit test.  This would allow each scenario to pass/fail independently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-8432) Split StreamExpressionTest into separate tests

2016-01-03 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15080429#comment-15080429
 ] 

Dennis Gove edited comment on SOLR-8432 at 1/3/16 1:28 PM:
---

The reason for the single @test calling out to multiple functions has to do 
with the test setup code in AbstractFullDistribZkTestBase. Each test method 
will go through realitively expensive test setup / teardown. By only having a 
single test method with calls out to individual methods we can avoid that 
repeated setup/teardown code. 

I ran both the original test class and these changes and the runtime difference 
is significant. The original version completes in 36s while the separated 
version completes in 490s. 

I think for this change to be accepted it would have to include changes in the 
base classes to move some of the setup work from test method setup to test 
class setup. (might actually require new base classes so as not to impact other 
tests using these base test classes).


was (Author: dpgove):
The reason for the single @test calling out to multiple functions has to do 
with the test setup code in AbstractFullDistribZkTestBase. Each test method 
will go through realitively expensive test setup / teardown. By only having a 
single test method with calls out to individual methods we can avoid that 
repeated setup/teardown code. 

I ran both the original test class and these changes and the runtime difference 
is significant. The original version completes in 36s while the separated 
versions takes 490s. 

I think for this change to be accepted it would have to include changes in the 
base classes to move some of the setup work from test method setup to test 
class setup. (might actually require new base classes so as not to impact other 
tests using these base test classes).

> Split StreamExpressionTest into separate tests
> --
>
> Key: SOLR-8432
> URL: https://issues.apache.org/jira/browse/SOLR-8432
> Project: Solr
>  Issue Type: Test
>Affects Versions: Trunk
>Reporter: Jason Gerlowski
>Priority: Trivial
> Fix For: Trunk
>
> Attachments: SOLR-8432.patch
>
>
> Currently, {{StreamExpressionTest}} consists of a single JUnit test that 
> calls 10 or 15 methods, each targeting a particular type of stream or 
> scenario.
> Each of these scenario's would benefit being split into its own separate 
> JUnit test.  This would allow each scenario to pass/fail independently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-7535) Add UpdateStream to Streaming API and Streaming Expression

2016-01-03 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15080592#comment-15080592
 ] 

Dennis Gove commented on SOLR-7535:
---

It seems like a reasonable approach to limit the read rate to the maximum 
possible write rate. Lets add a buffering option at a later point, if it ends 
up being necessary.

> Add UpdateStream to Streaming API and Streaming Expression
> --
>
> Key: SOLR-7535
> URL: https://issues.apache.org/jira/browse/SOLR-7535
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - java, SolrJ
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Minor
> Attachments: SOLR-7535.patch, SOLR-7535.patch, SOLR-7535.patch, 
> SOLR-7535.patch, SOLR-7535.patch, SOLR-7535.patch
>
>
> The ticket adds an UpdateStream implementation to the Streaming API and 
> streaming expressions. The UpdateStream will wrap a TupleStream and send the 
> Tuples it reads to a SolrCloud collection to be indexed.
> This will allow users to pull data from different Solr Cloud collections, 
> merge and transform the streams and send the transformed data to another Solr 
> Cloud collection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-8479) Add JDBCStream for integration with external data sources

2016-01-04 Thread Dennis Gove (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-8479:
--
Attachment: SOLR-8479.patch

New patch with a few changes.

1. Added some new tests
2. Made driverClassName an optional property. if provided then we will call 
Class.forName(driverClassName); during open(). Also added a call to 
DriverManager.getDriver(connectionUrl) during open() to validate that the 
driver can be found. If not then an exception is thrown. This will prevent us 
from continuing if the jdbc driver is not loaded.
3. Changed the default handling types so that Double is handled as a direct 
class while Float is converted to a Doube. This keeps in line with the rest of 
the Streaming API. 

> Add JDBCStream for integration with external data sources
> -
>
> Key: SOLR-8479
> URL: https://issues.apache.org/jira/browse/SOLR-8479
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrJ
>Reporter: Dennis Gove
>Priority: Minor
> Attachments: SOLR-8479.patch, SOLR-8479.patch, SOLR-8479.patch
>
>
> Given that the Streaming API can merge and join multiple incoming SolrStreams 
> to perform complex operations on the resulting combined datasets I think it 
> would be beneficial to also support incoming streams from other data sources. 
> The JDBCStream will provide a Streaming API interface to any data source 
> which provides a JDBC driver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-8479) Add JDBCStream for integration with external data sources

2016-01-04 Thread Dennis Gove (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-8479:
--
Attachment: SOLR-8479.patch

Previous patch was a diff between the wrong hashes in the repo. This one is 
correct.

> Add JDBCStream for integration with external data sources
> -
>
> Key: SOLR-8479
> URL: https://issues.apache.org/jira/browse/SOLR-8479
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrJ
>Reporter: Dennis Gove
>Priority: Minor
> Attachments: SOLR-8479.patch, SOLR-8479.patch, SOLR-8479.patch, 
> SOLR-8479.patch
>
>
> Given that the Streaming API can merge and join multiple incoming SolrStreams 
> to perform complex operations on the resulting combined datasets I think it 
> would be beneficial to also support incoming streams from other data sources. 
> The JDBCStream will provide a Streaming API interface to any data source 
> which provides a JDBC driver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8479) Add JDBCStream for integration with external data sources

2016-01-04 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15081873#comment-15081873
 ] 

Dennis Gove commented on SOLR-8479:
---

I intend to add a few more tests for failure scenarios and for setting 
connection properties. Barring any issues found with that, I think this will be 
ready to go .

> Add JDBCStream for integration with external data sources
> -
>
> Key: SOLR-8479
> URL: https://issues.apache.org/jira/browse/SOLR-8479
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrJ
>Reporter: Dennis Gove
>Priority: Minor
> Attachments: SOLR-8479.patch, SOLR-8479.patch, SOLR-8479.patch, 
> SOLR-8479.patch
>
>
> Given that the Streaming API can merge and join multiple incoming SolrStreams 
> to perform complex operations on the resulting combined datasets I think it 
> would be beneficial to also support incoming streams from other data sources. 
> The JDBCStream will provide a Streaming API interface to any data source 
> which provides a JDBC driver.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-8485) SelectStream only works with all lowercase field names and doesn't handle quoted selected fields

2016-01-04 Thread Dennis Gove (JIRA)

Dennis Gove created SOLR-8485:
-

 Summary: SelectStream only works with all lowercase field names 
and doesn't handle quoted selected fields
 Key: SOLR-8485
 URL: https://issues.apache.org/jira/browse/SOLR-8485
 Project: Solr
  Issue Type: Bug
Reporter: Dennis Gove
Priority: Minor


Three issues exist if one creates a SelectStream with an expression.

{code}
select(
  search(collection1, fl="personId_i,rating_f", q="rating_f:*", 
sort="personId_i asc"),
  personId_i as personId,
  rating_f as rating
)
{code}

"personId_i as personId" will be parsed as "personid_i as personid"

1. The incoming tuple will contain a field "personId_i" but the selection will 
be looking for a field "personid_i". This field won't be found in the incoming 
tuple (notice the case difference) and as such no field personId will exist in 
the outgoing tuple.

2. If (1) wasn't an issue, the outgoing tuple would have in a field "personid" 
and not the expected "personId" (notice the case difference). This can lead to 
other down-the-road issues.

Also, if one were to quote the selected fields such as in
{code}
select(
  search(collection1, fl="personId_i,rating_f", q="rating_f:*", 
sort="personId_i asc"),
  "personId_i as personId",
  "rating_f as rating"
)
{code}
then the quotes would be included in the field name. Wrapping quotes should be 
handled properly such that they are removed from the parameters before they are 
parsed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-8485) SelectStream only works with all lowercase field names and doesn't handle quoted selected fields

2016-01-04 Thread Dennis Gove (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-8485:
--
Description: 
Three issues exist if one creates a SelectStream with an expression.

{code}
select(
  search(collection1, fl="personId_i,rating_f", q="rating_f:*", 
sort="personId_i asc"),
  personId_i as personId,
  rating_f as rating
)
{code}

"personId_i as personId" will be parsed as "personid_i as personid"

1. The incoming tuple will contain a field "personId_i" but the selection will 
be looking for a field "personid_i". This field won't be found in the incoming 
tuple (notice the case difference) and as such no field personId will exist in 
the outgoing tuple.

2. If (1) wasn't an issue, the outgoing tuple would have in a field "personid" 
and not the expected "personId" (notice the case difference). This can lead to 
other down-the-road issues.

3. Also, if one were to quote the selected fields such as in
{code}
select(
  search(collection1, fl="personId_i,rating_f", q="rating_f:*", 
sort="personId_i asc"),
  "personId_i as personId",
  "rating_f as rating"
)
{code}
then the quotes would be included in the field name. Wrapping quotes should be 
handled properly such that they are removed from the parameters before they are 
parsed.

  was:
Three issues exist if one creates a SelectStream with an expression.

{code}
select(
  search(collection1, fl="personId_i,rating_f", q="rating_f:*", 
sort="personId_i asc"),
  personId_i as personId,
  rating_f as rating
)
{code}

"personId_i as personId" will be parsed as "personid_i as personid"

1. The incoming tuple will contain a field "personId_i" but the selection will 
be looking for a field "personid_i". This field won't be found in the incoming 
tuple (notice the case difference) and as such no field personId will exist in 
the outgoing tuple.

2. If (1) wasn't an issue, the outgoing tuple would have in a field "personid" 
and not the expected "personId" (notice the case difference). This can lead to 
other down-the-road issues.

Also, if one were to quote the selected fields such as in
{code}
select(
  search(collection1, fl="personId_i,rating_f", q="rating_f:*", 
sort="personId_i asc"),
  "personId_i as personId",
  "rating_f as rating"
)
{code}
then the quotes would be included in the field name. Wrapping quotes should be 
handled properly such that they are removed from the parameters before they are 
parsed.


> SelectStream only works with all lowercase field names and doesn't handle 
> quoted selected fields
> 
>
> Key: SOLR-8485
> URL: https://issues.apache.org/jira/browse/SOLR-8485
> Project: Solr
>  Issue Type: Bug
>Reporter: Dennis Gove
>Priority: Minor
>  Labels: streaming
>
> Three issues exist if one creates a SelectStream with an expression.
> {code}
> select(
>   search(collection1, fl="personId_i,rating_f", q="rating_f:*", 
> sort="personId_i asc"),
>   personId_i as personId,
>   rating_f as rating
> )
> {code}
> "personId_i as personId" will be parsed as "personid_i as personid"
> 1. The incoming tuple will contain a field "personId_i" but the selection 
> will be looking for a field "personid_i". This field won't be found in the 
> incoming tuple (notice the case difference) and as such no field personId 
> will exist in the outgoing tuple.
> 2. If (1) wasn't an issue, the outgoing tuple would have in a field 
> "personid" and not the expected "personId" (notice the case difference). This 
> can lead to other down-the-road issues.
> 3. Also, if one were to quote the selected fields such as in
> {code}
> select(
>   search(collection1, fl="personId_i,rating_f", q="rating_f:*", 
> sort="personId_i asc"),
>   "personId_i as personId",
>   "rating_f as rating"
> )
> {code}
> then the quotes would be included in the field name. Wrapping quotes should 
> be handled properly such that they are removed from the parameters before 
> they are parsed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-8485) SelectStream only works with all lowercase field names and doesn't handle quoted selected fields

2016-01-04 Thread Dennis Gove (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-8485:
--
Attachment: SOLR-8485.patch

This patch corrects issues (1) and (2). 

> SelectStream only works with all lowercase field names and doesn't handle 
> quoted selected fields
> 
>
> Key: SOLR-8485
> URL: https://issues.apache.org/jira/browse/SOLR-8485
> Project: Solr
>  Issue Type: Bug
>Reporter: Dennis Gove
>Priority: Minor
>  Labels: streaming
> Attachments: SOLR-8485.patch
>
>
> Three issues exist if one creates a SelectStream with an expression.
> {code}
> select(
>   search(collection1, fl="personId_i,rating_f", q="rating_f:*", 
> sort="personId_i asc"),
>   personId_i as personId,
>   rating_f as rating
> )
> {code}
> "personId_i as personId" will be parsed as "personid_i as personid"
> 1. The incoming tuple will contain a field "personId_i" but the selection 
> will be looking for a field "personid_i". This field won't be found in the 
> incoming tuple (notice the case difference) and as such no field personId 
> will exist in the outgoing tuple.
> 2. If (1) wasn't an issue, the outgoing tuple would have in a field 
> "personid" and not the expected "personId" (notice the case difference). This 
> can lead to other down-the-road issues.
> 3. Also, if one were to quote the selected fields such as in
> {code}
> select(
>   search(collection1, fl="personId_i,rating_f", q="rating_f:*", 
> sort="personId_i asc"),
>   "personId_i as personId",
>   "rating_f as rating"
> )
> {code}
> then the quotes would be included in the field name. Wrapping quotes should 
> be handled properly such that they are removed from the parameters before 
> they are parsed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-7535) Add UpdateStream to Streaming API and Streaming Expression

2016-01-04 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082109#comment-15082109
 ] 

Dennis Gove commented on SOLR-7535:
---

+1 on that. I'm real excited about this!

> Add UpdateStream to Streaming API and Streaming Expression
> --
>
> Key: SOLR-7535
> URL: https://issues.apache.org/jira/browse/SOLR-7535
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - java, SolrJ
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Minor
> Attachments: SOLR-7535.patch, SOLR-7535.patch, SOLR-7535.patch, 
> SOLR-7535.patch, SOLR-7535.patch, SOLR-7535.patch, SOLR-7535.patch, 
> SOLR-7535.patch
>
>
> The ticket adds an UpdateStream implementation to the Streaming API and 
> streaming expressions. The UpdateStream will wrap a TupleStream and send the 
> Tuples it reads to a SolrCloud collection to be indexed.
> This will allow users to pull data from different Solr Cloud collections, 
> merge and transform the streams and send the transformed data to another Solr 
> Cloud collection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-8559) FCS facet performance optimization

2016-01-19 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106833#comment-15106833
 ] 

Dennis Gove edited comment on SOLR-8559 at 1/19/16 2:57 PM:


Are you able to create a test for this specific enhancement? Or if not, are 
there existing tests covering this code I can specifically check after applying 
the patch?


was (Author: dpgove):
Are you able to create a test for this specific feature? Or if not, are there 
existing tests covering this code I can specifically check after applying the 
patch?

> FCS facet performance optimization
> --
>
> Key: SOLR-8559
> URL: https://issues.apache.org/jira/browse/SOLR-8559
> Project: Solr
>  Issue Type: Improvement
>  Components: faceting
>Reporter: Keith Laban
>  Labels: optimization, performance
> Attachments: solr-8559.patch
>
>
> While profiling a large collection (multi-sharded billions of documents), I 
> found that a fast (5-10ms query) which had no matches would take 20-30 
> seconds when doing facets even when {{facet.mincount=1}}
> Profiling made it apparent that with {{facet.method=fcs}} 99% of the time was 
> [spent 
> here|https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/request/PerSegmentSingleValuedFaceting.java#L212].
> {{queue.udpateTop}} gets called {{numOfSegments*numTerms}}, the worst case 
> when every term is in every segment. This formula doesn't take into account 
> whether or not any of the terms have a positive count with respect to the 
> docset.
> These optimizations are aimed to do two things:
> # When mincount>0 don't include segments which all terms have zero counts. 
> This should significantly speed up processing when terms are high cardinality 
> and the matching docset is small
> # FIXED TODO optimization: when mincount>0 move segment position the next non 
> zero term value.
> both of these changes will minimize the number of called needed to the slow 
> {{updateTop}} call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8559) FCS facet performance optimization

2016-01-19 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106833#comment-15106833
 ] 

Dennis Gove commented on SOLR-8559:
---

Are you able to create a test for this specific feature? Or if not, are there 
existing tests covering this code I can specifically check after applying the 
patch?

> FCS facet performance optimization
> --
>
> Key: SOLR-8559
> URL: https://issues.apache.org/jira/browse/SOLR-8559
> Project: Solr
>  Issue Type: Improvement
>  Components: faceting
>Reporter: Keith Laban
>  Labels: optimization, performance
> Attachments: solr-8559.patch
>
>
> While profiling a large collection (multi-sharded billions of documents), I 
> found that a fast (5-10ms query) which had no matches would take 20-30 
> seconds when doing facets even when {{facet.mincount=1}}
> Profiling made it apparent that with {{facet.method=fcs}} 99% of the time was 
> [spent 
> here|https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/request/PerSegmentSingleValuedFaceting.java#L212].
> {{queue.udpateTop}} gets called {{numOfSegments*numTerms}}, the worst case 
> when every term is in every segment. This formula doesn't take into account 
> whether or not any of the terms have a positive count with respect to the 
> docset.
> These optimizations are aimed to do two things:
> # When mincount>0 don't include segments which all terms have zero counts. 
> This should significantly speed up processing when terms are high cardinality 
> and the matching docset is small
> # FIXED TODO optimization: when mincount>0 move segment position the next non 
> zero term value.
> both of these changes will minimize the number of called needed to the slow 
> {{updateTop}} call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-8559) FCS facet performance optimization

2016-01-19 Thread Dennis Gove (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove reassigned SOLR-8559:
-

Assignee: Dennis Gove

> FCS facet performance optimization
> --
>
> Key: SOLR-8559
> URL: https://issues.apache.org/jira/browse/SOLR-8559
> Project: Solr
>  Issue Type: Improvement
>  Components: faceting
>Reporter: Keith Laban
>Assignee: Dennis Gove
>  Labels: optimization, performance
> Attachments: solr-8559.patch
>
>
> While profiling a large collection (multi-sharded billions of documents), I 
> found that a fast (5-10ms query) which had no matches would take 20-30 
> seconds when doing facets even when {{facet.mincount=1}}
> Profiling made it apparent that with {{facet.method=fcs}} 99% of the time was 
> [spent 
> here|https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/request/PerSegmentSingleValuedFaceting.java#L212].
> {{queue.udpateTop}} gets called {{numOfSegments*numTerms}}, the worst case 
> when every term is in every segment. This formula doesn't take into account 
> whether or not any of the terms have a positive count with respect to the 
> docset.
> These optimizations are aimed to do two things:
> # When mincount>0 don't include segments which all terms have zero counts. 
> This should significantly speed up processing when terms are high cardinality 
> and the matching docset is small
> # FIXED TODO optimization: when mincount>0 move segment position the next non 
> zero term value.
> both of these changes will minimize the number of called needed to the slow 
> {{updateTop}} call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-8559) FCS facet performance optimization

2016-01-19 Thread Dennis Gove (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-8559:
--
Attachment: SOLR-8559-trunk.patch

Rebased off trunk. Keith will upload a 5x backport.

> FCS facet performance optimization
> --
>
> Key: SOLR-8559
> URL: https://issues.apache.org/jira/browse/SOLR-8559
> Project: Solr
>  Issue Type: Improvement
>  Components: faceting
>Reporter: Keith Laban
>Assignee: Dennis Gove
>  Labels: optimization, performance
> Attachments: SOLR-8559-trunk.patch, solr-8559.patch
>
>
> While profiling a large collection (multi-sharded billions of documents), I 
> found that a fast (5-10ms query) which had no matches would take 20-30 
> seconds when doing facets even when {{facet.mincount=1}}
> Profiling made it apparent that with {{facet.method=fcs}} 99% of the time was 
> [spent 
> here|https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/request/PerSegmentSingleValuedFaceting.java#L212].
> {{queue.udpateTop}} gets called {{numOfSegments*numTerms}}, the worst case 
> when every term is in every segment. This formula doesn't take into account 
> whether or not any of the terms have a positive count with respect to the 
> docset.
> These optimizations are aimed to do two things:
> # When mincount>0 don't include segments which all terms have zero counts. 
> This should significantly speed up processing when terms are high cardinality 
> and the matching docset is small
> # FIXED TODO optimization: when mincount>0 move segment position the next non 
> zero term value.
> both of these changes will minimize the number of called needed to the slow 
> {{updateTop}} call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8556) Add ConcatOperation to be used with the SelectStream

2016-01-19 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106932#comment-15106932
 ] 

Dennis Gove commented on SOLR-8556:
---

{code}
expression.addParameter(new StreamExpressionNamedParameter("fields",fieldsStr));
{code}

If the ConcatOperation was created using the non-expression constructor then 
fieldsStr will unset and as such this won't produce the expected result. 
Instead, I'd iterate over the fields array and create a comma-separated list. 
This would allow the removal of the global fieldsStr.

> Add ConcatOperation to be used with the SelectStream
> 
>
> Key: SOLR-8556
> URL: https://issues.apache.org/jira/browse/SOLR-8556
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
> Attachments: SOLR-8556.patch
>
>
> Now that we have the UpdateStream it would be nice to support the use case of 
> sending rolled up aggregates for storage in another SolrCloud collection. To 
> support this we'll need to create id's for the aggregate records.
> The ConcatOperation would allows us to concatenate the bucket values into a 
> unique id. For example:
> {code}
> update(
> select( 
>  rollup(search(q="*:*, fl="a,b,c", ...)), 
>  concat(fields="a,b,c", delim="_",  as="id")))
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8556) Add ConcatOperation to be used with the SelectStream

2016-01-19 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106944#comment-15106944
 ] 

Dennis Gove commented on SOLR-8556:
---

I'm going through and creating tests so I'll correct these issues as I go.

> Add ConcatOperation to be used with the SelectStream
> 
>
> Key: SOLR-8556
> URL: https://issues.apache.org/jira/browse/SOLR-8556
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
> Attachments: SOLR-8556.patch
>
>
> Now that we have the UpdateStream it would be nice to support the use case of 
> sending rolled up aggregates for storage in another SolrCloud collection. To 
> support this we'll need to create id's for the aggregate records.
> The ConcatOperation would allows us to concatenate the bucket values into a 
> unique id. For example:
> {code}
> update(
> select( 
>  rollup(search(q="*:*, fl="a,b,c", ...)), 
>  concat(fields="a,b,c", delim="_",  as="id")))
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8556) Add ConcatOperation to be used with the SelectStream

2016-01-19 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106941#comment-15106941
 ] 

Dennis Gove commented on SOLR-8556:
---

{code}
buf.append(field);
{code}

This concatenates the fields together instead of the values of the fields 
together.

> Add ConcatOperation to be used with the SelectStream
> 
>
> Key: SOLR-8556
> URL: https://issues.apache.org/jira/browse/SOLR-8556
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
> Attachments: SOLR-8556.patch
>
>
> Now that we have the UpdateStream it would be nice to support the use case of 
> sending rolled up aggregates for storage in another SolrCloud collection. To 
> support this we'll need to create id's for the aggregate records.
> The ConcatOperation would allows us to concatenate the bucket values into a 
> unique id. For example:
> {code}
> update(
> select( 
>  rollup(search(q="*:*, fl="a,b,c", ...)), 
>  concat(fields="a,b,c", delim="_",  as="id")))
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-8556) Add ConcatOperation to be used with the SelectStream

2016-01-19 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106932#comment-15106932
 ] 

Dennis Gove edited comment on SOLR-8556 at 1/19/16 4:34 PM:


{code}
expression.addParameter(new StreamExpressionNamedParameter("fields",fieldsStr));
{code}

If the ConcatOperation was created using the non-expression constructor then 
fieldsStr will be unset and as such this won't produce the expected result. 
Instead, I'd iterate over the fields array and create a comma-separated list. 
This would allow the removal of the global fieldsStr.


was (Author: dpgove):
{code}
expression.addParameter(new StreamExpressionNamedParameter("fields",fieldsStr));
{code}

If the ConcatOperation was created using the non-expression constructor then 
fieldsStr will unset and as such this won't produce the expected result. 
Instead, I'd iterate over the fields array and create a comma-separated list. 
This would allow the removal of the global fieldsStr.

> Add ConcatOperation to be used with the SelectStream
> 
>
> Key: SOLR-8556
> URL: https://issues.apache.org/jira/browse/SOLR-8556
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
> Attachments: SOLR-8556.patch
>
>
> Now that we have the UpdateStream it would be nice to support the use case of 
> sending rolled up aggregates for storage in another SolrCloud collection. To 
> support this we'll need to create id's for the aggregate records.
> The ConcatOperation would allows us to concatenate the bucket values into a 
> unique id. For example:
> {code}
> update(
> select( 
>  rollup(search(q="*:*, fl="a,b,c", ...)), 
>  concat(fields="a,b,c", delim="_",  as="id")))
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-8556) Add ConcatOperation to be used with the SelectStream

2016-01-19 Thread Dennis Gove (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-8556:
--
Attachment: SOLR-8556.patch

Adds ConcatOperation specific tests. Corrects the issues mentioned above. Would 
still like to add a test showing the usage of this inside a SelectStream. For 
example, there is a difference between these two clauses
{code}
select(a,b,c, search(), replace(a,null,withValue=0f), concat(fields="a,b", 
as="ab", delim="-"))
{code}
{code}
select(a,b,c, search(), concat(fields="a,b", as="ab", delim="-"), 
replace(a,null,withValue=0f))
{code}

In the first one a null value in field a will first be replaced with 0 and then 
concatenated with b whereas in the second one a and b will be concatenated 
first and then a null value in a would be replaced with 0. Ie, the order of 
operations matters.

Also note, I added a feature which, for null values, will concatenate the 
string "null". If one wants to replace null with a different value then one can 
use the replace operation in conjunction with the concat operation.


> Add ConcatOperation to be used with the SelectStream
> 
>
> Key: SOLR-8556
> URL: https://issues.apache.org/jira/browse/SOLR-8556
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
> Attachments: SOLR-8556.patch, SOLR-8556.patch
>
>
> Now that we have the UpdateStream it would be nice to support the use case of 
> sending rolled up aggregates for storage in another SolrCloud collection. To 
> support this we'll need to create id's for the aggregate records.
> The ConcatOperation would allows us to concatenate the bucket values into a 
> unique id. For example:
> {code}
> update(
> select( 
>  rollup(search(q="*:*, fl="a,b,c", ...)), 
>  concat(fields="a,b,c", delim="_",  as="id")))
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-8556) Add ConcatOperation to be used with the SelectStream

2016-01-19 Thread Dennis Gove (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-8556:
--
Attachment: SOLR-8556.patch

Adds additional tests. I think this is good to go.

> Add ConcatOperation to be used with the SelectStream
> 
>
> Key: SOLR-8556
> URL: https://issues.apache.org/jira/browse/SOLR-8556
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
> Attachments: SOLR-8556.patch, SOLR-8556.patch, SOLR-8556.patch
>
>
> Now that we have the UpdateStream it would be nice to support the use case of 
> sending rolled up aggregates for storage in another SolrCloud collection. To 
> support this we'll need to create id's for the aggregate records.
> The ConcatOperation would allows us to concatenate the bucket values into a 
> unique id. For example:
> {code}
> update(
> select( 
>  rollup(search(q="*:*, fl="a,b,c", ...)), 
>  concat(fields="a,b,c", delim="_",  as="id")))
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-8559) FCS facet performance optimization

2016-01-19 Thread Dennis Gove (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-8559:
--
Attachment: (was: SOLR-8559-trunk.patch)

> FCS facet performance optimization
> --
>
> Key: SOLR-8559
> URL: https://issues.apache.org/jira/browse/SOLR-8559
> Project: Solr
>  Issue Type: Improvement
>  Components: faceting
>Reporter: Keith Laban
>Assignee: Dennis Gove
>  Labels: optimization, performance
> Attachments: SOLR-8559.patch, solr-8559.patch
>
>
> While profiling a large collection (multi-sharded billions of documents), I 
> found that a fast (5-10ms query) which had no matches would take 20-30 
> seconds when doing facets even when {{facet.mincount=1}}
> Profiling made it apparent that with {{facet.method=fcs}} 99% of the time was 
> [spent 
> here|https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/request/PerSegmentSingleValuedFaceting.java#L212].
> {{queue.udpateTop}} gets called {{numOfSegments*numTerms}}, the worst case 
> when every term is in every segment. This formula doesn't take into account 
> whether or not any of the terms have a positive count with respect to the 
> docset.
> These optimizations are aimed to do two things:
> # When mincount>0 don't include segments which all terms have zero counts. 
> This should significantly speed up processing when terms are high cardinality 
> and the matching docset is small
> # FIXED TODO optimization: when mincount>0 move segment position the next non 
> zero term value.
> both of these changes will minimize the number of called needed to the slow 
> {{updateTop}} call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-8559) FCS facet performance optimization

2016-01-19 Thread Dennis Gove (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-8559:
--
Attachment: SOLR-8559.patch

Patch applied to both trunk and branch_5x.

> FCS facet performance optimization
> --
>
> Key: SOLR-8559
> URL: https://issues.apache.org/jira/browse/SOLR-8559
> Project: Solr
>  Issue Type: Improvement
>  Components: faceting
>Reporter: Keith Laban
>Assignee: Dennis Gove
>  Labels: optimization, performance
> Attachments: SOLR-8559.patch, solr-8559.patch
>
>
> While profiling a large collection (multi-sharded billions of documents), I 
> found that a fast (5-10ms query) which had no matches would take 20-30 
> seconds when doing facets even when {{facet.mincount=1}}
> Profiling made it apparent that with {{facet.method=fcs}} 99% of the time was 
> [spent 
> here|https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/request/PerSegmentSingleValuedFaceting.java#L212].
> {{queue.udpateTop}} gets called {{numOfSegments*numTerms}}, the worst case 
> when every term is in every segment. This formula doesn't take into account 
> whether or not any of the terms have a positive count with respect to the 
> docset.
> These optimizations are aimed to do two things:
> # When mincount>0 don't include segments which all terms have zero counts. 
> This should significantly speed up processing when terms are high cardinality 
> and the matching docset is small
> # FIXED TODO optimization: when mincount>0 move segment position the next non 
> zero term value.
> both of these changes will minimize the number of called needed to the slow 
> {{updateTop}} call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-8559) FCS facet performance optimization

2016-01-19 Thread Dennis Gove (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-8559:
--
Affects Version/s: Trunk
   5.4

> FCS facet performance optimization
> --
>
> Key: SOLR-8559
> URL: https://issues.apache.org/jira/browse/SOLR-8559
> Project: Solr
>  Issue Type: Improvement
>  Components: faceting
>Affects Versions: 5.4, Trunk
>Reporter: Keith Laban
>Assignee: Dennis Gove
>  Labels: optimization, performance
> Attachments: SOLR-8559.patch, solr-8559.patch
>
>
> While profiling a large collection (multi-sharded billions of documents), I 
> found that a fast (5-10ms query) which had no matches would take 20-30 
> seconds when doing facets even when {{facet.mincount=1}}
> Profiling made it apparent that with {{facet.method=fcs}} 99% of the time was 
> [spent 
> here|https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/request/PerSegmentSingleValuedFaceting.java#L212].
> {{queue.udpateTop}} gets called {{numOfSegments*numTerms}}, the worst case 
> when every term is in every segment. This formula doesn't take into account 
> whether or not any of the terms have a positive count with respect to the 
> docset.
> These optimizations are aimed to do two things:
> # When mincount>0 don't include segments which all terms have zero counts. 
> This should significantly speed up processing when terms are high cardinality 
> and the matching docset is small
> # FIXED TODO optimization: when mincount>0 move segment position the next non 
> zero term value.
> both of these changes will minimize the number of called needed to the slow 
> {{updateTop}} call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8559) FCS facet performance optimization

2016-01-19 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107620#comment-15107620
 ] 

Dennis Gove commented on SOLR-8559:
---

Thanks for this performance optimization, Keith!

> FCS facet performance optimization
> --
>
> Key: SOLR-8559
> URL: https://issues.apache.org/jira/browse/SOLR-8559
> Project: Solr
>  Issue Type: Improvement
>  Components: faceting
>Affects Versions: 5.4, Trunk
>Reporter: Keith Laban
>Assignee: Dennis Gove
>  Labels: optimization, performance
> Fix For: Trunk
>
> Attachments: SOLR-8559-4-10-4.patch, SOLR-8559.patch, solr-8559.patch
>
>
> While profiling a large collection (multi-sharded billions of documents), I 
> found that a fast (5-10ms query) which had no matches would take 20-30 
> seconds when doing facets even when {{facet.mincount=1}}
> Profiling made it apparent that with {{facet.method=fcs}} 99% of the time was 
> [spent 
> here|https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/request/PerSegmentSingleValuedFaceting.java#L212].
> {{queue.udpateTop}} gets called {{numOfSegments*numTerms}}, the worst case 
> when every term is in every segment. This formula doesn't take into account 
> whether or not any of the terms have a positive count with respect to the 
> docset.
> These optimizations are aimed to do two things:
> # When mincount>0 don't include segments which all terms have zero counts. 
> This should significantly speed up processing when terms are high cardinality 
> and the matching docset is small
> # FIXED TODO optimization: when mincount>0 move segment position the next non 
> zero term value.
> both of these changes will minimize the number of called needed to the slow 
> {{updateTop}} call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Closed] (SOLR-8559) FCS facet performance optimization

2016-01-19 Thread Dennis Gove (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove closed SOLR-8559.
-
   Resolution: Fixed
Fix Version/s: Trunk

> FCS facet performance optimization
> --
>
> Key: SOLR-8559
> URL: https://issues.apache.org/jira/browse/SOLR-8559
> Project: Solr
>  Issue Type: Improvement
>  Components: faceting
>Affects Versions: 5.4, Trunk
>Reporter: Keith Laban
>Assignee: Dennis Gove
>  Labels: optimization, performance
> Fix For: Trunk
>
> Attachments: SOLR-8559-4-10-4.patch, SOLR-8559.patch, solr-8559.patch
>
>
> While profiling a large collection (multi-sharded billions of documents), I 
> found that a fast (5-10ms query) which had no matches would take 20-30 
> seconds when doing facets even when {{facet.mincount=1}}
> Profiling made it apparent that with {{facet.method=fcs}} 99% of the time was 
> [spent 
> here|https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/request/PerSegmentSingleValuedFaceting.java#L212].
> {{queue.udpateTop}} gets called {{numOfSegments*numTerms}}, the worst case 
> when every term is in every segment. This formula doesn't take into account 
> whether or not any of the terms have a positive count with respect to the 
> docset.
> These optimizations are aimed to do two things:
> # When mincount>0 don't include segments which all terms have zero counts. 
> This should significantly speed up processing when terms are high cardinality 
> and the matching docset is small
> # FIXED TODO optimization: when mincount>0 move segment position the next non 
> zero term value.
> both of these changes will minimize the number of called needed to the slow 
> {{updateTop}} call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-8556) Add ConcatOperation to be used with the SelectStream

2016-01-20 Thread Dennis Gove (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove reassigned SOLR-8556:
-

Assignee: Dennis Gove  (was: Joel Bernstein)

> Add ConcatOperation to be used with the SelectStream
> 
>
> Key: SOLR-8556
> URL: https://issues.apache.org/jira/browse/SOLR-8556
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joel Bernstein
>Assignee: Dennis Gove
> Attachments: SOLR-8556.patch, SOLR-8556.patch, SOLR-8556.patch
>
>
> Now that we have the UpdateStream it would be nice to support the use case of 
> sending rolled up aggregates for storage in another SolrCloud collection. To 
> support this we'll need to create id's for the aggregate records.
> The ConcatOperation would allows us to concatenate the bucket values into a 
> unique id. For example:
> {code}
> update(
> select( 
>  rollup(search(q="*:*, fl="a,b,c", ...)), 
>  concat(fields="a,b,c", delim="_",  as="id")))
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-8556) Add ConcatOperation to be used with the SelectStream

2016-01-20 Thread Dennis Gove (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-8556:
--
Attachment: SOLR-8556.patch

Added "concat" to StreamHandler so it is a default operation.

> Add ConcatOperation to be used with the SelectStream
> 
>
> Key: SOLR-8556
> URL: https://issues.apache.org/jira/browse/SOLR-8556
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joel Bernstein
>Assignee: Dennis Gove
> Attachments: SOLR-8556.patch, SOLR-8556.patch, SOLR-8556.patch, 
> SOLR-8556.patch
>
>
> Now that we have the UpdateStream it would be nice to support the use case of 
> sending rolled up aggregates for storage in another SolrCloud collection. To 
> support this we'll need to create id's for the aggregate records.
> The ConcatOperation would allows us to concatenate the bucket values into a 
> unique id. For example:
> {code}
> update(
> select( 
>  rollup(search(q="*:*, fl="a,b,c", ...)), 
>  concat(fields="a,b,c", delim="_",  as="id")))
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Closed] (SOLR-8556) Add ConcatOperation to be used with the SelectStream

2016-01-20 Thread Dennis Gove (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove closed SOLR-8556.
-
   Resolution: Implemented
Fix Version/s: Trunk

> Add ConcatOperation to be used with the SelectStream
> 
>
> Key: SOLR-8556
> URL: https://issues.apache.org/jira/browse/SOLR-8556
> Project: Solr
>  Issue Type: New Feature
>Reporter: Joel Bernstein
>Assignee: Dennis Gove
> Fix For: Trunk
>
> Attachments: SOLR-8556.patch, SOLR-8556.patch, SOLR-8556.patch, 
> SOLR-8556.patch
>
>
> Now that we have the UpdateStream it would be nice to support the use case of 
> sending rolled up aggregates for storage in another SolrCloud collection. To 
> support this we'll need to create id's for the aggregate records.
> The ConcatOperation would allows us to concatenate the bucket values into a 
> unique id. For example:
> {code}
> update(
> select( 
>  rollup(search(q="*:*, fl="a,b,c", ...)), 
>  concat(fields="a,b,c", delim="_",  as="id")))
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Reopened] (SOLR-8559) FCS facet performance optimization

2016-01-22 Thread Dennis Gove (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove reopened SOLR-8559:
---

> FCS facet performance optimization
> --
>
> Key: SOLR-8559
> URL: https://issues.apache.org/jira/browse/SOLR-8559
> Project: Solr
>  Issue Type: Improvement
>  Components: faceting
>Affects Versions: 5.4, Trunk
>Reporter: Keith Laban
>Assignee: Dennis Gove
>  Labels: optimization, performance
> Fix For: Trunk
>
> Attachments: SOLR-8559-4-10-4.patch, SOLR-8559.patch, solr-8559.patch
>
>
> While profiling a large collection (multi-sharded billions of documents), I 
> found that a fast (5-10ms query) which had no matches would take 20-30 
> seconds when doing facets even when {{facet.mincount=1}}
> Profiling made it apparent that with {{facet.method=fcs}} 99% of the time was 
> [spent 
> here|https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/request/PerSegmentSingleValuedFaceting.java#L212].
> {{queue.udpateTop}} gets called {{numOfSegments*numTerms}}, the worst case 
> when every term is in every segment. This formula doesn't take into account 
> whether or not any of the terms have a positive count with respect to the 
> docset.
> These optimizations are aimed to do two things:
> # When mincount>0 don't include segments which all terms have zero counts. 
> This should significantly speed up processing when terms are high cardinality 
> and the matching docset is small
> # FIXED TODO optimization: when mincount>0 move segment position the next non 
> zero term value.
> both of these changes will minimize the number of called needed to the slow 
> {{updateTop}} call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-8559) FCS facet performance optimization

2016-01-22 Thread Dennis Gove (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-8559:
--
Fix Version/s: (was: Trunk)
   5.5

> FCS facet performance optimization
> --
>
> Key: SOLR-8559
> URL: https://issues.apache.org/jira/browse/SOLR-8559
> Project: Solr
>  Issue Type: Improvement
>  Components: faceting
>Affects Versions: 5.4, Trunk
>Reporter: Keith Laban
>Assignee: Dennis Gove
>  Labels: optimization, performance
> Fix For: 5.5
>
> Attachments: SOLR-8559-4-10-4.patch, SOLR-8559.patch, solr-8559.patch
>
>
> While profiling a large collection (multi-sharded billions of documents), I 
> found that a fast (5-10ms query) which had no matches would take 20-30 
> seconds when doing facets even when {{facet.mincount=1}}
> Profiling made it apparent that with {{facet.method=fcs}} 99% of the time was 
> [spent 
> here|https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/request/PerSegmentSingleValuedFaceting.java#L212].
> {{queue.udpateTop}} gets called {{numOfSegments*numTerms}}, the worst case 
> when every term is in every segment. This formula doesn't take into account 
> whether or not any of the terms have a positive count with respect to the 
> docset.
> These optimizations are aimed to do two things:
> # When mincount>0 don't include segments which all terms have zero counts. 
> This should significantly speed up processing when terms are high cardinality 
> and the matching docset is small
> # FIXED TODO optimization: when mincount>0 move segment position the next non 
> zero term value.
> both of these changes will minimize the number of called needed to the slow 
> {{updateTop}} call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-8559) FCS facet performance optimization

2016-01-22 Thread Dennis Gove (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-8559:
--
Affects Version/s: (was: 5.4)
   5.5

> FCS facet performance optimization
> --
>
> Key: SOLR-8559
> URL: https://issues.apache.org/jira/browse/SOLR-8559
> Project: Solr
>  Issue Type: Improvement
>  Components: faceting
>Affects Versions: 5.5, Trunk
>Reporter: Keith Laban
>Assignee: Dennis Gove
>  Labels: optimization, performance
> Fix For: 5.5
>
> Attachments: SOLR-8559-4-10-4.patch, SOLR-8559.patch, solr-8559.patch
>
>
> While profiling a large collection (multi-sharded billions of documents), I 
> found that a fast (5-10ms query) which had no matches would take 20-30 
> seconds when doing facets even when {{facet.mincount=1}}
> Profiling made it apparent that with {{facet.method=fcs}} 99% of the time was 
> [spent 
> here|https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/request/PerSegmentSingleValuedFaceting.java#L212].
> {{queue.udpateTop}} gets called {{numOfSegments*numTerms}}, the worst case 
> when every term is in every segment. This formula doesn't take into account 
> whether or not any of the terms have a positive count with respect to the 
> docset.
> These optimizations are aimed to do two things:
> # When mincount>0 don't include segments which all terms have zero counts. 
> This should significantly speed up processing when terms are high cardinality 
> and the matching docset is small
> # FIXED TODO optimization: when mincount>0 move segment position the next non 
> zero term value.
> both of these changes will minimize the number of called needed to the slow 
> {{updateTop}} call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-8559) FCS facet performance optimization

2016-01-22 Thread Dennis Gove (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove resolved SOLR-8559.
---
Resolution: Implemented

> FCS facet performance optimization
> --
>
> Key: SOLR-8559
> URL: https://issues.apache.org/jira/browse/SOLR-8559
> Project: Solr
>  Issue Type: Improvement
>  Components: faceting
>Affects Versions: 5.5, Trunk
>Reporter: Keith Laban
>Assignee: Dennis Gove
>  Labels: optimization, performance
> Fix For: 5.5
>
> Attachments: SOLR-8559-4-10-4.patch, SOLR-8559.patch, solr-8559.patch
>
>
> While profiling a large collection (multi-sharded billions of documents), I 
> found that a fast (5-10ms query) which had no matches would take 20-30 
> seconds when doing facets even when {{facet.mincount=1}}
> Profiling made it apparent that with {{facet.method=fcs}} 99% of the time was 
> [spent 
> here|https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/request/PerSegmentSingleValuedFaceting.java#L212].
> {{queue.udpateTop}} gets called {{numOfSegments*numTerms}}, the worst case 
> when every term is in every segment. This formula doesn't take into account 
> whether or not any of the terms have a positive count with respect to the 
> docset.
> These optimizations are aimed to do two things:
> # When mincount>0 don't include segments which all terms have zero counts. 
> This should significantly speed up processing when terms are high cardinality 
> and the matching docset is small
> # FIXED TODO optimization: when mincount>0 move segment position the next non 
> zero term value.
> both of these changes will minimize the number of called needed to the slow 
> {{updateTop}} call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8559) FCS facet performance optimization

2016-01-22 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15113160#comment-15113160
 ] 

Dennis Gove commented on SOLR-8559:
---

Thanks, Dave. I think I've been marking issues as closed. I'll keep this in 
mind going forward.

> FCS facet performance optimization
> --
>
> Key: SOLR-8559
> URL: https://issues.apache.org/jira/browse/SOLR-8559
> Project: Solr
>  Issue Type: Improvement
>  Components: faceting
>Affects Versions: 5.5, Trunk
>Reporter: Keith Laban
>Assignee: Dennis Gove
>  Labels: optimization, performance
> Fix For: 5.5
>
> Attachments: SOLR-8559-4-10-4.patch, SOLR-8559.patch, solr-8559.patch
>
>
> While profiling a large collection (multi-sharded billions of documents), I 
> found that a fast (5-10ms query) which had no matches would take 20-30 
> seconds when doing facets even when {{facet.mincount=1}}
> Profiling made it apparent that with {{facet.method=fcs}} 99% of the time was 
> [spent 
> here|https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/request/PerSegmentSingleValuedFaceting.java#L212].
> {{queue.udpateTop}} gets called {{numOfSegments*numTerms}}, the worst case 
> when every term is in every segment. This formula doesn't take into account 
> whether or not any of the terms have a positive count with respect to the 
> docset.
> These optimizations are aimed to do two things:
> # When mincount>0 don't include segments which all terms have zero counts. 
> This should significantly speed up processing when terms are high cardinality 
> and the matching docset is small
> # FIXED TODO optimization: when mincount>0 move segment position the next non 
> zero term value.
> both of these changes will minimize the number of called needed to the slow 
> {{updateTop}} call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8176) Model distributed graph traversals with Streaming Expressions

2016-01-29 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15124557#comment-15124557
 ] 

Dennis Gove commented on SOLR-8176:
---

I'm having trouble envisioning the expression for this.

> Model distributed graph traversals with Streaming Expressions
> -
>
> Key: SOLR-8176
> URL: https://issues.apache.org/jira/browse/SOLR-8176
> Project: Solr
>  Issue Type: New Feature
>  Components: clients - java, SolrCloud, SolrJ
>Affects Versions: Trunk
>Reporter: Joel Bernstein
>  Labels: Graph
> Fix For: Trunk
>
>
> I think it would be useful to model a few *distributed graph traversal* use 
> cases with Solr's *Streaming Expression* language. This ticket will explore 
> different approaches with a goal of implementing two or three common graph 
> traversal use cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8125) Umbrella ticket for Streaming and SQL issues

2015-12-10 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15052028#comment-15052028
 ] 

Dennis Gove commented on SOLR-8125:
---

I'm working on SOLR-7904 and should have a patch by tomorrow. I'd also like to 
get SOLR-8185 into Solr 6 if I can get it done. Will spend some time on it this 
weekend.

> Umbrella ticket for Streaming and SQL issues
> 
>
> Key: SOLR-8125
> URL: https://issues.apache.org/jira/browse/SOLR-8125
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrJ
>Reporter: Joel Bernstein
>
> This is an umbrella ticket for tracking issues around the *Streaming API*, 
> *Streaming Expressions* and *Parallel SQL*.
> Issues can be linked to this ticket and discussions about the road map can 
> also happen on this ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-7904) Make FacetStream Expressible

2015-12-11 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-7904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15052745#comment-15052745
 ] 

Dennis Gove commented on SOLR-7904:
---

I'm finalizing some of the tests but so far everything is passing fine. The 
expression format is as follows
{code}
facet(
  collection1,
  q="*:*",
  fl="a_s,a_i,a_f",
  sort="a_s asc",
  buckets="a_s",
  bucketSorts="sum(a_i) asc",
  bucketSizeLimit=10,
  sum(a_i), sum(a_f),
  min(a_i), min(a_f),
  max(a_i), max(a_f),
  avg(a_i), avg(a_f),
  count(*),
  zkHost="url:port"
)
{code}
It supports multiple buckets and multiple bucketSorts. All standard query 
properties (q, fl, sort, etc...) are also supported. The example above is only 
showing 3 of them. zkHost is optional.

> Make FacetStream Expressible
> 
>
> Key: SOLR-7904
> URL: https://issues.apache.org/jira/browse/SOLR-7904
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: Trunk
>Reporter: Joel Bernstein
> Fix For: Trunk
>
>
> This ticket makes the FacetStream (SOLR-7903) expressible, so it can be used 
> as a Streaming Expression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-7904) Make FacetStream Expressible

2015-12-11 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-7904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15052794#comment-15052794
 ] 

Dennis Gove commented on SOLR-7904:
---

I did consider an alternative format that would put the bucket options together 
and allow for different things in each bucket but steered away from it because 
it would require larger changes to the FacetStream implementation and may not 
have a usecase

{code}
facet(
  collection1,
  q="*:*",
  fl="a_s,b_s,a_i,a_f",
  sort="a_s asc",
  bucket("a_s", sort="sum(a_i) asc", limit=5, sum(a_i), avg(a_i), count(*)),
  bucket("b_s", sort="max(a_i) desc, min(a_i) desc", limit=20, sum(a_i), 
min(a_i), max(a_i)),
)
{code}

> Make FacetStream Expressible
> 
>
> Key: SOLR-7904
> URL: https://issues.apache.org/jira/browse/SOLR-7904
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: Trunk
>Reporter: Joel Bernstein
> Fix For: Trunk
>
>
> This ticket makes the FacetStream (SOLR-7903) expressible, so it can be used 
> as a Streaming Expression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-7904) Make FacetStream Expressible

2015-12-11 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-7904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15052809#comment-15052809
 ] 

Dennis Gove commented on SOLR-7904:
---

Alright. The expression parsing in similar to CloudSolrStream whereby some 
named parameters are required (buckets, bucketSorts, bucketSizeLimit) but the 
others are just passed down to the QueryRequest and are not considered 
explicitly. If fl and sort are not required then it'd just be a change in the 
documentation and not an implementation change (since the expression parsing 
doesn't explicitly look to ensure those were provided).

> Make FacetStream Expressible
> 
>
> Key: SOLR-7904
> URL: https://issues.apache.org/jira/browse/SOLR-7904
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: Trunk
>Reporter: Joel Bernstein
> Fix For: Trunk
>
>
> This ticket makes the FacetStream (SOLR-7903) expressible, so it can be used 
> as a Streaming Expression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-7904) Make FacetStream Expressible

2015-12-11 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-7904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15052809#comment-15052809
 ] 

Dennis Gove edited comment on SOLR-7904 at 12/11/15 2:15 PM:
-

Alright. The expression parsing is similar to CloudSolrStream whereby some 
named parameters are required (buckets, bucketSorts, bucketSizeLimit) but the 
others are just passed down to the QueryRequest and are not considered 
explicitly. If fl and sort are not required then it'd just be a change in the 
documentation and not an implementation change (since the expression parsing 
doesn't explicitly look to ensure those were provided).


was (Author: dpgove):
Alright. The expression parsing in similar to CloudSolrStream whereby some 
named parameters are required (buckets, bucketSorts, bucketSizeLimit) but the 
others are just passed down to the QueryRequest and are not considered 
explicitly. If fl and sort are not required then it'd just be a change in the 
documentation and not an implementation change (since the expression parsing 
doesn't explicitly look to ensure those were provided).

> Make FacetStream Expressible
> 
>
> Key: SOLR-7904
> URL: https://issues.apache.org/jira/browse/SOLR-7904
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: Trunk
>Reporter: Joel Bernstein
> Fix For: Trunk
>
>
> This ticket makes the FacetStream (SOLR-7903) expressible, so it can be used 
> as a Streaming Expression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-7904) Make FacetStream Expressible

2015-12-11 Thread Dennis Gove (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-7904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-7904:
--
Attachment: SOLR-7904.patch

Fully implemented. All relevant tests pass.

> Make FacetStream Expressible
> 
>
> Key: SOLR-7904
> URL: https://issues.apache.org/jira/browse/SOLR-7904
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: Trunk
>Reporter: Joel Bernstein
> Fix For: Trunk
>
> Attachments: SOLR-7904.patch
>
>
> This ticket makes the FacetStream (SOLR-7903) expressible, so it can be used 
> as a Streaming Expression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-7904) Make FacetStream Expressible

2015-12-11 Thread Dennis Gove (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-7904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-7904:
--
Attachment: SOLR-7904.patch

Addes facet as a default function in the StreamHandler.

> Make FacetStream Expressible
> 
>
> Key: SOLR-7904
> URL: https://issues.apache.org/jira/browse/SOLR-7904
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: Trunk
>Reporter: Joel Bernstein
> Fix For: Trunk
>
> Attachments: SOLR-7904.patch, SOLR-7904.patch
>
>
> This ticket makes the FacetStream (SOLR-7903) expressible, so it can be used 
> as a Streaming Expression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-7904) Make FacetStream Expressible

2015-12-11 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-7904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15053908#comment-15053908
 ] 

Dennis Gove edited comment on SOLR-7904 at 12/12/15 1:18 AM:
-

Adds facet as a default function in the StreamHandler.


was (Author: dpgove):
Addes facet as a default function in the StreamHandler.

> Make FacetStream Expressible
> 
>
> Key: SOLR-7904
> URL: https://issues.apache.org/jira/browse/SOLR-7904
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: Trunk
>Reporter: Joel Bernstein
> Fix For: Trunk
>
> Attachments: SOLR-7904.patch, SOLR-7904.patch
>
>
> This ticket makes the FacetStream (SOLR-7903) expressible, so it can be used 
> as a Streaming Expression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-8409) Complex q param in Streaming Expression results in a bad query

2015-12-11 Thread Dennis Gove (JIRA)

Dennis Gove created SOLR-8409:
-

 Summary: Complex q param in Streaming Expression results in a bad 
query
 Key: SOLR-8409
 URL: https://issues.apache.org/jira/browse/SOLR-8409
 Project: Solr
  Issue Type: Bug
  Components: SolrJ
Reporter: Dennis Gove
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-8409) Complex q param in Streaming Expression results in a bad query

2015-12-11 Thread Dennis Gove (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-8409:
--
Affects Version/s: 6.0
   Trunk
   Labels: streaming streaming_api  (was: )
  Description: 
When providing an expression like 
{code}
expression=search(people, fl="id,first", sort="first asc", 
q="presentTitles:\"chief executive officer\" AND age:[36 TO *]")
{code}
the following error is seen.
{code}
no field name specified in query and no default specified via 'df' param
{code}

I believe the issue is related to the \" (escaped quotes) and the spaces in the 
q field. If I remove the spaces then the query returns results as expected 
(though I've yet to validate if those results are accurate).

This requires some investigation to get down to the root cause. I would like to 
fix it before Solr 6 is cut.

> Complex q param in Streaming Expression results in a bad query
> --
>
> Key: SOLR-8409
> URL: https://issues.apache.org/jira/browse/SOLR-8409
> Project: Solr
>  Issue Type: Bug
>  Components: SolrJ
>Affects Versions: Trunk, 6.0
>Reporter: Dennis Gove
>Priority: Minor
>  Labels: streaming, streaming_api
>
> When providing an expression like 
> {code}
> expression=search(people, fl="id,first", sort="first asc", 
> q="presentTitles:\"chief executive officer\" AND age:[36 TO *]")
> {code}
> the following error is seen.
> {code}
> no field name specified in query and no default specified via 'df' param
> {code}
> I believe the issue is related to the \" (escaped quotes) and the spaces in 
> the q field. If I remove the spaces then the query returns results as 
> expected (though I've yet to validate if those results are accurate).
> This requires some investigation to get down to the root cause. I would like 
> to fix it before Solr 6 is cut.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-8409) Complex q param in Streaming Expression results in a bad query

2015-12-11 Thread Dennis Gove (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-8409:
--
Description: 
When providing an expression like 
{code}
stream=search(people, fl="id,first", sort="first asc", q="presentTitles:\"chief 
executive officer\" AND age:[36 TO *]")
{code}
the following error is seen.
{code}
no field name specified in query and no default specified via 'df' param
{code}

I believe the issue is related to the \" (escaped quotes) and the spaces in the 
q field. If I remove the spaces then the query returns results as expected 
(though I've yet to validate if those results are accurate).

This requires some investigation to get down to the root cause. I would like to 
fix it before Solr 6 is cut.

  was:
When providing an expression like 
{code}
expression=search(people, fl="id,first", sort="first asc", 
q="presentTitles:\"chief executive officer\" AND age:[36 TO *]")
{code}
the following error is seen.
{code}
no field name specified in query and no default specified via 'df' param
{code}

I believe the issue is related to the \" (escaped quotes) and the spaces in the 
q field. If I remove the spaces then the query returns results as expected 
(though I've yet to validate if those results are accurate).

This requires some investigation to get down to the root cause. I would like to 
fix it before Solr 6 is cut.


> Complex q param in Streaming Expression results in a bad query
> --
>
> Key: SOLR-8409
> URL: https://issues.apache.org/jira/browse/SOLR-8409
> Project: Solr
>  Issue Type: Bug
>  Components: SolrJ
>Affects Versions: Trunk, 6.0
>Reporter: Dennis Gove
>Priority: Minor
>  Labels: streaming, streaming_api
>
> When providing an expression like 
> {code}
> stream=search(people, fl="id,first", sort="first asc", 
> q="presentTitles:\"chief executive officer\" AND age:[36 TO *]")
> {code}
> the following error is seen.
> {code}
> no field name specified in query and no default specified via 'df' param
> {code}
> I believe the issue is related to the \" (escaped quotes) and the spaces in 
> the q field. If I remove the spaces then the query returns results as 
> expected (though I've yet to validate if those results are accurate).
> This requires some investigation to get down to the root cause. I would like 
> to fix it before Solr 6 is cut.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8409) Complex q param in Streaming Expression results in a bad query

2015-12-11 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15053981#comment-15053981
 ] 

Dennis Gove commented on SOLR-8409:
---

I've been unable to replicate this in a unit test but have seen it in a fully 
packaged version of trunk. (ant package was run and then the tarball was 
unpacked).

Differences between unit test and packaged version:
* unit test is using dynamic fields while packaged version is using static 
fields
* unit test is not going through the StreamHandler

> Complex q param in Streaming Expression results in a bad query
> --
>
> Key: SOLR-8409
> URL: https://issues.apache.org/jira/browse/SOLR-8409
> Project: Solr
>  Issue Type: Bug
>  Components: SolrJ
>Affects Versions: Trunk, 6.0
>Reporter: Dennis Gove
>Priority: Minor
>  Labels: streaming, streaming_api
>
> When providing an expression like 
> {code}
> stream=search(people, fl="id,first", sort="first asc", 
> q="presentTitles:\"chief executive officer\" AND age:[36 TO *]")
> {code}
> the following error is seen.
> {code}
> no field name specified in query and no default specified via 'df' param
> {code}
> I believe the issue is related to the \" (escaped quotes) and the spaces in 
> the q field. If I remove the spaces then the query returns results as 
> expected (though I've yet to validate if those results are accurate).
> This requires some investigation to get down to the root cause. I would like 
> to fix it before Solr 6 is cut.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8125) Umbrella ticket for Streaming and SQL issues

2015-12-11 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15053983#comment-15053983
 ] 

Dennis Gove commented on SOLR-8125:
---

SOLR-8409 is a bug I'd like to get into Solr 6. I'd hate to see this go out in 
a major.

> Umbrella ticket for Streaming and SQL issues
> 
>
> Key: SOLR-8125
> URL: https://issues.apache.org/jira/browse/SOLR-8125
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrJ
>Reporter: Joel Bernstein
>
> This is an umbrella ticket for tracking issues around the *Streaming API*, 
> *Streaming Expressions* and *Parallel SQL*.
> Issues can be linked to this ticket and discussions about the road map can 
> also happen on this ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8409) Complex q param in Streaming Expression results in a bad query

2015-12-11 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15053998#comment-15053998
 ] 

Dennis Gove commented on SOLR-8409:
---

It looks like this
{code}
presentTitles:\"chief executive officer\" AND age:[36 TO *]
{code}

I suspect that the \" is the culprit here because the streaming expression 
parser does not remove the \ before the quote. As such, and this is a hunch, I 
suspect that the query parser is seeing \" and not considering it a quote that 
is starting a phase but instead a quote that is just part of the string being 
searched.

{code}
chief executive officer
{code}

I believe this can be fixed by adding logic into the expression parser that 
will transform \" into " and in fact I've written that code (very simple) but 
my lack of ability to replicate in a unit test is preventing me from ensuring 
the issue is actually fixed.

> Complex q param in Streaming Expression results in a bad query
> --
>
> Key: SOLR-8409
> URL: https://issues.apache.org/jira/browse/SOLR-8409
> Project: Solr
>  Issue Type: Bug
>  Components: SolrJ
>Affects Versions: Trunk, 6.0
>Reporter: Dennis Gove
>Priority: Minor
>  Labels: streaming, streaming_api
>
> When providing an expression like 
> {code}
> stream=search(people, fl="id,first", sort="first asc", 
> q="presentTitles:\"chief executive officer\" AND age:[36 TO *]")
> {code}
> the following error is seen.
> {code}
> no field name specified in query and no default specified via 'df' param
> {code}
> I believe the issue is related to the \" (escaped quotes) and the spaces in 
> the q field. If I remove the spaces then the query returns results as 
> expected (though I've yet to validate if those results are accurate).
> This requires some investigation to get down to the root cause. I would like 
> to fix it before Solr 6 is cut.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8409) Complex q param in Streaming Expression results in a bad query

2015-12-11 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15054004#comment-15054004
 ] 

Dennis Gove commented on SOLR-8409:
---

Backing up my hunch is that if I change the q to be 
{code}
presentTitles:\"chief\" AND age:[36 TO *]
{code}
I get results back but  a very small subset of the results I would expect to 
get back.

I've yet to visually verify the source data but I would guess that there is a 
record containing a field value
"chief"

I'll check for that the next time I'm looking into this (by Monday I suspect) 
but I'd wager that I'll find it. 

> Complex q param in Streaming Expression results in a bad query
> --
>
> Key: SOLR-8409
> URL: https://issues.apache.org/jira/browse/SOLR-8409
> Project: Solr
>  Issue Type: Bug
>  Components: SolrJ
>Affects Versions: Trunk, 6.0
>Reporter: Dennis Gove
>Priority: Minor
>  Labels: streaming, streaming_api
>
> When providing an expression like 
> {code}
> stream=search(people, fl="id,first", sort="first asc", 
> q="presentTitles:\"chief executive officer\" AND age:[36 TO *]")
> {code}
> the following error is seen.
> {code}
> no field name specified in query and no default specified via 'df' param
> {code}
> I believe the issue is related to the \" (escaped quotes) and the spaces in 
> the q field. If I remove the spaces then the query returns results as 
> expected (though I've yet to validate if those results are accurate).
> This requires some investigation to get down to the root cause. I would like 
> to fix it before Solr 6 is cut.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-7904) Make FacetStream Expressible

2015-12-13 Thread Dennis Gove (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-7904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove reassigned SOLR-7904:
-

Assignee: Dennis Gove

> Make FacetStream Expressible
> 
>
> Key: SOLR-7904
> URL: https://issues.apache.org/jira/browse/SOLR-7904
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: Trunk
>Reporter: Joel Bernstein
>Assignee: Dennis Gove
> Fix For: Trunk
>
> Attachments: SOLR-7904.patch, SOLR-7904.patch
>
>
> This ticket makes the FacetStream (SOLR-7903) expressible, so it can be used 
> as a Streaming Expression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-7904) Make FacetStream Expressible

2015-12-13 Thread Dennis Gove (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-7904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-7904:
--
Attachment: SOLR-7904.patch

Rebased against trunk.

> Make FacetStream Expressible
> 
>
> Key: SOLR-7904
> URL: https://issues.apache.org/jira/browse/SOLR-7904
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: Trunk
>Reporter: Joel Bernstein
>Assignee: Dennis Gove
> Fix For: Trunk
>
> Attachments: SOLR-7904.patch, SOLR-7904.patch, SOLR-7904.patch
>
>
> This ticket makes the FacetStream (SOLR-7903) expressible, so it can be used 
> as a Streaming Expression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Closed] (SOLR-7904) Make FacetStream Expressible

2015-12-13 Thread Dennis Gove (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-7904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove closed SOLR-7904.
-
Resolution: Fixed

> Make FacetStream Expressible
> 
>
> Key: SOLR-7904
> URL: https://issues.apache.org/jira/browse/SOLR-7904
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: Trunk
>Reporter: Joel Bernstein
>Assignee: Dennis Gove
> Fix For: Trunk
>
> Attachments: SOLR-7904.patch, SOLR-7904.patch, SOLR-7904.patch
>
>
> This ticket makes the FacetStream (SOLR-7903) expressible, so it can be used 
> as a Streaming Expression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-8409) Complex q param in Streaming Expression results in a bad query

2015-12-14 Thread Dennis Gove (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-8409:
--
Attachment: SOLR-8409.patch

This patch **appears** to fix the issues. Still am unable to replicate in a 
unit test but I have confirmed that the issue I was seeing in a packaged setup 
is fixed with this patch. 

I'll want to wait until I can get a replicated test before I commit this.

> Complex q param in Streaming Expression results in a bad query
> --
>
> Key: SOLR-8409
> URL: https://issues.apache.org/jira/browse/SOLR-8409
> Project: Solr
>  Issue Type: Bug
>  Components: SolrJ
>Affects Versions: Trunk, 6.0
>Reporter: Dennis Gove
>Priority: Minor
>  Labels: streaming, streaming_api
> Attachments: SOLR-8409.patch
>
>
> When providing an expression like 
> {code}
> stream=search(people, fl="id,first", sort="first asc", 
> q="presentTitles:\"chief executive officer\" AND age:[36 TO *]")
> {code}
> the following error is seen.
> {code}
> no field name specified in query and no default specified via 'df' param
> {code}
> I believe the issue is related to the \" (escaped quotes) and the spaces in 
> the q field. If I remove the spaces then the query returns results as 
> expected (though I've yet to validate if those results are accurate).
> This requires some investigation to get down to the root cause. I would like 
> to fix it before Solr 6 is cut.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-8409) Complex q param in Streaming Expression results in a bad query

2015-12-14 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15056147#comment-15056147
 ] 

Dennis Gove edited comment on SOLR-8409 at 12/14/15 3:48 PM:
-

This patch *appears* to fix the issues. Still am unable to replicate in a unit 
test but I have confirmed that the issue I was seeing in a packaged setup is 
fixed with this patch. 

I'll want to wait until I can get a replicated test before I commit this.


was (Author: dpgove):
This patch **appears** to fix the issues. Still am unable to replicate in a 
unit test but I have confirmed that the issue I was seeing in a packaged setup 
is fixed with this patch. 

I'll want to wait until I can get a replicated test before I commit this.

> Complex q param in Streaming Expression results in a bad query
> --
>
> Key: SOLR-8409
> URL: https://issues.apache.org/jira/browse/SOLR-8409
> Project: Solr
>  Issue Type: Bug
>  Components: SolrJ
>Affects Versions: Trunk, 6.0
>Reporter: Dennis Gove
>Priority: Minor
>  Labels: streaming, streaming_api
> Attachments: SOLR-8409.patch
>
>
> When providing an expression like 
> {code}
> stream=search(people, fl="id,first", sort="first asc", 
> q="presentTitles:\"chief executive officer\" AND age:[36 TO *]")
> {code}
> the following error is seen.
> {code}
> no field name specified in query and no default specified via 'df' param
> {code}
> I believe the issue is related to the \" (escaped quotes) and the spaces in 
> the q field. If I remove the spaces then the query returns results as 
> expected (though I've yet to validate if those results are accurate).
> This requires some investigation to get down to the root cause. I would like 
> to fix it before Solr 6 is cut.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8409) Complex q param in Streaming Expression results in a bad query

2015-12-15 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15057921#comment-15057921
 ] 

Dennis Gove commented on SOLR-8409:
---

Interestingly, if I leave the q param out entirely I don't see any raised 
exception. Also, if I leave out a field to filter on I also don't see any 
raised exception. I've confirmed the solrconfig-streaming.xml doesn't include 
either default q or df settings so I'd expect to see an exception in both of 
these cases.
{code}
search(collection1, fl="id,a_s,a_i,a_f", sort="a_f asc, a_i asc")
search(collection1, fl="id,a_s,a_i,a_f", sort="a_f asc, a_i asc", q="foo")
{code}

> Complex q param in Streaming Expression results in a bad query
> --
>
> Key: SOLR-8409
> URL: https://issues.apache.org/jira/browse/SOLR-8409
> Project: Solr
>  Issue Type: Bug
>  Components: SolrJ
>Affects Versions: Trunk
>Reporter: Dennis Gove
>Priority: Minor
>  Labels: streaming, streaming_api
> Attachments: SOLR-8409.patch
>
>
> When providing an expression like 
> {code}
> stream=search(people, fl="id,first", sort="first asc", 
> q="presentTitles:\"chief executive officer\" AND age:[36 TO *]")
> {code}
> the following error is seen.
> {code}
> no field name specified in query and no default specified via 'df' param
> {code}
> I believe the issue is related to the \" (escaped quotes) and the spaces in 
> the q field. If I remove the spaces then the query returns results as 
> expected (though I've yet to validate if those results are accurate).
> This requires some investigation to get down to the root cause. I would like 
> to fix it before Solr 6 is cut.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8409) Complex q param in Streaming Expression results in a bad query

2015-12-15 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15057983#comment-15057983
 ] 

Dennis Gove commented on SOLR-8409:
---

I take that back. The file schema-streaming.xml contains the default query field
{code}
text
{code}

If I comment out that setting then I am able to replicate the failure described 
in this ticket - finally. I will create a couple valid tests replicating the 
issue and will commit the fix as soon as I can.

> Complex q param in Streaming Expression results in a bad query
> --
>
> Key: SOLR-8409
> URL: https://issues.apache.org/jira/browse/SOLR-8409
> Project: Solr
>  Issue Type: Bug
>  Components: SolrJ
>Affects Versions: Trunk
>Reporter: Dennis Gove
>Priority: Minor
>  Labels: streaming, streaming_api
> Attachments: SOLR-8409.patch
>
>
> When providing an expression like 
> {code}
> stream=search(people, fl="id,first", sort="first asc", 
> q="presentTitles:\"chief executive officer\" AND age:[36 TO *]")
> {code}
> the following error is seen.
> {code}
> no field name specified in query and no default specified via 'df' param
> {code}
> I believe the issue is related to the \" (escaped quotes) and the spaces in 
> the q field. If I remove the spaces then the query returns results as 
> expected (though I've yet to validate if those results are accurate).
> This requires some investigation to get down to the root cause. I would like 
> to fix it before Solr 6 is cut.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-7525) Add ComplementStream to the Streaming API and Streaming Expressions

2015-12-17 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063456#comment-15063456
 ] 

Dennis Gove commented on SOLR-7525:
---

I'll rebase this off trunk so it is a little cleaner but I think the use of 
ReducerStream still holds. 

The purpose of Complement and Intersect is to return tuples in A which either 
do or do not exist in B. The tuples in B aren't used for anything and are 
dropped as soon as possible. The reason they make use of the ReducerStream is 
because B having 1 instance of some tuple found in A is the same as B having 
100 instances of some tuple found in A. Whether its 1 or 100 the tuple exists 
in B so it can either be returned in A or not. For this reason the size of the 
ReducerStream can always just be 1 because we only care about the first one and 
all others can be dropped from B. The fieldName (or fieldNames because you can 
do an intersect on N fields) provided to the ReducerStream are the fields the 
Intersect or Complement streams are acting on. 

Essentially, the goal is to take all the tuples in B and reduce them down to a 
unique list of tuples where uniqueness is defined over the fields that the 
intersect or complement is being checked over. Given that B is a set of unique 
tuples it is much easier to know when to move onto the next tuple in B.

I'll take a look at the GroupOperation but I would suspect that it can use a 
StreamEqualitor instead of a StreamComparator. A comparator allows order while 
an equalitor just checks if they are equal. There may be a reason it allows for 
ordering, though.

> Add ComplementStream to the Streaming API and Streaming Expressions
> ---
>
> Key: SOLR-7525
> URL: https://issues.apache.org/jira/browse/SOLR-7525
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrJ
>Reporter: Joel Bernstein
>Priority: Minor
> Attachments: SOLR-7525.patch
>
>
> This ticket adds a ComplementStream to the Streaming API and Streaming 
> Expression language.
> The ComplementStream will wrap two TupleStreams (StreamA, StreamB) and emit 
> Tuples from StreamA that are not in StreamB.
> Streaming API Syntax:
> {code}
> ComplementStream cstream = new ComplementStream(streamA, streamB, comp);
> {code}
> Streaming Expression syntax:
> {code}
> complement(search(...), search(...), on(...))
> {code}
> Internal implementation will rely on the ReducerStream. The ComplementStream 
> can be parallelized using the ParallelStream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-7525) Add ComplementStream to the Streaming API and Streaming Expressions

2015-12-17 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063456#comment-15063456
 ] 

Dennis Gove edited comment on SOLR-7525 at 12/18/15 4:56 AM:
-

I'll rebase this off trunk so it is a little cleaner but I think the use of 
ReducerStream still holds. 

The purpose of Complement and Intersect is to return tuples in A which either 
do or do not exist in B. The tuples in B aren't used for anything and are 
dropped as soon as possible. The reason they make use of the ReducerStream is 
because B having 1 instance of some tuple found in A is the same as B having 
100 instances of some tuple found in A. Whether its 1 or 100 the tuple exists 
in B so its twin in A can either be returned from A or not. For this reason the 
size of the ReducerStream can always just be 1 because we only care about the 
first one and all others can be dropped from B. The fieldName (or fieldNames 
because you can do an intersect on N fields) provided to the ReducerStream are 
the fields the Intersect or Complement streams are acting on. 

Essentially, the goal is to take all the tuples in B and reduce them down to a 
unique list of tuples where uniqueness is defined over the fields that the 
intersect or complement is being checked over. Given that B is a set of unique 
tuples it is much easier to know when to move onto the next tuple in B.

I'll take a look at the GroupOperation but I would suspect that it can use a 
StreamEqualitor instead of a StreamComparator. A comparator allows order while 
an equalitor just checks if they are equal. There may be a reason it allows for 
ordering, though.


was (Author: dpgove):
I'll rebase this off trunk so it is a little cleaner but I think the use of 
ReducerStream still holds. 

The purpose of Complement and Intersect is to return tuples in A which either 
do or do not exist in B. The tuples in B aren't used for anything and are 
dropped as soon as possible. The reason they make use of the ReducerStream is 
because B having 1 instance of some tuple found in A is the same as B having 
100 instances of some tuple found in A. Whether its 1 or 100 the tuple exists 
in B so it can either be returned in A or not. For this reason the size of the 
ReducerStream can always just be 1 because we only care about the first one and 
all others can be dropped from B. The fieldName (or fieldNames because you can 
do an intersect on N fields) provided to the ReducerStream are the fields the 
Intersect or Complement streams are acting on. 

Essentially, the goal is to take all the tuples in B and reduce them down to a 
unique list of tuples where uniqueness is defined over the fields that the 
intersect or complement is being checked over. Given that B is a set of unique 
tuples it is much easier to know when to move onto the next tuple in B.

I'll take a look at the GroupOperation but I would suspect that it can use a 
StreamEqualitor instead of a StreamComparator. A comparator allows order while 
an equalitor just checks if they are equal. There may be a reason it allows for 
ordering, though.

> Add ComplementStream to the Streaming API and Streaming Expressions
> ---
>
> Key: SOLR-7525
> URL: https://issues.apache.org/jira/browse/SOLR-7525
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrJ
>Reporter: Joel Bernstein
>Priority: Minor
> Attachments: SOLR-7525.patch
>
>
> This ticket adds a ComplementStream to the Streaming API and Streaming 
> Expression language.
> The ComplementStream will wrap two TupleStreams (StreamA, StreamB) and emit 
> Tuples from StreamA that are not in StreamB.
> Streaming API Syntax:
> {code}
> ComplementStream cstream = new ComplementStream(streamA, streamB, comp);
> {code}
> Streaming Expression syntax:
> {code}
> complement(search(...), search(...), on(...))
> {code}
> Internal implementation will rely on the ReducerStream. The ComplementStream 
> can be parallelized using the ParallelStream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8443) Change /stream handler http param from "stream" to "func"

2015-12-18 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064290#comment-15064290
 ] 

Dennis Gove commented on SOLR-8443:
---

If open to other suggestions, I find that I tend to refer to that parameter as 
the expression. Maybe expr=search()

> Change /stream handler http param from "stream" to "func"
> -
>
> Key: SOLR-8443
> URL: https://issues.apache.org/jira/browse/SOLR-8443
> Project: Solr
>  Issue Type: Bug
>  Components: SolrJ
>Reporter: Joel Bernstein
>Priority: Minor
>
> When passing in a Streaming Expression to the /stream handler you currently 
> use the "stream" http parameter. This dates back to when serialized 
> TupleStream objects were passed in. Now that the /stream handler only accepts 
> Streaming Expressions it makes sense to rename this parameter to "func". 
> This syntax also helps to emphasize that Streaming Expressions are a function 
> language.
> For example:
> http://localhost:8983/collection1/stream?func=search(...)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-8443) Change /stream handler http param from "stream" to "func"

2015-12-18 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064290#comment-15064290
 ] 

Dennis Gove edited comment on SOLR-8443 at 12/18/15 5:35 PM:
-

If open to other suggestions, I find that I tend to refer to that parameter as 
the expression. Maybe expr=search().

My thinking here is that one is providing a (potentially complex) expression 
made up of function calls.


was (Author: dpgove):
If open to other suggestions, I find that I tend to refer to that parameter as 
the expression. Maybe expr=search()

> Change /stream handler http param from "stream" to "func"
> -
>
> Key: SOLR-8443
> URL: https://issues.apache.org/jira/browse/SOLR-8443
> Project: Solr
>  Issue Type: Bug
>  Components: SolrJ
>Reporter: Joel Bernstein
>Priority: Minor
>
> When passing in a Streaming Expression to the /stream handler you currently 
> use the "stream" http parameter. This dates back to when serialized 
> TupleStream objects were passed in. Now that the /stream handler only accepts 
> Streaming Expressions it makes sense to rename this parameter to "func". 
> This syntax also helps to emphasize that Streaming Expressions are a function 
> language.
> For example:
> http://localhost:8983/collection1/stream?func=search(...)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-7525) Add ComplementStream to the Streaming API and Streaming Expressions

2015-12-18 Thread Dennis Gove (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-7525:
--
Attachment: SOLR-7525.patch

Rebases off of trunk and adds a DistinctOperation for use in the ReducerStream. 
The DistinctOperation ensures that for any given group only a single tuple will 
be returned. Currently it is implemented to return the first tuple in a group 
but a possible enhancement down the road could be to support a parameter asking 
for some other tuple in the group (such as the first in a sub-sorted list).

Also, while implementing this I realized that the UniqueStream can be 
refactored to be just a type of ReducerStream with DistinctOperation. That 
change is not included in this patch but will be done under a separate ticket.

Also of note, I'm not sure if the getChildren() function declared in 
TupleStream is necessary any longer. If I recall correctly that function was 
used by the StreamHandler when passing streams to workers but since all that 
has been changed to pass the result of toExpression()  I think we can get 
rid of the getChildren() function. I will explore that possibility.

> Add ComplementStream to the Streaming API and Streaming Expressions
> ---
>
> Key: SOLR-7525
> URL: https://issues.apache.org/jira/browse/SOLR-7525
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrJ
>Reporter: Joel Bernstein
>Priority: Minor
> Attachments: SOLR-7525.patch, SOLR-7525.patch
>
>
> This ticket adds a ComplementStream to the Streaming API and Streaming 
> Expression language.
> The ComplementStream will wrap two TupleStreams (StreamA, StreamB) and emit 
> Tuples from StreamA that are not in StreamB.
> Streaming API Syntax:
> {code}
> ComplementStream cstream = new ComplementStream(streamA, streamB, comp);
> {code}
> Streaming Expression syntax:
> {code}
> complement(search(...), search(...), on(...))
> {code}
> Internal implementation will rely on the ReducerStream. The ComplementStream 
> can be parallelized using the ParallelStream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-7525) Add ComplementStream to the Streaming API and Streaming Expressions

2015-12-18 Thread Dennis Gove (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Gove updated SOLR-7525:
--
Attachment: SOLR-7525.patch

As it turns out IntersectStream and ComplementStream can both make use of a 
UniqueStream which makes use of a ReducerStream. As such this new patch 
implements Intersect and Complement with streamB as an instance of 
UniqueStream. UniqueStream is changed to be implemented as a type of 
ReducerStream.

> Add ComplementStream to the Streaming API and Streaming Expressions
> ---
>
> Key: SOLR-7525
> URL: https://issues.apache.org/jira/browse/SOLR-7525
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrJ
>Reporter: Joel Bernstein
>Priority: Minor
> Attachments: SOLR-7525.patch, SOLR-7525.patch, SOLR-7525.patch
>
>
> This ticket adds a ComplementStream to the Streaming API and Streaming 
> Expression language.
> The ComplementStream will wrap two TupleStreams (StreamA, StreamB) and emit 
> Tuples from StreamA that are not in StreamB.
> Streaming API Syntax:
> {code}
> ComplementStream cstream = new ComplementStream(streamA, streamB, comp);
> {code}
> Streaming Expression syntax:
> {code}
> complement(search(...), search(...), on(...))
> {code}
> Internal implementation will rely on the ReducerStream. The ComplementStream 
> can be parallelized using the ParallelStream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-7525) Add ComplementStream to the Streaming API and Streaming Expressions

2015-12-18 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064830#comment-15064830
 ] 

Dennis Gove edited comment on SOLR-7525 at 12/18/15 10:06 PM:
--

Rebases off of trunk and adds a DistinctOperation for use in the ReducerStream. 
The DistinctOperation ensures that for any given group only a single tuple will 
be returned. Currently it is implemented to return the first tuple in a group 
but a possible enhancement down the road could be to support a parameter asking 
for some other tuple in the group (such as the first in a sub-sorted list).

Also, while implementing this I realized that the UniqueStream can be 
refactored to be just a type of ReducerStream with DistinctOperation. -That 
change is not included in this patch but will be done under a separate ticket.-

Also of note, I'm not sure if the getChildren() function declared in 
TupleStream is necessary any longer. If I recall correctly that function was 
used by the StreamHandler when passing streams to workers but since all that 
has been changed to pass the result of toExpression()  I think we can get 
rid of the getChildren() function. I will explore that possibility.


was (Author: dpgove):
Rebases off of trunk and adds a DistinctOperation for use in the ReducerStream. 
The DistinctOperation ensures that for any given group only a single tuple will 
be returned. Currently it is implemented to return the first tuple in a group 
but a possible enhancement down the road could be to support a parameter asking 
for some other tuple in the group (such as the first in a sub-sorted list).

Also, while implementing this I realized that the UniqueStream can be 
refactored to be just a type of ReducerStream with DistinctOperation. That 
change is not included in this patch but will be done under a separate ticket.

Also of note, I'm not sure if the getChildren() function declared in 
TupleStream is necessary any longer. If I recall correctly that function was 
used by the StreamHandler when passing streams to workers but since all that 
has been changed to pass the result of toExpression()  I think we can get 
rid of the getChildren() function. I will explore that possibility.

> Add ComplementStream to the Streaming API and Streaming Expressions
> ---
>
> Key: SOLR-7525
> URL: https://issues.apache.org/jira/browse/SOLR-7525
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrJ
>Reporter: Joel Bernstein
>Priority: Minor
> Attachments: SOLR-7525.patch, SOLR-7525.patch, SOLR-7525.patch
>
>
> This ticket adds a ComplementStream to the Streaming API and Streaming 
> Expression language.
> The ComplementStream will wrap two TupleStreams (StreamA, StreamB) and emit 
> Tuples from StreamA that are not in StreamB.
> Streaming API Syntax:
> {code}
> ComplementStream cstream = new ComplementStream(streamA, streamB, comp);
> {code}
> Streaming Expression syntax:
> {code}
> complement(search(...), search(...), on(...))
> {code}
> Internal implementation will rely on the ReducerStream. The ComplementStream 
> can be parallelized using the ParallelStream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-7525) Add ComplementStream to the Streaming API and Streaming Expressions

2015-12-19 Thread Dennis Gove (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065536#comment-15065536
 ] 

Dennis Gove commented on SOLR-7525:
---

Yes, you hit that right on the head. It was for consistency in the structure of 
Expressible classes. Also, currently it's implemented to return the first seen 
tuple in a group. However, I could see an enhancement where one could provide a 
selector to choose maybe the last seen, or the first based on some alternative 
order. For example, were someone to use the DistinctOperation in an expression 
it would currently look like this 
{code}
distinct()
{code}
but I could also see it looking like one of these
{code}
distinct(first, sort="fieldA desc, fieldB desc")
distinct(first, having="fieldA != null")
{code}
Essentially, although not currently supported it would be possible to expand 
the reducer operations to support complex selectors when a choice over which 
tuple to select is required.

All that said, for now it's just for consistency.

> Add ComplementStream to the Streaming API and Streaming Expressions
> ---
>
> Key: SOLR-7525
> URL: https://issues.apache.org/jira/browse/SOLR-7525
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrJ
>Reporter: Joel Bernstein
>Priority: Minor
> Attachments: SOLR-7525.patch, SOLR-7525.patch, SOLR-7525.patch
>
>
> This ticket adds a ComplementStream to the Streaming API and Streaming 
> Expression language.
> The ComplementStream will wrap two TupleStreams (StreamA, StreamB) and emit 
> Tuples from StreamA that are not in StreamB.
> Streaming API Syntax:
> {code}
> ComplementStream cstream = new ComplementStream(streamA, streamB, comp);
> {code}
> Streaming Expression syntax:
> {code}
> complement(search(...), search(...), on(...))
> {code}
> Internal implementation will rely on the ReducerStream. The ComplementStream 
> can be parallelized using the ParallelStream.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

1 2 3 4 5 6 >

1 - 100 of 539 matches

Mail list logo