[jira] [Resolved] (SOLR-12271) Analytics Component reads negative float and double field values incorrectly
[ https://issues.apache.org/jira/browse/SOLR-12271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove resolved SOLR-12271. Resolution: Fixed > Analytics Component reads negative float and double field values incorrectly > > > Key: SOLR-12271 > URL: https://issues.apache.org/jira/browse/SOLR-12271 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.4, master (8.0) >Reporter: Houston Putman >Assignee: Dennis Gove >Priority: Major > Fix For: 7.4, master (8.0) > > Time Spent: 10m > Remaining Estimate: 0h > > Currently the analytics component uses the incorrect way of converting > numeric doc values longs to doubles and floats. > The fix is easy and the tests now cover this use case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11914) Remove/move questionable SolrParams methods
[ https://issues.apache.org/jira/browse/SOLR-11914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16440179#comment-16440179 ] Dennis Gove commented on SOLR-11914: I agree with [~dsmiley] - the code in the streaming classes appears to be an oddly round-about way of doing things. The changes you've made here appear to be a much better approach. > Remove/move questionable SolrParams methods > --- > > Key: SOLR-11914 > URL: https://issues.apache.org/jira/browse/SOLR-11914 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrJ >Reporter: David Smiley >Priority: Minor > Labels: newdev > Attachments: SOLR-11914.patch > > > {{Map getAll(Map sink, Collection > params)}} > Is only used by the CollectionsHandler, and has particular rules about how it > handles multi-valued data that make it not very generic, and thus I think > doesn't belong here. Furthermore the existence of this method is confusing > in that it gives the user another choice against it use versus toMap (there > are two overloaded variants). > {{SolrParams toFilteredSolrParams(List names)}} > Is only called in one place, and something about it bothers me, perhaps just > the name or that it ought to be a view maybe. > {{static Map toMap(NamedList params)}} > Isn't used and I don't like it; it doesn't even involve a SolrParams! Legacy > of 2006. > {{static Map toMultiMap(NamedList params)}} > It doesn't even involve a SolrParams! Legacy of 2006 with some updates since. > Used in some places. Perhaps should be moved to NamedList as an instance > method. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-11924) Add the ability to watch collection set changes in ZkStateReader
[ https://issues.apache.org/jira/browse/SOLR-11924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove resolved SOLR-11924. Resolution: Fixed Assignee: Dennis Gove Fix Version/s: (was: master (8.0)) > Add the ability to watch collection set changes in ZkStateReader > > > Key: SOLR-11924 > URL: https://issues.apache.org/jira/browse/SOLR-11924 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrJ >Affects Versions: 7.4, master (8.0) >Reporter: Houston Putman >Assignee: Dennis Gove >Priority: Minor > Fix For: 7.4 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > Allow users to watch when the set of collections for a cluster is changed. > This is useful if a user is trying to discover collections within a cloud. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-12355) HashJoinStream's use of String::hashCode results in non-matching tuples being considered matches
Dennis Gove created SOLR-12355: -- Summary: HashJoinStream's use of String::hashCode results in non-matching tuples being considered matches Key: SOLR-12355 URL: https://issues.apache.org/jira/browse/SOLR-12355 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: SolrJ Affects Versions: 6.0 Reporter: Dennis Gove Assignee: Dennis Gove The following strings have been found to have hashCode conflicts and as such can result in HashJoinStream considering two tuples with fields of these values to be considered the same. {code:java} "MG!!00TNGP::Mtge::".hashCode() == "MG!!00TNH1::Mtge::".hashCode() {code} This means these two tuples are the same if we're comparing on field "foo" {code:java} { "foo":"MG!!00TNGP::Mtge::" } { "foo":"MG!!00TNH1::Mtge::" } {code} and these two tuples are the same if we're comparing on fields "foo,bar" {code:java} { "foo":"MG!!00TNGP" "bar":"Mtge" } { "foo":"MG!!00TNH1" "bar":"Mtge" }{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12355) HashJoinStream's use of String::hashCode results in non-matching tuples being considered matches
[ https://issues.apache.org/jira/browse/SOLR-12355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475055#comment-16475055 ] Dennis Gove commented on SOLR-12355: I have a fix for this where instead of calculating the string value's hashCode we just use the string value as the key in the hashed set of tuples. I'm creating a few test cases to verify this gives us what we want. > HashJoinStream's use of String::hashCode results in non-matching tuples being > considered matches > > > Key: SOLR-12355 > URL: https://issues.apache.org/jira/browse/SOLR-12355 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrJ >Affects Versions: 6.0 >Reporter: Dennis Gove >Assignee: Dennis Gove >Priority: Major > > The following strings have been found to have hashCode conflicts and as such > can result in HashJoinStream considering two tuples with fields of these > values to be considered the same. > {code:java} > "MG!!00TNGP::Mtge::".hashCode() == "MG!!00TNH1::Mtge::".hashCode() {code} > This means these two tuples are the same if we're comparing on field "foo" > {code:java} > { > "foo":"MG!!00TNGP::Mtge::" > } > { > "foo":"MG!!00TNH1::Mtge::" > } > {code} > and these two tuples are the same if we're comparing on fields "foo,bar" > {code:java} > { > "foo":"MG!!00TNGP" > "bar":"Mtge" > } > { > "foo":"MG!!00TNH1" > "bar":"Mtge" > }{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12355) HashJoinStream's use of String::hashCode results in non-matching tuples being considered matches
[ https://issues.apache.org/jira/browse/SOLR-12355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475060#comment-16475060 ] Dennis Gove commented on SOLR-12355: This also impacts OuterHashJoinStream. > HashJoinStream's use of String::hashCode results in non-matching tuples being > considered matches > > > Key: SOLR-12355 > URL: https://issues.apache.org/jira/browse/SOLR-12355 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrJ >Affects Versions: 6.0 >Reporter: Dennis Gove >Assignee: Dennis Gove >Priority: Major > > The following strings have been found to have hashCode conflicts and as such > can result in HashJoinStream considering two tuples with fields of these > values to be considered the same. > {code:java} > "MG!!00TNGP::Mtge::".hashCode() == "MG!!00TNH1::Mtge::".hashCode() {code} > This means these two tuples are the same if we're comparing on field "foo" > {code:java} > { > "foo":"MG!!00TNGP::Mtge::" > } > { > "foo":"MG!!00TNH1::Mtge::" > } > {code} > and these two tuples are the same if we're comparing on fields "foo,bar" > {code:java} > { > "foo":"MG!!00TNGP" > "bar":"Mtge" > } > { > "foo":"MG!!00TNH1" > "bar":"Mtge" > }{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-12355) HashJoinStream's use of String::hashCode results in non-matching tuples being considered matches
[ https://issues.apache.org/jira/browse/SOLR-12355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-12355: --- Attachment: SOLR-12355.patch > HashJoinStream's use of String::hashCode results in non-matching tuples being > considered matches > > > Key: SOLR-12355 > URL: https://issues.apache.org/jira/browse/SOLR-12355 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrJ >Affects Versions: 6.0 >Reporter: Dennis Gove >Assignee: Dennis Gove >Priority: Major > Attachments: SOLR-12355.patch > > > The following strings have been found to have hashCode conflicts and as such > can result in HashJoinStream considering two tuples with fields of these > values to be considered the same. > {code:java} > "MG!!00TNGP::Mtge::".hashCode() == "MG!!00TNH1::Mtge::".hashCode() {code} > This means these two tuples are the same if we're comparing on field "foo" > {code:java} > { > "foo":"MG!!00TNGP::Mtge::" > } > { > "foo":"MG!!00TNH1::Mtge::" > } > {code} > and these two tuples are the same if we're comparing on fields "foo,bar" > {code:java} > { > "foo":"MG!!00TNGP" > "bar":"Mtge" > } > { > "foo":"MG!!00TNH1" > "bar":"Mtge" > }{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-12355) HashJoinStream's use of String::hashCode results in non-matching tuples being considered matches
[ https://issues.apache.org/jira/browse/SOLR-12355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475973#comment-16475973 ] Dennis Gove commented on SOLR-12355: Initial patch attached. I have not yet run the full suite of tests against this. > HashJoinStream's use of String::hashCode results in non-matching tuples being > considered matches > > > Key: SOLR-12355 > URL: https://issues.apache.org/jira/browse/SOLR-12355 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrJ >Affects Versions: 6.0 >Reporter: Dennis Gove >Assignee: Dennis Gove >Priority: Major > Attachments: SOLR-12355.patch > > > The following strings have been found to have hashCode conflicts and as such > can result in HashJoinStream considering two tuples with fields of these > values to be considered the same. > {code:java} > "MG!!00TNGP::Mtge::".hashCode() == "MG!!00TNH1::Mtge::".hashCode() {code} > This means these two tuples are the same if we're comparing on field "foo" > {code:java} > { > "foo":"MG!!00TNGP::Mtge::" > } > { > "foo":"MG!!00TNH1::Mtge::" > } > {code} > and these two tuples are the same if we're comparing on fields "foo,bar" > {code:java} > { > "foo":"MG!!00TNGP" > "bar":"Mtge" > } > { > "foo":"MG!!00TNH1" > "bar":"Mtge" > }{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-12355) HashJoinStream's use of String::hashCode results in non-matching tuples being considered matches
[ https://issues.apache.org/jira/browse/SOLR-12355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-12355: --- Attachment: SOLR-12355.patch > HashJoinStream's use of String::hashCode results in non-matching tuples being > considered matches > > > Key: SOLR-12355 > URL: https://issues.apache.org/jira/browse/SOLR-12355 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrJ >Affects Versions: 6.0 >Reporter: Dennis Gove >Assignee: Dennis Gove >Priority: Major > Attachments: SOLR-12355.patch, SOLR-12355.patch > > > The following strings have been found to have hashCode conflicts and as such > can result in HashJoinStream considering two tuples with fields of these > values to be considered the same. > {code:java} > "MG!!00TNGP::Mtge::".hashCode() == "MG!!00TNH1::Mtge::".hashCode() {code} > This means these two tuples are the same if we're comparing on field "foo" > {code:java} > { > "foo":"MG!!00TNGP::Mtge::" > } > { > "foo":"MG!!00TNH1::Mtge::" > } > {code} > and these two tuples are the same if we're comparing on fields "foo,bar" > {code:java} > { > "foo":"MG!!00TNGP" > "bar":"Mtge" > } > { > "foo":"MG!!00TNH1" > "bar":"Mtge" > }{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-12355) HashJoinStream's use of String::hashCode results in non-matching tuples being considered matches
[ https://issues.apache.org/jira/browse/SOLR-12355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove resolved SOLR-12355. Resolution: Fixed Fix Version/s: 7.4 > HashJoinStream's use of String::hashCode results in non-matching tuples being > considered matches > > > Key: SOLR-12355 > URL: https://issues.apache.org/jira/browse/SOLR-12355 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrJ >Affects Versions: 6.0 >Reporter: Dennis Gove >Assignee: Dennis Gove >Priority: Major > Fix For: 7.4 > > Attachments: SOLR-12355.patch, SOLR-12355.patch > > > The following strings have been found to have hashCode conflicts and as such > can result in HashJoinStream considering two tuples with fields of these > values to be considered the same. > {code:java} > "MG!!00TNGP::Mtge::".hashCode() == "MG!!00TNH1::Mtge::".hashCode() {code} > This means these two tuples are the same if we're comparing on field "foo" > {code:java} > { > "foo":"MG!!00TNGP::Mtge::" > } > { > "foo":"MG!!00TNH1::Mtge::" > } > {code} > and these two tuples are the same if we're comparing on fields "foo,bar" > {code:java} > { > "foo":"MG!!00TNGP" > "bar":"Mtge" > } > { > "foo":"MG!!00TNH1" > "bar":"Mtge" > }{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-12271) Analytics Component reads negative float and double field values incorrectly
[ https://issues.apache.org/jira/browse/SOLR-12271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove reassigned SOLR-12271: -- Assignee: Dennis Gove > Analytics Component reads negative float and double field values incorrectly > > > Key: SOLR-12271 > URL: https://issues.apache.org/jira/browse/SOLR-12271 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.3.1, 7.4, master (8.0) >Reporter: Houston Putman >Assignee: Dennis Gove >Priority: Major > Fix For: 7.4, master (8.0) > > Time Spent: 10m > Remaining Estimate: 0h > > Currently the analytics component uses the incorrect way of converting > numeric doc values longs to doubles and floats. > The fix is easy and the tests now cover this use case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-10512) Innerjoin streaming expressions - Invalid JoinStream error
[ https://issues.apache.org/jira/browse/SOLR-10512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16392851#comment-16392851 ] Dennis Gove commented on SOLR-10512: It was certainly designed such that the left field in the on clause is the field from the first incoming stream and the right field in the on clause is the field from the second incoming stream. If that is not occurring then this is a very clear bug. > Innerjoin streaming expressions - Invalid JoinStream error > -- > > Key: SOLR-10512 > URL: https://issues.apache.org/jira/browse/SOLR-10512 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: search >Affects Versions: 6.4.2, 6.5 > Environment: Debian Jessie >Reporter: Dominique Béjean >Priority: Major > > It looks like innerJoin streaming expression do not work as explained in > documentation. An invalid JoinStream error occurs. > {noformat} > curl --data-urlencode 'expr=innerJoin( > search(books, >q="*:*", >fl="id", >sort="id asc"), > searchreviews, >q="*:*", >fl="id_book_s", >sort="id_book_s asc"), > on="id=id_books_s" > )' http://localhost:8983/solr/books/stream > > {"result-set":{"docs":[{"EXCEPTION":"Invalid JoinStream - all incoming stream > comparators (sort) must be a superset of this stream's > equalitor.","EOF":true}]}} > {noformat} > It is tottaly similar to the documentation example > > {noformat} > innerJoin( > search(people, q=*:*, fl="personId,name", sort="personId asc"), > search(pets, q=type:cat, fl="ownerId,petName", sort="ownerId asc"), > on="personId=ownerId" > ) > {noformat} > Queries on each collection give : > {noformat} > $ curl --data-urlencode 'expr=search(books, >q="*:*", >fl="id, title_s, pubyear_i", >sort="pubyear_i asc", >qt="/export")' > http://localhost:8983/solr/books/stream > { > "result-set": { > "docs": [ > { > "title_s": "Friends", > "pubyear_i": 1994, > "id": "book2" > }, > { > "title_s": "The Way of Kings", > "pubyear_i": 2010, > "id": "book1" > }, > { > "EOF": true, > "RESPONSE_TIME": 16 > } > ] > } > } > $ curl --data-urlencode 'expr=search(reviews, >q="author_s:d*", >fl="id, id_book_s, stars_i, review_dt", >sort="id_book_s asc", >qt="/export")' > http://localhost:8983/solr/reviews/stream > > { > "result-set": { > "docs": [ > { > "stars_i": 3, > "id": "book1_c2", > "id_book_s": "book1", > "review_dt": "2014-03-15T12:00:00Z" > }, > { > "stars_i": 4, > "id": "book1_c3", > "id_book_s": "book1", > "review_dt": "2014-12-15T12:00:00Z" > }, > { > "stars_i": 3, > "id": "book2_c2", > "id_book_s": "book2", > "review_dt": "1994-03-15T12:00:00Z" > }, > { > "stars_i": 4, > "id": "book2_c3", > "id_book_s": "book2", > "review_dt": "1994-12-15T12:00:00Z" > }, > { > "EOF": true, > "RESPONSE_TIME": 47 > } > ] > } > } > {noformat} > After more tests, I just had to invert the "on" clause to make it work > {noformat} > curl --data-urlencode 'expr=innerJoin( > search(books, >q="*:*", >fl="id", >sort="id asc"), > searchreviews, >q="*:*", >fl="id_book_s", >sort="id_book_s asc"), > on="id_books_s=id" > )' http://localhost:8983/solr/books/stream > > { > "result-set": { > "docs": [ > { > "title_s": "The Way of Kings", > "pubyear_i": 2010, > "stars_i": 5, > "id": "book1", >
[jira] [Commented] (SOLR-7535) Add UpdateStream to Streaming API and Streaming Expression
[ https://issues.apache.org/jira/browse/SOLR-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069053#comment-15069053 ] Dennis Gove commented on SOLR-7535: --- For the original mapping take a look at SolrStream, particular the {code}mapFields(...){code} function and where it is called from. It might make sense to require a SelectStream as the inner stream so that one can select the fields they want to insert. Or perhaps supporting a way to select fields as part of this stream's expression and it can internally use a SelectStream to implement that feature. > Add UpdateStream to Streaming API and Streaming Expression > -- > > Key: SOLR-7535 > URL: https://issues.apache.org/jira/browse/SOLR-7535 > Project: Solr > Issue Type: New Feature > Components: clients - java, SolrJ >Reporter: Joel Bernstein >Priority: Minor > > The ticket adds an UpdateStream implementation to the Streaming API and > streaming expressions. The UpdateStream will wrap a TupleStream and send the > Tuples it reads to a SolrCloud collection to be indexed. > This will allow users to pull data from different Solr Cloud collections, > merge and transform the streams and send the transformed data to another Solr > Cloud collection. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-7535) Add UpdateStream to Streaming API and Streaming Expression
[ https://issues.apache.org/jira/browse/SOLR-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15069053#comment-15069053 ] Dennis Gove edited comment on SOLR-7535 at 12/23/15 2:30 AM: - For the original mapping take a look at SolrStream, particularly the {code}mapFields(...){code} function and where it is called from. It might make sense to require a SelectStream as the inner stream so that one can select the fields they want to insert. Or perhaps supporting a way to select fields as part of this stream's expression and it can internally use a SelectStream to implement that feature. was (Author: dpgove): For the original mapping take a look at SolrStream, particular the {code}mapFields(...){code} function and where it is called from. It might make sense to require a SelectStream as the inner stream so that one can select the fields they want to insert. Or perhaps supporting a way to select fields as part of this stream's expression and it can internally use a SelectStream to implement that feature. > Add UpdateStream to Streaming API and Streaming Expression > -- > > Key: SOLR-7535 > URL: https://issues.apache.org/jira/browse/SOLR-7535 > Project: Solr > Issue Type: New Feature > Components: clients - java, SolrJ >Reporter: Joel Bernstein >Priority: Minor > > The ticket adds an UpdateStream implementation to the Streaming API and > streaming expressions. The UpdateStream will wrap a TupleStream and send the > Tuples it reads to a SolrCloud collection to be indexed. > This will allow users to pull data from different Solr Cloud collections, > merge and transform the streams and send the transformed data to another Solr > Cloud collection. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8458) Parameter substitution for Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15072395#comment-15072395 ] Dennis Gove commented on SOLR-8458: --- What if we were to make substitution parameters first class citizens similar to named parameters? During the parsing in ExpressionParser we could create instances of StreamExpressionSubstitutionParameters which exist as first class citizens of an StreamExpression object. This would allow us to send (in the example in the description) "expr", "left", and "right" through the ExpressionParser. Then, a simple method can be added to the StreamFactory which accepts a main expression and a map of names => expressions. It could then iterate over parameters of the main expression doing replacements until there are no more instances of StreamExpressionSubstitutionParameter in the main expression. Some checks for infinite loops would have to be added but those are relatively simple. This approach would allow the logic to exist outside of the StreamHandler which I think would be beneficial for the SQL Handler. It might also allow for some type of prepared statements with "pre-compiled" pieces (similar to what one might see in a DBMS). For example, this might be beneficial in a situation where some very expensive part of the expression is static which you want to perform different rollups or joins or whatever with. An optimizer could hang onto the static results in a RepeatableStream (doesn't exist yet) and substitute that into some other expression. > Parameter substitution for Streaming Expressions > > > Key: SOLR-8458 > URL: https://issues.apache.org/jira/browse/SOLR-8458 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Priority: Minor > Attachments: SOLR-8458.patch > > > As Streaming Expressions become more complicated it would be nice to support > parameter substitution. For example: > {code} > http://localhost:8983/col/stream?expr=merge($left, $right, > ...)&left=search(...)&right=search(...) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8467) CloudSolrStream should take a SolrParams object rather than a Map to allow more complex Solr queries to be specified
[ https://issues.apache.org/jira/browse/SOLR-8467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15072706#comment-15072706 ] Dennis Gove commented on SOLR-8467: --- I can't think of any reason why accepting a [Modifiable]SolrParams object instead of a Map would be a bad idea. I like this change. > CloudSolrStream should take a SolrParams object rather than a Map String> to allow more complex Solr queries to be specified > > > Key: SOLR-8467 > URL: https://issues.apache.org/jira/browse/SOLR-8467 > Project: Solr > Issue Type: Improvement >Reporter: Erick Erickson >Assignee: Erick Erickson > Attachments: SOLR-8647.patch > > > Currently, it's impossible to, say, specify multiple "fq" clauses when using > Streaming Aggregation due to the fact that the c'tors take a Map of params. > Opening to discuss whether we should > 1> deprecate the current c'tor > and/or > 2> add a c'tor that takes a SolrParams object instead. > and/or > 3> ??? > I don't see a clean way to go from a Map to a > (Modifiable)SolrParams, so existing code would need a significant change. I > hacked together a PoC, just to see if I could make CloudSolrStream take a > ModifiableSolrParams object instead and it passes tests, but it's so bad that > I'm not going to even post it. There's _got_ to be a better way to do this, > but at least it's possible -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8458) Parameter substitution for Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15072812#comment-15072812 ] Dennis Gove commented on SOLR-8458: --- As I see it there are 2 pieces here, both related but separate. First, adding support for parameter substitution in an expression. This would be handled with the changes I discussed above to StreamFactory, StreamParser, and the addition of a new type StreamExpressionSubstitutionParameter. Note that this doesn't necessarily care how the expressions come in. And second, adding support for parameter substitution in StreamHandler and in an http request. I like the syntax Joel uses in the description. What this would mean is that StreamHandler would see http params like "expr", "left" and "right", would know that these are expressions (can call into StreamFactory to check if something is a valid expression), and would pass them off independently to be parsed and then together to be pieced together. This approach modularizes the implementation such that how an expression with substitution comes in via http is independent to how it is handled within the Streaming API. For example, the following comes into StreamHandler {code} http://localhost:8983/col/stream?expr=merge($left, $right, ...)&baz=jaz&left=search(...)&right=search(...)&foo=bar {code} The StreamHandler will see five parameters, expr, baz, left, right, and foo. It would then determine that expr, left, and right are valid expressions and pass them off to be parsed into three expression objects. It would then pass all three into the factory to be combined into a single Stream object. The factory would then iterate (recursively?) until there aren't any more instances of a StreamExpressionSubstitutionParameter at any level (considering the possibility of infinite loops, of course). At this point it'd then just be passed off to create a Stream object as any other expression would be. Another possibility would be to parse out the substitution expressions and then register them in the factory for use during Stream object creation. This would negate the need to do that pre-processing of the N substitution expression and would give a place to register "pre-compiled" expressions. I'm not a huge fan of this approach as it would add more state to the factory and I'm not a huge fan of the state it already contains. > Parameter substitution for Streaming Expressions > > > Key: SOLR-8458 > URL: https://issues.apache.org/jira/browse/SOLR-8458 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Priority: Minor > Attachments: SOLR-8458.patch > > > As Streaming Expressions become more complicated it would be nice to support > parameter substitution. For example: > {code} > http://localhost:8983/col/stream?expr=merge($left, $right, > ...)&left=search(...)&right=search(...) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-8458) Parameter substitution for Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15072812#comment-15072812 ] Dennis Gove edited comment on SOLR-8458 at 12/28/15 3:27 PM: - As I see it there are 2 pieces here, both related but separate. First, adding support for parameter substitution in an expression. This would be handled with the changes I discussed above to StreamFactory, StreamParser, and the addition of a new type StreamExpressionSubstitutionParameter. Note that this doesn't necessarily care how the expressions come in. And second, adding support for parameter substitution in StreamHandler and in an http request. I like the syntax Joel uses in the description. What this would mean is that StreamHandler would see http params like "expr", "left" and "right", would know that these are expressions (can call into StreamFactory to check if something is a valid expression), and would pass them off independently to be parsed and then together to be pieced together. This approach modularizes the implementation such that how an expression with substitution comes in via http is independent to how it is handled within the Streaming API. For example, the following comes into StreamHandler {code} http://localhost:8983/col/stream?expr=merge($left, $right, ...)&baz=jaz&left=search(...)&right=search(...)&foo=bar {code} The StreamHandler will see five parameters, expr, baz, left, right, and foo. It would then determine that expr, left, and right are valid expressions and pass them off to be parsed into three expression objects. It would then pass all three into the factory to be combined into a single Stream object. The factory would then iterate (recursively?) until there aren't any more instances of a StreamExpressionSubstitutionParameter at any level (considering the possibility of infinite loops, of course). At this point it'd then just be passed off to create a Stream object as any other expression would be. Another possibility would be to parse out the substitution expressions and then register them in the factory for use during Stream object creation. This would negate the need to do that pre-processing of the N substitution expression and would give a place to register "pre-compiled" expressions. I'm not a huge fan of this approach as it would add more state to the factory and I'm not a huge fan of the state it already contains. I'm happy to take this on unless, [~caomanhdat], you want to continue your work on it. was (Author: dpgove): As I see it there are 2 pieces here, both related but separate. First, adding support for parameter substitution in an expression. This would be handled with the changes I discussed above to StreamFactory, StreamParser, and the addition of a new type StreamExpressionSubstitutionParameter. Note that this doesn't necessarily care how the expressions come in. And second, adding support for parameter substitution in StreamHandler and in an http request. I like the syntax Joel uses in the description. What this would mean is that StreamHandler would see http params like "expr", "left" and "right", would know that these are expressions (can call into StreamFactory to check if something is a valid expression), and would pass them off independently to be parsed and then together to be pieced together. This approach modularizes the implementation such that how an expression with substitution comes in via http is independent to how it is handled within the Streaming API. For example, the following comes into StreamHandler {code} http://localhost:8983/col/stream?expr=merge($left, $right, ...)&baz=jaz&left=search(...)&right=search(...)&foo=bar {code} The StreamHandler will see five parameters, expr, baz, left, right, and foo. It would then determine that expr, left, and right are valid expressions and pass them off to be parsed into three expression objects. It would then pass all three into the factory to be combined into a single Stream object. The factory would then iterate (recursively?) until there aren't any more instances of a StreamExpressionSubstitutionParameter at any level (considering the possibility of infinite loops, of course). At this point it'd then just be passed off to create a Stream object as any other expression would be. Another possibility would be to parse out the substitution expressions and then register them in the factory for use during Stream object creation. This would negate the need to do that pre-processing of the N substitution expression and would give a place to register "pre-compiled" expressions. I'm not a huge fan of this approach as it would add more state to the factory and I'm not a huge fan of the state it already contains. > Parameter substitution for Streaming Expressions > > > Key: SOLR-8458
[jira] [Commented] (SOLR-8458) Parameter substitution for Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15072871#comment-15072871 ] Dennis Gove commented on SOLR-8458: --- I agree. There's no reason to reinvent and I'm always a fan of keeping things consistent. If preprocessing substitution is already implemented for all incoming requests then we should absolutely make use of it. > Parameter substitution for Streaming Expressions > > > Key: SOLR-8458 > URL: https://issues.apache.org/jira/browse/SOLR-8458 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Priority: Minor > Attachments: SOLR-8458.patch > > > As Streaming Expressions become more complicated it would be nice to support > parameter substitution. For example: > {code} > http://localhost:8983/col/stream?expr=merge($left, $right, > ...)&left=search(...)&right=search(...) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8458) Parameter substitution for Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15072874#comment-15072874 ] Dennis Gove commented on SOLR-8458: --- This is great news. I'm all for continuing to make use of this feature. Thanks! > Parameter substitution for Streaming Expressions > > > Key: SOLR-8458 > URL: https://issues.apache.org/jira/browse/SOLR-8458 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Priority: Minor > Attachments: SOLR-8458.patch > > > As Streaming Expressions become more complicated it would be nice to support > parameter substitution. For example: > {code} > http://localhost:8983/col/stream?expr=merge($left, $right, > ...)&left=search(...)&right=search(...) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8458) Add Streaming Expressions tests for parameter substitution
[ https://issues.apache.org/jira/browse/SOLR-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15073389#comment-15073389 ] Dennis Gove commented on SOLR-8458: --- It appears from the thread below that substitution is already supported (see Yonik's comment below). At this point the action item would be to add streaming expression tests for parameter substitution. > Add Streaming Expressions tests for parameter substitution > -- > > Key: SOLR-8458 > URL: https://issues.apache.org/jira/browse/SOLR-8458 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Priority: Minor > Attachments: SOLR-8458.patch > > > This ticket is to add Streaming Expression tests that exercise the existing > macro expansion feature described here: > http://yonik.com/solr-query-parameter-substitution/ > Sample syntax below: > {code} > http://localhost:8983/col/stream?expr=merge(${left}, ${right}, > ...)&left=search(...)&right=search(...) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7535) Add UpdateStream to Streaming API and Streaming Expression
[ https://issues.apache.org/jira/browse/SOLR-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15073435#comment-15073435 ] Dennis Gove commented on SOLR-7535: --- I haven't looked at the patch yet but to answer your questions, 1. The name of the collection in the URL path and collection in any part of the expression can absolutely be different. There are couple of cases where this difference will most likely appear. First, during a join or merge of multiple of collections only one of the collection names can be contained in the URL. For example {code} innerJoin( search(people, fl="personId,name", q="*:*", sort="personId asc"), search(address, fl="personId,city", q="state:ny", sort="personId asc"), on="personId" ) {code} Two collections are being hit but only a single one can be included in the URL. There aren't any hard and fast rules about which one should be used in the URL and that decision could depend on a lot of different things, especially if the collections live in different clouds or on different hardware. There is also the possibility that the http request is being sent to what is effectively an empty collection which only exists to perform parallel work using the streaming api. For example, imagine you want to do some heavy metric processing but you don't want to use more resources than necessary on the servers where the collections live. You could setup an empty collection on totally different hardware with the intent of that hardware to act solely as workers on the real collection. This would allow you to do the heavy lifting on separate hardware from where the collection actually lives. For these reasons the collection name is a required parameter in the base streams (SolrCloudStream and FacetStream). 2. There are three types of parameters; positional, unnamed, and named. *Positional parameters* are those which must exist in some specific location in the expression. IIRC, the only positional parameters are the collection names in the base streams. This is done because the collection name is critical and as such we can say it is the first parameter, regardless of anything else included. *Unnamed parameters* are those whose meaning can be determined by the content of the parameter. For example, {code} rollup( search(people, fl="personId,name,age", q="*:*", sort="personId asc"), max(age), min(age), avg(age) ) {code} in this example we know that search(...) is a stream and max(...), min(...), and avg(...) are metrics. Unnamed parameters are also very useful in situations where the number of parameters of that type are non-determistic. In the example above one could provide any number of metrics and by keeping them unnamed the user can just keep adding new metrics without worrying about names. Another example of this is with the MergeStream where one can merge 2 or more streams together. *Named parameters* are used when you want to be very clear about what a particular parameter is being used for. For example, the "on" parameter in a join clause is to indicate that the join should be done on some field (or fields). The HashJoinStream is an interesting one because we have a named parameter "hashed" whose parameter needs to be a stream. In this case the decision to use a named parameter was made so as to be very clear to the user which stream is being hashed and which one is not. Generally it comes down to whether a parameter name would make things clearer for the user. > Add UpdateStream to Streaming API and Streaming Expression > -- > > Key: SOLR-7535 > URL: https://issues.apache.org/jira/browse/SOLR-7535 > Project: Solr > Issue Type: New Feature > Components: clients - java, SolrJ >Reporter: Joel Bernstein >Priority: Minor > Attachments: SOLR-7535.patch > > > The ticket adds an UpdateStream implementation to the Streaming API and > streaming expressions. The UpdateStream will wrap a TupleStream and send the > Tuples it reads to a SolrCloud collection to be indexed. > This will allow users to pull data from different Solr Cloud collections, > merge and transform the streams and send the transformed data to another Solr > Cloud collection. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8176) Model distributed graph traversals with Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-8176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15073843#comment-15073843 ] Dennis Gove commented on SOLR-8176: --- I've been thinking about this a little bit and one thing I keep coming back to is that there are different kinds of graph traversals and I think our model should take that into account. There are lots of types but I think the two major categories are node traversing graphs and edge traversing graphs. h3. Node Traversing Graphs These are graphs where you have some set of root nodes and you want to find connected nodes with some set of criteria. For example, given a collection of geographic locations (city, county, state, country) with fields "id", "type", "parentId", "name" find all cities in NY. As a hiccup the data is not completely normalized and some cities have their county listed as their parent while some have their state listed as their parent. Ie, you do not know how many nodes are between any given city and any given state. {code} graph( geography, root(q="type=state AND name:ny", fl="id"), leaf(q="type=city", fl="id,parentId,name"), edge("id=parentId") ) {code} In this example you're starting with a set of nodes in the geography collection, all which have some relationship to each other. You select your starting (root) nodes as all states named "ny" (there could be more than one). You then define what constitutes an ending (leaf) node as all cities. And finally, you say that all edges where nodeA.id == nodeB.parentId should be followed. This traversal can be implemented as a relatively simple iterative search following the form {code} frontier := search for all root nodes leaves := empty list while frontier is not empty frontierIds := list of ids of all nodes in frontier list leaves :append: search for all nodes whose parentId is in frontierIds and matches the leaf filter frontier := search for all nodes whose parentId is in frontierIds and does not match the leaf filter {code} In each iteration the leaves list can grow and the frontier list is replaced with the next set of nodes to consider. In the end you have a list of all leaf nodes which in some way connect to the original root nodes following the defined edge. Note that for simplicity I've left a couple of things out, including checking for already traversed nodes to avoid loops. Also, the leaf nodes are not added to the frontier but they can be. This would be useful in a situation where leaves are connected to leaves. > Model distributed graph traversals with Streaming Expressions > - > > Key: SOLR-8176 > URL: https://issues.apache.org/jira/browse/SOLR-8176 > Project: Solr > Issue Type: New Feature > Components: clients - java, SolrCloud, SolrJ >Affects Versions: Trunk >Reporter: Joel Bernstein >Assignee: Joel Bernstein > Labels: Graph > Fix For: Trunk > > > I think it would be useful to model a few *distributed graph traversal* use > cases with Solr's *Streaming Expression* language. This ticket will explore > different approaches with a goal of implementing two or three common graph > traversal use cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-8176) Model distributed graph traversals with Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-8176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15073843#comment-15073843 ] Dennis Gove edited comment on SOLR-8176 at 12/29/15 12:10 PM: -- I've been thinking about this a little bit and one thing I keep coming back to is that there are different kinds of graph traversals and I think our model should take that into account. There are lots of types but I think the two major categories are node traversing graphs and edge traversing graphs. h3. Node Traversing Graphs These are graphs where you have some set of root nodes and you want to find connected nodes with some set of criteria. For example, given a collection of geographic locations (city, county, state, country) with fields "id", "type", "parentId", "name" find all cities in NY. As a hiccup the data is not completely normalized and some cities have their county listed as their parent while some have their state listed as their parent. Ie, you do not know how many nodes are between any given city and any given state. {code} graph( geography, root(q="type=state AND name:ny", fl="id"), leaf(q="type=city", fl="id,parentId,name"), edge("id=parentId") ) {code} In this example you're starting with a set of nodes in the geography collection, all which have some relationship to each other. You select your starting (root) nodes as all states named "ny" (there could be more than one). You then define what constitutes an ending (leaf) node as all cities. And finally, you say that all edges where nodeA.id == nodeB.parentId should be followed. This traversal can be implemented as a relatively simple iterative search following the form {code} frontier := search for all root nodes leaves := empty list while frontier is not empty frontierIds := list of ids of all nodes in frontier list leaves :append: search for all nodes whose parentId is in frontierIds and matches the leaf filter frontier := search for all nodes whose parentId is in frontierIds and does not match the leaf filter {code} In each iteration the leaves list can grow and the frontier list is replaced with the next set of nodes to consider. In the end you have a list of all leaf nodes which in some way connect to the original root nodes following the defined edge. Note that for simplicity I've left a couple of things out, including checking for already traversed nodes to avoid loops. Also, the leaf nodes are not added to the frontier but they can be. This would be useful in a situation where leaves are connected to leaves. h3. Edge Traversal Graphs These are graphs where you have some set of edges but the nodes themselves are relatively unimportant for traversal. For example, finding the shortest path between two nodes, or finding the minimum spanning tree for some set of nodes, or finding loops. was (Author: dpgove): I've been thinking about this a little bit and one thing I keep coming back to is that there are different kinds of graph traversals and I think our model should take that into account. There are lots of types but I think the two major categories are node traversing graphs and edge traversing graphs. h3. Node Traversing Graphs These are graphs where you have some set of root nodes and you want to find connected nodes with some set of criteria. For example, given a collection of geographic locations (city, county, state, country) with fields "id", "type", "parentId", "name" find all cities in NY. As a hiccup the data is not completely normalized and some cities have their county listed as their parent while some have their state listed as their parent. Ie, you do not know how many nodes are between any given city and any given state. {code} graph( geography, root(q="type=state AND name:ny", fl="id"), leaf(q="type=city", fl="id,parentId,name"), edge("id=parentId") ) {code} In this example you're starting with a set of nodes in the geography collection, all which have some relationship to each other. You select your starting (root) nodes as all states named "ny" (there could be more than one). You then define what constitutes an ending (leaf) node as all cities. And finally, you say that all edges where nodeA.id == nodeB.parentId should be followed. This traversal can be implemented as a relatively simple iterative search following the form {code} frontier := search for all root nodes leaves := empty list while frontier is not empty frontierIds := list of ids of all nodes in frontier list leaves :append: search for all nodes whose parentId is in frontierIds and matches the leaf filter frontier := search for all nodes whose parentId is in frontierIds and does not match the leaf filter {code} In each iteration the leaves list can grow and the frontier list is replaced with the next set of nodes to consider. In the end you have a list of all leaf nodes which
[jira] [Commented] (SOLR-8458) Add Streaming Expressions tests for parameter substitution
[ https://issues.apache.org/jira/browse/SOLR-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15073898#comment-15073898 ] Dennis Gove commented on SOLR-8458: --- Cao, What's the purpose of ClientTupleStream? It appears it's only used in the tests and doesn't add any value as a Stream object. I'd rather not replace all existing stream creations with a randomized choice between doing substitution and not. I think it'd be better to have explicit tests which exercise substitution. I don't think it'd be necessary to test that substitution on each and every stream class because the implementation is outside of the stream classes. Also, it appears that the randomization of the choice is non-repeatable. Ie, if I rerun the tests with a -Dtests.seed value would the random choices be the same? It appears that the substitution is just picking some substring in the expression and marking it as being a parameter. I think this should test substituting entire expression clauses, like {code} http://localhost:8983/col/stream?expr=merge($left, $right, ...)&left=search(...)&right=search(...) {code} where left and right are entire clauses. The tests you've provided appear to do something like this {code} http://localhost:8983/col/stream?expr=merge(sear$left, se$right..), ...)&left=ch(...)&right=arch(. {code} which I don't think makes much sense. Technically the substitution should handle that but I think the codification should be that one would want to substitute entire expressions. > Add Streaming Expressions tests for parameter substitution > -- > > Key: SOLR-8458 > URL: https://issues.apache.org/jira/browse/SOLR-8458 > Project: Solr > Issue Type: Improvement >Reporter: Joel Bernstein >Priority: Minor > Attachments: SOLR-8458.patch, SOLR-8458.patch > > > This ticket is to add Streaming Expression tests that exercise the existing > macro expansion feature described here: > http://yonik.com/solr-query-parameter-substitution/ > Sample syntax below: > {code} > http://localhost:8983/col/stream?expr=merge(${left}, ${right}, > ...)&left=search(...)&right=search(...) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7535) Add UpdateStream to Streaming API and Streaming Expression
[ https://issues.apache.org/jira/browse/SOLR-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15073907#comment-15073907 ] Dennis Gove commented on SOLR-7535: --- In the Streaming API, read() is called until an EOF tuple is seen. This means that, even with an UpdateStream, one would have this code {code} while(true){ tuple = updateStream.read() // if # of records is some size, do a commit if(tuple.EOF){ break } } {code} I think it's the correct thing for an UpdateStream to swallow the individual tuples. The use-case you described isn't one I see existing. But if it did then I could see it being dealt with using a TeeStream. A TeeStream would work exactly like the unix command tee and take a single input stream and tee it out into multiple output streams. In this use-case, one would Tee the underlying searches. But again, I don't see this need actually existing. > Add UpdateStream to Streaming API and Streaming Expression > -- > > Key: SOLR-7535 > URL: https://issues.apache.org/jira/browse/SOLR-7535 > Project: Solr > Issue Type: New Feature > Components: clients - java, SolrJ >Reporter: Joel Bernstein >Priority: Minor > Attachments: SOLR-7535.patch > > > The ticket adds an UpdateStream implementation to the Streaming API and > streaming expressions. The UpdateStream will wrap a TupleStream and send the > Tuples it reads to a SolrCloud collection to be indexed. > This will allow users to pull data from different Solr Cloud collections, > merge and transform the streams and send the transformed data to another Solr > Cloud collection. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7535) Add UpdateStream to Streaming API and Streaming Expression
[ https://issues.apache.org/jira/browse/SOLR-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15073920#comment-15073920 ] Dennis Gove commented on SOLR-7535: --- I had an interesting thought related to the call to read(). Should there be some distinction between a ReadStream and a WriteStream. A ReadStream is one which reads tuples out while a WriteStream is one which writes tuples in. Up until this point we've only ever had ReadStreams and the read() method has always made sense. But the UpdateStream is a WriteStream and maybe it should have a different function, maybe write(). Also, it might be nice to be able to say in a stream that it's direct incoming stream must be a WriteStream (for example, a CommitStream would only work on a WriteStream while a RollupStream would only work on a ReadStream). (though maybe it'd be interesting to do rollups over the output tuples of an UpdateStream.). Thoughts? > Add UpdateStream to Streaming API and Streaming Expression > -- > > Key: SOLR-7535 > URL: https://issues.apache.org/jira/browse/SOLR-7535 > Project: Solr > Issue Type: New Feature > Components: clients - java, SolrJ >Reporter: Joel Bernstein >Priority: Minor > Attachments: SOLR-7535.patch > > > The ticket adds an UpdateStream implementation to the Streaming API and > streaming expressions. The UpdateStream will wrap a TupleStream and send the > Tuples it reads to a SolrCloud collection to be indexed. > This will allow users to pull data from different Solr Cloud collections, > merge and transform the streams and send the transformed data to another Solr > Cloud collection. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7535) Add UpdateStream to Streaming API and Streaming Expression
[ https://issues.apache.org/jira/browse/SOLR-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15073952#comment-15073952 ] Dennis Gove commented on SOLR-7535: --- I agree. It needs to be fleshed out some more. > Add UpdateStream to Streaming API and Streaming Expression > -- > > Key: SOLR-7535 > URL: https://issues.apache.org/jira/browse/SOLR-7535 > Project: Solr > Issue Type: New Feature > Components: clients - java, SolrJ >Reporter: Joel Bernstein >Priority: Minor > Attachments: SOLR-7535.patch > > > The ticket adds an UpdateStream implementation to the Streaming API and > streaming expressions. The UpdateStream will wrap a TupleStream and send the > Tuples it reads to a SolrCloud collection to be indexed. > This will allow users to pull data from different Solr Cloud collections, > merge and transform the streams and send the transformed data to another Solr > Cloud collection. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7535) Add UpdateStream to Streaming API and Streaming Expression
[ https://issues.apache.org/jira/browse/SOLR-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15076403#comment-15076403 ] Dennis Gove commented on SOLR-7535: --- +1 on fault tolerance as well. 1) I think the expected behavior of all streams is that the EOF tuple could contain extra metadata about the stream that is only known at the end. This allows an clients (or other streams) to know that this metadata didn't come from a real document but is just EOF metadata. If there are streams which don't handle a non-empty EOF tuple I think those streams should be corrected. 2) I think you're correct about the ParallelStream and how it operates. I don't see a way for the ParallelStream, as currently implemented, to interact with the raw tuples coming out from a call to another streams read() method. Ie, it does depend on doing the partitioning at the source and cannot do it in the middle of a data pipeline. It'd be a nice feature to be able to take a single stream of data and split it out onto N streams across N workers. Here's an example of a pipeline I'd like to be able to create with a ParallelStream but currently cannot seem to. Essentially, do something with the data then split it off to workers to to perform the expensive operations and then bring them back together (I hope the ascii art shows properly). {code} / --- worker1 --- rollup --- sort ---\ sourceA ---\ /- worker2 --- rollup --- sort \ --- join ---<-- worker3 --- rollup --- sort -> --- mergesort ---\ sourceB ---/ \- worker4 --- rollup --- sort / >--- join output \ --- worker5 --- rollup --- sort ---/ sourceC ---/ {code} My understanding is that the parallelization must be done at the start of the pipeline and cannot be done in the middle of the pipeline. Maybe a new stream is required that can split streams off to workers. > Add UpdateStream to Streaming API and Streaming Expression > -- > > Key: SOLR-7535 > URL: https://issues.apache.org/jira/browse/SOLR-7535 > Project: Solr > Issue Type: New Feature > Components: clients - java, SolrJ >Reporter: Joel Bernstein >Priority: Minor > Attachments: SOLR-7535.patch, SOLR-7535.patch > > > The ticket adds an UpdateStream implementation to the Streaming API and > streaming expressions. The UpdateStream will wrap a TupleStream and send the > Tuples it reads to a SolrCloud collection to be indexed. > This will allow users to pull data from different Solr Cloud collections, > merge and transform the streams and send the transformed data to another Solr > Cloud collection. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7535) Add UpdateStream to Streaming API and Streaming Expression
[ https://issues.apache.org/jira/browse/SOLR-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15076418#comment-15076418 ] Dennis Gove commented on SOLR-7535: --- Clever. I like it. > Add UpdateStream to Streaming API and Streaming Expression > -- > > Key: SOLR-7535 > URL: https://issues.apache.org/jira/browse/SOLR-7535 > Project: Solr > Issue Type: New Feature > Components: clients - java, SolrJ >Reporter: Joel Bernstein >Priority: Minor > Attachments: SOLR-7535.patch, SOLR-7535.patch > > > The ticket adds an UpdateStream implementation to the Streaming API and > streaming expressions. The UpdateStream will wrap a TupleStream and send the > Tuples it reads to a SolrCloud collection to be indexed. > This will allow users to pull data from different Solr Cloud collections, > merge and transform the streams and send the transformed data to another Solr > Cloud collection. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-8479) Add JDBCStream for integration with external data sources
Dennis Gove created SOLR-8479: - Summary: Add JDBCStream for integration with external data sources Key: SOLR-8479 URL: https://issues.apache.org/jira/browse/SOLR-8479 Project: Solr Issue Type: New Feature Components: SolrJ Reporter: Dennis Gove Priority: Minor Given that the Streaming API can merge and join multiple incoming SolrStreams to perform complex operations on the resulting combined datasets I think it would be beneficial to also support incoming streams from other data sources. The JDBCStream will provide a Streaming API interface to any data source which provides a JDBC driver. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8479) Add JDBCStream for integration with external data sources
[ https://issues.apache.org/jira/browse/SOLR-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-8479: -- Attachment: SOLR-8479.patch This is a first pass at the JDBCStream. There are still open questions and unimplemented pieces but I'm putting this out there to start the conversation. No tests are included. 1. Currently it's handling the loading of JDBC Driver classes by requiring the driver class be provided and will then call {code} Class.forName(driverClassName); {code} during open(). I'm wondering if there's a better way to handle this, particularly if we can do the loading via config file handling. > Add JDBCStream for integration with external data sources > - > > Key: SOLR-8479 > URL: https://issues.apache.org/jira/browse/SOLR-8479 > Project: Solr > Issue Type: New Feature > Components: SolrJ >Reporter: Dennis Gove >Priority: Minor > Attachments: SOLR-8479.patch > > > Given that the Streaming API can merge and join multiple incoming SolrStreams > to perform complex operations on the resulting combined datasets I think it > would be beneficial to also support incoming streams from other data sources. > The JDBCStream will provide a Streaming API interface to any data source > which provides a JDBC driver. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8479) Add JDBCStream for integration with external data sources
[ https://issues.apache.org/jira/browse/SOLR-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-8479: -- Attachment: SOLR-8479.patch Adds some simple tests for the raw stream and as embedded inside a SelectStream and MergeStream where it is being merged with a CloudSolrStream. Still doesn't implement Expressible interface (next on my list). > Add JDBCStream for integration with external data sources > - > > Key: SOLR-8479 > URL: https://issues.apache.org/jira/browse/SOLR-8479 > Project: Solr > Issue Type: New Feature > Components: SolrJ >Reporter: Dennis Gove >Priority: Minor > Attachments: SOLR-8479.patch, SOLR-8479.patch > > > Given that the Streaming API can merge and join multiple incoming SolrStreams > to perform complex operations on the resulting combined datasets I think it > would be beneficial to also support incoming streams from other data sources. > The JDBCStream will provide a Streaming API interface to any data source > which provides a JDBC driver. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-8479) Add JDBCStream for integration with external data sources
[ https://issues.apache.org/jira/browse/SOLR-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15076655#comment-15076655 ] Dennis Gove edited comment on SOLR-8479 at 1/2/16 8:28 PM: --- Adds some simple tests for the raw stream and as embedded inside a SelectStream and MergeStream where it is being merged with a CloudSolrStream. The tests are using the in-memory database hsqldb with driver "org.hsqldb.jdbcDriver". I chose this as it's already being used in a contrib module. I'm open to other options as I'm not a huge fan of this particular in-memory database. Still doesn't implement Expressible interface (next on my list). was (Author: dpgove): Adds some simple tests for the raw stream and as embedded inside a SelectStream and MergeStream where it is being merged with a CloudSolrStream. Still doesn't implement Expressible interface (next on my list). > Add JDBCStream for integration with external data sources > - > > Key: SOLR-8479 > URL: https://issues.apache.org/jira/browse/SOLR-8479 > Project: Solr > Issue Type: New Feature > Components: SolrJ >Reporter: Dennis Gove >Priority: Minor > Attachments: SOLR-8479.patch, SOLR-8479.patch > > > Given that the Streaming API can merge and join multiple incoming SolrStreams > to perform complex operations on the resulting combined datasets I think it > would be beneficial to also support incoming streams from other data sources. > The JDBCStream will provide a Streaming API interface to any data source > which provides a JDBC driver. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8479) Add JDBCStream for integration with external data sources
[ https://issues.apache.org/jira/browse/SOLR-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15076668#comment-15076668 ] Dennis Gove commented on SOLR-8479: --- I considered that but I wanted to be sure the test covered non-Solr code bases. I think there's value in showing that a non-Solr external source can be used and functions as expected. > Add JDBCStream for integration with external data sources > - > > Key: SOLR-8479 > URL: https://issues.apache.org/jira/browse/SOLR-8479 > Project: Solr > Issue Type: New Feature > Components: SolrJ >Reporter: Dennis Gove >Priority: Minor > Attachments: SOLR-8479.patch, SOLR-8479.patch > > > Given that the Streaming API can merge and join multiple incoming SolrStreams > to perform complex operations on the resulting combined datasets I think it > would be beneficial to also support incoming streams from other data sources. > The JDBCStream will provide a Streaming API interface to any data source > which provides a JDBC driver. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-6938) Convert build to work with Git rather than SVN.
[ https://issues.apache.org/jira/browse/LUCENE-6938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15076730#comment-15076730 ] Dennis Gove commented on LUCENE-6938: - You can get the current sha1 with the command {code} $> git rev-parse HEAD {code} And you can replace HEAD with the name of a branch/tag to get the sha1 of that. See $> git help rev-parse for all the options > Convert build to work with Git rather than SVN. > --- > > Key: LUCENE-6938 > URL: https://issues.apache.org/jira/browse/LUCENE-6938 > Project: Lucene - Core > Issue Type: Task >Reporter: Mark Miller >Assignee: Mark Miller > Attachments: LUCENE-6938.patch > > > We assume an SVN checkout in parts of our build and will need to move to > assuming a Git checkout. > Patches against https://github.com/dweiss/lucene-solr-svn2git from > LUCENE-6933. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8432) Split StreamExpressionTest into separate tests
[ https://issues.apache.org/jira/browse/SOLR-8432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15080429#comment-15080429 ] Dennis Gove commented on SOLR-8432: --- The reason for the single @test calling out to multiple functions has to do with the test setup code in AbstractFullDistribZkTestBase. Each test method will go through realitively expensive test setup / teardown. By only having a single test method with calls out to individual methods we can avoid that repeated setup/teardown code. I ran both the original test class and these changes and the runtime difference is significant. The original version completes in 36s while the separated versions takes 490s. I think for this change to be accepted it would have to include changes in the base classes to move some of the setup work from test method setup to test class setup. (might actually require new base classes so as not to impact other tests using these base test classes). > Split StreamExpressionTest into separate tests > -- > > Key: SOLR-8432 > URL: https://issues.apache.org/jira/browse/SOLR-8432 > Project: Solr > Issue Type: Test >Affects Versions: Trunk >Reporter: Jason Gerlowski >Priority: Trivial > Fix For: Trunk > > Attachments: SOLR-8432.patch > > > Currently, {{StreamExpressionTest}} consists of a single JUnit test that > calls 10 or 15 methods, each targeting a particular type of stream or > scenario. > Each of these scenario's would benefit being split into its own separate > JUnit test. This would allow each scenario to pass/fail independently. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-8432) Split StreamExpressionTest into separate tests
[ https://issues.apache.org/jira/browse/SOLR-8432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15080429#comment-15080429 ] Dennis Gove edited comment on SOLR-8432 at 1/3/16 1:28 PM: --- The reason for the single @test calling out to multiple functions has to do with the test setup code in AbstractFullDistribZkTestBase. Each test method will go through realitively expensive test setup / teardown. By only having a single test method with calls out to individual methods we can avoid that repeated setup/teardown code. I ran both the original test class and these changes and the runtime difference is significant. The original version completes in 36s while the separated version completes in 490s. I think for this change to be accepted it would have to include changes in the base classes to move some of the setup work from test method setup to test class setup. (might actually require new base classes so as not to impact other tests using these base test classes). was (Author: dpgove): The reason for the single @test calling out to multiple functions has to do with the test setup code in AbstractFullDistribZkTestBase. Each test method will go through realitively expensive test setup / teardown. By only having a single test method with calls out to individual methods we can avoid that repeated setup/teardown code. I ran both the original test class and these changes and the runtime difference is significant. The original version completes in 36s while the separated versions takes 490s. I think for this change to be accepted it would have to include changes in the base classes to move some of the setup work from test method setup to test class setup. (might actually require new base classes so as not to impact other tests using these base test classes). > Split StreamExpressionTest into separate tests > -- > > Key: SOLR-8432 > URL: https://issues.apache.org/jira/browse/SOLR-8432 > Project: Solr > Issue Type: Test >Affects Versions: Trunk >Reporter: Jason Gerlowski >Priority: Trivial > Fix For: Trunk > > Attachments: SOLR-8432.patch > > > Currently, {{StreamExpressionTest}} consists of a single JUnit test that > calls 10 or 15 methods, each targeting a particular type of stream or > scenario. > Each of these scenario's would benefit being split into its own separate > JUnit test. This would allow each scenario to pass/fail independently. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7535) Add UpdateStream to Streaming API and Streaming Expression
[ https://issues.apache.org/jira/browse/SOLR-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15080592#comment-15080592 ] Dennis Gove commented on SOLR-7535: --- It seems like a reasonable approach to limit the read rate to the maximum possible write rate. Lets add a buffering option at a later point, if it ends up being necessary. > Add UpdateStream to Streaming API and Streaming Expression > -- > > Key: SOLR-7535 > URL: https://issues.apache.org/jira/browse/SOLR-7535 > Project: Solr > Issue Type: New Feature > Components: clients - java, SolrJ >Reporter: Joel Bernstein >Assignee: Joel Bernstein >Priority: Minor > Attachments: SOLR-7535.patch, SOLR-7535.patch, SOLR-7535.patch, > SOLR-7535.patch, SOLR-7535.patch, SOLR-7535.patch > > > The ticket adds an UpdateStream implementation to the Streaming API and > streaming expressions. The UpdateStream will wrap a TupleStream and send the > Tuples it reads to a SolrCloud collection to be indexed. > This will allow users to pull data from different Solr Cloud collections, > merge and transform the streams and send the transformed data to another Solr > Cloud collection. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8479) Add JDBCStream for integration with external data sources
[ https://issues.apache.org/jira/browse/SOLR-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-8479: -- Attachment: SOLR-8479.patch New patch with a few changes. 1. Added some new tests 2. Made driverClassName an optional property. if provided then we will call Class.forName(driverClassName); during open(). Also added a call to DriverManager.getDriver(connectionUrl) during open() to validate that the driver can be found. If not then an exception is thrown. This will prevent us from continuing if the jdbc driver is not loaded. 3. Changed the default handling types so that Double is handled as a direct class while Float is converted to a Doube. This keeps in line with the rest of the Streaming API. > Add JDBCStream for integration with external data sources > - > > Key: SOLR-8479 > URL: https://issues.apache.org/jira/browse/SOLR-8479 > Project: Solr > Issue Type: New Feature > Components: SolrJ >Reporter: Dennis Gove >Priority: Minor > Attachments: SOLR-8479.patch, SOLR-8479.patch, SOLR-8479.patch > > > Given that the Streaming API can merge and join multiple incoming SolrStreams > to perform complex operations on the resulting combined datasets I think it > would be beneficial to also support incoming streams from other data sources. > The JDBCStream will provide a Streaming API interface to any data source > which provides a JDBC driver. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8479) Add JDBCStream for integration with external data sources
[ https://issues.apache.org/jira/browse/SOLR-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-8479: -- Attachment: SOLR-8479.patch Previous patch was a diff between the wrong hashes in the repo. This one is correct. > Add JDBCStream for integration with external data sources > - > > Key: SOLR-8479 > URL: https://issues.apache.org/jira/browse/SOLR-8479 > Project: Solr > Issue Type: New Feature > Components: SolrJ >Reporter: Dennis Gove >Priority: Minor > Attachments: SOLR-8479.patch, SOLR-8479.patch, SOLR-8479.patch, > SOLR-8479.patch > > > Given that the Streaming API can merge and join multiple incoming SolrStreams > to perform complex operations on the resulting combined datasets I think it > would be beneficial to also support incoming streams from other data sources. > The JDBCStream will provide a Streaming API interface to any data source > which provides a JDBC driver. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8479) Add JDBCStream for integration with external data sources
[ https://issues.apache.org/jira/browse/SOLR-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15081873#comment-15081873 ] Dennis Gove commented on SOLR-8479: --- I intend to add a few more tests for failure scenarios and for setting connection properties. Barring any issues found with that, I think this will be ready to go . > Add JDBCStream for integration with external data sources > - > > Key: SOLR-8479 > URL: https://issues.apache.org/jira/browse/SOLR-8479 > Project: Solr > Issue Type: New Feature > Components: SolrJ >Reporter: Dennis Gove >Priority: Minor > Attachments: SOLR-8479.patch, SOLR-8479.patch, SOLR-8479.patch, > SOLR-8479.patch > > > Given that the Streaming API can merge and join multiple incoming SolrStreams > to perform complex operations on the resulting combined datasets I think it > would be beneficial to also support incoming streams from other data sources. > The JDBCStream will provide a Streaming API interface to any data source > which provides a JDBC driver. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-8485) SelectStream only works with all lowercase field names and doesn't handle quoted selected fields
Dennis Gove created SOLR-8485: - Summary: SelectStream only works with all lowercase field names and doesn't handle quoted selected fields Key: SOLR-8485 URL: https://issues.apache.org/jira/browse/SOLR-8485 Project: Solr Issue Type: Bug Reporter: Dennis Gove Priority: Minor Three issues exist if one creates a SelectStream with an expression. {code} select( search(collection1, fl="personId_i,rating_f", q="rating_f:*", sort="personId_i asc"), personId_i as personId, rating_f as rating ) {code} "personId_i as personId" will be parsed as "personid_i as personid" 1. The incoming tuple will contain a field "personId_i" but the selection will be looking for a field "personid_i". This field won't be found in the incoming tuple (notice the case difference) and as such no field personId will exist in the outgoing tuple. 2. If (1) wasn't an issue, the outgoing tuple would have in a field "personid" and not the expected "personId" (notice the case difference). This can lead to other down-the-road issues. Also, if one were to quote the selected fields such as in {code} select( search(collection1, fl="personId_i,rating_f", q="rating_f:*", sort="personId_i asc"), "personId_i as personId", "rating_f as rating" ) {code} then the quotes would be included in the field name. Wrapping quotes should be handled properly such that they are removed from the parameters before they are parsed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8485) SelectStream only works with all lowercase field names and doesn't handle quoted selected fields
[ https://issues.apache.org/jira/browse/SOLR-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-8485: -- Description: Three issues exist if one creates a SelectStream with an expression. {code} select( search(collection1, fl="personId_i,rating_f", q="rating_f:*", sort="personId_i asc"), personId_i as personId, rating_f as rating ) {code} "personId_i as personId" will be parsed as "personid_i as personid" 1. The incoming tuple will contain a field "personId_i" but the selection will be looking for a field "personid_i". This field won't be found in the incoming tuple (notice the case difference) and as such no field personId will exist in the outgoing tuple. 2. If (1) wasn't an issue, the outgoing tuple would have in a field "personid" and not the expected "personId" (notice the case difference). This can lead to other down-the-road issues. 3. Also, if one were to quote the selected fields such as in {code} select( search(collection1, fl="personId_i,rating_f", q="rating_f:*", sort="personId_i asc"), "personId_i as personId", "rating_f as rating" ) {code} then the quotes would be included in the field name. Wrapping quotes should be handled properly such that they are removed from the parameters before they are parsed. was: Three issues exist if one creates a SelectStream with an expression. {code} select( search(collection1, fl="personId_i,rating_f", q="rating_f:*", sort="personId_i asc"), personId_i as personId, rating_f as rating ) {code} "personId_i as personId" will be parsed as "personid_i as personid" 1. The incoming tuple will contain a field "personId_i" but the selection will be looking for a field "personid_i". This field won't be found in the incoming tuple (notice the case difference) and as such no field personId will exist in the outgoing tuple. 2. If (1) wasn't an issue, the outgoing tuple would have in a field "personid" and not the expected "personId" (notice the case difference). This can lead to other down-the-road issues. Also, if one were to quote the selected fields such as in {code} select( search(collection1, fl="personId_i,rating_f", q="rating_f:*", sort="personId_i asc"), "personId_i as personId", "rating_f as rating" ) {code} then the quotes would be included in the field name. Wrapping quotes should be handled properly such that they are removed from the parameters before they are parsed. > SelectStream only works with all lowercase field names and doesn't handle > quoted selected fields > > > Key: SOLR-8485 > URL: https://issues.apache.org/jira/browse/SOLR-8485 > Project: Solr > Issue Type: Bug >Reporter: Dennis Gove >Priority: Minor > Labels: streaming > > Three issues exist if one creates a SelectStream with an expression. > {code} > select( > search(collection1, fl="personId_i,rating_f", q="rating_f:*", > sort="personId_i asc"), > personId_i as personId, > rating_f as rating > ) > {code} > "personId_i as personId" will be parsed as "personid_i as personid" > 1. The incoming tuple will contain a field "personId_i" but the selection > will be looking for a field "personid_i". This field won't be found in the > incoming tuple (notice the case difference) and as such no field personId > will exist in the outgoing tuple. > 2. If (1) wasn't an issue, the outgoing tuple would have in a field > "personid" and not the expected "personId" (notice the case difference). This > can lead to other down-the-road issues. > 3. Also, if one were to quote the selected fields such as in > {code} > select( > search(collection1, fl="personId_i,rating_f", q="rating_f:*", > sort="personId_i asc"), > "personId_i as personId", > "rating_f as rating" > ) > {code} > then the quotes would be included in the field name. Wrapping quotes should > be handled properly such that they are removed from the parameters before > they are parsed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8485) SelectStream only works with all lowercase field names and doesn't handle quoted selected fields
[ https://issues.apache.org/jira/browse/SOLR-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-8485: -- Attachment: SOLR-8485.patch This patch corrects issues (1) and (2). > SelectStream only works with all lowercase field names and doesn't handle > quoted selected fields > > > Key: SOLR-8485 > URL: https://issues.apache.org/jira/browse/SOLR-8485 > Project: Solr > Issue Type: Bug >Reporter: Dennis Gove >Priority: Minor > Labels: streaming > Attachments: SOLR-8485.patch > > > Three issues exist if one creates a SelectStream with an expression. > {code} > select( > search(collection1, fl="personId_i,rating_f", q="rating_f:*", > sort="personId_i asc"), > personId_i as personId, > rating_f as rating > ) > {code} > "personId_i as personId" will be parsed as "personid_i as personid" > 1. The incoming tuple will contain a field "personId_i" but the selection > will be looking for a field "personid_i". This field won't be found in the > incoming tuple (notice the case difference) and as such no field personId > will exist in the outgoing tuple. > 2. If (1) wasn't an issue, the outgoing tuple would have in a field > "personid" and not the expected "personId" (notice the case difference). This > can lead to other down-the-road issues. > 3. Also, if one were to quote the selected fields such as in > {code} > select( > search(collection1, fl="personId_i,rating_f", q="rating_f:*", > sort="personId_i asc"), > "personId_i as personId", > "rating_f as rating" > ) > {code} > then the quotes would be included in the field name. Wrapping quotes should > be handled properly such that they are removed from the parameters before > they are parsed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7535) Add UpdateStream to Streaming API and Streaming Expression
[ https://issues.apache.org/jira/browse/SOLR-7535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15082109#comment-15082109 ] Dennis Gove commented on SOLR-7535: --- +1 on that. I'm real excited about this! > Add UpdateStream to Streaming API and Streaming Expression > -- > > Key: SOLR-7535 > URL: https://issues.apache.org/jira/browse/SOLR-7535 > Project: Solr > Issue Type: New Feature > Components: clients - java, SolrJ >Reporter: Joel Bernstein >Assignee: Joel Bernstein >Priority: Minor > Attachments: SOLR-7535.patch, SOLR-7535.patch, SOLR-7535.patch, > SOLR-7535.patch, SOLR-7535.patch, SOLR-7535.patch, SOLR-7535.patch, > SOLR-7535.patch > > > The ticket adds an UpdateStream implementation to the Streaming API and > streaming expressions. The UpdateStream will wrap a TupleStream and send the > Tuples it reads to a SolrCloud collection to be indexed. > This will allow users to pull data from different Solr Cloud collections, > merge and transform the streams and send the transformed data to another Solr > Cloud collection. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-8559) FCS facet performance optimization
[ https://issues.apache.org/jira/browse/SOLR-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106833#comment-15106833 ] Dennis Gove edited comment on SOLR-8559 at 1/19/16 2:57 PM: Are you able to create a test for this specific enhancement? Or if not, are there existing tests covering this code I can specifically check after applying the patch? was (Author: dpgove): Are you able to create a test for this specific feature? Or if not, are there existing tests covering this code I can specifically check after applying the patch? > FCS facet performance optimization > -- > > Key: SOLR-8559 > URL: https://issues.apache.org/jira/browse/SOLR-8559 > Project: Solr > Issue Type: Improvement > Components: faceting >Reporter: Keith Laban > Labels: optimization, performance > Attachments: solr-8559.patch > > > While profiling a large collection (multi-sharded billions of documents), I > found that a fast (5-10ms query) which had no matches would take 20-30 > seconds when doing facets even when {{facet.mincount=1}} > Profiling made it apparent that with {{facet.method=fcs}} 99% of the time was > [spent > here|https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/request/PerSegmentSingleValuedFaceting.java#L212]. > {{queue.udpateTop}} gets called {{numOfSegments*numTerms}}, the worst case > when every term is in every segment. This formula doesn't take into account > whether or not any of the terms have a positive count with respect to the > docset. > These optimizations are aimed to do two things: > # When mincount>0 don't include segments which all terms have zero counts. > This should significantly speed up processing when terms are high cardinality > and the matching docset is small > # FIXED TODO optimization: when mincount>0 move segment position the next non > zero term value. > both of these changes will minimize the number of called needed to the slow > {{updateTop}} call. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8559) FCS facet performance optimization
[ https://issues.apache.org/jira/browse/SOLR-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106833#comment-15106833 ] Dennis Gove commented on SOLR-8559: --- Are you able to create a test for this specific feature? Or if not, are there existing tests covering this code I can specifically check after applying the patch? > FCS facet performance optimization > -- > > Key: SOLR-8559 > URL: https://issues.apache.org/jira/browse/SOLR-8559 > Project: Solr > Issue Type: Improvement > Components: faceting >Reporter: Keith Laban > Labels: optimization, performance > Attachments: solr-8559.patch > > > While profiling a large collection (multi-sharded billions of documents), I > found that a fast (5-10ms query) which had no matches would take 20-30 > seconds when doing facets even when {{facet.mincount=1}} > Profiling made it apparent that with {{facet.method=fcs}} 99% of the time was > [spent > here|https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/request/PerSegmentSingleValuedFaceting.java#L212]. > {{queue.udpateTop}} gets called {{numOfSegments*numTerms}}, the worst case > when every term is in every segment. This formula doesn't take into account > whether or not any of the terms have a positive count with respect to the > docset. > These optimizations are aimed to do two things: > # When mincount>0 don't include segments which all terms have zero counts. > This should significantly speed up processing when terms are high cardinality > and the matching docset is small > # FIXED TODO optimization: when mincount>0 move segment position the next non > zero term value. > both of these changes will minimize the number of called needed to the slow > {{updateTop}} call. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-8559) FCS facet performance optimization
[ https://issues.apache.org/jira/browse/SOLR-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove reassigned SOLR-8559: - Assignee: Dennis Gove > FCS facet performance optimization > -- > > Key: SOLR-8559 > URL: https://issues.apache.org/jira/browse/SOLR-8559 > Project: Solr > Issue Type: Improvement > Components: faceting >Reporter: Keith Laban >Assignee: Dennis Gove > Labels: optimization, performance > Attachments: solr-8559.patch > > > While profiling a large collection (multi-sharded billions of documents), I > found that a fast (5-10ms query) which had no matches would take 20-30 > seconds when doing facets even when {{facet.mincount=1}} > Profiling made it apparent that with {{facet.method=fcs}} 99% of the time was > [spent > here|https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/request/PerSegmentSingleValuedFaceting.java#L212]. > {{queue.udpateTop}} gets called {{numOfSegments*numTerms}}, the worst case > when every term is in every segment. This formula doesn't take into account > whether or not any of the terms have a positive count with respect to the > docset. > These optimizations are aimed to do two things: > # When mincount>0 don't include segments which all terms have zero counts. > This should significantly speed up processing when terms are high cardinality > and the matching docset is small > # FIXED TODO optimization: when mincount>0 move segment position the next non > zero term value. > both of these changes will minimize the number of called needed to the slow > {{updateTop}} call. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8559) FCS facet performance optimization
[ https://issues.apache.org/jira/browse/SOLR-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-8559: -- Attachment: SOLR-8559-trunk.patch Rebased off trunk. Keith will upload a 5x backport. > FCS facet performance optimization > -- > > Key: SOLR-8559 > URL: https://issues.apache.org/jira/browse/SOLR-8559 > Project: Solr > Issue Type: Improvement > Components: faceting >Reporter: Keith Laban >Assignee: Dennis Gove > Labels: optimization, performance > Attachments: SOLR-8559-trunk.patch, solr-8559.patch > > > While profiling a large collection (multi-sharded billions of documents), I > found that a fast (5-10ms query) which had no matches would take 20-30 > seconds when doing facets even when {{facet.mincount=1}} > Profiling made it apparent that with {{facet.method=fcs}} 99% of the time was > [spent > here|https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/request/PerSegmentSingleValuedFaceting.java#L212]. > {{queue.udpateTop}} gets called {{numOfSegments*numTerms}}, the worst case > when every term is in every segment. This formula doesn't take into account > whether or not any of the terms have a positive count with respect to the > docset. > These optimizations are aimed to do two things: > # When mincount>0 don't include segments which all terms have zero counts. > This should significantly speed up processing when terms are high cardinality > and the matching docset is small > # FIXED TODO optimization: when mincount>0 move segment position the next non > zero term value. > both of these changes will minimize the number of called needed to the slow > {{updateTop}} call. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8556) Add ConcatOperation to be used with the SelectStream
[ https://issues.apache.org/jira/browse/SOLR-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106932#comment-15106932 ] Dennis Gove commented on SOLR-8556: --- {code} expression.addParameter(new StreamExpressionNamedParameter("fields",fieldsStr)); {code} If the ConcatOperation was created using the non-expression constructor then fieldsStr will unset and as such this won't produce the expected result. Instead, I'd iterate over the fields array and create a comma-separated list. This would allow the removal of the global fieldsStr. > Add ConcatOperation to be used with the SelectStream > > > Key: SOLR-8556 > URL: https://issues.apache.org/jira/browse/SOLR-8556 > Project: Solr > Issue Type: New Feature >Reporter: Joel Bernstein >Assignee: Joel Bernstein > Attachments: SOLR-8556.patch > > > Now that we have the UpdateStream it would be nice to support the use case of > sending rolled up aggregates for storage in another SolrCloud collection. To > support this we'll need to create id's for the aggregate records. > The ConcatOperation would allows us to concatenate the bucket values into a > unique id. For example: > {code} > update( > select( > rollup(search(q="*:*, fl="a,b,c", ...)), > concat(fields="a,b,c", delim="_", as="id"))) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8556) Add ConcatOperation to be used with the SelectStream
[ https://issues.apache.org/jira/browse/SOLR-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106944#comment-15106944 ] Dennis Gove commented on SOLR-8556: --- I'm going through and creating tests so I'll correct these issues as I go. > Add ConcatOperation to be used with the SelectStream > > > Key: SOLR-8556 > URL: https://issues.apache.org/jira/browse/SOLR-8556 > Project: Solr > Issue Type: New Feature >Reporter: Joel Bernstein >Assignee: Joel Bernstein > Attachments: SOLR-8556.patch > > > Now that we have the UpdateStream it would be nice to support the use case of > sending rolled up aggregates for storage in another SolrCloud collection. To > support this we'll need to create id's for the aggregate records. > The ConcatOperation would allows us to concatenate the bucket values into a > unique id. For example: > {code} > update( > select( > rollup(search(q="*:*, fl="a,b,c", ...)), > concat(fields="a,b,c", delim="_", as="id"))) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8556) Add ConcatOperation to be used with the SelectStream
[ https://issues.apache.org/jira/browse/SOLR-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106941#comment-15106941 ] Dennis Gove commented on SOLR-8556: --- {code} buf.append(field); {code} This concatenates the fields together instead of the values of the fields together. > Add ConcatOperation to be used with the SelectStream > > > Key: SOLR-8556 > URL: https://issues.apache.org/jira/browse/SOLR-8556 > Project: Solr > Issue Type: New Feature >Reporter: Joel Bernstein >Assignee: Joel Bernstein > Attachments: SOLR-8556.patch > > > Now that we have the UpdateStream it would be nice to support the use case of > sending rolled up aggregates for storage in another SolrCloud collection. To > support this we'll need to create id's for the aggregate records. > The ConcatOperation would allows us to concatenate the bucket values into a > unique id. For example: > {code} > update( > select( > rollup(search(q="*:*, fl="a,b,c", ...)), > concat(fields="a,b,c", delim="_", as="id"))) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-8556) Add ConcatOperation to be used with the SelectStream
[ https://issues.apache.org/jira/browse/SOLR-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106932#comment-15106932 ] Dennis Gove edited comment on SOLR-8556 at 1/19/16 4:34 PM: {code} expression.addParameter(new StreamExpressionNamedParameter("fields",fieldsStr)); {code} If the ConcatOperation was created using the non-expression constructor then fieldsStr will be unset and as such this won't produce the expected result. Instead, I'd iterate over the fields array and create a comma-separated list. This would allow the removal of the global fieldsStr. was (Author: dpgove): {code} expression.addParameter(new StreamExpressionNamedParameter("fields",fieldsStr)); {code} If the ConcatOperation was created using the non-expression constructor then fieldsStr will unset and as such this won't produce the expected result. Instead, I'd iterate over the fields array and create a comma-separated list. This would allow the removal of the global fieldsStr. > Add ConcatOperation to be used with the SelectStream > > > Key: SOLR-8556 > URL: https://issues.apache.org/jira/browse/SOLR-8556 > Project: Solr > Issue Type: New Feature >Reporter: Joel Bernstein >Assignee: Joel Bernstein > Attachments: SOLR-8556.patch > > > Now that we have the UpdateStream it would be nice to support the use case of > sending rolled up aggregates for storage in another SolrCloud collection. To > support this we'll need to create id's for the aggregate records. > The ConcatOperation would allows us to concatenate the bucket values into a > unique id. For example: > {code} > update( > select( > rollup(search(q="*:*, fl="a,b,c", ...)), > concat(fields="a,b,c", delim="_", as="id"))) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8556) Add ConcatOperation to be used with the SelectStream
[ https://issues.apache.org/jira/browse/SOLR-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-8556: -- Attachment: SOLR-8556.patch Adds ConcatOperation specific tests. Corrects the issues mentioned above. Would still like to add a test showing the usage of this inside a SelectStream. For example, there is a difference between these two clauses {code} select(a,b,c, search(), replace(a,null,withValue=0f), concat(fields="a,b", as="ab", delim="-")) {code} {code} select(a,b,c, search(), concat(fields="a,b", as="ab", delim="-"), replace(a,null,withValue=0f)) {code} In the first one a null value in field a will first be replaced with 0 and then concatenated with b whereas in the second one a and b will be concatenated first and then a null value in a would be replaced with 0. Ie, the order of operations matters. Also note, I added a feature which, for null values, will concatenate the string "null". If one wants to replace null with a different value then one can use the replace operation in conjunction with the concat operation. > Add ConcatOperation to be used with the SelectStream > > > Key: SOLR-8556 > URL: https://issues.apache.org/jira/browse/SOLR-8556 > Project: Solr > Issue Type: New Feature >Reporter: Joel Bernstein >Assignee: Joel Bernstein > Attachments: SOLR-8556.patch, SOLR-8556.patch > > > Now that we have the UpdateStream it would be nice to support the use case of > sending rolled up aggregates for storage in another SolrCloud collection. To > support this we'll need to create id's for the aggregate records. > The ConcatOperation would allows us to concatenate the bucket values into a > unique id. For example: > {code} > update( > select( > rollup(search(q="*:*, fl="a,b,c", ...)), > concat(fields="a,b,c", delim="_", as="id"))) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8556) Add ConcatOperation to be used with the SelectStream
[ https://issues.apache.org/jira/browse/SOLR-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-8556: -- Attachment: SOLR-8556.patch Adds additional tests. I think this is good to go. > Add ConcatOperation to be used with the SelectStream > > > Key: SOLR-8556 > URL: https://issues.apache.org/jira/browse/SOLR-8556 > Project: Solr > Issue Type: New Feature >Reporter: Joel Bernstein >Assignee: Joel Bernstein > Attachments: SOLR-8556.patch, SOLR-8556.patch, SOLR-8556.patch > > > Now that we have the UpdateStream it would be nice to support the use case of > sending rolled up aggregates for storage in another SolrCloud collection. To > support this we'll need to create id's for the aggregate records. > The ConcatOperation would allows us to concatenate the bucket values into a > unique id. For example: > {code} > update( > select( > rollup(search(q="*:*, fl="a,b,c", ...)), > concat(fields="a,b,c", delim="_", as="id"))) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8559) FCS facet performance optimization
[ https://issues.apache.org/jira/browse/SOLR-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-8559: -- Attachment: (was: SOLR-8559-trunk.patch) > FCS facet performance optimization > -- > > Key: SOLR-8559 > URL: https://issues.apache.org/jira/browse/SOLR-8559 > Project: Solr > Issue Type: Improvement > Components: faceting >Reporter: Keith Laban >Assignee: Dennis Gove > Labels: optimization, performance > Attachments: SOLR-8559.patch, solr-8559.patch > > > While profiling a large collection (multi-sharded billions of documents), I > found that a fast (5-10ms query) which had no matches would take 20-30 > seconds when doing facets even when {{facet.mincount=1}} > Profiling made it apparent that with {{facet.method=fcs}} 99% of the time was > [spent > here|https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/request/PerSegmentSingleValuedFaceting.java#L212]. > {{queue.udpateTop}} gets called {{numOfSegments*numTerms}}, the worst case > when every term is in every segment. This formula doesn't take into account > whether or not any of the terms have a positive count with respect to the > docset. > These optimizations are aimed to do two things: > # When mincount>0 don't include segments which all terms have zero counts. > This should significantly speed up processing when terms are high cardinality > and the matching docset is small > # FIXED TODO optimization: when mincount>0 move segment position the next non > zero term value. > both of these changes will minimize the number of called needed to the slow > {{updateTop}} call. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8559) FCS facet performance optimization
[ https://issues.apache.org/jira/browse/SOLR-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-8559: -- Attachment: SOLR-8559.patch Patch applied to both trunk and branch_5x. > FCS facet performance optimization > -- > > Key: SOLR-8559 > URL: https://issues.apache.org/jira/browse/SOLR-8559 > Project: Solr > Issue Type: Improvement > Components: faceting >Reporter: Keith Laban >Assignee: Dennis Gove > Labels: optimization, performance > Attachments: SOLR-8559.patch, solr-8559.patch > > > While profiling a large collection (multi-sharded billions of documents), I > found that a fast (5-10ms query) which had no matches would take 20-30 > seconds when doing facets even when {{facet.mincount=1}} > Profiling made it apparent that with {{facet.method=fcs}} 99% of the time was > [spent > here|https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/request/PerSegmentSingleValuedFaceting.java#L212]. > {{queue.udpateTop}} gets called {{numOfSegments*numTerms}}, the worst case > when every term is in every segment. This formula doesn't take into account > whether or not any of the terms have a positive count with respect to the > docset. > These optimizations are aimed to do two things: > # When mincount>0 don't include segments which all terms have zero counts. > This should significantly speed up processing when terms are high cardinality > and the matching docset is small > # FIXED TODO optimization: when mincount>0 move segment position the next non > zero term value. > both of these changes will minimize the number of called needed to the slow > {{updateTop}} call. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8559) FCS facet performance optimization
[ https://issues.apache.org/jira/browse/SOLR-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-8559: -- Affects Version/s: Trunk 5.4 > FCS facet performance optimization > -- > > Key: SOLR-8559 > URL: https://issues.apache.org/jira/browse/SOLR-8559 > Project: Solr > Issue Type: Improvement > Components: faceting >Affects Versions: 5.4, Trunk >Reporter: Keith Laban >Assignee: Dennis Gove > Labels: optimization, performance > Attachments: SOLR-8559.patch, solr-8559.patch > > > While profiling a large collection (multi-sharded billions of documents), I > found that a fast (5-10ms query) which had no matches would take 20-30 > seconds when doing facets even when {{facet.mincount=1}} > Profiling made it apparent that with {{facet.method=fcs}} 99% of the time was > [spent > here|https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/request/PerSegmentSingleValuedFaceting.java#L212]. > {{queue.udpateTop}} gets called {{numOfSegments*numTerms}}, the worst case > when every term is in every segment. This formula doesn't take into account > whether or not any of the terms have a positive count with respect to the > docset. > These optimizations are aimed to do two things: > # When mincount>0 don't include segments which all terms have zero counts. > This should significantly speed up processing when terms are high cardinality > and the matching docset is small > # FIXED TODO optimization: when mincount>0 move segment position the next non > zero term value. > both of these changes will minimize the number of called needed to the slow > {{updateTop}} call. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8559) FCS facet performance optimization
[ https://issues.apache.org/jira/browse/SOLR-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107620#comment-15107620 ] Dennis Gove commented on SOLR-8559: --- Thanks for this performance optimization, Keith! > FCS facet performance optimization > -- > > Key: SOLR-8559 > URL: https://issues.apache.org/jira/browse/SOLR-8559 > Project: Solr > Issue Type: Improvement > Components: faceting >Affects Versions: 5.4, Trunk >Reporter: Keith Laban >Assignee: Dennis Gove > Labels: optimization, performance > Fix For: Trunk > > Attachments: SOLR-8559-4-10-4.patch, SOLR-8559.patch, solr-8559.patch > > > While profiling a large collection (multi-sharded billions of documents), I > found that a fast (5-10ms query) which had no matches would take 20-30 > seconds when doing facets even when {{facet.mincount=1}} > Profiling made it apparent that with {{facet.method=fcs}} 99% of the time was > [spent > here|https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/request/PerSegmentSingleValuedFaceting.java#L212]. > {{queue.udpateTop}} gets called {{numOfSegments*numTerms}}, the worst case > when every term is in every segment. This formula doesn't take into account > whether or not any of the terms have a positive count with respect to the > docset. > These optimizations are aimed to do two things: > # When mincount>0 don't include segments which all terms have zero counts. > This should significantly speed up processing when terms are high cardinality > and the matching docset is small > # FIXED TODO optimization: when mincount>0 move segment position the next non > zero term value. > both of these changes will minimize the number of called needed to the slow > {{updateTop}} call. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (SOLR-8559) FCS facet performance optimization
[ https://issues.apache.org/jira/browse/SOLR-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove closed SOLR-8559. - Resolution: Fixed Fix Version/s: Trunk > FCS facet performance optimization > -- > > Key: SOLR-8559 > URL: https://issues.apache.org/jira/browse/SOLR-8559 > Project: Solr > Issue Type: Improvement > Components: faceting >Affects Versions: 5.4, Trunk >Reporter: Keith Laban >Assignee: Dennis Gove > Labels: optimization, performance > Fix For: Trunk > > Attachments: SOLR-8559-4-10-4.patch, SOLR-8559.patch, solr-8559.patch > > > While profiling a large collection (multi-sharded billions of documents), I > found that a fast (5-10ms query) which had no matches would take 20-30 > seconds when doing facets even when {{facet.mincount=1}} > Profiling made it apparent that with {{facet.method=fcs}} 99% of the time was > [spent > here|https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/request/PerSegmentSingleValuedFaceting.java#L212]. > {{queue.udpateTop}} gets called {{numOfSegments*numTerms}}, the worst case > when every term is in every segment. This formula doesn't take into account > whether or not any of the terms have a positive count with respect to the > docset. > These optimizations are aimed to do two things: > # When mincount>0 don't include segments which all terms have zero counts. > This should significantly speed up processing when terms are high cardinality > and the matching docset is small > # FIXED TODO optimization: when mincount>0 move segment position the next non > zero term value. > both of these changes will minimize the number of called needed to the slow > {{updateTop}} call. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-8556) Add ConcatOperation to be used with the SelectStream
[ https://issues.apache.org/jira/browse/SOLR-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove reassigned SOLR-8556: - Assignee: Dennis Gove (was: Joel Bernstein) > Add ConcatOperation to be used with the SelectStream > > > Key: SOLR-8556 > URL: https://issues.apache.org/jira/browse/SOLR-8556 > Project: Solr > Issue Type: New Feature >Reporter: Joel Bernstein >Assignee: Dennis Gove > Attachments: SOLR-8556.patch, SOLR-8556.patch, SOLR-8556.patch > > > Now that we have the UpdateStream it would be nice to support the use case of > sending rolled up aggregates for storage in another SolrCloud collection. To > support this we'll need to create id's for the aggregate records. > The ConcatOperation would allows us to concatenate the bucket values into a > unique id. For example: > {code} > update( > select( > rollup(search(q="*:*, fl="a,b,c", ...)), > concat(fields="a,b,c", delim="_", as="id"))) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8556) Add ConcatOperation to be used with the SelectStream
[ https://issues.apache.org/jira/browse/SOLR-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-8556: -- Attachment: SOLR-8556.patch Added "concat" to StreamHandler so it is a default operation. > Add ConcatOperation to be used with the SelectStream > > > Key: SOLR-8556 > URL: https://issues.apache.org/jira/browse/SOLR-8556 > Project: Solr > Issue Type: New Feature >Reporter: Joel Bernstein >Assignee: Dennis Gove > Attachments: SOLR-8556.patch, SOLR-8556.patch, SOLR-8556.patch, > SOLR-8556.patch > > > Now that we have the UpdateStream it would be nice to support the use case of > sending rolled up aggregates for storage in another SolrCloud collection. To > support this we'll need to create id's for the aggregate records. > The ConcatOperation would allows us to concatenate the bucket values into a > unique id. For example: > {code} > update( > select( > rollup(search(q="*:*, fl="a,b,c", ...)), > concat(fields="a,b,c", delim="_", as="id"))) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (SOLR-8556) Add ConcatOperation to be used with the SelectStream
[ https://issues.apache.org/jira/browse/SOLR-8556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove closed SOLR-8556. - Resolution: Implemented Fix Version/s: Trunk > Add ConcatOperation to be used with the SelectStream > > > Key: SOLR-8556 > URL: https://issues.apache.org/jira/browse/SOLR-8556 > Project: Solr > Issue Type: New Feature >Reporter: Joel Bernstein >Assignee: Dennis Gove > Fix For: Trunk > > Attachments: SOLR-8556.patch, SOLR-8556.patch, SOLR-8556.patch, > SOLR-8556.patch > > > Now that we have the UpdateStream it would be nice to support the use case of > sending rolled up aggregates for storage in another SolrCloud collection. To > support this we'll need to create id's for the aggregate records. > The ConcatOperation would allows us to concatenate the bucket values into a > unique id. For example: > {code} > update( > select( > rollup(search(q="*:*, fl="a,b,c", ...)), > concat(fields="a,b,c", delim="_", as="id"))) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Reopened] (SOLR-8559) FCS facet performance optimization
[ https://issues.apache.org/jira/browse/SOLR-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove reopened SOLR-8559: --- > FCS facet performance optimization > -- > > Key: SOLR-8559 > URL: https://issues.apache.org/jira/browse/SOLR-8559 > Project: Solr > Issue Type: Improvement > Components: faceting >Affects Versions: 5.4, Trunk >Reporter: Keith Laban >Assignee: Dennis Gove > Labels: optimization, performance > Fix For: Trunk > > Attachments: SOLR-8559-4-10-4.patch, SOLR-8559.patch, solr-8559.patch > > > While profiling a large collection (multi-sharded billions of documents), I > found that a fast (5-10ms query) which had no matches would take 20-30 > seconds when doing facets even when {{facet.mincount=1}} > Profiling made it apparent that with {{facet.method=fcs}} 99% of the time was > [spent > here|https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/request/PerSegmentSingleValuedFaceting.java#L212]. > {{queue.udpateTop}} gets called {{numOfSegments*numTerms}}, the worst case > when every term is in every segment. This formula doesn't take into account > whether or not any of the terms have a positive count with respect to the > docset. > These optimizations are aimed to do two things: > # When mincount>0 don't include segments which all terms have zero counts. > This should significantly speed up processing when terms are high cardinality > and the matching docset is small > # FIXED TODO optimization: when mincount>0 move segment position the next non > zero term value. > both of these changes will minimize the number of called needed to the slow > {{updateTop}} call. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8559) FCS facet performance optimization
[ https://issues.apache.org/jira/browse/SOLR-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-8559: -- Fix Version/s: (was: Trunk) 5.5 > FCS facet performance optimization > -- > > Key: SOLR-8559 > URL: https://issues.apache.org/jira/browse/SOLR-8559 > Project: Solr > Issue Type: Improvement > Components: faceting >Affects Versions: 5.4, Trunk >Reporter: Keith Laban >Assignee: Dennis Gove > Labels: optimization, performance > Fix For: 5.5 > > Attachments: SOLR-8559-4-10-4.patch, SOLR-8559.patch, solr-8559.patch > > > While profiling a large collection (multi-sharded billions of documents), I > found that a fast (5-10ms query) which had no matches would take 20-30 > seconds when doing facets even when {{facet.mincount=1}} > Profiling made it apparent that with {{facet.method=fcs}} 99% of the time was > [spent > here|https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/request/PerSegmentSingleValuedFaceting.java#L212]. > {{queue.udpateTop}} gets called {{numOfSegments*numTerms}}, the worst case > when every term is in every segment. This formula doesn't take into account > whether or not any of the terms have a positive count with respect to the > docset. > These optimizations are aimed to do two things: > # When mincount>0 don't include segments which all terms have zero counts. > This should significantly speed up processing when terms are high cardinality > and the matching docset is small > # FIXED TODO optimization: when mincount>0 move segment position the next non > zero term value. > both of these changes will minimize the number of called needed to the slow > {{updateTop}} call. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8559) FCS facet performance optimization
[ https://issues.apache.org/jira/browse/SOLR-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-8559: -- Affects Version/s: (was: 5.4) 5.5 > FCS facet performance optimization > -- > > Key: SOLR-8559 > URL: https://issues.apache.org/jira/browse/SOLR-8559 > Project: Solr > Issue Type: Improvement > Components: faceting >Affects Versions: 5.5, Trunk >Reporter: Keith Laban >Assignee: Dennis Gove > Labels: optimization, performance > Fix For: 5.5 > > Attachments: SOLR-8559-4-10-4.patch, SOLR-8559.patch, solr-8559.patch > > > While profiling a large collection (multi-sharded billions of documents), I > found that a fast (5-10ms query) which had no matches would take 20-30 > seconds when doing facets even when {{facet.mincount=1}} > Profiling made it apparent that with {{facet.method=fcs}} 99% of the time was > [spent > here|https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/request/PerSegmentSingleValuedFaceting.java#L212]. > {{queue.udpateTop}} gets called {{numOfSegments*numTerms}}, the worst case > when every term is in every segment. This formula doesn't take into account > whether or not any of the terms have a positive count with respect to the > docset. > These optimizations are aimed to do two things: > # When mincount>0 don't include segments which all terms have zero counts. > This should significantly speed up processing when terms are high cardinality > and the matching docset is small > # FIXED TODO optimization: when mincount>0 move segment position the next non > zero term value. > both of these changes will minimize the number of called needed to the slow > {{updateTop}} call. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-8559) FCS facet performance optimization
[ https://issues.apache.org/jira/browse/SOLR-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove resolved SOLR-8559. --- Resolution: Implemented > FCS facet performance optimization > -- > > Key: SOLR-8559 > URL: https://issues.apache.org/jira/browse/SOLR-8559 > Project: Solr > Issue Type: Improvement > Components: faceting >Affects Versions: 5.5, Trunk >Reporter: Keith Laban >Assignee: Dennis Gove > Labels: optimization, performance > Fix For: 5.5 > > Attachments: SOLR-8559-4-10-4.patch, SOLR-8559.patch, solr-8559.patch > > > While profiling a large collection (multi-sharded billions of documents), I > found that a fast (5-10ms query) which had no matches would take 20-30 > seconds when doing facets even when {{facet.mincount=1}} > Profiling made it apparent that with {{facet.method=fcs}} 99% of the time was > [spent > here|https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/request/PerSegmentSingleValuedFaceting.java#L212]. > {{queue.udpateTop}} gets called {{numOfSegments*numTerms}}, the worst case > when every term is in every segment. This formula doesn't take into account > whether or not any of the terms have a positive count with respect to the > docset. > These optimizations are aimed to do two things: > # When mincount>0 don't include segments which all terms have zero counts. > This should significantly speed up processing when terms are high cardinality > and the matching docset is small > # FIXED TODO optimization: when mincount>0 move segment position the next non > zero term value. > both of these changes will minimize the number of called needed to the slow > {{updateTop}} call. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8559) FCS facet performance optimization
[ https://issues.apache.org/jira/browse/SOLR-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15113160#comment-15113160 ] Dennis Gove commented on SOLR-8559: --- Thanks, Dave. I think I've been marking issues as closed. I'll keep this in mind going forward. > FCS facet performance optimization > -- > > Key: SOLR-8559 > URL: https://issues.apache.org/jira/browse/SOLR-8559 > Project: Solr > Issue Type: Improvement > Components: faceting >Affects Versions: 5.5, Trunk >Reporter: Keith Laban >Assignee: Dennis Gove > Labels: optimization, performance > Fix For: 5.5 > > Attachments: SOLR-8559-4-10-4.patch, SOLR-8559.patch, solr-8559.patch > > > While profiling a large collection (multi-sharded billions of documents), I > found that a fast (5-10ms query) which had no matches would take 20-30 > seconds when doing facets even when {{facet.mincount=1}} > Profiling made it apparent that with {{facet.method=fcs}} 99% of the time was > [spent > here|https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/request/PerSegmentSingleValuedFaceting.java#L212]. > {{queue.udpateTop}} gets called {{numOfSegments*numTerms}}, the worst case > when every term is in every segment. This formula doesn't take into account > whether or not any of the terms have a positive count with respect to the > docset. > These optimizations are aimed to do two things: > # When mincount>0 don't include segments which all terms have zero counts. > This should significantly speed up processing when terms are high cardinality > and the matching docset is small > # FIXED TODO optimization: when mincount>0 move segment position the next non > zero term value. > both of these changes will minimize the number of called needed to the slow > {{updateTop}} call. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8176) Model distributed graph traversals with Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-8176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15124557#comment-15124557 ] Dennis Gove commented on SOLR-8176: --- I'm having trouble envisioning the expression for this. > Model distributed graph traversals with Streaming Expressions > - > > Key: SOLR-8176 > URL: https://issues.apache.org/jira/browse/SOLR-8176 > Project: Solr > Issue Type: New Feature > Components: clients - java, SolrCloud, SolrJ >Affects Versions: Trunk >Reporter: Joel Bernstein > Labels: Graph > Fix For: Trunk > > > I think it would be useful to model a few *distributed graph traversal* use > cases with Solr's *Streaming Expression* language. This ticket will explore > different approaches with a goal of implementing two or three common graph > traversal use cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8125) Umbrella ticket for Streaming and SQL issues
[ https://issues.apache.org/jira/browse/SOLR-8125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15052028#comment-15052028 ] Dennis Gove commented on SOLR-8125: --- I'm working on SOLR-7904 and should have a patch by tomorrow. I'd also like to get SOLR-8185 into Solr 6 if I can get it done. Will spend some time on it this weekend. > Umbrella ticket for Streaming and SQL issues > > > Key: SOLR-8125 > URL: https://issues.apache.org/jira/browse/SOLR-8125 > Project: Solr > Issue Type: New Feature > Components: SolrJ >Reporter: Joel Bernstein > > This is an umbrella ticket for tracking issues around the *Streaming API*, > *Streaming Expressions* and *Parallel SQL*. > Issues can be linked to this ticket and discussions about the road map can > also happen on this ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7904) Make FacetStream Expressible
[ https://issues.apache.org/jira/browse/SOLR-7904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15052745#comment-15052745 ] Dennis Gove commented on SOLR-7904: --- I'm finalizing some of the tests but so far everything is passing fine. The expression format is as follows {code} facet( collection1, q="*:*", fl="a_s,a_i,a_f", sort="a_s asc", buckets="a_s", bucketSorts="sum(a_i) asc", bucketSizeLimit=10, sum(a_i), sum(a_f), min(a_i), min(a_f), max(a_i), max(a_f), avg(a_i), avg(a_f), count(*), zkHost="url:port" ) {code} It supports multiple buckets and multiple bucketSorts. All standard query properties (q, fl, sort, etc...) are also supported. The example above is only showing 3 of them. zkHost is optional. > Make FacetStream Expressible > > > Key: SOLR-7904 > URL: https://issues.apache.org/jira/browse/SOLR-7904 > Project: Solr > Issue Type: New Feature >Affects Versions: Trunk >Reporter: Joel Bernstein > Fix For: Trunk > > > This ticket makes the FacetStream (SOLR-7903) expressible, so it can be used > as a Streaming Expression. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7904) Make FacetStream Expressible
[ https://issues.apache.org/jira/browse/SOLR-7904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15052794#comment-15052794 ] Dennis Gove commented on SOLR-7904: --- I did consider an alternative format that would put the bucket options together and allow for different things in each bucket but steered away from it because it would require larger changes to the FacetStream implementation and may not have a usecase {code} facet( collection1, q="*:*", fl="a_s,b_s,a_i,a_f", sort="a_s asc", bucket("a_s", sort="sum(a_i) asc", limit=5, sum(a_i), avg(a_i), count(*)), bucket("b_s", sort="max(a_i) desc, min(a_i) desc", limit=20, sum(a_i), min(a_i), max(a_i)), ) {code} > Make FacetStream Expressible > > > Key: SOLR-7904 > URL: https://issues.apache.org/jira/browse/SOLR-7904 > Project: Solr > Issue Type: New Feature >Affects Versions: Trunk >Reporter: Joel Bernstein > Fix For: Trunk > > > This ticket makes the FacetStream (SOLR-7903) expressible, so it can be used > as a Streaming Expression. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7904) Make FacetStream Expressible
[ https://issues.apache.org/jira/browse/SOLR-7904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15052809#comment-15052809 ] Dennis Gove commented on SOLR-7904: --- Alright. The expression parsing in similar to CloudSolrStream whereby some named parameters are required (buckets, bucketSorts, bucketSizeLimit) but the others are just passed down to the QueryRequest and are not considered explicitly. If fl and sort are not required then it'd just be a change in the documentation and not an implementation change (since the expression parsing doesn't explicitly look to ensure those were provided). > Make FacetStream Expressible > > > Key: SOLR-7904 > URL: https://issues.apache.org/jira/browse/SOLR-7904 > Project: Solr > Issue Type: New Feature >Affects Versions: Trunk >Reporter: Joel Bernstein > Fix For: Trunk > > > This ticket makes the FacetStream (SOLR-7903) expressible, so it can be used > as a Streaming Expression. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-7904) Make FacetStream Expressible
[ https://issues.apache.org/jira/browse/SOLR-7904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15052809#comment-15052809 ] Dennis Gove edited comment on SOLR-7904 at 12/11/15 2:15 PM: - Alright. The expression parsing is similar to CloudSolrStream whereby some named parameters are required (buckets, bucketSorts, bucketSizeLimit) but the others are just passed down to the QueryRequest and are not considered explicitly. If fl and sort are not required then it'd just be a change in the documentation and not an implementation change (since the expression parsing doesn't explicitly look to ensure those were provided). was (Author: dpgove): Alright. The expression parsing in similar to CloudSolrStream whereby some named parameters are required (buckets, bucketSorts, bucketSizeLimit) but the others are just passed down to the QueryRequest and are not considered explicitly. If fl and sort are not required then it'd just be a change in the documentation and not an implementation change (since the expression parsing doesn't explicitly look to ensure those were provided). > Make FacetStream Expressible > > > Key: SOLR-7904 > URL: https://issues.apache.org/jira/browse/SOLR-7904 > Project: Solr > Issue Type: New Feature >Affects Versions: Trunk >Reporter: Joel Bernstein > Fix For: Trunk > > > This ticket makes the FacetStream (SOLR-7903) expressible, so it can be used > as a Streaming Expression. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7904) Make FacetStream Expressible
[ https://issues.apache.org/jira/browse/SOLR-7904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-7904: -- Attachment: SOLR-7904.patch Fully implemented. All relevant tests pass. > Make FacetStream Expressible > > > Key: SOLR-7904 > URL: https://issues.apache.org/jira/browse/SOLR-7904 > Project: Solr > Issue Type: New Feature >Affects Versions: Trunk >Reporter: Joel Bernstein > Fix For: Trunk > > Attachments: SOLR-7904.patch > > > This ticket makes the FacetStream (SOLR-7903) expressible, so it can be used > as a Streaming Expression. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7904) Make FacetStream Expressible
[ https://issues.apache.org/jira/browse/SOLR-7904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-7904: -- Attachment: SOLR-7904.patch Addes facet as a default function in the StreamHandler. > Make FacetStream Expressible > > > Key: SOLR-7904 > URL: https://issues.apache.org/jira/browse/SOLR-7904 > Project: Solr > Issue Type: New Feature >Affects Versions: Trunk >Reporter: Joel Bernstein > Fix For: Trunk > > Attachments: SOLR-7904.patch, SOLR-7904.patch > > > This ticket makes the FacetStream (SOLR-7903) expressible, so it can be used > as a Streaming Expression. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-7904) Make FacetStream Expressible
[ https://issues.apache.org/jira/browse/SOLR-7904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15053908#comment-15053908 ] Dennis Gove edited comment on SOLR-7904 at 12/12/15 1:18 AM: - Adds facet as a default function in the StreamHandler. was (Author: dpgove): Addes facet as a default function in the StreamHandler. > Make FacetStream Expressible > > > Key: SOLR-7904 > URL: https://issues.apache.org/jira/browse/SOLR-7904 > Project: Solr > Issue Type: New Feature >Affects Versions: Trunk >Reporter: Joel Bernstein > Fix For: Trunk > > Attachments: SOLR-7904.patch, SOLR-7904.patch > > > This ticket makes the FacetStream (SOLR-7903) expressible, so it can be used > as a Streaming Expression. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-8409) Complex q param in Streaming Expression results in a bad query
Dennis Gove created SOLR-8409: - Summary: Complex q param in Streaming Expression results in a bad query Key: SOLR-8409 URL: https://issues.apache.org/jira/browse/SOLR-8409 Project: Solr Issue Type: Bug Components: SolrJ Reporter: Dennis Gove Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8409) Complex q param in Streaming Expression results in a bad query
[ https://issues.apache.org/jira/browse/SOLR-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-8409: -- Affects Version/s: 6.0 Trunk Labels: streaming streaming_api (was: ) Description: When providing an expression like {code} expression=search(people, fl="id,first", sort="first asc", q="presentTitles:\"chief executive officer\" AND age:[36 TO *]") {code} the following error is seen. {code} no field name specified in query and no default specified via 'df' param {code} I believe the issue is related to the \" (escaped quotes) and the spaces in the q field. If I remove the spaces then the query returns results as expected (though I've yet to validate if those results are accurate). This requires some investigation to get down to the root cause. I would like to fix it before Solr 6 is cut. > Complex q param in Streaming Expression results in a bad query > -- > > Key: SOLR-8409 > URL: https://issues.apache.org/jira/browse/SOLR-8409 > Project: Solr > Issue Type: Bug > Components: SolrJ >Affects Versions: Trunk, 6.0 >Reporter: Dennis Gove >Priority: Minor > Labels: streaming, streaming_api > > When providing an expression like > {code} > expression=search(people, fl="id,first", sort="first asc", > q="presentTitles:\"chief executive officer\" AND age:[36 TO *]") > {code} > the following error is seen. > {code} > no field name specified in query and no default specified via 'df' param > {code} > I believe the issue is related to the \" (escaped quotes) and the spaces in > the q field. If I remove the spaces then the query returns results as > expected (though I've yet to validate if those results are accurate). > This requires some investigation to get down to the root cause. I would like > to fix it before Solr 6 is cut. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8409) Complex q param in Streaming Expression results in a bad query
[ https://issues.apache.org/jira/browse/SOLR-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-8409: -- Description: When providing an expression like {code} stream=search(people, fl="id,first", sort="first asc", q="presentTitles:\"chief executive officer\" AND age:[36 TO *]") {code} the following error is seen. {code} no field name specified in query and no default specified via 'df' param {code} I believe the issue is related to the \" (escaped quotes) and the spaces in the q field. If I remove the spaces then the query returns results as expected (though I've yet to validate if those results are accurate). This requires some investigation to get down to the root cause. I would like to fix it before Solr 6 is cut. was: When providing an expression like {code} expression=search(people, fl="id,first", sort="first asc", q="presentTitles:\"chief executive officer\" AND age:[36 TO *]") {code} the following error is seen. {code} no field name specified in query and no default specified via 'df' param {code} I believe the issue is related to the \" (escaped quotes) and the spaces in the q field. If I remove the spaces then the query returns results as expected (though I've yet to validate if those results are accurate). This requires some investigation to get down to the root cause. I would like to fix it before Solr 6 is cut. > Complex q param in Streaming Expression results in a bad query > -- > > Key: SOLR-8409 > URL: https://issues.apache.org/jira/browse/SOLR-8409 > Project: Solr > Issue Type: Bug > Components: SolrJ >Affects Versions: Trunk, 6.0 >Reporter: Dennis Gove >Priority: Minor > Labels: streaming, streaming_api > > When providing an expression like > {code} > stream=search(people, fl="id,first", sort="first asc", > q="presentTitles:\"chief executive officer\" AND age:[36 TO *]") > {code} > the following error is seen. > {code} > no field name specified in query and no default specified via 'df' param > {code} > I believe the issue is related to the \" (escaped quotes) and the spaces in > the q field. If I remove the spaces then the query returns results as > expected (though I've yet to validate if those results are accurate). > This requires some investigation to get down to the root cause. I would like > to fix it before Solr 6 is cut. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8409) Complex q param in Streaming Expression results in a bad query
[ https://issues.apache.org/jira/browse/SOLR-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15053981#comment-15053981 ] Dennis Gove commented on SOLR-8409: --- I've been unable to replicate this in a unit test but have seen it in a fully packaged version of trunk. (ant package was run and then the tarball was unpacked). Differences between unit test and packaged version: * unit test is using dynamic fields while packaged version is using static fields * unit test is not going through the StreamHandler > Complex q param in Streaming Expression results in a bad query > -- > > Key: SOLR-8409 > URL: https://issues.apache.org/jira/browse/SOLR-8409 > Project: Solr > Issue Type: Bug > Components: SolrJ >Affects Versions: Trunk, 6.0 >Reporter: Dennis Gove >Priority: Minor > Labels: streaming, streaming_api > > When providing an expression like > {code} > stream=search(people, fl="id,first", sort="first asc", > q="presentTitles:\"chief executive officer\" AND age:[36 TO *]") > {code} > the following error is seen. > {code} > no field name specified in query and no default specified via 'df' param > {code} > I believe the issue is related to the \" (escaped quotes) and the spaces in > the q field. If I remove the spaces then the query returns results as > expected (though I've yet to validate if those results are accurate). > This requires some investigation to get down to the root cause. I would like > to fix it before Solr 6 is cut. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8125) Umbrella ticket for Streaming and SQL issues
[ https://issues.apache.org/jira/browse/SOLR-8125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15053983#comment-15053983 ] Dennis Gove commented on SOLR-8125: --- SOLR-8409 is a bug I'd like to get into Solr 6. I'd hate to see this go out in a major. > Umbrella ticket for Streaming and SQL issues > > > Key: SOLR-8125 > URL: https://issues.apache.org/jira/browse/SOLR-8125 > Project: Solr > Issue Type: New Feature > Components: SolrJ >Reporter: Joel Bernstein > > This is an umbrella ticket for tracking issues around the *Streaming API*, > *Streaming Expressions* and *Parallel SQL*. > Issues can be linked to this ticket and discussions about the road map can > also happen on this ticket. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8409) Complex q param in Streaming Expression results in a bad query
[ https://issues.apache.org/jira/browse/SOLR-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15053998#comment-15053998 ] Dennis Gove commented on SOLR-8409: --- It looks like this {code} presentTitles:\"chief executive officer\" AND age:[36 TO *] {code} I suspect that the \" is the culprit here because the streaming expression parser does not remove the \ before the quote. As such, and this is a hunch, I suspect that the query parser is seeing \" and not considering it a quote that is starting a phase but instead a quote that is just part of the string being searched. {code} chief executive officer {code} I believe this can be fixed by adding logic into the expression parser that will transform \" into " and in fact I've written that code (very simple) but my lack of ability to replicate in a unit test is preventing me from ensuring the issue is actually fixed. > Complex q param in Streaming Expression results in a bad query > -- > > Key: SOLR-8409 > URL: https://issues.apache.org/jira/browse/SOLR-8409 > Project: Solr > Issue Type: Bug > Components: SolrJ >Affects Versions: Trunk, 6.0 >Reporter: Dennis Gove >Priority: Minor > Labels: streaming, streaming_api > > When providing an expression like > {code} > stream=search(people, fl="id,first", sort="first asc", > q="presentTitles:\"chief executive officer\" AND age:[36 TO *]") > {code} > the following error is seen. > {code} > no field name specified in query and no default specified via 'df' param > {code} > I believe the issue is related to the \" (escaped quotes) and the spaces in > the q field. If I remove the spaces then the query returns results as > expected (though I've yet to validate if those results are accurate). > This requires some investigation to get down to the root cause. I would like > to fix it before Solr 6 is cut. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8409) Complex q param in Streaming Expression results in a bad query
[ https://issues.apache.org/jira/browse/SOLR-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15054004#comment-15054004 ] Dennis Gove commented on SOLR-8409: --- Backing up my hunch is that if I change the q to be {code} presentTitles:\"chief\" AND age:[36 TO *] {code} I get results back but a very small subset of the results I would expect to get back. I've yet to visually verify the source data but I would guess that there is a record containing a field value "chief" I'll check for that the next time I'm looking into this (by Monday I suspect) but I'd wager that I'll find it. > Complex q param in Streaming Expression results in a bad query > -- > > Key: SOLR-8409 > URL: https://issues.apache.org/jira/browse/SOLR-8409 > Project: Solr > Issue Type: Bug > Components: SolrJ >Affects Versions: Trunk, 6.0 >Reporter: Dennis Gove >Priority: Minor > Labels: streaming, streaming_api > > When providing an expression like > {code} > stream=search(people, fl="id,first", sort="first asc", > q="presentTitles:\"chief executive officer\" AND age:[36 TO *]") > {code} > the following error is seen. > {code} > no field name specified in query and no default specified via 'df' param > {code} > I believe the issue is related to the \" (escaped quotes) and the spaces in > the q field. If I remove the spaces then the query returns results as > expected (though I've yet to validate if those results are accurate). > This requires some investigation to get down to the root cause. I would like > to fix it before Solr 6 is cut. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-7904) Make FacetStream Expressible
[ https://issues.apache.org/jira/browse/SOLR-7904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove reassigned SOLR-7904: - Assignee: Dennis Gove > Make FacetStream Expressible > > > Key: SOLR-7904 > URL: https://issues.apache.org/jira/browse/SOLR-7904 > Project: Solr > Issue Type: New Feature >Affects Versions: Trunk >Reporter: Joel Bernstein >Assignee: Dennis Gove > Fix For: Trunk > > Attachments: SOLR-7904.patch, SOLR-7904.patch > > > This ticket makes the FacetStream (SOLR-7903) expressible, so it can be used > as a Streaming Expression. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7904) Make FacetStream Expressible
[ https://issues.apache.org/jira/browse/SOLR-7904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-7904: -- Attachment: SOLR-7904.patch Rebased against trunk. > Make FacetStream Expressible > > > Key: SOLR-7904 > URL: https://issues.apache.org/jira/browse/SOLR-7904 > Project: Solr > Issue Type: New Feature >Affects Versions: Trunk >Reporter: Joel Bernstein >Assignee: Dennis Gove > Fix For: Trunk > > Attachments: SOLR-7904.patch, SOLR-7904.patch, SOLR-7904.patch > > > This ticket makes the FacetStream (SOLR-7903) expressible, so it can be used > as a Streaming Expression. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (SOLR-7904) Make FacetStream Expressible
[ https://issues.apache.org/jira/browse/SOLR-7904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove closed SOLR-7904. - Resolution: Fixed > Make FacetStream Expressible > > > Key: SOLR-7904 > URL: https://issues.apache.org/jira/browse/SOLR-7904 > Project: Solr > Issue Type: New Feature >Affects Versions: Trunk >Reporter: Joel Bernstein >Assignee: Dennis Gove > Fix For: Trunk > > Attachments: SOLR-7904.patch, SOLR-7904.patch, SOLR-7904.patch > > > This ticket makes the FacetStream (SOLR-7903) expressible, so it can be used > as a Streaming Expression. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-8409) Complex q param in Streaming Expression results in a bad query
[ https://issues.apache.org/jira/browse/SOLR-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-8409: -- Attachment: SOLR-8409.patch This patch **appears** to fix the issues. Still am unable to replicate in a unit test but I have confirmed that the issue I was seeing in a packaged setup is fixed with this patch. I'll want to wait until I can get a replicated test before I commit this. > Complex q param in Streaming Expression results in a bad query > -- > > Key: SOLR-8409 > URL: https://issues.apache.org/jira/browse/SOLR-8409 > Project: Solr > Issue Type: Bug > Components: SolrJ >Affects Versions: Trunk, 6.0 >Reporter: Dennis Gove >Priority: Minor > Labels: streaming, streaming_api > Attachments: SOLR-8409.patch > > > When providing an expression like > {code} > stream=search(people, fl="id,first", sort="first asc", > q="presentTitles:\"chief executive officer\" AND age:[36 TO *]") > {code} > the following error is seen. > {code} > no field name specified in query and no default specified via 'df' param > {code} > I believe the issue is related to the \" (escaped quotes) and the spaces in > the q field. If I remove the spaces then the query returns results as > expected (though I've yet to validate if those results are accurate). > This requires some investigation to get down to the root cause. I would like > to fix it before Solr 6 is cut. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-8409) Complex q param in Streaming Expression results in a bad query
[ https://issues.apache.org/jira/browse/SOLR-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15056147#comment-15056147 ] Dennis Gove edited comment on SOLR-8409 at 12/14/15 3:48 PM: - This patch *appears* to fix the issues. Still am unable to replicate in a unit test but I have confirmed that the issue I was seeing in a packaged setup is fixed with this patch. I'll want to wait until I can get a replicated test before I commit this. was (Author: dpgove): This patch **appears** to fix the issues. Still am unable to replicate in a unit test but I have confirmed that the issue I was seeing in a packaged setup is fixed with this patch. I'll want to wait until I can get a replicated test before I commit this. > Complex q param in Streaming Expression results in a bad query > -- > > Key: SOLR-8409 > URL: https://issues.apache.org/jira/browse/SOLR-8409 > Project: Solr > Issue Type: Bug > Components: SolrJ >Affects Versions: Trunk, 6.0 >Reporter: Dennis Gove >Priority: Minor > Labels: streaming, streaming_api > Attachments: SOLR-8409.patch > > > When providing an expression like > {code} > stream=search(people, fl="id,first", sort="first asc", > q="presentTitles:\"chief executive officer\" AND age:[36 TO *]") > {code} > the following error is seen. > {code} > no field name specified in query and no default specified via 'df' param > {code} > I believe the issue is related to the \" (escaped quotes) and the spaces in > the q field. If I remove the spaces then the query returns results as > expected (though I've yet to validate if those results are accurate). > This requires some investigation to get down to the root cause. I would like > to fix it before Solr 6 is cut. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8409) Complex q param in Streaming Expression results in a bad query
[ https://issues.apache.org/jira/browse/SOLR-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15057921#comment-15057921 ] Dennis Gove commented on SOLR-8409: --- Interestingly, if I leave the q param out entirely I don't see any raised exception. Also, if I leave out a field to filter on I also don't see any raised exception. I've confirmed the solrconfig-streaming.xml doesn't include either default q or df settings so I'd expect to see an exception in both of these cases. {code} search(collection1, fl="id,a_s,a_i,a_f", sort="a_f asc, a_i asc") search(collection1, fl="id,a_s,a_i,a_f", sort="a_f asc, a_i asc", q="foo") {code} > Complex q param in Streaming Expression results in a bad query > -- > > Key: SOLR-8409 > URL: https://issues.apache.org/jira/browse/SOLR-8409 > Project: Solr > Issue Type: Bug > Components: SolrJ >Affects Versions: Trunk >Reporter: Dennis Gove >Priority: Minor > Labels: streaming, streaming_api > Attachments: SOLR-8409.patch > > > When providing an expression like > {code} > stream=search(people, fl="id,first", sort="first asc", > q="presentTitles:\"chief executive officer\" AND age:[36 TO *]") > {code} > the following error is seen. > {code} > no field name specified in query and no default specified via 'df' param > {code} > I believe the issue is related to the \" (escaped quotes) and the spaces in > the q field. If I remove the spaces then the query returns results as > expected (though I've yet to validate if those results are accurate). > This requires some investigation to get down to the root cause. I would like > to fix it before Solr 6 is cut. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8409) Complex q param in Streaming Expression results in a bad query
[ https://issues.apache.org/jira/browse/SOLR-8409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15057983#comment-15057983 ] Dennis Gove commented on SOLR-8409: --- I take that back. The file schema-streaming.xml contains the default query field {code} text {code} If I comment out that setting then I am able to replicate the failure described in this ticket - finally. I will create a couple valid tests replicating the issue and will commit the fix as soon as I can. > Complex q param in Streaming Expression results in a bad query > -- > > Key: SOLR-8409 > URL: https://issues.apache.org/jira/browse/SOLR-8409 > Project: Solr > Issue Type: Bug > Components: SolrJ >Affects Versions: Trunk >Reporter: Dennis Gove >Priority: Minor > Labels: streaming, streaming_api > Attachments: SOLR-8409.patch > > > When providing an expression like > {code} > stream=search(people, fl="id,first", sort="first asc", > q="presentTitles:\"chief executive officer\" AND age:[36 TO *]") > {code} > the following error is seen. > {code} > no field name specified in query and no default specified via 'df' param > {code} > I believe the issue is related to the \" (escaped quotes) and the spaces in > the q field. If I remove the spaces then the query returns results as > expected (though I've yet to validate if those results are accurate). > This requires some investigation to get down to the root cause. I would like > to fix it before Solr 6 is cut. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7525) Add ComplementStream to the Streaming API and Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063456#comment-15063456 ] Dennis Gove commented on SOLR-7525: --- I'll rebase this off trunk so it is a little cleaner but I think the use of ReducerStream still holds. The purpose of Complement and Intersect is to return tuples in A which either do or do not exist in B. The tuples in B aren't used for anything and are dropped as soon as possible. The reason they make use of the ReducerStream is because B having 1 instance of some tuple found in A is the same as B having 100 instances of some tuple found in A. Whether its 1 or 100 the tuple exists in B so it can either be returned in A or not. For this reason the size of the ReducerStream can always just be 1 because we only care about the first one and all others can be dropped from B. The fieldName (or fieldNames because you can do an intersect on N fields) provided to the ReducerStream are the fields the Intersect or Complement streams are acting on. Essentially, the goal is to take all the tuples in B and reduce them down to a unique list of tuples where uniqueness is defined over the fields that the intersect or complement is being checked over. Given that B is a set of unique tuples it is much easier to know when to move onto the next tuple in B. I'll take a look at the GroupOperation but I would suspect that it can use a StreamEqualitor instead of a StreamComparator. A comparator allows order while an equalitor just checks if they are equal. There may be a reason it allows for ordering, though. > Add ComplementStream to the Streaming API and Streaming Expressions > --- > > Key: SOLR-7525 > URL: https://issues.apache.org/jira/browse/SOLR-7525 > Project: Solr > Issue Type: New Feature > Components: SolrJ >Reporter: Joel Bernstein >Priority: Minor > Attachments: SOLR-7525.patch > > > This ticket adds a ComplementStream to the Streaming API and Streaming > Expression language. > The ComplementStream will wrap two TupleStreams (StreamA, StreamB) and emit > Tuples from StreamA that are not in StreamB. > Streaming API Syntax: > {code} > ComplementStream cstream = new ComplementStream(streamA, streamB, comp); > {code} > Streaming Expression syntax: > {code} > complement(search(...), search(...), on(...)) > {code} > Internal implementation will rely on the ReducerStream. The ComplementStream > can be parallelized using the ParallelStream. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-7525) Add ComplementStream to the Streaming API and Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15063456#comment-15063456 ] Dennis Gove edited comment on SOLR-7525 at 12/18/15 4:56 AM: - I'll rebase this off trunk so it is a little cleaner but I think the use of ReducerStream still holds. The purpose of Complement and Intersect is to return tuples in A which either do or do not exist in B. The tuples in B aren't used for anything and are dropped as soon as possible. The reason they make use of the ReducerStream is because B having 1 instance of some tuple found in A is the same as B having 100 instances of some tuple found in A. Whether its 1 or 100 the tuple exists in B so its twin in A can either be returned from A or not. For this reason the size of the ReducerStream can always just be 1 because we only care about the first one and all others can be dropped from B. The fieldName (or fieldNames because you can do an intersect on N fields) provided to the ReducerStream are the fields the Intersect or Complement streams are acting on. Essentially, the goal is to take all the tuples in B and reduce them down to a unique list of tuples where uniqueness is defined over the fields that the intersect or complement is being checked over. Given that B is a set of unique tuples it is much easier to know when to move onto the next tuple in B. I'll take a look at the GroupOperation but I would suspect that it can use a StreamEqualitor instead of a StreamComparator. A comparator allows order while an equalitor just checks if they are equal. There may be a reason it allows for ordering, though. was (Author: dpgove): I'll rebase this off trunk so it is a little cleaner but I think the use of ReducerStream still holds. The purpose of Complement and Intersect is to return tuples in A which either do or do not exist in B. The tuples in B aren't used for anything and are dropped as soon as possible. The reason they make use of the ReducerStream is because B having 1 instance of some tuple found in A is the same as B having 100 instances of some tuple found in A. Whether its 1 or 100 the tuple exists in B so it can either be returned in A or not. For this reason the size of the ReducerStream can always just be 1 because we only care about the first one and all others can be dropped from B. The fieldName (or fieldNames because you can do an intersect on N fields) provided to the ReducerStream are the fields the Intersect or Complement streams are acting on. Essentially, the goal is to take all the tuples in B and reduce them down to a unique list of tuples where uniqueness is defined over the fields that the intersect or complement is being checked over. Given that B is a set of unique tuples it is much easier to know when to move onto the next tuple in B. I'll take a look at the GroupOperation but I would suspect that it can use a StreamEqualitor instead of a StreamComparator. A comparator allows order while an equalitor just checks if they are equal. There may be a reason it allows for ordering, though. > Add ComplementStream to the Streaming API and Streaming Expressions > --- > > Key: SOLR-7525 > URL: https://issues.apache.org/jira/browse/SOLR-7525 > Project: Solr > Issue Type: New Feature > Components: SolrJ >Reporter: Joel Bernstein >Priority: Minor > Attachments: SOLR-7525.patch > > > This ticket adds a ComplementStream to the Streaming API and Streaming > Expression language. > The ComplementStream will wrap two TupleStreams (StreamA, StreamB) and emit > Tuples from StreamA that are not in StreamB. > Streaming API Syntax: > {code} > ComplementStream cstream = new ComplementStream(streamA, streamB, comp); > {code} > Streaming Expression syntax: > {code} > complement(search(...), search(...), on(...)) > {code} > Internal implementation will rely on the ReducerStream. The ComplementStream > can be parallelized using the ParallelStream. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-8443) Change /stream handler http param from "stream" to "func"
[ https://issues.apache.org/jira/browse/SOLR-8443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064290#comment-15064290 ] Dennis Gove commented on SOLR-8443: --- If open to other suggestions, I find that I tend to refer to that parameter as the expression. Maybe expr=search() > Change /stream handler http param from "stream" to "func" > - > > Key: SOLR-8443 > URL: https://issues.apache.org/jira/browse/SOLR-8443 > Project: Solr > Issue Type: Bug > Components: SolrJ >Reporter: Joel Bernstein >Priority: Minor > > When passing in a Streaming Expression to the /stream handler you currently > use the "stream" http parameter. This dates back to when serialized > TupleStream objects were passed in. Now that the /stream handler only accepts > Streaming Expressions it makes sense to rename this parameter to "func". > This syntax also helps to emphasize that Streaming Expressions are a function > language. > For example: > http://localhost:8983/collection1/stream?func=search(...) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-8443) Change /stream handler http param from "stream" to "func"
[ https://issues.apache.org/jira/browse/SOLR-8443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064290#comment-15064290 ] Dennis Gove edited comment on SOLR-8443 at 12/18/15 5:35 PM: - If open to other suggestions, I find that I tend to refer to that parameter as the expression. Maybe expr=search(). My thinking here is that one is providing a (potentially complex) expression made up of function calls. was (Author: dpgove): If open to other suggestions, I find that I tend to refer to that parameter as the expression. Maybe expr=search() > Change /stream handler http param from "stream" to "func" > - > > Key: SOLR-8443 > URL: https://issues.apache.org/jira/browse/SOLR-8443 > Project: Solr > Issue Type: Bug > Components: SolrJ >Reporter: Joel Bernstein >Priority: Minor > > When passing in a Streaming Expression to the /stream handler you currently > use the "stream" http parameter. This dates back to when serialized > TupleStream objects were passed in. Now that the /stream handler only accepts > Streaming Expressions it makes sense to rename this parameter to "func". > This syntax also helps to emphasize that Streaming Expressions are a function > language. > For example: > http://localhost:8983/collection1/stream?func=search(...) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7525) Add ComplementStream to the Streaming API and Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-7525: -- Attachment: SOLR-7525.patch Rebases off of trunk and adds a DistinctOperation for use in the ReducerStream. The DistinctOperation ensures that for any given group only a single tuple will be returned. Currently it is implemented to return the first tuple in a group but a possible enhancement down the road could be to support a parameter asking for some other tuple in the group (such as the first in a sub-sorted list). Also, while implementing this I realized that the UniqueStream can be refactored to be just a type of ReducerStream with DistinctOperation. That change is not included in this patch but will be done under a separate ticket. Also of note, I'm not sure if the getChildren() function declared in TupleStream is necessary any longer. If I recall correctly that function was used by the StreamHandler when passing streams to workers but since all that has been changed to pass the result of toExpression() I think we can get rid of the getChildren() function. I will explore that possibility. > Add ComplementStream to the Streaming API and Streaming Expressions > --- > > Key: SOLR-7525 > URL: https://issues.apache.org/jira/browse/SOLR-7525 > Project: Solr > Issue Type: New Feature > Components: SolrJ >Reporter: Joel Bernstein >Priority: Minor > Attachments: SOLR-7525.patch, SOLR-7525.patch > > > This ticket adds a ComplementStream to the Streaming API and Streaming > Expression language. > The ComplementStream will wrap two TupleStreams (StreamA, StreamB) and emit > Tuples from StreamA that are not in StreamB. > Streaming API Syntax: > {code} > ComplementStream cstream = new ComplementStream(streamA, streamB, comp); > {code} > Streaming Expression syntax: > {code} > complement(search(...), search(...), on(...)) > {code} > Internal implementation will rely on the ReducerStream. The ComplementStream > can be parallelized using the ParallelStream. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-7525) Add ComplementStream to the Streaming API and Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Gove updated SOLR-7525: -- Attachment: SOLR-7525.patch As it turns out IntersectStream and ComplementStream can both make use of a UniqueStream which makes use of a ReducerStream. As such this new patch implements Intersect and Complement with streamB as an instance of UniqueStream. UniqueStream is changed to be implemented as a type of ReducerStream. > Add ComplementStream to the Streaming API and Streaming Expressions > --- > > Key: SOLR-7525 > URL: https://issues.apache.org/jira/browse/SOLR-7525 > Project: Solr > Issue Type: New Feature > Components: SolrJ >Reporter: Joel Bernstein >Priority: Minor > Attachments: SOLR-7525.patch, SOLR-7525.patch, SOLR-7525.patch > > > This ticket adds a ComplementStream to the Streaming API and Streaming > Expression language. > The ComplementStream will wrap two TupleStreams (StreamA, StreamB) and emit > Tuples from StreamA that are not in StreamB. > Streaming API Syntax: > {code} > ComplementStream cstream = new ComplementStream(streamA, streamB, comp); > {code} > Streaming Expression syntax: > {code} > complement(search(...), search(...), on(...)) > {code} > Internal implementation will rely on the ReducerStream. The ComplementStream > can be parallelized using the ParallelStream. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-7525) Add ComplementStream to the Streaming API and Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15064830#comment-15064830 ] Dennis Gove edited comment on SOLR-7525 at 12/18/15 10:06 PM: -- Rebases off of trunk and adds a DistinctOperation for use in the ReducerStream. The DistinctOperation ensures that for any given group only a single tuple will be returned. Currently it is implemented to return the first tuple in a group but a possible enhancement down the road could be to support a parameter asking for some other tuple in the group (such as the first in a sub-sorted list). Also, while implementing this I realized that the UniqueStream can be refactored to be just a type of ReducerStream with DistinctOperation. -That change is not included in this patch but will be done under a separate ticket.- Also of note, I'm not sure if the getChildren() function declared in TupleStream is necessary any longer. If I recall correctly that function was used by the StreamHandler when passing streams to workers but since all that has been changed to pass the result of toExpression() I think we can get rid of the getChildren() function. I will explore that possibility. was (Author: dpgove): Rebases off of trunk and adds a DistinctOperation for use in the ReducerStream. The DistinctOperation ensures that for any given group only a single tuple will be returned. Currently it is implemented to return the first tuple in a group but a possible enhancement down the road could be to support a parameter asking for some other tuple in the group (such as the first in a sub-sorted list). Also, while implementing this I realized that the UniqueStream can be refactored to be just a type of ReducerStream with DistinctOperation. That change is not included in this patch but will be done under a separate ticket. Also of note, I'm not sure if the getChildren() function declared in TupleStream is necessary any longer. If I recall correctly that function was used by the StreamHandler when passing streams to workers but since all that has been changed to pass the result of toExpression() I think we can get rid of the getChildren() function. I will explore that possibility. > Add ComplementStream to the Streaming API and Streaming Expressions > --- > > Key: SOLR-7525 > URL: https://issues.apache.org/jira/browse/SOLR-7525 > Project: Solr > Issue Type: New Feature > Components: SolrJ >Reporter: Joel Bernstein >Priority: Minor > Attachments: SOLR-7525.patch, SOLR-7525.patch, SOLR-7525.patch > > > This ticket adds a ComplementStream to the Streaming API and Streaming > Expression language. > The ComplementStream will wrap two TupleStreams (StreamA, StreamB) and emit > Tuples from StreamA that are not in StreamB. > Streaming API Syntax: > {code} > ComplementStream cstream = new ComplementStream(streamA, streamB, comp); > {code} > Streaming Expression syntax: > {code} > complement(search(...), search(...), on(...)) > {code} > Internal implementation will rely on the ReducerStream. The ComplementStream > can be parallelized using the ParallelStream. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7525) Add ComplementStream to the Streaming API and Streaming Expressions
[ https://issues.apache.org/jira/browse/SOLR-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065536#comment-15065536 ] Dennis Gove commented on SOLR-7525: --- Yes, you hit that right on the head. It was for consistency in the structure of Expressible classes. Also, currently it's implemented to return the first seen tuple in a group. However, I could see an enhancement where one could provide a selector to choose maybe the last seen, or the first based on some alternative order. For example, were someone to use the DistinctOperation in an expression it would currently look like this {code} distinct() {code} but I could also see it looking like one of these {code} distinct(first, sort="fieldA desc, fieldB desc") distinct(first, having="fieldA != null") {code} Essentially, although not currently supported it would be possible to expand the reducer operations to support complex selectors when a choice over which tuple to select is required. All that said, for now it's just for consistency. > Add ComplementStream to the Streaming API and Streaming Expressions > --- > > Key: SOLR-7525 > URL: https://issues.apache.org/jira/browse/SOLR-7525 > Project: Solr > Issue Type: New Feature > Components: SolrJ >Reporter: Joel Bernstein >Priority: Minor > Attachments: SOLR-7525.patch, SOLR-7525.patch, SOLR-7525.patch > > > This ticket adds a ComplementStream to the Streaming API and Streaming > Expression language. > The ComplementStream will wrap two TupleStreams (StreamA, StreamB) and emit > Tuples from StreamA that are not in StreamB. > Streaming API Syntax: > {code} > ComplementStream cstream = new ComplementStream(streamA, streamB, comp); > {code} > Streaming Expression syntax: > {code} > complement(search(...), search(...), on(...)) > {code} > Internal implementation will rely on the ReducerStream. The ComplementStream > can be parallelized using the ParallelStream. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org