[jira] [Commented] (SOLR-8743) Data loss on shard recovery and leader election
[ https://issues.apache.org/jira/browse/SOLR-8743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170810#comment-15170810 ] Matt Weber commented on SOLR-8743: -- Thank you for reopening, I feel it is a bug in step 6. In step 5 when NodeA comes back online, it stays down (I assume) because it knows NodeB should be the primary due to the state in ZK that never went down. In step 6 when NodeB comes back online, it *should* pick NodeB as the primary and sync from there, however it appears to pick the first node that came online even if it has stale data. If step 6 never happens, I think it should stay down forever or until the user forces it back online. > Data loss on shard recovery and leader election > --- > > Key: SOLR-8743 > URL: https://issues.apache.org/jira/browse/SOLR-8743 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 5.5 >Reporter: Matt Weber > > SolrCloud with 3 external ZK nodes in quorum, 2 data nodes (NodeA and NodeB). > Single collection that has a single shard and a single replica (replication > factor 2). The primary shard is initially on NodeA and the replica on NodeB. > 1. Index Doc1 via NodeA > ./solr/bin/post -c people -d ' name="id">1' > 2. Shutdown NodeA, NodeB becomes the primary. Doc1 is searchable. > 3. Delete Doc1 and index Doc2 via NodeB. > ./solr/bin/post -c people -d "1" > ./solr/bin/post -c people -d ' name="id">2' > 4. Shutdown NodeB, no nodes online. ZK still up. > 5. Start NodeA, it remains in "down" status since NodeB is down and it was > last seen as the primary. > 6. Start NodeB, it comes online in "down" status. NodeA is elected leader > and sync starts. Doc2 is gone, Doc1 exists in both shards. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-8743) Data loss on shard recovery and leader election
Matt Weber created SOLR-8743: Summary: Data loss on shard recovery and leader election Key: SOLR-8743 URL: https://issues.apache.org/jira/browse/SOLR-8743 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 5.5 Reporter: Matt Weber SolrCloud with 3 external ZK nodes in quorum, 2 data nodes (NodeA and NodeB). Single collection that has a single shard and a single replica (replication factor 2). The primary shard is initially on NodeA and the replica on NodeB. 1. Index Doc1 via NodeA ./solr/bin/post -c people -d '1' 2. Shutdown NodeA, NodeB becomes the primary. Doc1 is searchable. 3. Delete Doc1 and index Doc2 via NodeB. ./solr/bin/post -c people -d "1" ./solr/bin/post -c people -d '2' 4. Shutdown NodeB, no nodes online. ZK still up. 5. Start NodeA, it remains in "down" status since NodeB is down and it was last seen as the primary. 6. Start NodeB, it comes online in "down" status. NodeA is elected leader and sync starts. Doc2 is gone, Doc1 exists in both shards. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-7495) Unexpected docvalues type NUMERIC when grouping by a int facet
[ https://issues.apache.org/jira/browse/SOLR-7495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15236395#comment-15236395 ] Matt Weber commented on SOLR-7495: -- group.facet also breaks range queries. I'm finding v5 almost non-usable. I tried setting docValues to true on the fields (and re-indexing), but it hasn't worked differently. The only workaround I found was using facet queries with multi-valued fields, but that doesn't help with everything. The work around doesn't apply to range facets. For example: This works... =rounded_price_price.facet.range.start=0_price.facet.range.end=100_price.facet.range.gap=10=on But simply adding "group.facet" fails... =rounded_price_price.facet.range.start=0_price.facet.range.end=100_price.facet.range.gap=10=on=true=anyNumericField > Unexpected docvalues type NUMERIC when grouping by a int facet > -- > > Key: SOLR-7495 > URL: https://issues.apache.org/jira/browse/SOLR-7495 > Project: Solr > Issue Type: Bug >Affects Versions: 5.0, 5.1, 5.2, 5.3 >Reporter: Fabio Batista da Silva > Attachments: SOLR-7495.patch > > > Hey All, > After upgrading from solr 4.10 to 5.1 with solr could > I'm getting a IllegalStateException when i try to facet a int field. > IllegalStateException: unexpected docvalues type NUMERIC for field 'year' > (expected=SORTED). Use UninvertingReader or index with docvalues. > schema.xml > {code} > > > > > > > multiValued="false" required="true"/> > multiValued="false" required="true"/> > > > stored="true"/> > > > > /> > sortMissingLast="true"/> > positionIncrementGap="0"/> > positionIncrementGap="0"/> > positionIncrementGap="0"/> > precisionStep="0" positionIncrementGap="0"/> > positionIncrementGap="0"/> > positionIncrementGap="100"> > > > words="stopwords.txt" /> > > maxGramSize="15"/> > > > > words="stopwords.txt" /> > synonyms="synonyms.txt" ignoreCase="true" expand="true"/> > > > > positionIncrementGap="100"> > > > words="stopwords.txt" /> > > maxGramSize="15"/> > > > > words="stopwords.txt" /> > synonyms="synonyms.txt" ignoreCase="true" expand="true"/> > > > > class="solr.SpatialRecursivePrefixTreeFieldType" geo="true" > distErrPct="0.025" maxDistErr="0.09" units="degrees" /> > > id > name > > > {code} > query : > {code} > http://solr.dev:8983/solr/my_collection/select?wt=json=id=index_type:foobar=true=year_make_model=true=true=year > {code} > Exception : > {code} > ull:org.apache.solr.common.SolrException: Exception during facet.field: year > at org.apache.solr.request.SimpleFacets$3.call(SimpleFacets.java:627) > at org.apache.solr.request.SimpleFacets$3.call(SimpleFacets.java:612) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at org.apache.solr.request.SimpleFacets$2.execute(SimpleFacets.java:566) > at > org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:637) > at > org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:280) > at > org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:106) > at > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:222) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1984) > at > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) > at >
[jira] [Commented] (LUCENE-7638) Optimize graph query produced by QueryBuilder
[ https://issues.apache.org/jira/browse/LUCENE-7638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15856188#comment-15856188 ] Matt Weber commented on LUCENE-7638: [~jim.ferenczi] Sorry so late been swamped. Anyways, this is great! I really like this approach, awesome job man! > Optimize graph query produced by QueryBuilder > - > > Key: LUCENE-7638 > URL: https://issues.apache.org/jira/browse/LUCENE-7638 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Jim Ferenczi > Attachments: LUCENE-7638.patch, LUCENE-7638.patch > > > The QueryBuilder creates a graph query when the underlying TokenStream > contains token with PositionLengthAttribute greater than 1. > These TokenStreams are in fact graphs (lattice to be more precise) where > synonyms can span on multiple terms. > Currently the graph query is built by visiting all the path of the graph > TokenStream. For instance if you have a synonym like "ny, new york" and you > search for "new york city", the query builder would produce two pathes: > "new york city", "ny city" > This can quickly explode when the number of multi terms synonyms increase. > The query "ny ny" for instance would produce 4 pathes and so on. > For boolean queries with should or must clauses it should be more efficient > to build a boolean query that merges all the intersections in the graph. So > instead of "new york city", "ny city" we could produce: > "+((+new +york) ny) +city" > The attached patch is a proposal to do that instead of the all path solution. > The patch transforms multi terms synonyms in graph query for each > intersection in the graph. This is not done in this patch but we could also > create a specialized query that gives equivalent scores to multi terms > synonyms like the SynonymQuery does for single term synonyms. > For phrase query this patch does not change the current behavior but we could > also use the new method to create optimized graph SpanQuery. > [~mattweber] I think this patch could optimize a lot of cases where multiple > muli-terms synonyms are present in a single request. Could you take a look ? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7699) Apply graph articulation points optimization to phrase graph queries
[ https://issues.apache.org/jira/browse/LUCENE-7699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876101#comment-15876101 ] Matt Weber commented on LUCENE-7699: [~jim.ferenczi] That check was intended, but as you said, it is essentially pointless. I will remove it. Yes, I think {{GraphQuery}} should go as well. It was only needed when we needed to detect the graph to apply minimum should match and phrase slop which is no longer the case. Should that be separate issue? > Apply graph articulation points optimization to phrase graph queries > > > Key: LUCENE-7699 > URL: https://issues.apache.org/jira/browse/LUCENE-7699 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Matt Weber > Attachments: LUCENE-7699.patch > > > Follow-up to LUCENE-7638 that applies the same articulation point logic to > graph phrases using span queries. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (LUCENE-7699) Apply graph articulation points optimization to phrase graph queries
[ https://issues.apache.org/jira/browse/LUCENE-7699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Weber closed LUCENE-7699. -- Resolution: Fixed Thanks [~jim.ferenczi]! > Apply graph articulation points optimization to phrase graph queries > > > Key: LUCENE-7699 > URL: https://issues.apache.org/jira/browse/LUCENE-7699 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Matt Weber > Attachments: LUCENE-7699.patch, LUCENE-7699.patch > > > Follow-up to LUCENE-7638 that applies the same articulation point logic to > graph phrases using span queries. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (LUCENE-7702) Remove GraphQuery
[ https://issues.apache.org/jira/browse/LUCENE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Weber closed LUCENE-7702. -- Resolution: Fixed Thanks [~jim.ferenczi]! > Remove GraphQuery > - > > Key: LUCENE-7702 > URL: https://issues.apache.org/jira/browse/LUCENE-7702 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Matt Weber > Attachments: LUCENE-7702.patch > > > With LUCENE-7638 and LUCENE-7699 the {{GraphQuery}} wrapper is no longer > needed and we can use standard queries. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-7638) Optimize graph query produced by QueryBuilder
[ https://issues.apache.org/jira/browse/LUCENE-7638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875182#comment-15875182 ] Matt Weber edited comment on LUCENE-7638 at 2/21/17 12:38 AM: -- [~jim.ferenczi] [~mikemccand] Could we apply the same articulation points logic in analyzeGraphPhrase but generate span queries to essentially act like a phrase? {code} SpanNear[ SpanOr[ SpanNear[SpanTerm[new], SpanTerm[york]] SpanTerm[ny] ], SpanTerm[city] ] {code} was (Author: mattweber): [~jim.ferenczi] [~mikemccand] Could we apply the same articulation points logic in analyzeGraphPhrase but generate span queries to essentially act like a phrase? SpanNear[ SpanOr[ SpanNear[SpanTerm[new], SpanTerm[york]] SpanTerm[ny] ], SpanTerm[city] ] > Optimize graph query produced by QueryBuilder > - > > Key: LUCENE-7638 > URL: https://issues.apache.org/jira/browse/LUCENE-7638 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Jim Ferenczi > Attachments: LUCENE-7638.patch, LUCENE-7638.patch > > > The QueryBuilder creates a graph query when the underlying TokenStream > contains token with PositionLengthAttribute greater than 1. > These TokenStreams are in fact graphs (lattice to be more precise) where > synonyms can span on multiple terms. > Currently the graph query is built by visiting all the path of the graph > TokenStream. For instance if you have a synonym like "ny, new york" and you > search for "new york city", the query builder would produce two pathes: > "new york city", "ny city" > This can quickly explode when the number of multi terms synonyms increase. > The query "ny ny" for instance would produce 4 pathes and so on. > For boolean queries with should or must clauses it should be more efficient > to build a boolean query that merges all the intersections in the graph. So > instead of "new york city", "ny city" we could produce: > "+((+new +york) ny) +city" > The attached patch is a proposal to do that instead of the all path solution. > The patch transforms multi terms synonyms in graph query for each > intersection in the graph. This is not done in this patch but we could also > create a specialized query that gives equivalent scores to multi terms > synonyms like the SynonymQuery does for single term synonyms. > For phrase query this patch does not change the current behavior but we could > also use the new method to create optimized graph SpanQuery. > [~mattweber] I think this patch could optimize a lot of cases where multiple > muli-terms synonyms are present in a single request. Could you take a look ? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7638) Optimize graph query produced by QueryBuilder
[ https://issues.apache.org/jira/browse/LUCENE-7638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875182#comment-15875182 ] Matt Weber commented on LUCENE-7638: [~jim.ferenczi] [~mikemccand] Could we apply the same articulation points logic in analyzeGraphPhrase but generate span queries to essentially act like a phrase? SpanNear[ SpanOr[ SpanNear[SpanTerm[new], SpanTerm[york]] SpanTerm[ny] ], SpanTerm[city] ] > Optimize graph query produced by QueryBuilder > - > > Key: LUCENE-7638 > URL: https://issues.apache.org/jira/browse/LUCENE-7638 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Jim Ferenczi > Attachments: LUCENE-7638.patch, LUCENE-7638.patch > > > The QueryBuilder creates a graph query when the underlying TokenStream > contains token with PositionLengthAttribute greater than 1. > These TokenStreams are in fact graphs (lattice to be more precise) where > synonyms can span on multiple terms. > Currently the graph query is built by visiting all the path of the graph > TokenStream. For instance if you have a synonym like "ny, new york" and you > search for "new york city", the query builder would produce two pathes: > "new york city", "ny city" > This can quickly explode when the number of multi terms synonyms increase. > The query "ny ny" for instance would produce 4 pathes and so on. > For boolean queries with should or must clauses it should be more efficient > to build a boolean query that merges all the intersections in the graph. So > instead of "new york city", "ny city" we could produce: > "+((+new +york) ny) +city" > The attached patch is a proposal to do that instead of the all path solution. > The patch transforms multi terms synonyms in graph query for each > intersection in the graph. This is not done in this patch but we could also > create a specialized query that gives equivalent scores to multi terms > synonyms like the SynonymQuery does for single term synonyms. > For phrase query this patch does not change the current behavior but we could > also use the new method to create optimized graph SpanQuery. > [~mattweber] I think this patch could optimize a lot of cases where multiple > muli-terms synonyms are present in a single request. Could you take a look ? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-7699) Apply graph articulation points optimization to phrase graph queries
Matt Weber created LUCENE-7699: -- Summary: Apply graph articulation points optimization to phrase graph queries Key: LUCENE-7699 URL: https://issues.apache.org/jira/browse/LUCENE-7699 Project: Lucene - Core Issue Type: Improvement Reporter: Matt Weber Follow-up to LUCENE-7638 that applies the same articulation point logic to graph phrases using span queries. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7699) Apply graph articulation points optimization to phrase graph queries
[ https://issues.apache.org/jira/browse/LUCENE-7699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Weber updated LUCENE-7699: --- Attachment: LUCENE-7699.patch WIP > Apply graph articulation points optimization to phrase graph queries > > > Key: LUCENE-7699 > URL: https://issues.apache.org/jira/browse/LUCENE-7699 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Matt Weber > Attachments: LUCENE-7699.patch > > > Follow-up to LUCENE-7638 that applies the same articulation point logic to > graph phrases using span queries. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7699) Apply graph articulation points optimization to phrase graph queries
[ https://issues.apache.org/jira/browse/LUCENE-7699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15875325#comment-15875325 ] Matt Weber commented on LUCENE-7699: [~jim.ferenczi] [~mikemccand] What do you think? > Apply graph articulation points optimization to phrase graph queries > > > Key: LUCENE-7699 > URL: https://issues.apache.org/jira/browse/LUCENE-7699 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Matt Weber > Attachments: LUCENE-7699.patch > > > Follow-up to LUCENE-7638 that applies the same articulation point logic to > graph phrases using span queries. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7699) Apply graph articulation points optimization to phrase graph queries
[ https://issues.apache.org/jira/browse/LUCENE-7699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Weber updated LUCENE-7699: --- Attachment: LUCENE-7699.patch Updated patch with fixed tests. > Apply graph articulation points optimization to phrase graph queries > > > Key: LUCENE-7699 > URL: https://issues.apache.org/jira/browse/LUCENE-7699 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Matt Weber > Attachments: LUCENE-7699.patch, LUCENE-7699.patch > > > Follow-up to LUCENE-7638 that applies the same articulation point logic to > graph phrases using span queries. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-7702) Remove GraphQuery
Matt Weber created LUCENE-7702: -- Summary: Remove GraphQuery Key: LUCENE-7702 URL: https://issues.apache.org/jira/browse/LUCENE-7702 Project: Lucene - Core Issue Type: Improvement Reporter: Matt Weber With LUCENE-7638 and LUCENE-7699 the {{GraphQuery}}wrapper is no longer needed and we can use standard queries. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7702) Remove GraphQuery
[ https://issues.apache.org/jira/browse/LUCENE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Weber updated LUCENE-7702: --- Attachment: LUCENE-7702.patch Patch. Assumes LUCENE-7699 is also applied. > Remove GraphQuery > - > > Key: LUCENE-7702 > URL: https://issues.apache.org/jira/browse/LUCENE-7702 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Matt Weber > Attachments: LUCENE-7702.patch > > > With LUCENE-7638 and LUCENE-7699 the {{GraphQuery}}wrapper is no longer > needed and we can use standard queries. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7702) Remove GraphQuery
[ https://issues.apache.org/jira/browse/LUCENE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876540#comment-15876540 ] Matt Weber commented on LUCENE-7702: [~jim.ferenczi] [~mikemccand] Patch to remove {{GraphQuery}}. > Remove GraphQuery > - > > Key: LUCENE-7702 > URL: https://issues.apache.org/jira/browse/LUCENE-7702 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Matt Weber > Attachments: LUCENE-7702.patch > > > With LUCENE-7638 and LUCENE-7699 the {{GraphQuery}}wrapper is no longer > needed and we can use standard queries. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-7702) Remove GraphQuery
[ https://issues.apache.org/jira/browse/LUCENE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Weber updated LUCENE-7702: --- Description: With LUCENE-7638 and LUCENE-7699 the {{GraphQuery}} wrapper is no longer needed and we can use standard queries. (was: With LUCENE-7638 and LUCENE-7699 the {{GraphQuery}}wrapper is no longer needed and we can use standard queries.) > Remove GraphQuery > - > > Key: LUCENE-7702 > URL: https://issues.apache.org/jira/browse/LUCENE-7702 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Matt Weber > Attachments: LUCENE-7702.patch > > > With LUCENE-7638 and LUCENE-7699 the {{GraphQuery}} wrapper is no longer > needed and we can use standard queries. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7699) Apply graph articulation points optimization to phrase graph queries
[ https://issues.apache.org/jira/browse/LUCENE-7699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15876559#comment-15876559 ] Matt Weber commented on LUCENE-7699: Remove {{GraphQuery}} in LUCENE-7702. > Apply graph articulation points optimization to phrase graph queries > > > Key: LUCENE-7699 > URL: https://issues.apache.org/jira/browse/LUCENE-7699 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Matt Weber > Attachments: LUCENE-7699.patch, LUCENE-7699.patch > > > Follow-up to LUCENE-7638 that applies the same articulation point logic to > graph phrases using span queries. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7702) Remove GraphQuery
[ https://issues.apache.org/jira/browse/LUCENE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15878281#comment-15878281 ] Matt Weber commented on LUCENE-7702: [~jim.ferenczi] Would you like me to backport to 6x? > Remove GraphQuery > - > > Key: LUCENE-7702 > URL: https://issues.apache.org/jira/browse/LUCENE-7702 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Matt Weber > Attachments: LUCENE-7702.patch > > > With LUCENE-7638 and LUCENE-7699 the {{GraphQuery}} wrapper is no longer > needed and we can use standard queries. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7699) Apply graph articulation points optimization to phrase graph queries
[ https://issues.apache.org/jira/browse/LUCENE-7699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15878280#comment-15878280 ] Matt Weber commented on LUCENE-7699: Thanks [~jim.ferenczi]. I this going to make it into 6.4.2 or wait until 6.5? > Apply graph articulation points optimization to phrase graph queries > > > Key: LUCENE-7699 > URL: https://issues.apache.org/jira/browse/LUCENE-7699 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Matt Weber > Attachments: LUCENE-7699.patch, LUCENE-7699.patch > > > Follow-up to LUCENE-7638 that applies the same articulation point logic to > graph phrases using span queries. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5012) Make graph-based TokenFilters easier
[ https://issues.apache.org/jira/browse/LUCENE-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Weber updated LUCENE-5012: --- Attachment: LUCENE-5012.patch Patch for current master. {{testSynAfterDecompoundStageAnalyzer}} and {{testSynStageAnalyzer}} randomly fail so will need to dig into this when I find some more time. [~mikemccand] Can you take a look and make sure I didn't miss anything from the original? The attached patch wasn't up to date since you were working from a branch. Here is what I ran to get the latest: {code} svn diff --ignore-properties --old https://svn.apache.org/repos/asf/lucene/dev/trunk@1597052 --new https://svn.apache.org/repos/asf/lucene/dev/branches/lucene5012@1694511 > LUCENE-5012.patch {code} > Make graph-based TokenFilters easier > > > Key: LUCENE-5012 > URL: https://issues.apache.org/jira/browse/LUCENE-5012 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Reporter: Michael McCandless >Assignee: Michael McCandless > Attachments: LUCENE-5012.patch, LUCENE-5012.patch > > > SynonymFilter has two limitations today: > * It cannot create positions, so eg dns -> domain name service > creates blatantly wrong highlights (SOLR-3390, LUCENE-4499 and > others). > * It cannot consume a graph, so e.g. if you try to apply synonyms > after Kuromoji tokenizer I'm not sure what will happen. > I've thought about how to fix these issues but it's really quite > difficult with the current PosInc/PosLen graph representation, so I'd > like to explore an alternative approach. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5012) Make graph-based TokenFilters easier
[ https://issues.apache.org/jira/browse/LUCENE-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826257#comment-15826257 ] Matt Weber commented on LUCENE-5012: [~mikemccand] So I was looking into supporting incoming graphs in {{SynonymGraphFilter}} and found this when you mentioned it in LUCENE-7638. What do you think the state of this patch is? Would it be best to look into advancing this instead of just {{SynonymGraphFilter}} itself? > Make graph-based TokenFilters easier > > > Key: LUCENE-5012 > URL: https://issues.apache.org/jira/browse/LUCENE-5012 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Reporter: Michael McCandless >Assignee: Michael McCandless > Attachments: LUCENE-5012.patch > > > SynonymFilter has two limitations today: > * It cannot create positions, so eg dns -> domain name service > creates blatantly wrong highlights (SOLR-3390, LUCENE-4499 and > others). > * It cannot consume a graph, so e.g. if you try to apply synonyms > after Kuromoji tokenizer I'm not sure what will happen. > I've thought about how to fix these issues but it's really quite > difficult with the current PosInc/PosLen graph representation, so I'd > like to explore an alternative approach. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7638) Optimize graph query produced by QueryBuilder
[ https://issues.apache.org/jira/browse/LUCENE-7638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15826311#comment-15826311 ] Matt Weber commented on LUCENE-7638: Ok this is great. So going forward we should assume that synonyms are to treated together (single token or multi-token) and ideally multi-token synonyms as a phrase. Would it be best to move this logic into {{GraphQuery}} itself? This would make it so we can still detect when we are working with graph related queries and be easier to make the various optimizations talked about here. Maybe make {{GraphQuery}} store the graph token stream instead of the processed queries and then do the graph processing / query generation when rewrite it called? > Optimize graph query produced by QueryBuilder > - > > Key: LUCENE-7638 > URL: https://issues.apache.org/jira/browse/LUCENE-7638 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Jim Ferenczi > Attachments: LUCENE-7638.patch > > > The QueryBuilder creates a graph query when the underlying TokenStream > contains token with PositionLengthAttribute greater than 1. > These TokenStreams are in fact graphs (lattice to be more precise) where > synonyms can span on multiple terms. > Currently the graph query is built by visiting all the path of the graph > TokenStream. For instance if you have a synonym like "ny, new york" and you > search for "new york city", the query builder would produce two pathes: > "new york city", "ny city" > This can quickly explode when the number of multi terms synonyms increase. > The query "ny ny" for instance would produce 4 pathes and so on. > For boolean queries with should or must clauses it should be more efficient > to build a boolean query that merges all the intersections in the graph. So > instead of "new york city", "ny city" we could produce: > "+((+new +york) ny) +city" > The attached patch is a proposal to do that instead of the all path solution. > The patch transforms multi terms synonyms in graph query for each > intersection in the graph. This is not done in this patch but we could also > create a specialized query that gives equivalent scores to multi terms > synonyms like the SynonymQuery does for single term synonyms. > For phrase query this patch does not change the current behavior but we could > also use the new method to create optimized graph SpanQuery. > [~mattweber] I think this patch could optimize a lot of cases where multiple > muli-terms synonyms are present in a single request. Could you take a look ? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7638) Optimize graph query produced by QueryBuilder
[ https://issues.apache.org/jira/browse/LUCENE-7638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824319#comment-15824319 ] Matt Weber commented on LUCENE-7638: I think the problem here is that we lose minimum should match support as that is applied AFTER query generation by building a new boolean query. Same thing for phrase slop even though that would not be affected by this patch. If we can move this logic into rewrite method of GraphQuery then we could take all that information into consideration to build a more efficient query. > Optimize graph query produced by QueryBuilder > - > > Key: LUCENE-7638 > URL: https://issues.apache.org/jira/browse/LUCENE-7638 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Jim Ferenczi > Attachments: LUCENE-7638.patch > > > The QueryBuilder creates a graph query when the underlying TokenStream > contains token with PositionLengthAttribute greater than 1. > These TokenStreams are in fact graphs (lattice to be more precise) where > synonyms can span on multiple terms. > Currently the graph query is built by visiting all the path of the graph > TokenStream. For instance if you have a synonym like "ny, new york" and you > search for "new york city", the query builder would produce two pathes: > "new york city", "ny city" > This can quickly explode when the number of multi terms synonyms increase. > The query "ny ny" for instance would produce 4 pathes and so on. > For boolean queries with should or must clauses it should be more efficient > to build a boolean query that merges all the intersections in the graph. So > instead of "new york city", "ny city" we could produce: > "+((+new +york) ny) +city" > The attached patch is a proposal to do that instead of the all path solution. > The patch transforms multi terms synonyms in graph query for each > intersection in the graph. This is not done in this patch but we could also > create a specialized query that gives equivalent scores to multi terms > synonyms like the SynonymQuery does for single term synonyms. > For phrase query this patch does not change the current behavior but we could > also use the new method to create optimized graph SpanQuery. > [~mattweber] I think this patch could optimize a lot of cases where multiple > muli-terms synonyms are present in a single request. Could you take a look ? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7638) Optimize graph query produced by QueryBuilder
[ https://issues.apache.org/jira/browse/LUCENE-7638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15824462#comment-15824462 ] Matt Weber commented on LUCENE-7638: [~jim.ferenczi] I have mixed feelings about that as I can see plus and minus of both. When I was originally working on this I essentially decided that everything should be passed to each path as if it was the original query. What do you think [~mikemccand]? Also, there are additional use cases that we handle in elasticsearch that have not made their way into Lucene yet that might be affected by this. Boolean with cutoff frequency, prefix queries, etc. > Optimize graph query produced by QueryBuilder > - > > Key: LUCENE-7638 > URL: https://issues.apache.org/jira/browse/LUCENE-7638 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Jim Ferenczi > Attachments: LUCENE-7638.patch > > > The QueryBuilder creates a graph query when the underlying TokenStream > contains token with PositionLengthAttribute greater than 1. > These TokenStreams are in fact graphs (lattice to be more precise) where > synonyms can span on multiple terms. > Currently the graph query is built by visiting all the path of the graph > TokenStream. For instance if you have a synonym like "ny, new york" and you > search for "new york city", the query builder would produce two pathes: > "new york city", "ny city" > This can quickly explode when the number of multi terms synonyms increase. > The query "ny ny" for instance would produce 4 pathes and so on. > For boolean queries with should or must clauses it should be more efficient > to build a boolean query that merges all the intersections in the graph. So > instead of "new york city", "ny city" we could produce: > "+((+new +york) ny) +city" > The attached patch is a proposal to do that instead of the all path solution. > The patch transforms multi terms synonyms in graph query for each > intersection in the graph. This is not done in this patch but we could also > create a specialized query that gives equivalent scores to multi terms > synonyms like the SynonymQuery does for single term synonyms. > For phrase query this patch does not change the current behavior but we could > also use the new method to create optimized graph SpanQuery. > [~mattweber] I think this patch could optimize a lot of cases where multiple > muli-terms synonyms are present in a single request. Could you take a look ? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5012) Make graph-based TokenFilters easier
[ https://issues.apache.org/jira/browse/LUCENE-5012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15830136#comment-15830136 ] Matt Weber commented on LUCENE-5012: Thanks [~mikemccand] there was a lot of additional changes! I am going to start getting familiar with this and hopefully will be able to help move it forward as I get time. > Make graph-based TokenFilters easier > > > Key: LUCENE-5012 > URL: https://issues.apache.org/jira/browse/LUCENE-5012 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/analysis >Reporter: Michael McCandless >Assignee: Michael McCandless > Attachments: LUCENE-5012.patch, LUCENE-5012.patch > > > SynonymFilter has two limitations today: > * It cannot create positions, so eg dns -> domain name service > creates blatantly wrong highlights (SOLR-3390, LUCENE-4499 and > others). > * It cannot consume a graph, so e.g. if you try to apply synonyms > after Kuromoji tokenizer I'm not sure what will happen. > I've thought about how to fix these issues but it's really quite > difficult with the current PosInc/PosLen graph representation, so I'd > like to explore an alternative approach. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7603) Support Graph Token Streams in QueryBuilder
[ https://issues.apache.org/jira/browse/LUCENE-7603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15789729#comment-15789729 ] Matt Weber commented on LUCENE-7603: Great thank you both! I have updated the brach_6x PR with the latest changes as well as rebased+squashed both. Happy New Year! > Support Graph Token Streams in QueryBuilder > --- > > Key: LUCENE-7603 > URL: https://issues.apache.org/jira/browse/LUCENE-7603 > Project: Lucene - Core > Issue Type: Improvement > Components: core/queryparser, core/search >Reporter: Matt Weber > > With [LUCENE-6664|https://issues.apache.org/jira/browse/LUCENE-6664] we can > use multi-term synonyms query time. A "graph token stream" will be created > which which is nothing more than using the position length attribute on > stacked tokens to indicate how many positions a token should span. Currently > the position length attribute on tokens is ignored during query parsing. > This issue will add support for handling these graph token streams inside the > QueryBuilder utility class used by query parsers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7603) Support Graph Token Streams in QueryBuilder
[ https://issues.apache.org/jira/browse/LUCENE-7603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15786980#comment-15786980 ] Matt Weber commented on LUCENE-7603: [~dsmiley] . Thank you for the review! I was able to come up with a way to preserve position increment gaps. Can you please take another look? [~mikemccand] Can you please have another look as well? > Support Graph Token Streams in QueryBuilder > --- > > Key: LUCENE-7603 > URL: https://issues.apache.org/jira/browse/LUCENE-7603 > Project: Lucene - Core > Issue Type: Improvement > Components: core/queryparser, core/search >Reporter: Matt Weber > > With [LUCENE-6664|https://issues.apache.org/jira/browse/LUCENE-6664] we can > use multi-term synonyms query time. A "graph token stream" will be created > which which is nothing more than using the position length attribute on > stacked tokens to indicate how many positions a token should span. Currently > the position length attribute on tokens is ignored during query parsing. > This issue will add support for handling these graph token streams inside the > QueryBuilder utility class used by query parsers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7603) Support Graph Token Streams in QueryBuilder
[ https://issues.apache.org/jira/browse/LUCENE-7603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15783462#comment-15783462 ] Matt Weber commented on LUCENE-7603: Thanks for reviewing [~mikemccand]! I have added the missing ASF header and moved the test into the proper package. I have also backported this to 6x and opened a new PR for that. The only difference in the 6x backport is disabling coord on the rewritten boolean query. Some of the tests are slightly different as well due to the fact that splitOnWhitespace defaults to true in 6x. Please let me know if you need me to change anything! > Support Graph Token Streams in QueryBuilder > --- > > Key: LUCENE-7603 > URL: https://issues.apache.org/jira/browse/LUCENE-7603 > Project: Lucene - Core > Issue Type: Improvement > Components: core/queryparser, core/search >Reporter: Matt Weber > > With [LUCENE-6664|https://issues.apache.org/jira/browse/LUCENE-6664] we can > use multi-term synonyms query time. A "graph token stream" will be created > which which is nothing more than using the position length attribute on > stacked tokens to indicate how many positions a token should span. Currently > the position length attribute on tokens is ignored during query parsing. > This issue will add support for handling these graph token streams inside the > QueryBuilder utility class used by query parsers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-7603) Support Graph Token Streams in QueryBuilder
Matt Weber created LUCENE-7603: -- Summary: Support Graph Token Streams in QueryBuilder Key: LUCENE-7603 URL: https://issues.apache.org/jira/browse/LUCENE-7603 Project: Lucene - Core Issue Type: Improvement Components: core/queryparser, core/search Reporter: Matt Weber With [LUCENE-6664|https://issues.apache.org/jira/browse/LUCENE-6664] we can use multi-term synonyms query time. A "graph token stream" will be created which which is nothing more than using the position length attribute on stacked tokens to indicate how many positions a token should span. Currently the position length attribute on tokens is ignored during query parsing. This issue will add support for handling these graph token streams inside the QueryBuilder utility class used by query parsers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7824) Multi-word synonyms rule with common terms at the same position are buggy
[ https://issues.apache.org/jira/browse/LUCENE-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16008209#comment-16008209 ] Matt Weber commented on LUCENE-7824: [~jim.ferenczi] Maybe use a {{BytesRefHash}} and maintain a id-to-hash map so we still only have single copy of common term in memory and still have a unique id? > Multi-word synonyms rule with common terms at the same position are buggy > - > > Key: LUCENE-7824 > URL: https://issues.apache.org/jira/browse/LUCENE-7824 > Project: Lucene - Core > Issue Type: Bug >Reporter: Jim Ferenczi > Attachments: LUCENE-7824.patch > > > The automaton built from the graph token stream tries to pack common terms in > multi word synonyms that appear at the same position. This means that some > states inside a multi word synonym can have multiple transitions. > As a result the intersection point of the graph are not computed correctly. > For example the synonym rule: "ny, new york city, new york" is not applied > correctly to the query "ny police". > In this case "police" is detected as part of the multi synonyms path and we > create the disjunction between: > "ny police", "new york police", ... > I pushed a patch that removes this optim (and creates a single transition > from each state) in order to ensure that the intersection points of the graph > always showed up at the end of the multi synonym paths. > [~mattweber] can you take a look ? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7824) Multi-word synonyms rule with common terms at the same position are buggy
[ https://issues.apache.org/jira/browse/LUCENE-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16008311#comment-16008311 ] Matt Weber commented on LUCENE-7824: Sure, looks good then! > Multi-word synonyms rule with common terms at the same position are buggy > - > > Key: LUCENE-7824 > URL: https://issues.apache.org/jira/browse/LUCENE-7824 > Project: Lucene - Core > Issue Type: Bug >Reporter: Jim Ferenczi > Attachments: LUCENE-7824.patch > > > The automaton built from the graph token stream tries to pack common terms in > multi word synonyms that appear at the same position. This means that some > states inside a multi word synonym can have multiple transitions. > As a result the intersection point of the graph are not computed correctly. > For example the synonym rule: "ny, new york city, new york" is not applied > correctly to the query "ny police". > In this case "police" is detected as part of the multi synonyms path and we > create the disjunction between: > "ny police", "new york police", ... > I pushed a patch that removes this optim (and creates a single transition > from each state) in order to ensure that the intersection points of the graph > always showed up at the end of the multi synonym paths. > [~mattweber] can you take a look ? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8300) Add unordered-distinct IntervalsSource
[ https://issues.apache.org/jira/browse/LUCENE-8300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16495389#comment-16495389 ] Matt Weber commented on LUCENE-8300: Thank you [~romseygeek]! > Add unordered-distinct IntervalsSource > -- > > Key: LUCENE-8300 > URL: https://issues.apache.org/jira/browse/LUCENE-8300 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Fix For: 7.4 > > Attachments: LUCENE-8300.patch, LUCENE-8300.patch > > > [~mattweber] pointed out on LUCENE-8196 that {{Intervals.unordered()}} > doesn't check to see if its subintervals overlap, which means that for > example {{Intervals.unordered(Intervals.term("a"), Intervals.term("a"))}} > would match a document with {{a}} appearing only once. This ticket will > introduce a new function, {{Intervals.unordered_distinct()}}, that ensures > that all subintervals within an unordered interval do not overlap. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8284) Add MultiTermsIntervalsSource
[ https://issues.apache.org/jira/browse/LUCENE-8284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461092#comment-16461092 ] Matt Weber commented on LUCENE-8284: [~jpountz] [~jim.ferenczi] I disagree. There are many cases where they are not expensive and/or I, as a user, understand the consequences and am willing to live with it. Indexing techniques (ngrams, etc) will only go so far and there are many cases where they might actually introduce issues once your not working on a tiny dataset. I feel the type of restriction or optimizations you talk about should be added at the usage level, ie. Solr or Elasticsearch. Is there anything I can do to move this forward? Add an expansion limit? Rewrite support? > Add MultiTermsIntervalsSource > - > > Key: LUCENE-8284 > URL: https://issues.apache.org/jira/browse/LUCENE-8284 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Matt Weber >Priority: Minor > Attachments: LUCENE-8284.patch > > > Add support for creating an {{IntervalsSource}} from multi-term expansions > such as wildcards, regular expressions, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8284) Add MultiTermsIntervalsSource
[ https://issues.apache.org/jira/browse/LUCENE-8284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461774#comment-16461774 ] Matt Weber commented on LUCENE-8284: Attached a patch that adds an expansion limit per-segment and just gathers the first terms we come across. Not sure I like this, I am going to try a version that adds a rewrite method to {{IntervalsSource}} so we can use the existing rewrite methods including the one [~dsmiley] mentioned. > Add MultiTermsIntervalsSource > - > > Key: LUCENE-8284 > URL: https://issues.apache.org/jira/browse/LUCENE-8284 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Matt Weber >Priority: Minor > Attachments: LUCENE-8284.patch, LUCENE-8284.patch > > > Add support for creating an {{IntervalsSource}} from multi-term expansions > such as wildcards, regular expressions, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8284) Add MultiTermsIntervalsSource
[ https://issues.apache.org/jira/browse/LUCENE-8284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Weber updated LUCENE-8284: --- Attachment: LUCENE-8284.patch > Add MultiTermsIntervalsSource > - > > Key: LUCENE-8284 > URL: https://issues.apache.org/jira/browse/LUCENE-8284 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Matt Weber >Priority: Minor > Attachments: LUCENE-8284.patch, LUCENE-8284.patch > > > Add support for creating an {{IntervalsSource}} from multi-term expansions > such as wildcards, regular expressions, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-8284) Add MultiTermsIntervalsSource
Matt Weber created LUCENE-8284: -- Summary: Add MultiTermsIntervalsSource Key: LUCENE-8284 URL: https://issues.apache.org/jira/browse/LUCENE-8284 Project: Lucene - Core Issue Type: Improvement Reporter: Matt Weber Add support for creating an {{IntervalsSource}} from multi-term expansions such as wildcards, regular expressions, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8284) Add MultiTermsIntervalsSource
[ https://issues.apache.org/jira/browse/LUCENE-8284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Weber updated LUCENE-8284: --- Attachment: LUCENE-8284.patch > Add MultiTermsIntervalsSource > - > > Key: LUCENE-8284 > URL: https://issues.apache.org/jira/browse/LUCENE-8284 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Matt Weber >Priority: Minor > Attachments: LUCENE-8284.patch > > > Add support for creating an {{IntervalsSource}} from multi-term expansions > such as wildcards, regular expressions, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8284) Add MultiTermsIntervalsSource
[ https://issues.apache.org/jira/browse/LUCENE-8284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16458673#comment-16458673 ] Matt Weber commented on LUCENE-8284: [~romseygeek] [~jimczi] Since these expand terms per-segment the terms are not available when creating the {{IntervalWeight}} and thus result in a null {{simScorer}} if these are the only sources. I currently picked using a constant {{1.0f}} in this case. Not sure if this is the best approach or not. > Add MultiTermsIntervalsSource > - > > Key: LUCENE-8284 > URL: https://issues.apache.org/jira/browse/LUCENE-8284 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Matt Weber >Priority: Minor > Attachments: LUCENE-8284.patch > > > Add support for creating an {{IntervalsSource}} from multi-term expansions > such as wildcards, regular expressions, etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields
[ https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16446939#comment-16446939 ] Matt Weber commented on LUCENE-8196: [~romseygeek] This is great! How would we prevent matching at the same interval? In {{TestIntervalQuery}}, I would expect this to pass but it matches every doc with {{w3}}. {code:java} public void testUnorderedQueryNoSelfMatch() throws IOException { Query q = new IntervalQuery(field, Intervals.maxwidth(2, Intervals.unordered(Intervals.term("w3"), Intervals.term("w3"; checkHits(q, new int[]{1}); } {code} > Add IntervalQuery and IntervalsSource to expose minimum interval semantics > across term fields > - > > Key: LUCENE-8196 > URL: https://issues.apache.org/jira/browse/LUCENE-8196 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Fix For: 7.4 > > Attachments: LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, > LUCENE-8196.patch, LUCENE-8196.patch > > Time Spent: 10m > Remaining Estimate: 0h > > This ticket proposes an alternative implementation of the SpanQuery family > that uses minimum-interval semantics from > [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf] > to implement positional queries across term-based fields. Rather than using > TermQueries to construct the interval operators, as in LUCENE-2878 or the > current Spans implementation, we instead use a new IntervalsSource object, > which will produce IntervalIterators over a particular segment and field. > These are constructed using various static helper methods, and can then be > passed to a new IntervalQuery which will return documents that contain one or > more intervals so defined. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields
[ https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16450158#comment-16450158 ] Matt Weber commented on LUCENE-8196: I use these queries to build query parsers and I am specifically thinking of an unordered near and how I can prevent it from matching the same term. I can't think of any situation where a user would think {{NEAR(a, a)}} would match documents with a single {{a}} and if we can't get that by default I would like a way to explicitly prevent it myself. Spans have the same issue as well, see LUCENE-3120. > Add IntervalQuery and IntervalsSource to expose minimum interval semantics > across term fields > - > > Key: LUCENE-8196 > URL: https://issues.apache.org/jira/browse/LUCENE-8196 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Fix For: 7.4 > > Attachments: LUCENE-8196-debug.patch, LUCENE-8196.patch, > LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch > > Time Spent: 10m > Remaining Estimate: 0h > > This ticket proposes an alternative implementation of the SpanQuery family > that uses minimum-interval semantics from > [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf] > to implement positional queries across term-based fields. Rather than using > TermQueries to construct the interval operators, as in LUCENE-2878 or the > current Spans implementation, we instead use a new IntervalsSource object, > which will produce IntervalIterators over a particular segment and field. > These are constructed using various static helper methods, and can then be > passed to a new IntervalQuery which will return documents that contain one or > more intervals so defined. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields
[ https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16450408#comment-16450408 ] Matt Weber commented on LUCENE-8196: [~jim.ferenczi] [~romseygeek] So given a single document with the value {{a b}}. The following queries would both match this document: {code:java} Intervals.unordered(Intervals.term("b"), Intervals.term("a")) {code} {code:java} Intervals.unordered(Intervals.term("b"), Intervals.term("b")) {code} The first I think would have an interval width of {{1}} and the 2nd should have a width of {{0}}. So if we have a {{minwidth}} operator we could use that to set the minimum width to {{1}} preventing the 2nd from matching? If both of these queries result in an interval with the same width then that feels wrong to me. > Add IntervalQuery and IntervalsSource to expose minimum interval semantics > across term fields > - > > Key: LUCENE-8196 > URL: https://issues.apache.org/jira/browse/LUCENE-8196 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Fix For: 7.4 > > Attachments: LUCENE-8196-debug.patch, LUCENE-8196.patch, > LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch > > Time Spent: 10m > Remaining Estimate: 0h > > This ticket proposes an alternative implementation of the SpanQuery family > that uses minimum-interval semantics from > [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf] > to implement positional queries across term-based fields. Rather than using > TermQueries to construct the interval operators, as in LUCENE-2878 or the > current Spans implementation, we instead use a new IntervalsSource object, > which will produce IntervalIterators over a particular segment and field. > These are constructed using various static helper methods, and can then be > passed to a new IntervalQuery which will return documents that contain one or > more intervals so defined. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields
[ https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16450014#comment-16450014 ] Matt Weber edited comment on LUCENE-8196 at 4/24/18 2:59 PM: - [~jim.ferenczi] [~romseygeek] I think rename to {{and}} makes sense, however, I would still like a way to explicitly prevent the scenario I described . Maybe a {{minwith}} operator? The width at the same position/interval should be {{0}} right? was (Author: mattweber): [~jim.ferenczi] [~romseygeek] I think rename to {{and}} makes sense, however, I would still live a way to explicitly prevent the scenario I described . Maybe a {{minwith}} operator? The width at the same position/interval should be {{0}} right? > Add IntervalQuery and IntervalsSource to expose minimum interval semantics > across term fields > - > > Key: LUCENE-8196 > URL: https://issues.apache.org/jira/browse/LUCENE-8196 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Fix For: 7.4 > > Attachments: LUCENE-8196-debug.patch, LUCENE-8196.patch, > LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch > > Time Spent: 10m > Remaining Estimate: 0h > > This ticket proposes an alternative implementation of the SpanQuery family > that uses minimum-interval semantics from > [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf] > to implement positional queries across term-based fields. Rather than using > TermQueries to construct the interval operators, as in LUCENE-2878 or the > current Spans implementation, we instead use a new IntervalsSource object, > which will produce IntervalIterators over a particular segment and field. > These are constructed using various static helper methods, and can then be > passed to a new IntervalQuery which will return documents that contain one or > more intervals so defined. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8196) Add IntervalQuery and IntervalsSource to expose minimum interval semantics across term fields
[ https://issues.apache.org/jira/browse/LUCENE-8196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16450014#comment-16450014 ] Matt Weber commented on LUCENE-8196: [~jim.ferenczi] [~romseygeek] I think rename to {{and}} makes sense, however, I would still live a way to explicitly prevent the scenario I described . Maybe a {{minwith}} operator? The width at the same position/interval should be {{0}} right? > Add IntervalQuery and IntervalsSource to expose minimum interval semantics > across term fields > - > > Key: LUCENE-8196 > URL: https://issues.apache.org/jira/browse/LUCENE-8196 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Fix For: 7.4 > > Attachments: LUCENE-8196-debug.patch, LUCENE-8196.patch, > LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch, LUCENE-8196.patch > > Time Spent: 10m > Remaining Estimate: 0h > > This ticket proposes an alternative implementation of the SpanQuery family > that uses minimum-interval semantics from > [http://vigna.di.unimi.it/ftp/papers/EfficientAlgorithmsMinimalIntervalSemantics.pdf] > to implement positional queries across term-based fields. Rather than using > TermQueries to construct the interval operators, as in LUCENE-2878 or the > current Spans implementation, we instead use a new IntervalsSource object, > which will produce IntervalIterators over a particular segment and field. > These are constructed using various static helper methods, and can then be > passed to a new IntervalQuery which will return documents that contain one or > more intervals so defined. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8828) Fix Intervals.unordered() without overlaps
[ https://issues.apache.org/jira/browse/LUCENE-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855897#comment-16855897 ] Matt Weber commented on LUCENE-8828: [~romseygeek] Yup that would work... would this be able to handle something like {{NO_OVERLAPS(OR(a,b), a)}}? I wouldn't want a single token {{a}} to match this. > Fix Intervals.unordered() without overlaps > -- > > Key: LUCENE-8828 > URL: https://issues.apache.org/jira/browse/LUCENE-8828 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Attachments: LUCENE-8828.patch > > > LUCENE-8300 added an option to Intervals.unordered() which would attempt to > find intervals that contained all of a set of subintervals where none of the > subintervals overlapped. Unfortunately, this implementation was buggy, and > could miss documents depending on the order in which the subintervals were > passed to the factory method. > After some digging around, I think that it is not in fact possible to > implement this in anything other than n! time, because of the need to > minimize the resulting intervals. My proposal is to remove the boolean flag, > and instead implement an Intervals.unorderedNoOverlaps() method that takes > only two subsources, and rewrites NO_OVERLAPS(a, b) to OR(ORDERED(a, b), > ORDERED(b, a)). The usual simplifications will apply here, so NO_OVERLAPS(a, > a) will end up as ORDERED(a, a) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8828) Fix Intervals.unordered() without overlaps
[ https://issues.apache.org/jira/browse/LUCENE-8828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855924#comment-16855924 ] Matt Weber commented on LUCENE-8828: [~romseygeek] Sounds good thanks! > Fix Intervals.unordered() without overlaps > -- > > Key: LUCENE-8828 > URL: https://issues.apache.org/jira/browse/LUCENE-8828 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Attachments: LUCENE-8828.patch > > > LUCENE-8300 added an option to Intervals.unordered() which would attempt to > find intervals that contained all of a set of subintervals where none of the > subintervals overlapped. Unfortunately, this implementation was buggy, and > could miss documents depending on the order in which the subintervals were > passed to the factory method. > After some digging around, I think that it is not in fact possible to > implement this in anything other than n! time, because of the need to > minimize the resulting intervals. My proposal is to remove the boolean flag, > and instead implement an Intervals.unorderedNoOverlaps() method that takes > only two subsources, and rewrites NO_OVERLAPS(a, b) to OR(ORDERED(a, b), > ORDERED(b, a)). The usual simplifications will apply here, so NO_OVERLAPS(a, > a) will end up as ORDERED(a, a) -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-1139) SolrJ TermsComponent Query and Response Support
[ https://issues.apache.org/jira/browse/SOLR-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Weber updated SOLR-1139: - Attachment: SOLR-1139.patch This patch adds TermComponent support to SolrJ. It adds a new response TermsResponse as well as updates SolrQuery to support setting/getting of TermsComponent parameters. SolrJ TermsComponent Query and Response Support --- Key: SOLR-1139 URL: https://issues.apache.org/jira/browse/SOLR-1139 Project: Solr Issue Type: New Feature Components: clients - java Affects Versions: 1.4 Reporter: Matt Weber Priority: Minor Attachments: SOLR-1139.patch SolrJ should support the new TermsComponent that was introduced in Solr 1.4. It should be able to: - set TermsComponent query parameters via SolrQuery - parse the TermsComponent response -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1139) SolrJ TermsComponent Query and Response Support
[ https://issues.apache.org/jira/browse/SOLR-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Weber updated SOLR-1139: - Attachment: SOLR-1139.patch Added support so you can specify multiple source fields. In the previous patch I mistakenly assumed a single source field named spell. SolrJ TermsComponent Query and Response Support --- Key: SOLR-1139 URL: https://issues.apache.org/jira/browse/SOLR-1139 Project: Solr Issue Type: New Feature Components: clients - java Affects Versions: 1.4 Reporter: Matt Weber Priority: Minor Attachments: SOLR-1139.patch, SOLR-1139.patch SolrJ should support the new TermsComponent that was introduced in Solr 1.4. It should be able to: - set TermsComponent query parameters via SolrQuery - parse the TermsComponent response -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1156) Sort TermsComponent results by frequency
Sort TermsComponent results by frequency Key: SOLR-1156 URL: https://issues.apache.org/jira/browse/SOLR-1156 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Matt Weber TermsComponent should be able to return results sorted by frequency. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1156) Sort TermsComponent results by frequency
[ https://issues.apache.org/jira/browse/SOLR-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Weber updated SOLR-1156: - Attachment: SOLR-1156.patch I have implemented TermsComponent sorting by frequency. I use the same technique as facet sorting. Enable sorting by the parameter terms.sort=true|false. Sort TermsComponent results by frequency Key: SOLR-1156 URL: https://issues.apache.org/jira/browse/SOLR-1156 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Matt Weber Attachments: SOLR-1156.patch TermsComponent should be able to return results sorted by frequency. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-877) Access to Lucene's TermEnum capabilities
[ https://issues.apache.org/jira/browse/SOLR-877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12707601#action_12707601 ] Matt Weber commented on SOLR-877: - I wrote a patch for freq. sorting thar is attached to SOLR-1156. I will update that patch once you commit your latest changes. Access to Lucene's TermEnum capabilities Key: SOLR-877 URL: https://issues.apache.org/jira/browse/SOLR-877 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.4 Attachments: SOLR-877.patch, SOLR-877.patch, SOLR-877.patch, SOLR-877_2.patch I wrote a simple SearchComponent on the plane the other day that gives access to Lucene's TermEnum capabilities. I think this will be useful for doing auto-suggest and other term based operations. My first draft is not distributed, but it probably should be made to do so eventually. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1156) Sort TermsComponent results by frequency
[ https://issues.apache.org/jira/browse/SOLR-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Weber updated SOLR-1156: - Attachment: SOLR-1156.patch Updated patch to resolve conflicts with the recent changes to trunk (rev. 773446). Also to keep the sort parameter similar to the facet.sort parameter, you can specify terms.sort=count|index instead of true|false. Default is to sort by count. Sort TermsComponent results by frequency Key: SOLR-1156 URL: https://issues.apache.org/jira/browse/SOLR-1156 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Matt Weber Attachments: SOLR-1156.patch, SOLR-1156.patch TermsComponent should be able to return results sorted by frequency. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1139) SolrJ TermsComponent Query and Response Support
[ https://issues.apache.org/jira/browse/SOLR-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Weber updated SOLR-1139: - Attachment: SOLR-1139.patch Updated to reflect latest changes to TermsComponent in rev. 773447. SolrJ TermsComponent Query and Response Support --- Key: SOLR-1139 URL: https://issues.apache.org/jira/browse/SOLR-1139 Project: Solr Issue Type: New Feature Components: clients - java Affects Versions: 1.4 Reporter: Matt Weber Priority: Minor Attachments: SOLR-1139.patch, SOLR-1139.patch, SOLR-1139.patch SolrJ should support the new TermsComponent that was introduced in Solr 1.4. It should be able to: - set TermsComponent query parameters via SolrQuery - parse the TermsComponent response -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1139) SolrJ TermsComponent Query and Response Support
[ https://issues.apache.org/jira/browse/SOLR-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Weber updated SOLR-1139: - Attachment: SOLR-1139-WITH_SORT_SUPPORT.patch Here is a patch that adds support for the sort parameters in SOLR-1156. SolrJ TermsComponent Query and Response Support --- Key: SOLR-1139 URL: https://issues.apache.org/jira/browse/SOLR-1139 Project: Solr Issue Type: New Feature Components: clients - java Affects Versions: 1.4 Reporter: Matt Weber Priority: Minor Attachments: SOLR-1139-WITH_SORT_SUPPORT.patch, SOLR-1139.patch, SOLR-1139.patch, SOLR-1139.patch SolrJ should support the new TermsComponent that was introduced in Solr 1.4. It should be able to: - set TermsComponent query parameters via SolrQuery - parse the TermsComponent response -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1156) Sort TermsComponent results by frequency
[ https://issues.apache.org/jira/browse/SOLR-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12708223#action_12708223 ] Matt Weber commented on SOLR-1156: -- The current tests pass: [junit] Running org.apache.solr.handler.component.TermsComponentTest [junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 6.944 sec I will work on some unit tests for the new sorting functionality. Sort TermsComponent results by frequency Key: SOLR-1156 URL: https://issues.apache.org/jira/browse/SOLR-1156 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Matt Weber Attachments: SOLR-1156.patch, SOLR-1156.patch, SOLR-1156.patch TermsComponent should be able to return results sorted by frequency. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1156) Sort TermsComponent results by frequency
[ https://issues.apache.org/jira/browse/SOLR-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Weber updated SOLR-1156: - Attachment: SOLR-1156.patch Added some unit tests. Sort TermsComponent results by frequency Key: SOLR-1156 URL: https://issues.apache.org/jira/browse/SOLR-1156 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Matt Weber Attachments: SOLR-1156.patch, SOLR-1156.patch, SOLR-1156.patch, SOLR-1156.patch TermsComponent should be able to return results sorted by frequency. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1177) Distributed TermsComponent
[ https://issues.apache.org/jira/browse/SOLR-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Weber updated SOLR-1177: - Attachment: SOLR-1177.patch Here is my first attempt at a patch that is not currently working. For some reason only the prepare and process methods are being called. It seems that the shards parameter is not being honored like it is in the other distributed components because rb.shards is always null. I have looked at the other distributed components and did not notice them doing anything special with the shards parameter. I have based this code on the information from http://wiki.apache.org/solr/WritingDistributedSearchComponents and looking though the FacetComponent, DebugComponent, StatsComponent, and HighlightComponent code. Any help figuring out why the other methods are not being called is greatly appreciated. Please ignore the println statments, they are for debug only and will be removed in the finalized, working patch. Thanks! Distributed TermsComponent -- Key: SOLR-1177 URL: https://issues.apache.org/jira/browse/SOLR-1177 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Matt Weber Priority: Minor Fix For: 1.5 Attachments: SOLR-1177.patch TermsComponent should be distributed -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1177) Distributed TermsComponent
Distributed TermsComponent -- Key: SOLR-1177 URL: https://issues.apache.org/jira/browse/SOLR-1177 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Matt Weber Priority: Minor Fix For: 1.5 TermsComponent should be distributed -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-945) JSON update handler
[ https://issues.apache.org/jira/browse/SOLR-945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12715198#action_12715198 ] Matt Weber commented on SOLR-945: - Any update on this for 1.4? +1 here. JSON update handler --- Key: SOLR-945 URL: https://issues.apache.org/jira/browse/SOLR-945 Project: Solr Issue Type: New Feature Reporter: Ryan McKinley Attachments: SOLR-945-json-update.patch In addition to supporting xml and csv updating, it would be good to support json. This patch uses [noggit|http://svn.apache.org/repos/asf/labs/noggit/], a streaming json parser, to build the commands. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1488) autoCommit when idle
autoCommit when idle Key: SOLR-1488 URL: https://issues.apache.org/jira/browse/SOLR-1488 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Matt Weber Priority: Minor Fix For: 1.4 Enable autoCommit to execute after a given amount of idle time (no documents submitted). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1488) autoCommit when idle
[ https://issues.apache.org/jira/browse/SOLR-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Weber updated SOLR-1488: - Attachment: SOLR-1488.patch This patch adds autoCommit after idle support. If maxTime and idleTime are both defined in solrconfig.xml, then maxTime takes precedence. autoCommit when idle Key: SOLR-1488 URL: https://issues.apache.org/jira/browse/SOLR-1488 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Matt Weber Priority: Minor Fix For: 1.4 Attachments: SOLR-1488.patch Enable autoCommit to execute after a given amount of idle time (no documents submitted). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1488) autoCommit when idle
[ https://issues.apache.org/jira/browse/SOLR-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12761910#action_12761910 ] Matt Weber commented on SOLR-1488: -- Forgot to mention, the new parameter used to configure this feature is called idleTime. Here is an example that will commit every 100k docs or after 10 seconds of idle time: autoCommit maxDocs10/maxDocs idleTime1/idleTime !-- maxTime3/maxTime -- /autoCommit autoCommit when idle Key: SOLR-1488 URL: https://issues.apache.org/jira/browse/SOLR-1488 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Matt Weber Priority: Minor Fix For: 1.4 Attachments: SOLR-1488.patch Enable autoCommit to execute after a given amount of idle time (no documents submitted). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1139) SolrJ TermsComponent Query and Response Support
[ https://issues.apache.org/jira/browse/SOLR-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Weber updated SOLR-1139: - Attachment: SOLR-1139.patch Updating patch to work with latest trunk since SOLR-1156 has been committed. Any chance of this making it into 1.4 since it is fairly trivial and the fact TermsComponent is in 1.4? SolrJ TermsComponent Query and Response Support --- Key: SOLR-1139 URL: https://issues.apache.org/jira/browse/SOLR-1139 Project: Solr Issue Type: New Feature Components: clients - java Affects Versions: 1.4 Reporter: Matt Weber Priority: Minor Attachments: SOLR-1139-WITH_SORT_SUPPORT.patch, SOLR-1139.patch, SOLR-1139.patch, SOLR-1139.patch, SOLR-1139.patch SolrJ should support the new TermsComponent that was introduced in Solr 1.4. It should be able to: - set TermsComponent query parameters via SolrQuery - parse the TermsComponent response -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1139) SolrJ TermsComponent Query and Response Support
[ https://issues.apache.org/jira/browse/SOLR-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Weber updated SOLR-1139: - Attachment: SOLR-1139.patch Updated test to use EmbeddedSolrServer and not depend on example as Yonik suggested. SolrJ TermsComponent Query and Response Support --- Key: SOLR-1139 URL: https://issues.apache.org/jira/browse/SOLR-1139 Project: Solr Issue Type: New Feature Components: clients - java Affects Versions: 1.4 Reporter: Matt Weber Priority: Minor Attachments: SOLR-1139-WITH_SORT_SUPPORT.patch, SOLR-1139.patch, SOLR-1139.patch, SOLR-1139.patch, SOLR-1139.patch, SOLR-1139.patch SolrJ should support the new TermsComponent that was introduced in Solr 1.4. It should be able to: - set TermsComponent query parameters via SolrQuery - parse the TermsComponent response -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1177) Distributed TermsComponent
[ https://issues.apache.org/jira/browse/SOLR-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789336#action_12789336 ] Matt Weber commented on SOLR-1177: -- Thanks for the update Yonik! I will see if I can get this and SOLR-1139 using the same classes. Distributed TermsComponent -- Key: SOLR-1177 URL: https://issues.apache.org/jira/browse/SOLR-1177 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Matt Weber Assignee: Shalin Shekhar Mangar Priority: Minor Fix For: 1.5 Attachments: SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, TermsComponent.java, TermsComponent.patch TermsComponent should be distributed -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1177) Distributed TermsComponent
[ https://issues.apache.org/jira/browse/SOLR-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Weber updated SOLR-1177: - Attachment: SOLR-1177.patch Here is an updated patch that includes Shalin's suggestions: - replace TermData with TermsResponse.Term - updates TermsHelper to use the parsing code from TermsResponse I also changed TermsResponse.Term#frequency to a long so that we don't overflow when calculating the frequency. Then to keep back-compatbility with existing code I do the following when writing it to the NamedList: if (tc.getFrequency() = freqmin tc.getFrequency() = freqmax) { fieldterms.add(tc.getTerm(), ((Number)tc.getFrequency()).intValue()); cnt++; } Is this a good approach? This new patch includes SOLR-1139. Distributed TermsComponent -- Key: SOLR-1177 URL: https://issues.apache.org/jira/browse/SOLR-1177 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Matt Weber Assignee: Shalin Shekhar Mangar Priority: Minor Fix For: 1.5 Attachments: SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, TermsComponent.java, TermsComponent.patch TermsComponent should be distributed -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1177) Distributed TermsComponent
[ https://issues.apache.org/jira/browse/SOLR-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789794#action_12789794 ] Matt Weber commented on SOLR-1177: -- The latest SOLR-1139 patch is included inside the latest patch I attached to this ticket. Should I separate them? Distributed TermsComponent -- Key: SOLR-1177 URL: https://issues.apache.org/jira/browse/SOLR-1177 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Matt Weber Assignee: Shalin Shekhar Mangar Priority: Minor Fix For: 1.5 Attachments: SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, TermsComponent.java, TermsComponent.patch TermsComponent should be distributed -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1139) SolrJ TermsComponent Query and Response Support
[ https://issues.apache.org/jira/browse/SOLR-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Weber updated SOLR-1139: - Attachment: SOLR-1139.patch Updated patch in preparation for SOLR-1177 SolrJ TermsComponent Query and Response Support --- Key: SOLR-1139 URL: https://issues.apache.org/jira/browse/SOLR-1139 Project: Solr Issue Type: New Feature Components: clients - java Affects Versions: 1.4 Reporter: Matt Weber Assignee: Shalin Shekhar Mangar Priority: Minor Attachments: SOLR-1139-WITH_SORT_SUPPORT.patch, SOLR-1139.patch, SOLR-1139.patch, SOLR-1139.patch, SOLR-1139.patch, SOLR-1139.patch, SOLR-1139.patch SolrJ should support the new TermsComponent that was introduced in Solr 1.4. It should be able to: - set TermsComponent query parameters via SolrQuery - parse the TermsComponent response -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1177) Distributed TermsComponent
[ https://issues.apache.org/jira/browse/SOLR-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Weber updated SOLR-1177: - Attachment: SOLR-1177.patch New patch that DOES NOT include the code for SOLR-1139. Make sure you have SOLR-1139 applied before using this patch. Distributed TermsComponent -- Key: SOLR-1177 URL: https://issues.apache.org/jira/browse/SOLR-1177 Project: Solr Issue Type: Improvement Affects Versions: 1.4 Reporter: Matt Weber Assignee: Shalin Shekhar Mangar Priority: Minor Fix For: 1.5 Attachments: SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, SOLR-1177.patch, TermsComponent.java, TermsComponent.patch TermsComponent should be distributed -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.