[jira] [Commented] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without deprecated enablePositionIncrements="false"
[ https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899976#comment-16899976 ] Alexander S. commented on SOLR-6468: Just wanted to give a small update – we upgraded to Solr 8 over the weekend and search seem to be working well. [MappingCharFilterFactory|http://lucene.apache.org/core/4_8_1/analyzers-common/org/apache/lucene/analysis/charfilter/MappingCharFilterFactory.html] also works. [~steve_rowe], are there any known downsides of replacing the StopFilterFactory with the MappinhCharFilterFactory? > Regression: StopFilterFactory doesn't work properly without deprecated > enablePositionIncrements="false" > --- > > Key: SOLR-6468 > URL: https://issues.apache.org/jira/browse/SOLR-6468 > Project: Solr > Issue Type: Bug > Components: Schema and Analysis >Affects Versions: 4.8.1, 4.9, 5.3.1, 6.6.2, 7.1 >Reporter: Alexander S. >Priority: Major > Attachments: FieldValue.png > > > Setup: > * Schema version is 1.5 > * Field config: > {code} > autoGeneratePhraseQueries="true"> > > > ignoreCase="true" /> > > > > {code} > * Stop words: > {code} > http > https > ftp > www > {code} > So very simple. In the index I have: > * twitter.com/testuser > All these queries do match: > * twitter.com/testuser > * com/testuser > * testuser > But none of these does: > * https://twitter.com/testuser > * https://www.twitter.com/testuser > * www.twitter.com/testuser > Debug output shows: > "parsedquery_toString": "+(url_words_ngram:\"? twitter com testuser\")" > But we need: > "parsedquery_toString": "+(url_words_ngram:\"twitter com testuser\")" > Complete debug outputs: > * a valid search: > http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za > * an invalid search: > http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww > The complete discussion and explanation of the problem is here: > http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html > I didn't find a clear explanation how can we upgrade Solr, there's no any > replacement or a workarround to this, so this is not just a major change but > a major disrespect to all existing Solr users who are using this feature. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-13293) org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient - Error consuming and closing http response stream.
[ https://issues.apache.org/jira/browse/SOLR-13293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16899459#comment-16899459 ] Alexander S. commented on SOLR-13293: - I just upgraded from Solr 5 to 8 and also seeing these errors. > org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient - Error > consuming and closing http response stream. > - > > Key: SOLR-13293 > URL: https://issues.apache.org/jira/browse/SOLR-13293 > Project: Solr > Issue Type: Bug > Components: SolrJ >Affects Versions: 8.0 >Reporter: Karl Stoney >Priority: Minor > > Hi, > Testing out branch_8x, we're randomly seeing the following errors on a simple > 3 node cluster. It doesn't appear to affect replication (the cluster remains > green). > They come in (mass, literally 1000s at a time) bulk. > There we no network issues at the time. > {code:java} > 16:53:01.492 [updateExecutor-4-thread-34-processing-x:at-uk_shard1_replica_n1 > r:core_node3 null n:solr-2.search-solr.preprod.k8.atcloud.io:80_solr c:at-uk > s:shard1] ERROR > org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient - Error > consuming and closing http response stream. > java.nio.channels.AsynchronousCloseException: null > at > org.eclipse.jetty.client.util.InputStreamResponseListener$Input.read(InputStreamResponseListener.java:316) > ~[jetty-client-9.4.14.v20181114.jar:9.4.14.v20181114] > at java.io.InputStream.read(InputStream.java:101) ~[?:1.8.0_191] > at > org.eclipse.jetty.client.util.InputStreamResponseListener$Input.read(InputStreamResponseListener.java:287) > ~[jetty-client-9.4.14.v20181114.jar:9.4.14.v20181114] > at > org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient$Runner.sendUpdateStream(ConcurrentUpdateHttp2SolrClient.java:283) > ~[solr-solrj-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT > b14748e61fd147ea572f6545265b883fa69ed27f - root > - 2019-03-04 16:30:04] > at > org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient$Runner.run(ConcurrentUpdateHttp2SolrClient.java:176) > ~[solr-solrj-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT > b14748e61fd147ea572f6545265b883fa69ed27f - root - 2019-03-04 > 16:30:04] > at > com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176) > ~[metrics-core-3.2.6.jar:3.2.6] > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209) > ~[solr-solrj-8.1.0-SNAPSHOT.jar:8.1.0-SNAPSHOT > b14748e61fd147ea572f6545265b883fa69ed27f - root - 2019-03-04 16:30:04] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > [?:1.8.0_191] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [?:1.8.0_191] > at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191] > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6769) Election bug
[ https://issues.apache.org/jira/browse/SOLR-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16897936#comment-16897936 ] Alexander S. commented on SOLR-6769: Hi, unfortunately I can't test with the latest versions since we are tied to Solr 5. I tuned our caches and didn't see this error any more so let's close for now. > Election bug > > > Key: SOLR-6769 > URL: https://issues.apache.org/jira/browse/SOLR-6769 > Project: Solr > Issue Type: Bug >Reporter: Alexander S. >Priority: Major > Attachments: Screenshot 876.png > > > Hello, I have a very simple set up: 2 shards and 2 replicas (4 nodes in > total). > What I did is just stopped the shards, but if first shard stopped immediately > the second one took about 5 minutes to stop. You can see on the screenshot > what happened next. In short: > 1. Shard 1 stopped normally > 3. Replica 1 became a leader > 2. Shard 2 still was performing some job but wasn't accepting connection > 4. Replica 2 did not became a leader because Shard 2 is still there but > doesn't work > 5. Entire cluster went down until Shard 2 stopped and Replica 2 became a > leader > Marked as critical because this shuts down the entire cluster. Please adjust > if I am wrong. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-12363) Duplicates with random search, cursors, and fixed seed
[ https://issues.apache.org/jira/browse/SOLR-12363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander S. updated SOLR-12363: Description: We do have a SolrCloud cluster and just updated one of our views to use cursors with the random order. Our goal was to use an infinite scroll with the random ordering so we can shuffle results once every 24 hours. To do so we save the seed that we use in our random order to the cookies with the 24 hours expiration period, which didn't work as expected: # Results are shuffled with every request (every time we pass the initial cursor value "*" and the same random value for ordering we already used). # Results contain duplicates sometimes. Not a lot of them, but from time to time they appear. In our *schema.xml* we have: {code:java} {code} In our search requests, we order by *random_123 asc, id asc*, where *123* is the seed from cookies. Here is the page [https://awards.wegohealth.com/nominees] Even when I try to get the "next page" URL from google chrome developer console and open it in separate tabs it yields different results: [https://awards.wegohealth.com/nominees?cursor=AoJYmYbyATRBd2FyZDo6Tm9taW5lZSAxMzI0Mg%3D%3D] So it feels like the seed parameter we use is ignored or every shard understands it differently, not sure. On the screenshots, you can see the URL is the same and results are different. was: We do have a SolrCloud cluster and just updated one of our views to use cursors with the random order. Our goal was to use an infinite scroll with the random ordering so we can shuffle results once every 24 hours. To do so we save the seed that we use in our random order to the cookies with the 24 hours expiration period, which didn't work as expected: # Results are shuffled with every request (every time we pass the initial cursor value "*" and the same random value for ordering we already used). # Results contain duplicates sometimes. Not a lot of them, but from time to time they appear. In our *schema.xml* we have: {code:java} {code} In our search requests, we order by *random_123 asc, id asc*, where *123* is the seed from cookies. Here is the page [https://awards.wegohealth.com/nominees] -Even when I try to get the "next page" URL from google chrome developer console and open it in separate tabs it yields different results: [https://awards.wegohealth.com/nominees?cursor=AoJYmYbyATRBd2FyZDo6Tm9taW5lZSAxMzI0Mg%3D%3D]- So it feels like the seed parameter we use is ignored or every shard understands it differently, not sure. On the screenshots, you can see the URL is the same and results are different. > Duplicates with random search, cursors, and fixed seed > -- > > Key: SOLR-12363 > URL: https://issues.apache.org/jira/browse/SOLR-12363 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 5.3.1 >Reporter: Alexander S. >Priority: Major > Attachments: Screen shot 2018-05-16 at 14.51.19.png, Screen shot > 2018-05-16 at 14.51.23.png, Screen shot 2018-05-16 at 14.51.26.png > > > We do have a SolrCloud cluster and just updated one of our views to use > cursors with the random order. Our goal was to use an infinite scroll with > the random ordering so we can shuffle results once every 24 hours. > To do so we save the seed that we use in our random order to the cookies with > the 24 hours expiration period, which didn't work as expected: > # Results are shuffled with every request (every time we pass the initial > cursor value "*" and the same random value for ordering we already used). > # Results contain duplicates sometimes. Not a lot of them, but from time to > time they appear. > In our *schema.xml* we have: > {code:java} > > indexed="true"/>{code} > In our search requests, we order by *random_123 asc, id asc*, where *123* is > the seed from cookies. > Here is the page [https://awards.wegohealth.com/nominees] > Even when I try to get the "next page" URL from google chrome developer > console and open it in separate tabs it yields different results: > [https://awards.wegohealth.com/nominees?cursor=AoJYmYbyATRBd2FyZDo6Tm9taW5lZSAxMzI0Mg%3D%3D] > So it feels like the seed parameter we use is ignored or every shard > understands it differently, not sure. > On the screenshots, you can see the URL is the same and results are different. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-12363) Duplicates with random search, cursors, and fixed seed
[ https://issues.apache.org/jira/browse/SOLR-12363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander S. updated SOLR-12363: Description: We do have a SolrCloud cluster and just updated one of our views to use cursors with the random order. Our goal was to use an infinite scroll with the random ordering so we can shuffle results once every 24 hours. To do so we save the seed that we use in our random order to the cookies with the 24 hours expiration period, which didn't work as expected: # Results are shuffled with every request (every time we pass the initial cursor value "*" and the same random value for ordering we already used). # Results contain duplicates sometimes. Not a lot of them, but from time to time they appear. In our *schema.xml* we have: {code:java} {code} In our search requests, we order by *random_123 asc, id asc*, where *123* is the seed from cookies. Here is the page [https://awards.wegohealth.com/nominees] -Even when I try to get the "next page" URL from google chrome developer console and open it in separate tabs it yields different results: [https://awards.wegohealth.com/nominees?cursor=AoJYmYbyATRBd2FyZDo6Tm9taW5lZSAxMzI0Mg%3D%3D]- So it feels like the seed parameter we use is ignored or every shard understands it differently, not sure. On the screenshots, you can see the URL is the same and results are different. was: We do have a SolrCloud cluster and just updated one of our views to use cursors with the random order. Our goal was to use an infinite scroll with the random ordering so we can shuffle results once every 24 hours. To do so we save the seed that we use in our random order to the cookies with the 24 hours expiration period, which didn't work as expected: # Results are shuffled with every request (every time we pass the initial cursor value "*" and the same random value for ordering we already used). # Results contain duplicates sometimes. Not a lot of them, but from time to time they appear. In our *schema.xml* we have: {code:java} {code} In our search requests, we order by *random_123 asc, id asc*, where *123* is the seed from cookies. Here is the page [https://awards.wegohealth.com/nominees] Even when I try to get the "next page" URL from google chrome developer console and open it in separate tabs it yields different results: [https://awards.wegohealth.com/nominees?cursor=AoJYmYbyATRBd2FyZDo6Tm9taW5lZSAxMzI0Mg%3D%3D] So it feels like the seed parameter we use is ignored or every shard understands it differently, not sure. On the screenshots, you can see the URL is the same and results are different. > Duplicates with random search, cursors, and fixed seed > -- > > Key: SOLR-12363 > URL: https://issues.apache.org/jira/browse/SOLR-12363 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 5.3.1 >Reporter: Alexander S. >Priority: Major > Attachments: Screen shot 2018-05-16 at 14.51.19.png, Screen shot > 2018-05-16 at 14.51.23.png, Screen shot 2018-05-16 at 14.51.26.png > > > We do have a SolrCloud cluster and just updated one of our views to use > cursors with the random order. Our goal was to use an infinite scroll with > the random ordering so we can shuffle results once every 24 hours. > To do so we save the seed that we use in our random order to the cookies with > the 24 hours expiration period, which didn't work as expected: > # Results are shuffled with every request (every time we pass the initial > cursor value "*" and the same random value for ordering we already used). > # Results contain duplicates sometimes. Not a lot of them, but from time to > time they appear. > In our *schema.xml* we have: > {code:java} > > indexed="true"/>{code} > In our search requests, we order by *random_123 asc, id asc*, where *123* is > the seed from cookies. > Here is the page [https://awards.wegohealth.com/nominees] > -Even when I try to get the "next page" URL from google chrome developer > console and open it in separate tabs it yields different results: > [https://awards.wegohealth.com/nominees?cursor=AoJYmYbyATRBd2FyZDo6Tm9taW5lZSAxMzI0Mg%3D%3D]- > So it feels like the seed parameter we use is ignored or every shard > understands it differently, not sure. > On the screenshots, you can see the URL is the same and results are different. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-12363) Duplicates with random search, cursors, and fixed seed
Alexander S. created SOLR-12363: --- Summary: Duplicates with random search, cursors, and fixed seed Key: SOLR-12363 URL: https://issues.apache.org/jira/browse/SOLR-12363 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Affects Versions: 5.3.1 Reporter: Alexander S. Attachments: Screen shot 2018-05-16 at 14.51.19.png, Screen shot 2018-05-16 at 14.51.23.png, Screen shot 2018-05-16 at 14.51.26.png We do have a SolrCloud cluster and just updated one of our views to use cursors with the random order. Our goal was to use an infinite scroll with the random ordering so we can shuffle results once every 24 hours. To do so we save the seed that we use in our random order to the cookies with the 24 hours expiration period, which didn't work as expected: # Results are shuffled with every request (every time we pass the initial cursor value "*" and the same random value for ordering we already used). # Results contain duplicates sometimes. Not a lot of them, but from time to time they appear. In our *schema.xml* we have: {code:java} {code} In our search requests, we order by *random_123 asc, id asc*, where *123* is the seed from cookies. Here is the page [https://awards.wegohealth.com/nominees] Even when I try to get the "next page" URL from google chrome developer console and open it in separate tabs it yields different results: [https://awards.wegohealth.com/nominees?cursor=AoJYmYbyATRBd2FyZDo6Tm9taW5lZSAxMzI0Mg%3D%3D] So it feels like the seed parameter we use is ignored or every shard understands it differently, not sure. On the screenshots, you can see the URL is the same and results are different. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without deprecated enablePositionIncrements="false"
[ https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363744#comment-16363744 ] Alexander S. commented on SOLR-6468: I think so, Solr and Lucene versions are different things. Solr 5.3.1 supports Lucene version 4.3, but newer versions of Solr probably don't. But I am not absolutely sure what exactly Solr version dropped support for this, just saying that we're on Solr 5.3.1 and it is working, it didn't work in Solr 6 for sure (we tried it) and, if I am not mistaken, it didn't work in Solr 5.5. > Regression: StopFilterFactory doesn't work properly without deprecated > enablePositionIncrements="false" > --- > > Key: SOLR-6468 > URL: https://issues.apache.org/jira/browse/SOLR-6468 > Project: Solr > Issue Type: Bug > Components: Schema and Analysis >Affects Versions: 4.8.1, 4.9, 5.3.1, 7.1, 6.6.2 >Reporter: Alexander S. >Priority: Major > Attachments: FieldValue.png > > > Setup: > * Schema version is 1.5 > * Field config: > {code} > autoGeneratePhraseQueries="true"> > > > ignoreCase="true" /> > > > > {code} > * Stop words: > {code} > http > https > ftp > www > {code} > So very simple. In the index I have: > * twitter.com/testuser > All these queries do match: > * twitter.com/testuser > * com/testuser > * testuser > But none of these does: > * https://twitter.com/testuser > * https://www.twitter.com/testuser > * www.twitter.com/testuser > Debug output shows: > "parsedquery_toString": "+(url_words_ngram:\"? twitter com testuser\")" > But we need: > "parsedquery_toString": "+(url_words_ngram:\"twitter com testuser\")" > Complete debug outputs: > * a valid search: > http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za > * an invalid search: > http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww > The complete discussion and explanation of the problem is here: > http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html > I didn't find a clear explanation how can we upgrade Solr, there's no any > replacement or a workarround to this, so this is not just a major change but > a major disrespect to all existing Solr users who are using this feature. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without deprecated enablePositionIncrements="false"
[ https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16363583#comment-16363583 ] Alexander S. commented on SOLR-6468: Hey, we're on 5.3.1 because of this. AFAIK this doesn't work on newer versions. > Regression: StopFilterFactory doesn't work properly without deprecated > enablePositionIncrements="false" > --- > > Key: SOLR-6468 > URL: https://issues.apache.org/jira/browse/SOLR-6468 > Project: Solr > Issue Type: Bug > Components: Schema and Analysis >Affects Versions: 4.8.1, 4.9, 5.3.1, 7.1, 6.6.2 >Reporter: Alexander S. >Priority: Major > Attachments: FieldValue.png > > > Setup: > * Schema version is 1.5 > * Field config: > {code} > autoGeneratePhraseQueries="true"> > > > ignoreCase="true" /> > > > > {code} > * Stop words: > {code} > http > https > ftp > www > {code} > So very simple. In the index I have: > * twitter.com/testuser > All these queries do match: > * twitter.com/testuser > * com/testuser > * testuser > But none of these does: > * https://twitter.com/testuser > * https://www.twitter.com/testuser > * www.twitter.com/testuser > Debug output shows: > "parsedquery_toString": "+(url_words_ngram:\"? twitter com testuser\")" > But we need: > "parsedquery_toString": "+(url_words_ngram:\"twitter com testuser\")" > Complete debug outputs: > * a valid search: > http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za > * an invalid search: > http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww > The complete discussion and explanation of the problem is here: > http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html > I didn't find a clear explanation how can we upgrade Solr, there's no any > replacement or a workarround to this, so this is not just a major change but > a major disrespect to all existing Solr users who are using this feature. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without deprecated enablePositionIncrements="false"
[ https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander S. updated SOLR-6468: --- Affects Version/s: 7.1 6.6.2 > Regression: StopFilterFactory doesn't work properly without deprecated > enablePositionIncrements="false" > --- > > Key: SOLR-6468 > URL: https://issues.apache.org/jira/browse/SOLR-6468 > Project: Solr > Issue Type: Bug > Components: Schema and Analysis >Affects Versions: 4.8.1, 4.9, 5.3.1, 7.1, 6.6.2 >Reporter: Alexander S. >Priority: Major > Attachments: FieldValue.png > > > Setup: > * Schema version is 1.5 > * Field config: > {code} > autoGeneratePhraseQueries="true"> > > > ignoreCase="true" /> > > > > {code} > * Stop words: > {code} > http > https > ftp > www > {code} > So very simple. In the index I have: > * twitter.com/testuser > All these queries do match: > * twitter.com/testuser > * com/testuser > * testuser > But none of these does: > * https://twitter.com/testuser > * https://www.twitter.com/testuser > * www.twitter.com/testuser > Debug output shows: > "parsedquery_toString": "+(url_words_ngram:\"? twitter com testuser\")" > But we need: > "parsedquery_toString": "+(url_words_ngram:\"twitter com testuser\")" > Complete debug outputs: > * a valid search: > http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za > * an invalid search: > http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww > The complete discussion and explanation of the problem is here: > http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html > I didn't find a clear explanation how can we upgrade Solr, there's no any > replacement or a workarround to this, so this is not just a major change but > a major disrespect to all existing Solr users who are using this feature. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without deprecated enablePositionIncrements="false"
[ https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16359924#comment-16359924 ] Alexander S. commented on SOLR-6468: Wondering how we can bring attention to this problem? > Regression: StopFilterFactory doesn't work properly without deprecated > enablePositionIncrements="false" > --- > > Key: SOLR-6468 > URL: https://issues.apache.org/jira/browse/SOLR-6468 > Project: Solr > Issue Type: Bug > Components: Schema and Analysis >Affects Versions: 4.8.1, 4.9, 5.3.1 >Reporter: Alexander S. >Priority: Major > Attachments: FieldValue.png > > > Setup: > * Schema version is 1.5 > * Field config: > {code} > autoGeneratePhraseQueries="true"> > > > ignoreCase="true" /> > > > > {code} > * Stop words: > {code} > http > https > ftp > www > {code} > So very simple. In the index I have: > * twitter.com/testuser > All these queries do match: > * twitter.com/testuser > * com/testuser > * testuser > But none of these does: > * https://twitter.com/testuser > * https://www.twitter.com/testuser > * www.twitter.com/testuser > Debug output shows: > "parsedquery_toString": "+(url_words_ngram:\"? twitter com testuser\")" > But we need: > "parsedquery_toString": "+(url_words_ngram:\"twitter com testuser\")" > Complete debug outputs: > * a valid search: > http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za > * an invalid search: > http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww > The complete discussion and explanation of the problem is here: > http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html > I didn't find a clear explanation how can we upgrade Solr, there's no any > replacement or a workarround to this, so this is not just a major change but > a major disrespect to all existing Solr users who are using this feature. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without deprecated enablePositionIncrements="false"
[ https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander S. updated SOLR-6468: --- Summary: Regression: StopFilterFactory doesn't work properly without deprecated enablePositionIncrements="false" (was: Regression: StopFilterFactory doesn't work properly without enablePositionIncrements="false") > Regression: StopFilterFactory doesn't work properly without deprecated > enablePositionIncrements="false" > --- > > Key: SOLR-6468 > URL: https://issues.apache.org/jira/browse/SOLR-6468 > Project: Solr > Issue Type: Bug > Components: Schema and Analysis >Affects Versions: 4.8.1, 4.9, 5.3.1 >Reporter: Alexander S. >Priority: Major > Attachments: FieldValue.png > > > Setup: > * Schema version is 1.5 > * Field config: > {code} > autoGeneratePhraseQueries="true"> > > > ignoreCase="true" /> > > > > {code} > * Stop words: > {code} > http > https > ftp > www > {code} > So very simple. In the index I have: > * twitter.com/testuser > All these queries do match: > * twitter.com/testuser > * com/testuser > * testuser > But none of these does: > * https://twitter.com/testuser > * https://www.twitter.com/testuser > * www.twitter.com/testuser > Debug output shows: > "parsedquery_toString": "+(url_words_ngram:\"? twitter com testuser\")" > But we need: > "parsedquery_toString": "+(url_words_ngram:\"twitter com testuser\")" > Complete debug outputs: > * a valid search: > http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za > * an invalid search: > http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww > The complete discussion and explanation of the problem is here: > http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html > I didn't find a clear explanation how can we upgrade Solr, there's no any > replacement or a workarround to this, so this is not just a major change but > a major disrespect to all existing Solr users who are using this feature. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without enablePositionIncrements="false"
[ https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander S. updated SOLR-6468: --- Affects Version/s: 5.3.1 > Regression: StopFilterFactory doesn't work properly without > enablePositionIncrements="false" > > > Key: SOLR-6468 > URL: https://issues.apache.org/jira/browse/SOLR-6468 > Project: Solr > Issue Type: Bug > Components: Schema and Analysis >Affects Versions: 4.8.1, 4.9, 5.3.1 >Reporter: Alexander S. >Priority: Major > Attachments: FieldValue.png > > > Setup: > * Schema version is 1.5 > * Field config: > {code} > autoGeneratePhraseQueries="true"> > > > ignoreCase="true" /> > > > > {code} > * Stop words: > {code} > http > https > ftp > www > {code} > So very simple. In the index I have: > * twitter.com/testuser > All these queries do match: > * twitter.com/testuser > * com/testuser > * testuser > But none of these does: > * https://twitter.com/testuser > * https://www.twitter.com/testuser > * www.twitter.com/testuser > Debug output shows: > "parsedquery_toString": "+(url_words_ngram:\"? twitter com testuser\")" > But we need: > "parsedquery_toString": "+(url_words_ngram:\"twitter com testuser\")" > Complete debug outputs: > * a valid search: > http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za > * an invalid search: > http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww > The complete discussion and explanation of the problem is here: > http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html > I didn't find a clear explanation how can we upgrade Solr, there's no any > replacement or a workarround to this, so this is not just a major change but > a major disrespect to all existing Solr users who are using this feature. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11939) Collection API: property.name ignored when creating collections
[ https://issues.apache.org/jira/browse/SOLR-11939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351708#comment-16351708 ] Alexander S. commented on SOLR-11939: - Hi Varun, I am referring to [https://lucene.apache.org/solr/guide/6_6/collections-api.html] |property._name_=_value_|string|No| |Set core property _name_ to _value_. See the section [Defining core.properties|https://lucene.apache.org/solr/guide/6_6/defining-core-properties.html#defining-core-properties] for details on supported properties and values.| All shards and replicas are created on separate Solr instances so a single name for all cores would work in this case. Well, I started working on core names mostly because the WEB UI (at least in 5.3.1) doesn't work with collections so I wasn't aware that query requests would work with collection names also. Core name doesn't matter that much then and we're fine with generic core names. It would be good to mention this in the docs somewhere. Best, Alexander S. > Collection API: property.name ignored when creating collections > --- > > Key: SOLR-11939 > URL: https://issues.apache.org/jira/browse/SOLR-11939 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 5.3.1 >Reporter: Alexander S. >Assignee: Varun Thacker >Priority: Major > > Trying to create a collection this way: > {code:java} > /solr/admin/collections?wt=json=CREATE=carmen-test=1=4=shard1,shard2,shard3,shard4=carmen=compositeId=carmen_test{code} > This appears in the log: > {code:java} > OverseerCollectionProcessor.processMessage : create , { > "name":"carmen-test", > "fromApi":"true", > "replicationFactor":"1", > "collection.configName":"carmen", > "numShards":"4", > "shards":"shard1,shard2,shard3,shard4", > "stateFormat":"2", > "property.name":"carmen_test", > "router.name":"compositeId", > "operation":"create"}{code} > But the resulting core name is *carmen-test_shard1_replica1* matching > "collection name" + sharn name + replica number. > How can I set a custom core name when creating a collection? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-11939) Collection API: property.name ignored when creating collections
[ https://issues.apache.org/jira/browse/SOLR-11939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350473#comment-16350473 ] Alexander S. edited comment on SOLR-11939 at 2/2/18 3:07 PM: - Found this discussion [http://lucene.472066.n3.nabble.com/Core-property-name-ignored-when-creating-collection-using-API-td4183405.html] It seems like I don't have to worry about the core name as it seems Solr is mowing towards collections. UPD. But this is still a discrepancy between the docs and the API. I've spent an hour figuring this out, patching a Chef's cookbook adding these properties and figured out that this doesn't work as described in the docs. was (Author: aheaven): Found this discussion [http://lucene.472066.n3.nabble.com/Core-property-name-ignored-when-creating-collection-using-API-td4183405.html] It seems like I don't have to worry about the core name as it seems Solr is mowing towards collections. > Collection API: property.name ignored when creating collections > --- > > Key: SOLR-11939 > URL: https://issues.apache.org/jira/browse/SOLR-11939 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 5.3.1 >Reporter: Alexander S. >Priority: Major > > Trying to create a collection this way: > {code:java} > /solr/admin/collections?wt=json=CREATE=carmen-test=1=4=shard1,shard2,shard3,shard4=carmen=compositeId=carmen_test{code} > This appears in the log: > {code:java} > OverseerCollectionProcessor.processMessage : create , { > "name":"carmen-test", > "fromApi":"true", > "replicationFactor":"1", > "collection.configName":"carmen", > "numShards":"4", > "shards":"shard1,shard2,shard3,shard4", > "stateFormat":"2", > "property.name":"carmen_test", > "router.name":"compositeId", > "operation":"create"}{code} > But the resulting core name is *carmen-test_shard1_replica1* matching > "collection name" + sharn name + replica number. > How can I set a custom core name when creating a collection? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-11939) Collection API: property.name ignored when creating collections
[ https://issues.apache.org/jira/browse/SOLR-11939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350473#comment-16350473 ] Alexander S. commented on SOLR-11939: - Found this discussion [http://lucene.472066.n3.nabble.com/Core-property-name-ignored-when-creating-collection-using-API-td4183405.html] It seems like I don't have to worry about the core name as it seems Solr is mowing towards collections. > Collection API: property.name ignored when creating collections > --- > > Key: SOLR-11939 > URL: https://issues.apache.org/jira/browse/SOLR-11939 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 5.3.1 >Reporter: Alexander S. >Priority: Major > > Trying to create a collection this way: > {code:java} > /solr/admin/collections?wt=json=CREATE=carmen-test=1=4=shard1,shard2,shard3,shard4=carmen=compositeId=carmen_test{code} > This appears in the log: > {code:java} > OverseerCollectionProcessor.processMessage : create , { > "name":"carmen-test", > "fromApi":"true", > "replicationFactor":"1", > "collection.configName":"carmen", > "numShards":"4", > "shards":"shard1,shard2,shard3,shard4", > "stateFormat":"2", > "property.name":"carmen_test", > "router.name":"compositeId", > "operation":"create"}{code} > But the resulting core name is *carmen-test_shard1_replica1* matching > "collection name" + sharn name + replica number. > How can I set a custom core name when creating a collection? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-11939) Collection API: property.name ignored when creating collections
Alexander S. created SOLR-11939: --- Summary: Collection API: property.name ignored when creating collections Key: SOLR-11939 URL: https://issues.apache.org/jira/browse/SOLR-11939 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Affects Versions: 5.3.1 Reporter: Alexander S. Trying to create a collection this way: {code:java} /solr/admin/collections?wt=json=CREATE=carmen-test=1=4=shard1,shard2,shard3,shard4=carmen=compositeId=carmen_test{code} This appears in the log: {code:java} OverseerCollectionProcessor.processMessage : create , { "name":"carmen-test", "fromApi":"true", "replicationFactor":"1", "collection.configName":"carmen", "numShards":"4", "shards":"shard1,shard2,shard3,shard4", "stateFormat":"2", "property.name":"carmen_test", "router.name":"compositeId", "operation":"create"}{code} But the resulting core name is *carmen-test_shard1_replica1* matching "collection name" + sharn name + replica number. How can I set a custom core name when creating a collection? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without enablePositionIncrements="false"
[ https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15541697#comment-15541697 ] Alexander S. commented on SOLR-6468: We now can't upgrade to Solr 6 due to this. > Regression: StopFilterFactory doesn't work properly without > enablePositionIncrements="false" > > > Key: SOLR-6468 > URL: https://issues.apache.org/jira/browse/SOLR-6468 > Project: Solr > Issue Type: Bug >Affects Versions: 4.8.1, 4.9 >Reporter: Alexander S. > > Setup: > * Schema version is 1.5 > * Field config: > {code} > autoGeneratePhraseQueries="true"> > > > ignoreCase="true" /> > > > > {code} > * Stop words: > {code} > http > https > ftp > www > {code} > So very simple. In the index I have: > * twitter.com/testuser > All these queries do match: > * twitter.com/testuser > * com/testuser > * testuser > But none of these does: > * https://twitter.com/testuser > * https://www.twitter.com/testuser > * www.twitter.com/testuser > Debug output shows: > "parsedquery_toString": "+(url_words_ngram:\"? twitter com testuser\")" > But we need: > "parsedquery_toString": "+(url_words_ngram:\"twitter com testuser\")" > Complete debug outputs: > * a valid search: > http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za > * an invalid search: > http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww > The complete discussion and explanation of the problem is here: > http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html > I didn't find a clear explanation how can we upgrade Solr, there's no any > replacement or a workarround to this, so this is not just a major change but > a major disrespect to all existing Solr users who are using this feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems
[ https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14738200#comment-14738200 ] Alexander S. commented on SOLR-3274: Hi, just wanted to let you know that adding 2 new ZK servers (so I have 5 running ZK instances) improved the situation a lot. But I found one weird thing with the ZK: {code} java.net.UnknownHostException: zoo5.devops at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:178) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762) 2015-09-10 01:13:21,235 - WARN [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot open channel to 2 at election address zoo2.devops:3888 java.net.UnknownHostException: zoo2.devops at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:178) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762) 2015-09-10 01:13:21,235 - WARN [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot open channel to 1 at election address zoo1.devops:3888 java.net.UnknownHostException: zoo1.devops at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:178) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762) 2015-09-10 01:13:21,236 - WARN [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot open channel to 4 at election address zoo4.devops:3888 java.net.UnknownHostException: zoo4.devops at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:178) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762) {code} Just opened 2 ssh sessions to that server and was monitoring the log with tail. While ZK posted these errors I was able to ping zoo1/2/4/5.devops servers and was able to connect to ZK there with telnet. So it seems something could go wrong with ZK itself. At this time I seen these "cannot talk to ZK" errors in Solr. And eventually I've just restarted this broken ZK instance and everything is fine again. So I guess Solr tried to connect namely to this broken ZK instance (can't say for sure since it doesn't mention the instance it failed to connect to in its log). > ZooKeeper related SolrCloud problems > > > Key: SOLR-3274 > URL: https://issues.apache.org/jira/browse/SOLR-3274 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0-ALPHA > Environment: Any >Reporter: Per Steffensen > > Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 > Solr servers, running 28 slices of the same collection (collA) - all slices > have one replica (two shards all in all - leader + replica) - 56 cores all in > all (8 shards on each solr instance). But anyways... > Besides the problem reported in SOLR-3273, the system seems to run fine under > high load for several hours, but eventually errors like the ones shown below > start to occur. I might be wrong, but they all seem to indicate some kind of > unstability in the
[jira] [Comment Edited] (SOLR-3274) ZooKeeper related SolrCloud problems
[ https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14738200#comment-14738200 ] Alexander S. edited comment on SOLR-3274 at 9/10/15 5:43 AM: - Hi, just wanted to let you know that adding 2 new ZK servers (so I have 5 running ZK instances) improved the situation a lot. But I found one weird thing with the ZK: {code} java.net.UnknownHostException: zoo5.devops at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:178) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762) 2015-09-10 01:13:21,235 - WARN [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot open channel to 2 at election address zoo2.devops:3888 java.net.UnknownHostException: zoo2.devops at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:178) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762) 2015-09-10 01:13:21,235 - WARN [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot open channel to 1 at election address zoo1.devops:3888 java.net.UnknownHostException: zoo1.devops at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:178) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762) 2015-09-10 01:13:21,236 - WARN [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager@382] - Cannot open channel to 4 at election address zoo4.devops:3888 java.net.UnknownHostException: zoo4.devops at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:178) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:402) at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:840) at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:762) {code} Just opened 2 ssh sessions to that server and was monitoring the log with tail. While ZK posted these errors I was able to ping zoo1/2/4/5.devops servers and was able to connect to ZK there with telnet. So it seems something could go wrong with ZK itself. At this time I seen these "cannot talk to ZK" errors in Solr. And eventually I've just restarted this broken ZK instance and everything is fine again. So I guess Solr tried to connect namely to this broken ZK instance (can't say for sure since it doesn't mention the instance it failed to connect to in its log). UPD: but still often see these errors in ZK logs: {code} 2015-09-10 01:31:28,804 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.128.202.22:35990 2015-09-10 01:31:28,847 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception EndOfStreamException: Unable to read additional data from client sessionid 0x0, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:744) 2015-09-10 01:31:28,847 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /10.128.202.22:35990 (no session established for client)
[jira] [Comment Edited] (SOLR-6875) No data integrity between replicas
[ https://issues.apache.org/jira/browse/SOLR-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14582960#comment-14582960 ] Alexander S. edited comment on SOLR-6875 at 6/12/15 5:24 AM: - Got another error today on 4 shards set up, each has 2 replicas (8 nodes in total). On the shard 4/replica 1 I see the next error: [^replica1.png] On the shard 4/replica 2 the next: [^replica2.png] Here's the backtrace for the error on the first screenshot: {code} java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:196) at java.net.SocketInputStream.read(SocketInputStream.java:122) at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160) at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84) at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:260) at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283) at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251) at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:271) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123) at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:682) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:486) at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:233) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {code} After all this replica 1 shows: {quote} numDocs: 28 215 608 {quote} And the replica 2 shows: {quote} numDocs: 28 215 609 {quote} Everything worked well for a few months until yesterday, when we started to reindex some data (like 1.7m records). Our Solr set up is using large pages and there's enough resources. Here's how we run the instances: {code} exec chpst -u solr java -Xms6G -Xmx8G -XX:+UseConcMarkSweepGC -XX:+UseLargePages -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:+UseLargePages -XX:+AggressiveOpts -XX:CMSInitiatingOccupancyFraction=75 -DzkHost=zoo5.devops:2181,zoo4.devops:2181,zoo1.devops:2181,zoo2.devops:2181,zoo3.devops:2181 -Dcollection.configName=Carmen -Dbootstrap_confdir=./solr/conf -Dbootstrap_conf=true -DnumShards=4 -jar start.jar etc/jetty.xml {code} The server has 16 CPU cores and SSD RAID 10, the load average is between 2 and 3 usually. The charts also don't show anything suspicious in server load, it is very stable. So seems like something went wrong during recovery after the network error. Not sure how to debug that deeper and what those warnings in the log mean, for example the last 2 messages on the first screenshot, from DistributedUpdateProcessor and CoreAdminHandler. was (Author: aheaven): Get another error today on 4 shards set up, each has 2 replicas (8 nodes in total). On the shard 4/replica 1 I see the next error: [^replica1.png] On the shard 4/replica 2 the next: [^replica2.png] Here's the backtrace for the error on the first screenshot: {code} java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:196) at java.net.SocketInputStream.read(SocketInputStream.java:122) at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160) at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84) at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273) at
[jira] [Updated] (SOLR-6875) No data integrity between replicas
[ https://issues.apache.org/jira/browse/SOLR-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander S. updated SOLR-6875: --- Attachment: replica2.png replica1.png Get another error today on 4 shards set up, each has 2 replicas (8 nodes in total). On the shard 4/replica 1 I see the next error: [^replica1.png] On the shard 4/replica 2 the next: [^replica2.png] Here's the backtrace for the error on the first screenshot: {code} java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:196) at java.net.SocketInputStream.read(SocketInputStream.java:122) at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160) at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84) at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:260) at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283) at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251) at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:271) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123) at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:682) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:486) at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:233) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {code} After all this replica 1 shows: {quote} numDocs: 28 215 608 {quote} And the replica 2 shows: {quote} numDocs: 28 215 609 {quote} Everything worked well for a few months until yesterday, when we started to reindex some data (like 1.7m records). Our Solr set up is using large pages and there's enough resources. Here's how we run the instances: {code} exec chpst -u solr java -Xms6G -Xmx8G -XX:+UseConcMarkSweepGC -XX:+UseLargePages -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:+UseLargePages -XX:+AggressiveOpts -XX:CMSInitiatingOccupancyFraction=75 -DzkHost=zoo5.devops:2181,zoo4.devops:2181,zoo1.devops:2181,zoo2.devops:2181,zoo3.devops:2181 -Dcollection.configName=Carmen -Dbootstrap_confdir=./solr/conf -Dbootstrap_conf=true -DnumShards=4 -jar start.jar etc/jetty.xml {code} The server has 16 CPU cores and SSD RAID 10, the load average is between 2 and 3 usually. The charts also don't show anything suspicious in server load, it is very stable. So seems like something went wrong during recovery after the network error. Not sure how to debug that deeper and what those warnings in the log mean, for example the last 2 messages on the first screenshot, from DistributedUpdateProcessor and CoreAdminHandler. No data integrity between replicas -- Key: SOLR-6875 URL: https://issues.apache.org/jira/browse/SOLR-6875 Project: Solr Issue Type: Bug Affects Versions: 4.10.2 Environment: One replica is @ Linux solr1.devops.wegohealth.com 3.8.0-29-generic #42~precise1-Ubuntu SMP Wed Aug 14 16:19:23 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux Another replica is @ Linux solr2.devops.wegohealth.com 3.16.0-23-generic #30-Ubuntu SMP Thu Oct 16 13:17:16 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Solr is running with the next options: * -Xms12G * -Xmx16G * -XX:+UseConcMarkSweepGC * -XX:+UseLargePages * -XX:+CMSParallelRemarkEnabled * -XX:+ParallelRefProcEnabled * -XX:+UseLargePages * -XX:+AggressiveOpts * -XX:CMSInitiatingOccupancyFraction=75 Reporter: Alexander S.
[jira] [Updated] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory
[ https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander S. updated SOLR-5332: --- Affects Version/s: 5.1 Add preserve original setting to the EdgeNGramFilterFactory - Key: SOLR-5332 URL: https://issues.apache.org/jira/browse/SOLR-5332 Project: Solr Issue Type: Wish Affects Versions: 4.4, 4.5, 4.5.1, 4.6, 5.1 Reporter: Alexander S. Hi, as described here: http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html the problem is in that if you have these 2 strings to index: 1. facebook.com/someuser.1 2. facebook.com/someveryandverylongusername and the edge ngram filter factory with min and max gram size settings 2 and 25, search requests for these urls will fail. But search requests for: 1. facebook.com/someuser 2. facebook.com/someveryandverylonguserna will work properly. It's because first url has 1 at the end, which is lover than the allowed min gram size. In the second url the user name is longer than the max gram size (27 characters). Would be good to have a preserve original option, that will add the original string to the index if it does not fit the allowed gram size, so that 1 and someveryandverylongusername tokens will also be added to the index. Best, Alex -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory
[ https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander S. updated SOLR-5332: --- Fix Version/s: 5.1 Add preserve original setting to the EdgeNGramFilterFactory - Key: SOLR-5332 URL: https://issues.apache.org/jira/browse/SOLR-5332 Project: Solr Issue Type: Wish Affects Versions: 4.4, 4.5, 4.5.1, 4.6 Reporter: Alexander S. Fix For: 5.1 Hi, as described here: http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html the problem is in that if you have these 2 strings to index: 1. facebook.com/someuser.1 2. facebook.com/someveryandverylongusername and the edge ngram filter factory with min and max gram size settings 2 and 25, search requests for these urls will fail. But search requests for: 1. facebook.com/someuser 2. facebook.com/someveryandverylonguserna will work properly. It's because first url has 1 at the end, which is lover than the allowed min gram size. In the second url the user name is longer than the max gram size (27 characters). Would be good to have a preserve original option, that will add the original string to the index if it does not fit the allowed gram size, so that 1 and someveryandverylongusername tokens will also be added to the index. Best, Alex -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory
[ https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander S. updated SOLR-5332: --- Affects Version/s: (was: 5.1) Add preserve original setting to the EdgeNGramFilterFactory - Key: SOLR-5332 URL: https://issues.apache.org/jira/browse/SOLR-5332 Project: Solr Issue Type: Wish Affects Versions: 4.4, 4.5, 4.5.1, 4.6 Reporter: Alexander S. Fix For: 5.1 Hi, as described here: http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html the problem is in that if you have these 2 strings to index: 1. facebook.com/someuser.1 2. facebook.com/someveryandverylongusername and the edge ngram filter factory with min and max gram size settings 2 and 25, search requests for these urls will fail. But search requests for: 1. facebook.com/someuser 2. facebook.com/someveryandverylonguserna will work properly. It's because first url has 1 at the end, which is lover than the allowed min gram size. In the second url the user name is longer than the max gram size (27 characters). Would be good to have a preserve original option, that will add the original string to the index if it does not fit the allowed gram size, so that 1 and someveryandverylongusername tokens will also be added to the index. Best, Alex -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-7022) ERROR UpdateHandler java.lang.InterruptedException
Alexander S. created SOLR-7022: -- Summary: ERROR UpdateHandler java.lang.InterruptedException Key: SOLR-7022 URL: https://issues.apache.org/jira/browse/SOLR-7022 Project: Solr Issue Type: Bug Environment: Solr 4.10.2, Ubuntu x86_64 Reporter: Alexander S. What I did: * Updated configs in zookeeper with zkcli.sh -cmd upconfig. * Opened solr admin interface in the web browser * Followed to core admin and reloaded the cores one by one Backtrace: {code} java.lang.InterruptedException at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:400) at java.util.concurrent.FutureTask.get(FutureTask.java:187) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:654) at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) {code} I already did that before and didn't see such errors, but previous time I increased the caches too much so warming time for query results cache was around 30 seconds. This time cores reload took much longer and then this error appeared in the log. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-6875) No data integrity between replicas
[ https://issues.apache.org/jira/browse/SOLR-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14272877#comment-14272877 ] Alexander S. edited comment on SOLR-6875 at 1/11/15 11:33 AM: -- Now we have 4 shards, each with 2 replicas (8 total nodes) and the next picture: {noformat} Shard 1: Replica 1: *14 486 089* Replica 2: *14 496 445* Shard 2 Replica 1: 14 496 609 Replica 2: 14 496 609 Shard 3 Replica 1: 14 492 812 Replica 2: 14 492 812 Shard 4 Replica 1: 14 488 755 Replica 2: 14 488 755 {noformat} How could it be? We didn't see anything like that before upgrade from 4.8.1 to 4.10.2. Also we enabled checkIntegrityAtMerge, could it be the reason? was (Author: aheaven): Now we have 4 shards, each with 2 replicas (8 total nodes) and the next picture: {noformat} Shard 1: Replica 1: 14 486 089 Replica 2: 14 496 445 Shard 2 Replica 1: 14 496 609 Replica 2: 14 496 609 Shard 3 Replica 1: 14 492 812 Replica 2: 14 492 812 Shard 4 Replica 1: 14 488 755 Replica 2: 14 488 755 {noformat} How could it be? We didn't see anything like that before upgrade from 4.8.1 to 4.10.2. Also we enabled checkIntegrityAtMerge, could it be the reason? No data integrity between replicas -- Key: SOLR-6875 URL: https://issues.apache.org/jira/browse/SOLR-6875 Project: Solr Issue Type: Bug Affects Versions: 4.10.2 Environment: One replica is @ Linux solr1.devops.wegohealth.com 3.8.0-29-generic #42~precise1-Ubuntu SMP Wed Aug 14 16:19:23 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux Another replica is @ Linux solr2.devops.wegohealth.com 3.16.0-23-generic #30-Ubuntu SMP Thu Oct 16 13:17:16 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Solr is running with the next options: * -Xms12G * -Xmx16G * -XX:+UseConcMarkSweepGC * -XX:+UseLargePages * -XX:+CMSParallelRemarkEnabled * -XX:+ParallelRefProcEnabled * -XX:+UseLargePages * -XX:+AggressiveOpts * -XX:CMSInitiatingOccupancyFraction=75 Reporter: Alexander S. Setup: SolrCloud with 2 shards, each with 2 replicas, 4 nodes in total. Indexing is stopped, one replica of a shard (Solr1) shows 45 574 039 docs, and another (Solr1.1) 45 574 038 docs. Solr1 is the leader, these errors appeared in the logs: {code} ERROR - 2014-12-20 09:54:38.783; org.apache.solr.update.StreamingSolrServers$1; error java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:196) at java.net.SocketInputStream.read(SocketInputStream.java:122) at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160) at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84) at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:260) at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283) at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251) at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:271) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123) at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:682) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:486) at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:233) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) WARN - 2014-12-20 09:54:38.787; org.apache.solr.update.processor.DistributedUpdateProcessor;
[jira] [Commented] (SOLR-6875) No data integrity between replicas
[ https://issues.apache.org/jira/browse/SOLR-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14272877#comment-14272877 ] Alexander S. commented on SOLR-6875: Now we have 4 shards, each with 2 replics (8 total nodes) and the next picture: {noformat} Shard 1: Replica 1: 14 486 089 Replica 2: 14 496 445 Shard 2 Replica 1: 14 496 609 Replica 2: 14 496 609 Shard 3 Replica 1: 14 492 812 Replica 2: 14 492 812 Shard 4 Replica 1: 14 488 755 Replica 2: 14 488 755 {noformat} How could it be? We didn't see anything like that before upgrade from 4.8.1 to 4.10.2. Also we enabled checkIntegrityAtMerge, could it be the reason? No data integrity between replicas -- Key: SOLR-6875 URL: https://issues.apache.org/jira/browse/SOLR-6875 Project: Solr Issue Type: Bug Affects Versions: 4.10.2 Environment: One replica is @ Linux solr1.devops.wegohealth.com 3.8.0-29-generic #42~precise1-Ubuntu SMP Wed Aug 14 16:19:23 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux Another replica is @ Linux solr2.devops.wegohealth.com 3.16.0-23-generic #30-Ubuntu SMP Thu Oct 16 13:17:16 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Solr is running with the next options: * -Xms12G * -Xmx16G * -XX:+UseConcMarkSweepGC * -XX:+UseLargePages * -XX:+CMSParallelRemarkEnabled * -XX:+ParallelRefProcEnabled * -XX:+UseLargePages * -XX:+AggressiveOpts * -XX:CMSInitiatingOccupancyFraction=75 Reporter: Alexander S. Setup: SolrCloud with 2 shards, each with 2 replicas, 4 nodes in total. Indexing is stopped, one replica of a shard (Solr1) shows 45 574 039 docs, and another (Solr1.1) 45 574 038 docs. Solr1 is the leader, these errors appeared in the logs: {code} ERROR - 2014-12-20 09:54:38.783; org.apache.solr.update.StreamingSolrServers$1; error java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:196) at java.net.SocketInputStream.read(SocketInputStream.java:122) at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160) at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84) at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:260) at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283) at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251) at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:271) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123) at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:682) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:486) at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:233) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) WARN - 2014-12-20 09:54:38.787; org.apache.solr.update.processor.DistributedUpdateProcessor; Error sending update java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:196) at java.net.SocketInputStream.read(SocketInputStream.java:122) at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160) at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84) at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273) at
[jira] [Comment Edited] (SOLR-6875) No data integrity between replicas
[ https://issues.apache.org/jira/browse/SOLR-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14272877#comment-14272877 ] Alexander S. edited comment on SOLR-6875 at 1/11/15 11:33 AM: -- Now we have 4 shards, each with 2 replicas (8 total nodes) and the next picture: {noformat} Shard 1: Replica 1: 14 486 089 Replica 2: 14 496 445 Shard 2 Replica 1: 14 496 609 Replica 2: 14 496 609 Shard 3 Replica 1: 14 492 812 Replica 2: 14 492 812 Shard 4 Replica 1: 14 488 755 Replica 2: 14 488 755 {noformat} How could it be? We didn't see anything like that before upgrade from 4.8.1 to 4.10.2. Also we enabled checkIntegrityAtMerge, could it be the reason? was (Author: aheaven): Now we have 4 shards, each with 2 replics (8 total nodes) and the next picture: {noformat} Shard 1: Replica 1: 14 486 089 Replica 2: 14 496 445 Shard 2 Replica 1: 14 496 609 Replica 2: 14 496 609 Shard 3 Replica 1: 14 492 812 Replica 2: 14 492 812 Shard 4 Replica 1: 14 488 755 Replica 2: 14 488 755 {noformat} How could it be? We didn't see anything like that before upgrade from 4.8.1 to 4.10.2. Also we enabled checkIntegrityAtMerge, could it be the reason? No data integrity between replicas -- Key: SOLR-6875 URL: https://issues.apache.org/jira/browse/SOLR-6875 Project: Solr Issue Type: Bug Affects Versions: 4.10.2 Environment: One replica is @ Linux solr1.devops.wegohealth.com 3.8.0-29-generic #42~precise1-Ubuntu SMP Wed Aug 14 16:19:23 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux Another replica is @ Linux solr2.devops.wegohealth.com 3.16.0-23-generic #30-Ubuntu SMP Thu Oct 16 13:17:16 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Solr is running with the next options: * -Xms12G * -Xmx16G * -XX:+UseConcMarkSweepGC * -XX:+UseLargePages * -XX:+CMSParallelRemarkEnabled * -XX:+ParallelRefProcEnabled * -XX:+UseLargePages * -XX:+AggressiveOpts * -XX:CMSInitiatingOccupancyFraction=75 Reporter: Alexander S. Setup: SolrCloud with 2 shards, each with 2 replicas, 4 nodes in total. Indexing is stopped, one replica of a shard (Solr1) shows 45 574 039 docs, and another (Solr1.1) 45 574 038 docs. Solr1 is the leader, these errors appeared in the logs: {code} ERROR - 2014-12-20 09:54:38.783; org.apache.solr.update.StreamingSolrServers$1; error java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:196) at java.net.SocketInputStream.read(SocketInputStream.java:122) at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160) at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84) at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:260) at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283) at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251) at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:271) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123) at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:682) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:486) at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:233) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) WARN - 2014-12-20 09:54:38.787; org.apache.solr.update.processor.DistributedUpdateProcessor; Error
[jira] [Comment Edited] (SOLR-6875) No data integrity between replicas
[ https://issues.apache.org/jira/browse/SOLR-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14272877#comment-14272877 ] Alexander S. edited comment on SOLR-6875 at 1/11/15 11:33 AM: -- Now we have 4 shards, each with 2 replicas (8 total nodes) and the next picture: {noformat} Shard 1: Replica 1: 14 486 089 Replica 2: 14 496 445 Shard 2 Replica 1: 14 496 609 Replica 2: 14 496 609 Shard 3 Replica 1: 14 492 812 Replica 2: 14 492 812 Shard 4 Replica 1: 14 488 755 Replica 2: 14 488 755 {noformat} How could it be? We didn't see anything like that before upgrade from 4.8.1 to 4.10.2. Also we enabled checkIntegrityAtMerge, could it be the reason? was (Author: aheaven): Now we have 4 shards, each with 2 replicas (8 total nodes) and the next picture: {noformat} Shard 1: Replica 1: *14 486 089* Replica 2: *14 496 445* Shard 2 Replica 1: 14 496 609 Replica 2: 14 496 609 Shard 3 Replica 1: 14 492 812 Replica 2: 14 492 812 Shard 4 Replica 1: 14 488 755 Replica 2: 14 488 755 {noformat} How could it be? We didn't see anything like that before upgrade from 4.8.1 to 4.10.2. Also we enabled checkIntegrityAtMerge, could it be the reason? No data integrity between replicas -- Key: SOLR-6875 URL: https://issues.apache.org/jira/browse/SOLR-6875 Project: Solr Issue Type: Bug Affects Versions: 4.10.2 Environment: One replica is @ Linux solr1.devops.wegohealth.com 3.8.0-29-generic #42~precise1-Ubuntu SMP Wed Aug 14 16:19:23 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux Another replica is @ Linux solr2.devops.wegohealth.com 3.16.0-23-generic #30-Ubuntu SMP Thu Oct 16 13:17:16 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Solr is running with the next options: * -Xms12G * -Xmx16G * -XX:+UseConcMarkSweepGC * -XX:+UseLargePages * -XX:+CMSParallelRemarkEnabled * -XX:+ParallelRefProcEnabled * -XX:+UseLargePages * -XX:+AggressiveOpts * -XX:CMSInitiatingOccupancyFraction=75 Reporter: Alexander S. Setup: SolrCloud with 2 shards, each with 2 replicas, 4 nodes in total. Indexing is stopped, one replica of a shard (Solr1) shows 45 574 039 docs, and another (Solr1.1) 45 574 038 docs. Solr1 is the leader, these errors appeared in the logs: {code} ERROR - 2014-12-20 09:54:38.783; org.apache.solr.update.StreamingSolrServers$1; error java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:196) at java.net.SocketInputStream.read(SocketInputStream.java:122) at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160) at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84) at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:260) at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283) at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251) at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:271) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123) at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:682) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:486) at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:233) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) WARN - 2014-12-20 09:54:38.787; org.apache.solr.update.processor.DistributedUpdateProcessor;
[jira] [Commented] (SOLR-6494) Query filters applied in a wrong order
[ https://issues.apache.org/jira/browse/SOLR-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14262969#comment-14262969 ] Alexander S. commented on SOLR-6494: Correct, and that's exactly my case, because the time is entered by users and differ between queries. I'd love to have something like this working with the standard query parser: {code} fq={!cache=false cost=101}field:value {code} It seems that `cache=false` does actually work, but `cost` doesn't (some parsers, like the frange one, do threat and apply all queries with the `cost` higher than 100 as post filters). Query filters applied in a wrong order -- Key: SOLR-6494 URL: https://issues.apache.org/jira/browse/SOLR-6494 Project: Solr Issue Type: Bug Affects Versions: 4.8.1 Reporter: Alexander S. This query: {code} { fq: [type:Award::Nomination], sort: score desc, start: 0, rows: 20, q: *:* } {code} takes just a few milliseconds, but this one: {code} { fq: [ type:Award::Nomination, created_at_d:[* TO 2014-09-08T23:59:59Z] ], sort: score desc, start: 0, rows: 20, q: *:* } {code} takes almost 15 seconds. I have just ≈12k of documents with type Award::Nomination, but around half a billion with created_at_d field set. And it seems Solr applies the created_at_d filter first going through all documents where this field is set, which is not very smart. I think if it can't do anything better than applying filters in the alphabet order it should apply them in the order they were received. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6494) Query filters applied in a wrong order
[ https://issues.apache.org/jira/browse/SOLR-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259627#comment-14259627 ] Alexander S. commented on SOLR-6494: As I was told already, Solr does not apply filters incrementally, instead each filter runs through the entire data set, then Solr caches the results. In the case with filters that contain ranges cache is not effective, especially when we need NRT search and commits being triggered multiple times per minute. Then big caches make no sense and big autowarming numbers causing Solr to fail. My point is that cache is not always efficient and for such cases Solr need to use another strategy and apply filters incrementally (read as post filters). So this: {quote} By design, fq clauses like this are calculated for the entire document set and the results cached, there is no ordering for that part. Otherwise, how could they be re-used for a different query? {quote} does not work in all cases. Something like this: {code} fq={!cache=false cost=101}field:value # to run as a post filter {code} would definitely solve the problem, but this is not supported. The frange parser has support for this, but it is not always suitable and fails with different errors, like can not use FieldCache on multivalued field: type, etc. Does that look like a missing feature? I mean for me it definitely does, but could this be considered as a wish and implemented some day? How can Solr community help with missing features? Query filters applied in a wrong order -- Key: SOLR-6494 URL: https://issues.apache.org/jira/browse/SOLR-6494 Project: Solr Issue Type: Bug Affects Versions: 4.8.1 Reporter: Alexander S. This query: {code} { fq: [type:Award::Nomination], sort: score desc, start: 0, rows: 20, q: *:* } {code} takes just a few milliseconds, but this one: {code} { fq: [ type:Award::Nomination, created_at_d:[* TO 2014-09-08T23:59:59Z] ], sort: score desc, start: 0, rows: 20, q: *:* } {code} takes almost 15 seconds. I have just ≈12k of documents with type Award::Nomination, but around half a billion with created_at_d field set. And it seems Solr applies the created_at_d filter first going through all documents where this field is set, which is not very smart. I think if it can't do anything better than applying filters in the alphabet order it should apply them in the order they were received. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-6494) Query filters applied in a wrong order
[ https://issues.apache.org/jira/browse/SOLR-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259627#comment-14259627 ] Alexander S. edited comment on SOLR-6494 at 12/28/14 12:50 PM: --- As I was told already, Solr does not apply filters incrementally, instead each filter runs through the entire data set, then Solr caches the results. In the case with filters that contain ranges cache is not effective, especially when we need NRT search and commits being triggered multiple times per minute. Then big caches make no sense and big autowarming numbers causing Solr to fail. My point is that cache is not always efficient and for such cases Solr need to use another strategy and apply filters incrementally (read as post filters). So this: {quote} By design, fq clauses like this are calculated for the entire document set and the results cached, there is no ordering for that part. Otherwise, how could they be re-used for a different query? {quote} does not work in all cases. Something like this: {code} # cost 100 to run as a post filter, but something like post=true would be better I think fq={!cache=false cost=101}field:value {code} would definitely solve the problem, but this is not supported. The frange parser has support for this, but it is not always suitable and fails with different errors, like can not use FieldCache on multivalued field: type, etc. Does that look like a missing feature? I mean for me it definitely does, but could this be considered as a wish and implemented some day? How can Solr community help with missing features? was (Author: aheaven): As I was told already, Solr does not apply filters incrementally, instead each filter runs through the entire data set, then Solr caches the results. In the case with filters that contain ranges cache is not effective, especially when we need NRT search and commits being triggered multiple times per minute. Then big caches make no sense and big autowarming numbers causing Solr to fail. My point is that cache is not always efficient and for such cases Solr need to use another strategy and apply filters incrementally (read as post filters). So this: {quote} By design, fq clauses like this are calculated for the entire document set and the results cached, there is no ordering for that part. Otherwise, how could they be re-used for a different query? {quote} does not work in all cases. Something like this: {code} fq={!cache=false cost=101}field:value # to run as a post filter {code} would definitely solve the problem, but this is not supported. The frange parser has support for this, but it is not always suitable and fails with different errors, like can not use FieldCache on multivalued field: type, etc. Does that look like a missing feature? I mean for me it definitely does, but could this be considered as a wish and implemented some day? How can Solr community help with missing features? Query filters applied in a wrong order -- Key: SOLR-6494 URL: https://issues.apache.org/jira/browse/SOLR-6494 Project: Solr Issue Type: Bug Affects Versions: 4.8.1 Reporter: Alexander S. This query: {code} { fq: [type:Award::Nomination], sort: score desc, start: 0, rows: 20, q: *:* } {code} takes just a few milliseconds, but this one: {code} { fq: [ type:Award::Nomination, created_at_d:[* TO 2014-09-08T23:59:59Z] ], sort: score desc, start: 0, rows: 20, q: *:* } {code} takes almost 15 seconds. I have just ≈12k of documents with type Award::Nomination, but around half a billion with created_at_d field set. And it seems Solr applies the created_at_d filter first going through all documents where this field is set, which is not very smart. I think if it can't do anything better than applying filters in the alphabet order it should apply them in the order they were received. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6494) Query filters applied in a wrong order
[ https://issues.apache.org/jira/browse/SOLR-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259150#comment-14259150 ] Alexander S. commented on SOLR-6494: Just an idea, but what if Solr detecting that the filter does use date rages like [* TO 2014-09-08T23:59:59Z] (or probably any ranges were cache is not very efficient), and if there are other simpler filters in the query, will apply such range filters at last? And probably to already fetched results as a post filter? And probably avoid caching for this filter? That sounds like a good optimization to me. This will avoid losing of more useful filters from the cache, increase warming speed and which is the most important — increase the search speed. [~erickerickson] [~hossman] Best, Alex Query filters applied in a wrong order -- Key: SOLR-6494 URL: https://issues.apache.org/jira/browse/SOLR-6494 Project: Solr Issue Type: Bug Affects Versions: 4.8.1 Reporter: Alexander S. This query: {code} { fq: [type:Award::Nomination], sort: score desc, start: 0, rows: 20, q: *:* } {code} takes just a few milliseconds, but this one: {code} { fq: [ type:Award::Nomination, created_at_d:[* TO 2014-09-08T23:59:59Z] ], sort: score desc, start: 0, rows: 20, q: *:* } {code} takes almost 15 seconds. I have just ≈12k of documents with type Award::Nomination, but around half a billion with created_at_d field set. And it seems Solr applies the created_at_d filter first going through all documents where this field is set, which is not very smart. I think if it can't do anything better than applying filters in the alphabet order it should apply them in the order they were received. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-6494) Query filters applied in a wrong order
[ https://issues.apache.org/jira/browse/SOLR-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259150#comment-14259150 ] Alexander S. edited comment on SOLR-6494 at 12/26/14 5:20 PM: -- Just an idea, but what if Solr detecting that the filter does use date rages like [* TO 2014-09-08T23:59:59Z] (or probably any ranges where cache is not very efficient), and if there are other simpler filters in the query, will apply such range filters at last? And probably to already fetched results as a post filter? And probably avoid caching for this filter? That sounds like a good optimization to me. This will avoid losing of more useful filters from the cache, increase warming speed and which is the most important — increase the search speed. [~erickerickson] [~hossman] Best, Alex was (Author: aheaven): Just an idea, but what if Solr detecting that the filter does use date rages like [* TO 2014-09-08T23:59:59Z] (or probably any ranges were cache is not very efficient), and if there are other simpler filters in the query, will apply such range filters at last? And probably to already fetched results as a post filter? And probably avoid caching for this filter? That sounds like a good optimization to me. This will avoid losing of more useful filters from the cache, increase warming speed and which is the most important — increase the search speed. [~erickerickson] [~hossman] Best, Alex Query filters applied in a wrong order -- Key: SOLR-6494 URL: https://issues.apache.org/jira/browse/SOLR-6494 Project: Solr Issue Type: Bug Affects Versions: 4.8.1 Reporter: Alexander S. This query: {code} { fq: [type:Award::Nomination], sort: score desc, start: 0, rows: 20, q: *:* } {code} takes just a few milliseconds, but this one: {code} { fq: [ type:Award::Nomination, created_at_d:[* TO 2014-09-08T23:59:59Z] ], sort: score desc, start: 0, rows: 20, q: *:* } {code} takes almost 15 seconds. I have just ≈12k of documents with type Award::Nomination, but around half a billion with created_at_d field set. And it seems Solr applies the created_at_d filter first going through all documents where this field is set, which is not very smart. I think if it can't do anything better than applying filters in the alphabet order it should apply them in the order they were received. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-6494) Query filters applied in a wrong order
[ https://issues.apache.org/jira/browse/SOLR-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259150#comment-14259150 ] Alexander S. edited comment on SOLR-6494 at 12/26/14 5:24 PM: -- Just an idea, but what if Solr detecting that the filter does use date rages like [* TO 2014-09-08T23:59:59Z] (or probably any ranges where cache is not very efficient), and if there are other simpler filters in the query, will apply such range filters at last? And probably to already fetched results as a post filter? And probably avoid caching for this filter? That sounds like a good optimization to me. This will avoid losing of more useful filters from the cache, increase warming speed and which is the most important — increase the search speed. Like in the case above, if you have 200m of docs, but only 12k with type:AwardNomination, and query has 2 filters, one with a date range, Solr definitely can detect this and do the right thing instead simply loop through all 200m documents with this cache-inefficient filter. Could this be at least considered as a wish? [~erickerickson] [~hossman] Best, Alex was (Author: aheaven): Just an idea, but what if Solr detecting that the filter does use date rages like [* TO 2014-09-08T23:59:59Z] (or probably any ranges where cache is not very efficient), and if there are other simpler filters in the query, will apply such range filters at last? And probably to already fetched results as a post filter? And probably avoid caching for this filter? That sounds like a good optimization to me. This will avoid losing of more useful filters from the cache, increase warming speed and which is the most important — increase the search speed. [~erickerickson] [~hossman] Best, Alex Query filters applied in a wrong order -- Key: SOLR-6494 URL: https://issues.apache.org/jira/browse/SOLR-6494 Project: Solr Issue Type: Bug Affects Versions: 4.8.1 Reporter: Alexander S. This query: {code} { fq: [type:Award::Nomination], sort: score desc, start: 0, rows: 20, q: *:* } {code} takes just a few milliseconds, but this one: {code} { fq: [ type:Award::Nomination, created_at_d:[* TO 2014-09-08T23:59:59Z] ], sort: score desc, start: 0, rows: 20, q: *:* } {code} takes almost 15 seconds. I have just ≈12k of documents with type Award::Nomination, but around half a billion with created_at_d field set. And it seems Solr applies the created_at_d filter first going through all documents where this field is set, which is not very smart. I think if it can't do anything better than applying filters in the alphabet order it should apply them in the order they were received. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-6494) Query filters applied in a wrong order
[ https://issues.apache.org/jira/browse/SOLR-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259150#comment-14259150 ] Alexander S. edited comment on SOLR-6494 at 12/26/14 5:25 PM: -- Just an idea, but what if Solr detecting that the filter does use date rages like [* TO 2014-09-08T23:59:59Z] (or probably any ranges where cache is not very efficient), and if there are other simpler filters in the query, will apply such range filters at last? And probably to already fetched results as a post filter? And probably avoid caching for this filter? That sounds like a good optimization to me. This will avoid losing of more useful filters from the cache, increase warming speed and which is the most important — increase the search speed. Like in the case above, if you have 200m of docs, but only 12k with type:AwardNomination, and query has 2 filters, one with a date range, Solr definitely can detect this and do the right thing instead simply loop through all 200m documents with this cache-inefficient filter. Could this be at least considered as a wish and reopened? [~erickerickson] [~hossman] Best, Alex was (Author: aheaven): Just an idea, but what if Solr detecting that the filter does use date rages like [* TO 2014-09-08T23:59:59Z] (or probably any ranges where cache is not very efficient), and if there are other simpler filters in the query, will apply such range filters at last? And probably to already fetched results as a post filter? And probably avoid caching for this filter? That sounds like a good optimization to me. This will avoid losing of more useful filters from the cache, increase warming speed and which is the most important — increase the search speed. Like in the case above, if you have 200m of docs, but only 12k with type:AwardNomination, and query has 2 filters, one with a date range, Solr definitely can detect this and do the right thing instead simply loop through all 200m documents with this cache-inefficient filter. Could this be at least considered as a wish? [~erickerickson] [~hossman] Best, Alex Query filters applied in a wrong order -- Key: SOLR-6494 URL: https://issues.apache.org/jira/browse/SOLR-6494 Project: Solr Issue Type: Bug Affects Versions: 4.8.1 Reporter: Alexander S. This query: {code} { fq: [type:Award::Nomination], sort: score desc, start: 0, rows: 20, q: *:* } {code} takes just a few milliseconds, but this one: {code} { fq: [ type:Award::Nomination, created_at_d:[* TO 2014-09-08T23:59:59Z] ], sort: score desc, start: 0, rows: 20, q: *:* } {code} takes almost 15 seconds. I have just ≈12k of documents with type Award::Nomination, but around half a billion with created_at_d field set. And it seems Solr applies the created_at_d filter first going through all documents where this field is set, which is not very smart. I think if it can't do anything better than applying filters in the alphabet order it should apply them in the order they were received. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-6494) Query filters applied in a wrong order
[ https://issues.apache.org/jira/browse/SOLR-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259150#comment-14259150 ] Alexander S. edited comment on SOLR-6494 at 12/26/14 5:27 PM: -- Just an idea, but what if Solr detecting that the filter does use date rages like [* TO 2014-09-08T23:59:59Z] (or probably any ranges where cache is not very efficient), and if there are other simpler filters in the query, will apply such range filters at last? And probably to already fetched results as a post filter? And probably avoid caching for this filter? That sounds like a good optimization to me. This will avoid losing of more useful filters from the cache, increase warming speed and which is the most important — increase the search speed. Like in the case above, if you have 200m of docs, but only 12k with type:AwardNomination, and query has 2 filters where one with a date range. Solr definitely can detect this and do the right thing instead of simply looping through all 200m documents with this cache-inefficient filter. Could this be at least considered as a wish and reopened? [~erickerickson] [~hossman] Best, Alex was (Author: aheaven): Just an idea, but what if Solr detecting that the filter does use date rages like [* TO 2014-09-08T23:59:59Z] (or probably any ranges where cache is not very efficient), and if there are other simpler filters in the query, will apply such range filters at last? And probably to already fetched results as a post filter? And probably avoid caching for this filter? That sounds like a good optimization to me. This will avoid losing of more useful filters from the cache, increase warming speed and which is the most important — increase the search speed. Like in the case above, if you have 200m of docs, but only 12k with type:AwardNomination, and query has 2 filters, one with a date range, Solr definitely can detect this and do the right thing instead simply loop through all 200m documents with this cache-inefficient filter. Could this be at least considered as a wish and reopened? [~erickerickson] [~hossman] Best, Alex Query filters applied in a wrong order -- Key: SOLR-6494 URL: https://issues.apache.org/jira/browse/SOLR-6494 Project: Solr Issue Type: Bug Affects Versions: 4.8.1 Reporter: Alexander S. This query: {code} { fq: [type:Award::Nomination], sort: score desc, start: 0, rows: 20, q: *:* } {code} takes just a few milliseconds, but this one: {code} { fq: [ type:Award::Nomination, created_at_d:[* TO 2014-09-08T23:59:59Z] ], sort: score desc, start: 0, rows: 20, q: *:* } {code} takes almost 15 seconds. I have just ≈12k of documents with type Award::Nomination, but around half a billion with created_at_d field set. And it seems Solr applies the created_at_d filter first going through all documents where this field is set, which is not very smart. I think if it can't do anything better than applying filters in the alphabet order it should apply them in the order they were received. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6875) No data integrity between replicas
Alexander S. created SOLR-6875: -- Summary: No data integrity between replicas Key: SOLR-6875 URL: https://issues.apache.org/jira/browse/SOLR-6875 Project: Solr Issue Type: Bug Reporter: Alexander S. Setup: SolrCloud with 2 shards, each with 2 replicas, 4 nodes in total. Indexing is stopped, one replica of a shard (Solr1) shows 45 574 039 docs, and another (Solr1.1) 45 574 038 docs. Solr1 is the leader, these errors appeared in the logs: {code} ERROR - 2014-12-20 09:54:38.783; org.apache.solr.update.StreamingSolrServers$1; error java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:196) at java.net.SocketInputStream.read(SocketInputStream.java:122) at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160) at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84) at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:260) at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283) at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251) at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:271) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123) at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:682) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:486) at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:233) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) WARN - 2014-12-20 09:54:38.787; org.apache.solr.update.processor.DistributedUpdateProcessor; Error sending update java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:196) at java.net.SocketInputStream.read(SocketInputStream.java:122) at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160) at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84) at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:260) at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283) at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251) at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:271) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123) at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:682) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:486) at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
[jira] [Updated] (SOLR-6875) No data integrity between replicas
[ https://issues.apache.org/jira/browse/SOLR-6875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander S. updated SOLR-6875: --- Environment: One replica is @ Linux solr1.devops.wegohealth.com 3.8.0-29-generic #42~precise1-Ubuntu SMP Wed Aug 14 16:19:23 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux Another replica is @ Linux solr2.devops.wegohealth.com 3.16.0-23-generic #30-Ubuntu SMP Thu Oct 16 13:17:16 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Solr is running with the next options: * -Xms12G * -Xmx16G * -XX:+UseConcMarkSweepGC * -XX:+UseLargePages * -XX:+CMSParallelRemarkEnabled * -XX:+ParallelRefProcEnabled * -XX:+UseLargePages * -XX:+AggressiveOpts * -XX:CMSInitiatingOccupancyFraction=75 Affects Version/s: 4.10.2 No data integrity between replicas -- Key: SOLR-6875 URL: https://issues.apache.org/jira/browse/SOLR-6875 Project: Solr Issue Type: Bug Affects Versions: 4.10.2 Environment: One replica is @ Linux solr1.devops.wegohealth.com 3.8.0-29-generic #42~precise1-Ubuntu SMP Wed Aug 14 16:19:23 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux Another replica is @ Linux solr2.devops.wegohealth.com 3.16.0-23-generic #30-Ubuntu SMP Thu Oct 16 13:17:16 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux Solr is running with the next options: * -Xms12G * -Xmx16G * -XX:+UseConcMarkSweepGC * -XX:+UseLargePages * -XX:+CMSParallelRemarkEnabled * -XX:+ParallelRefProcEnabled * -XX:+UseLargePages * -XX:+AggressiveOpts * -XX:CMSInitiatingOccupancyFraction=75 Reporter: Alexander S. Setup: SolrCloud with 2 shards, each with 2 replicas, 4 nodes in total. Indexing is stopped, one replica of a shard (Solr1) shows 45 574 039 docs, and another (Solr1.1) 45 574 038 docs. Solr1 is the leader, these errors appeared in the logs: {code} ERROR - 2014-12-20 09:54:38.783; org.apache.solr.update.StreamingSolrServers$1; error java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:196) at java.net.SocketInputStream.read(SocketInputStream.java:122) at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160) at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84) at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:273) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140) at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:260) at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:283) at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:251) at org.apache.http.impl.conn.ManagedClientConnectionImpl.receiveResponseHeader(ManagedClientConnectionImpl.java:197) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:271) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:123) at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:682) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:486) at org.apache.http.impl.client.AbstractHttpClient.doExecute(AbstractHttpClient.java:863) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106) at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:233) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) WARN - 2014-12-20 09:54:38.787; org.apache.solr.update.processor.DistributedUpdateProcessor; Error sending update java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:196) at java.net.SocketInputStream.read(SocketInputStream.java:122) at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:160) at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:84) at
[jira] [Commented] (SOLR-6769) Election bug
[ https://issues.apache.org/jira/browse/SOLR-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14255146#comment-14255146 ] Alexander S. commented on SOLR-6769: Correct, an endless warming was causing this problem. So this is a bug in Solr, it waits for searchers to end warming, which could take up to 5 minutes in some cases. The node itself goes down and does not accept connections but the ellection does not happen. Election bug Key: SOLR-6769 URL: https://issues.apache.org/jira/browse/SOLR-6769 Project: Solr Issue Type: Bug Reporter: Alexander S. Attachments: Screenshot 876.png Hello, I have a very simple set up: 2 shards and 2 replicas (4 nodes in total). What I did is just stopped the shards, but if first shard stopped immediately the second one took about 5 minutes to stop. You can see on the screenshot what happened next. In short: 1. Shard 1 stopped normally 3. Replica 1 became a leader 2. Shard 2 still was performing some job but wasn't accepting connection 4. Replica 2 did not became a leader because Shard 2 is still there but doesn't work 5. Entire cluster went down until Shard 2 stopped and Replica 2 became a leader Marked as critical because this shuts down the entire cluster. Please adjust if I am wrong. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6769) Election bug
[ https://issues.apache.org/jira/browse/SOLR-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14252440#comment-14252440 ] Alexander S. commented on SOLR-6769: This might be related: http://lucene.472066.n3.nabble.com/Endless-100-CPU-usage-on-searcherExecutor-thread-td4175088.html Election bug Key: SOLR-6769 URL: https://issues.apache.org/jira/browse/SOLR-6769 Project: Solr Issue Type: Bug Reporter: Alexander S. Attachments: Screenshot 876.png Hello, I have a very simple set up: 2 shards and 2 replicas (4 nodes in total). What I did is just stopped the shards, but if first shard stopped immediately the second one took about 5 minutes to stop. You can see on the screenshot what happened next. In short: 1. Shard 1 stopped normally 3. Replica 1 became a leader 2. Shard 2 still was performing some job but wasn't accepting connection 4. Replica 2 did not became a leader because Shard 2 is still there but doesn't work 5. Entire cluster went down until Shard 2 stopped and Replica 2 became a leader Marked as critical because this shuts down the entire cluster. Please adjust if I am wrong. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6769) Election bug
[ https://issues.apache.org/jira/browse/SOLR-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14239203#comment-14239203 ] Alexander S. commented on SOLR-6769: Hi, yes, my terminology about shards and replicas wasn't clear, let me explain this better. * Solr: 4.8.1 * Java: java version 1.7.0_51 Java(TM) SE Runtime Environment (build 1.7.0_51-b13) Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode) * We have 5 servers, 2 of which are big (16 CPU cores, 48G of RAM each) and 3 others are small (1 CPU and 1G of RAM). All servers have rapid SSD RAID 10. Each server runs a ZK instance, so we have 5 ZK instances in total. Those big servers also run Solr: the first one runs 2 instances and the second one also runs 2 replicas (so each shard has 2 replicas, the simplest SolrCloud setup from the wiki). So the cluster looks like this: {noformat} * Small 1G node: ZK * Small 1G node: ZK * Small 1G node: ZK * Big 16G node: ZK, Solr1, Solr2 * Big 16G node: ZK, Solr1.1, Solr2.1 {noformat} Stopped manually means I tried to manually stop Solr1 and Solr2, which were the leaders, by sending a TERM signal (we have service files so I did service stop and was expecting a graceful shut down). This was working for Solr1 and it went down normally and Solr1.1 became the leader instantly. Then I tried to do the same for Solr2, but once I sent the TERM it became not operable but didn't exit completely (orange on the screenshot), the process was still running for ≈ 5-10 minutes and the election didn't happen. As a result I get no node hosting shard errors, but was expecting Solr2.1 to become the leader instantly as it was with Solr1.1. As I understand this, the Solr2 didn't shut down instantly because there could be some background jobs, e.g. index merging, an in process commit, etc, *but then it should not stop accepting connections and should not change its status to down* until all background jobs are finished and it s really ready to go down and pass leadership to the Solr2.1. It seems like a bug in Solr, because all services were working normally, all ZK instances were up and operable, and Solr itself wasn't under a heavy load. Otherwise could you please point me where to look for any information about how to gracefully shut down instances? It would be good to have a button in the web UI to be able to force a replica to become the leader with one click. So then I would be able to force Solr1.1 and Solr 2.1 to become the leaders, wait until this happen and safely reboot Solr1 and solr2 instances. Best, Alexander Election bug Key: SOLR-6769 URL: https://issues.apache.org/jira/browse/SOLR-6769 Project: Solr Issue Type: Bug Reporter: Alexander S. Attachments: Screenshot 876.png Hello, I have a very simple set up: 2 shards and 2 replicas (4 nodes in total). What I did is just stopped the shards, but if first shard stopped immediately the second one took about 5 minutes to stop. You can see on the screenshot what happened next. In short: 1. Shard 1 stopped normally 3. Replica 1 became a leader 2. Shard 2 still was performing some job but wasn't accepting connection 4. Replica 2 did not became a leader because Shard 2 is still there but doesn't work 5. Entire cluster went down until Shard 2 stopped and Replica 2 became a leader Marked as critical because this shuts down the entire cluster. Please adjust if I am wrong. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6769) Election bug
Alexander S. created SOLR-6769: -- Summary: Election bug Key: SOLR-6769 URL: https://issues.apache.org/jira/browse/SOLR-6769 Project: Solr Issue Type: Bug Reporter: Alexander S. Priority: Critical Hello, I have a very simple set up: 2 shards and 2 replicas (4 nodes in total). What I did is just stopped the shards, but if first shard stopped immediately the second one took about 5 minutes to stop. You can see on the screenshot what happened next. In short: 1. Shard 1 stopped normally 3. Replica 1 became a leader 2. Shard 2 still was performing some job but wasn't accepting connection 4. Replica 2 did not became a leader because Shard 2 is still there but doesn't work 5. Entire cluster went down until Shard 2 stopped and Replica 2 became a leader Marked as critical because this shuts down the entire cluster. Please adjust if I am wrong. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6769) Election bug
[ https://issues.apache.org/jira/browse/SOLR-6769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander S. updated SOLR-6769: --- Attachment: Screenshot 876.png [^Screenshot 876.png] Election bug Key: SOLR-6769 URL: https://issues.apache.org/jira/browse/SOLR-6769 Project: Solr Issue Type: Bug Reporter: Alexander S. Priority: Critical Attachments: Screenshot 876.png Hello, I have a very simple set up: 2 shards and 2 replicas (4 nodes in total). What I did is just stopped the shards, but if first shard stopped immediately the second one took about 5 minutes to stop. You can see on the screenshot what happened next. In short: 1. Shard 1 stopped normally 3. Replica 1 became a leader 2. Shard 2 still was performing some job but wasn't accepting connection 4. Replica 2 did not became a leader because Shard 2 is still there but doesn't work 5. Entire cluster went down until Shard 2 stopped and Replica 2 became a leader Marked as critical because this shuts down the entire cluster. Please adjust if I am wrong. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6494) Query filters applied in a wrong order
[ https://issues.apache.org/jira/browse/SOLR-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14131495#comment-14131495 ] Alexander S. commented on SOLR-6494: So I've added a new field nominated_at_d to all docs with type Award::Nomination. Now this query: {code} { fq: [ type:Award::Nomination, nominated_at_d:[* TO 2014-09-08T23:59:59Z] ], sort: score desc, start: 0, rows: 20, q: *:* } {code} doesn't take longer than a few milliseconds. The new nominated_at_d is the same field as created_at_d, the only difference is that there are only ≈ 12k of documents with nominated_at_d field and ≈ 100m with created_at_d. So again, I am saying that current way Solr applies filters is not optimal, sometimes we need to skip cache and apply filters incrementally. So each filter doesn't have to go through entire collection, so we can filter this way: {code} 200m docs → filter (type:Award::Nomination) → 12k docs → filter (created_at_d:[* TO 2014-09-08T23:59:59Z]) → 500 docs {code} I don't think the *entire* solr user community can do anything with this, but a few solr developers could. Do I have to be an solr expert to report bugs/feature leaks? Query filters applied in a wrong order -- Key: SOLR-6494 URL: https://issues.apache.org/jira/browse/SOLR-6494 Project: Solr Issue Type: Bug Affects Versions: 4.8.1 Reporter: Alexander S. This query: {code} { fq: [type:Award::Nomination], sort: score desc, start: 0, rows: 20, q: *:* } {code} takes just a few milliseconds, but this one: {code} { fq: [ type:Award::Nomination, created_at_d:[* TO 2014-09-08T23:59:59Z] ], sort: score desc, start: 0, rows: 20, q: *:* } {code} takes almost 15 seconds. I have just ≈12k of documents with type Award::Nomination, but around half a billion with created_at_d field set. And it seems Solr applies the created_at_d filter first going through all documents where this field is set, which is not very smart. I think if it can't do anything better than applying filters in the alphabet order it should apply them in the order they were received. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6494) Query filters applied in a wrong order
[ https://issues.apache.org/jira/browse/SOLR-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14129956#comment-14129956 ] Alexander S. commented on SOLR-6494: Added the schema and debug output here: http://lucene.472066.n3.nabble.com/Help-with-a-slow-filter-query-td4158159.html Query filters applied in a wrong order -- Key: SOLR-6494 URL: https://issues.apache.org/jira/browse/SOLR-6494 Project: Solr Issue Type: Bug Affects Versions: 4.8.1 Reporter: Alexander S. This query: {code} { fq: [type:Award::Nomination], sort: score desc, start: 0, rows: 20, q: *:* } {code} takes just a few milliseconds, but this one: {code} { fq: [ type:Award::Nomination, created_at_d:[* TO 2014-09-08T23:59:59Z] ], sort: score desc, start: 0, rows: 20, q: *:* } {code} takes almost 15 seconds. I have just ≈12k of documents with type Award::Nomination, but around half a billion with created_at_d field set. And it seems Solr applies the created_at_d filter first going through all documents where this field is set, which is not very smart. I think if it can't do anything better than applying filters in the alphabet order it should apply them in the order they were received. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without enablePositionIncrements=false
[ https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130090#comment-14130090 ] Alexander S. commented on SOLR-6468: Just tried to add matchVersion but got this error: {code} null:org.apache.solr.common.SolrException: Unable to create core: crm-prod at org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:911) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:568) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:261) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:253) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.solr.common.SolrException: Could not load core configuration for core crm-prod at org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:66) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:554) ... 8 more Caused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType words_ngram: Plugin init failure for [schema.xml] analyzer/filter: Error instantiating class: 'org.apache.lucene.analysis.core.StopFilterFactory'. Schema file is /etc/solr/core2/schema.xml at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:616) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:166) at org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55) at org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69) at org.apache.solr.core.ConfigSetService.createIndexSchema(ConfigSetService.java:89) at org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:62) ... 9 more Caused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType words_ngram: Plugin init failure for [schema.xml] analyzer/filter: Error instantiating class: 'org.apache.lucene.analysis.core.StopFilterFactory' at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177) at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:470) ... 14 more Caused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] analyzer/filter: Error instantiating class: 'org.apache.lucene.analysis.core.StopFilterFactory' at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177) at org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:400) at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:86) at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43) at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151) ... 15 more Caused by: org.apache.solr.common.SolrException: Error instantiating class: 'org.apache.lucene.analysis.core.StopFilterFactory' at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:606) at org.apache.solr.schema.FieldTypePluginLoader$3.create(FieldTypePluginLoader.java:382) at org.apache.solr.schema.FieldTypePluginLoader$3.create(FieldTypePluginLoader.java:376) at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151) ... 19 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:408) at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:603) ... 22 more Caused by: java.lang.IllegalArgumentException: Unknown parameters: {matchVersion=4.3} at org.apache.lucene.analysis.core.StopFilterFactory.init(StopFilterFactory.java:91) ... 27 more {code} Regression: StopFilterFactory doesn't work properly without enablePositionIncrements=false Key: SOLR-6468 URL: https://issues.apache.org/jira/browse/SOLR-6468 Project: Solr
[jira] [Commented] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without enablePositionIncrements=false
[ https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14130209#comment-14130209 ] Alexander S. commented on SOLR-6468: Thanks, it does work with luceneMatchVersion=4.3, isn't this deprecated? Any chance to return enablePositionIncrements? Regression: StopFilterFactory doesn't work properly without enablePositionIncrements=false Key: SOLR-6468 URL: https://issues.apache.org/jira/browse/SOLR-6468 Project: Solr Issue Type: Bug Affects Versions: 4.8.1, 4.9 Reporter: Alexander S. Setup: * Schema version is 1.5 * Field config: {code} fieldType name=words_ngram class=solr.TextField omitNorms=false autoGeneratePhraseQueries=true analyzer tokenizer class=solr.PatternTokenizerFactory pattern=[^\w]+ / filter class=solr.StopFilterFactory words=url_stopwords.txt ignoreCase=true / filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType {code} * Stop words: {code} http https ftp www {code} So very simple. In the index I have: * twitter.com/testuser All these queries do match: * twitter.com/testuser * com/testuser * testuser But none of these does: * https://twitter.com/testuser * https://www.twitter.com/testuser * www.twitter.com/testuser Debug output shows: parsedquery_toString: +(url_words_ngram:\? twitter com testuser\) But we need: parsedquery_toString: +(url_words_ngram:\twitter com testuser\) Complete debug outputs: * a valid search: http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za * an invalid search: http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww The complete discussion and explanation of the problem is here: http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html I didn't find a clear explanation how can we upgrade Solr, there's no any replacement or a workarround to this, so this is not just a major change but a major disrespect to all existing Solr users who are using this feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6494) Query filters applied in a wrong order
[ https://issues.apache.org/jira/browse/SOLR-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14128322#comment-14128322 ] Alexander S. commented on SOLR-6494: Unfortunately that doesn't solve the problem completely, these queries take ≈7 seconds instead of 15: {code} {!cache=false}type:Award::Nomination {!cache=false cost=10}created_at_d:[* TO 2014-09-08T23:59:59Z] {code} Which is still not good since I have only 11 974 docs with type:Award::Nomination and 139 716 883 with created_at_d:[* TO 2014-09-08T23:59:59Z]. if the cost parameter tells Solr to apply cheapest filters first why the query still takes so long? It seems even though it doesn't run them in parallel filters still don't know of each other and go through all docs. My point is that it would be much faster if it could run filters one by one and if each next filter would work not with the entire data set but with results returned from the previous filter. Also tried cost = 100 to apply a filter as a post filter, but nothing changes, same 7 seconds. Filter cache doesn't help here. So this: By design, fq clauses like this are calculated for the entire document set and the results cached, there is no ordering for that part. doesn't sound right to me. Sometimes we don't need to reuse filters (and sometimes even can't, e.g. the cost option requires cache=false). In the provided use case the way Solr applies filters is more harmful than useful. I'd even say more than 600 times harmful. The query that wouldn't take more than a second in MySQL takes 15 seconds in a search engine that uses rapid SSD RAID 10, has a few shards and replicas, uses more that 160G of memory in total and has ≈40 CPU cores. Thus this sounds like a feature leak (at least). Please share your thoughts on this. Query filters applied in a wrong order -- Key: SOLR-6494 URL: https://issues.apache.org/jira/browse/SOLR-6494 Project: Solr Issue Type: Bug Affects Versions: 4.8.1 Reporter: Alexander S. This query: {code} { fq: [type:Award::Nomination], sort: score desc, start: 0, rows: 20, q: *:* } {code} takes just a few milliseconds, but this one: {code} { fq: [ type:Award::Nomination, created_at_d:[* TO 2014-09-08T23:59:59Z] ], sort: score desc, start: 0, rows: 20, q: *:* } {code} takes almost 15 seconds. I have just ≈12k of documents with type Award::Nomination, but around half a billion with created_at_d field set. And it seems Solr applies the created_at_d filter first going through all documents where this field is set, which is not very smart. I think if it can't do anything better than applying filters in the alphabet order it should apply them in the order they were received. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6494) Query filters applied in a wrong order
Alexander S. created SOLR-6494: -- Summary: Query filters applied in a wrong order Key: SOLR-6494 URL: https://issues.apache.org/jira/browse/SOLR-6494 Project: Solr Issue Type: Bug Affects Versions: 4.8.1 Reporter: Alexander S. This query: {code} { fq: [type:Award::Nomination], sort: score desc, start: 0, rows: 20, q: *:* } {code} takes just a few milliseconds, but this one: {code} { fq: [ type:Award::Nomination, created_at_d:[* TO 2014-09-08T23:59:59Z] ], sort: score desc, start: 0, rows: 20, q: *:* } {code} takes almost 15 seconds. I have just ≈12k of documents with type Award::Nomination, but around half a billion with created_at_d field set. And it seems Solr applies the created_at_d filter first going through all documents where this field is set, which is not very smart. I think if it can't do anything better than applying filters in the alphabet order it should apply them in the order they were received. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-6494) Query filters applied in a wrong order
[ https://issues.apache.org/jira/browse/SOLR-6494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14127304#comment-14127304 ] Alexander S. commented on SOLR-6494: Hi, thank you for the explanation, but I think sometimes (like in this case) it would be much more efficient to run filters one by one. It seems that the cost parameter should do what I need, e.g.: {code} {!cost=1}type:Award::Nomination {!cost=10}created_at_d:[* TO 2014-09-08T23:59:59Z] {code} Query filters applied in a wrong order -- Key: SOLR-6494 URL: https://issues.apache.org/jira/browse/SOLR-6494 Project: Solr Issue Type: Bug Affects Versions: 4.8.1 Reporter: Alexander S. This query: {code} { fq: [type:Award::Nomination], sort: score desc, start: 0, rows: 20, q: *:* } {code} takes just a few milliseconds, but this one: {code} { fq: [ type:Award::Nomination, created_at_d:[* TO 2014-09-08T23:59:59Z] ], sort: score desc, start: 0, rows: 20, q: *:* } {code} takes almost 15 seconds. I have just ≈12k of documents with type Award::Nomination, but around half a billion with created_at_d field set. And it seems Solr applies the created_at_d filter first going through all documents where this field is set, which is not very smart. I think if it can't do anything better than applying filters in the alphabet order it should apply them in the order they were received. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without enablePositionIncrements=false
Alexander S. created SOLR-6468: -- Summary: Regression: StopFilterFactory doesn't work properly without enablePositionIncrements=false Key: SOLR-6468 URL: https://issues.apache.org/jira/browse/SOLR-6468 Project: Solr Issue Type: Bug Affects Versions: 4.9, 4.8.1 Reporter: Alexander S. Setup: * Schema version is 1.5 * Field config: {code} fieldType name=words_ngram class=solr.TextField omitNorms=false autoGeneratePhraseQueries=true analyzer tokenizer class=solr.PatternTokenizerFactory pattern=[^\w]+ / filter class=solr.StopFilterFactory words=url_stopwords.txt ignoreCase=true / filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType {code} * Stop words: {code} http https ftp www {code} So very simple. In the index I have: * twitter.com/testuser All these queries do match: * twitter.com/testuser * com/testuser * testuser But none of these does: * https://twitter.com/testuser * https://www.twitter.com/testuser * www.twitter.com/testuser Debug output shows: parsedquery_toString: +(url_words_ngram:\? twitter com zer0sleep\) But we need: parsedquery_toString: +(url_words_ngram:\twitter com zer0sleep\) Complete debug outputs: * a valid search: http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za * an invalid search: http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww The complete discussion and explanation of the problem is here: http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html I didn't find a clear explanation how can we upgrade Solr, there's no any replacement or a workarround to this, so this is not just a major change but a major disrespect to all existing Solr users who are using this feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without enablePositionIncrements=false
[ https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander S. updated SOLR-6468: --- Description: Setup: * Schema version is 1.5 * Field config: {code} fieldType name=words_ngram class=solr.TextField omitNorms=false autoGeneratePhraseQueries=true analyzer tokenizer class=solr.PatternTokenizerFactory pattern=[^\w]+ / filter class=solr.StopFilterFactory words=url_stopwords.txt ignoreCase=true / filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType {code} * Stop words: {code} http https ftp www {code} So very simple. In the index I have: * twitter.com/testuser All these queries do match: * twitter.com/testuser * com/testuser * testuser But none of these does: * https://twitter.com/testuser * https://www.twitter.com/testuser * www.twitter.com/testuser Debug output shows: parsedquery_toString: +(url_words_ngram:\? twitter com testuser\) But we need: parsedquery_toString: +(url_words_ngram:\twitter com testuser\) Complete debug outputs: * a valid search: http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za * an invalid search: http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww The complete discussion and explanation of the problem is here: http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html I didn't find a clear explanation how can we upgrade Solr, there's no any replacement or a workarround to this, so this is not just a major change but a major disrespect to all existing Solr users who are using this feature. was: Setup: * Schema version is 1.5 * Field config: {code} fieldType name=words_ngram class=solr.TextField omitNorms=false autoGeneratePhraseQueries=true analyzer tokenizer class=solr.PatternTokenizerFactory pattern=[^\w]+ / filter class=solr.StopFilterFactory words=url_stopwords.txt ignoreCase=true / filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType {code} * Stop words: {code} http https ftp www {code} So very simple. In the index I have: * twitter.com/testuser All these queries do match: * twitter.com/testuser * com/testuser * testuser But none of these does: * https://twitter.com/testuser * https://www.twitter.com/testuser * www.twitter.com/testuser Debug output shows: parsedquery_toString: +(url_words_ngram:\? twitter com zer0sleep\) But we need: parsedquery_toString: +(url_words_ngram:\twitter com zer0sleep\) Complete debug outputs: * a valid search: http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za * an invalid search: http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww The complete discussion and explanation of the problem is here: http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html I didn't find a clear explanation how can we upgrade Solr, there's no any replacement or a workarround to this, so this is not just a major change but a major disrespect to all existing Solr users who are using this feature. Regression: StopFilterFactory doesn't work properly without enablePositionIncrements=false Key: SOLR-6468 URL: https://issues.apache.org/jira/browse/SOLR-6468 Project: Solr Issue Type: Bug Affects Versions: 4.8.1, 4.9 Reporter: Alexander S. Setup: * Schema version is 1.5 * Field config: {code} fieldType name=words_ngram class=solr.TextField omitNorms=false autoGeneratePhraseQueries=true analyzer tokenizer class=solr.PatternTokenizerFactory pattern=[^\w]+ / filter class=solr.StopFilterFactory words=url_stopwords.txt ignoreCase=true / filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType {code} * Stop words: {code} http https ftp www {code} So very simple. In the index I have: * twitter.com/testuser All these queries do match: * twitter.com/testuser * com/testuser * testuser But none of these does: * https://twitter.com/testuser * https://www.twitter.com/testuser * www.twitter.com/testuser Debug output shows: parsedquery_toString: +(url_words_ngram:\? twitter com testuser\) But we need: parsedquery_toString: +(url_words_ngram:\twitter com testuser\) Complete debug outputs: * a valid search: http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za * an invalid search: http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww The complete discussion and explanation of the problem is here: http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html I didn't find a clear explanation how can we upgrade Solr, there's no any replacement or a workarround to this, so this is not just a major change but a major disrespect to all existing Solr users who are using this feature. -- This message was sent by Atlassian JIRA
[jira] [Commented] (SOLR-6468) Regression: StopFilterFactory doesn't work properly without enablePositionIncrements=false
[ https://issues.apache.org/jira/browse/SOLR-6468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14118078#comment-14118078 ] Alexander S. commented on SOLR-6468: Correct, but isn't this behavior deprecated? I mean matchVersion=4.3? I was told this could get removed from 5.0 as well. If I do understand the problem correctly enablePositionIncrements=false could generate wrong tokens for those who do not know how to use this option correctly? It seems it requires a custom tokenizer and solr.PatternTokenizerFactory in my example should work properly. So instead of removing the option the problem with wrong tokens could be explained in the readme and the option could be kept for those who really needs it. That makes more sense to me than simply removing it. Anyway, is there any chance the option could be restored? My usecase should clearly show how useful it might be. And I was trying to google the problem, there's a lot of complaints about this, but no solutions. Regression: StopFilterFactory doesn't work properly without enablePositionIncrements=false Key: SOLR-6468 URL: https://issues.apache.org/jira/browse/SOLR-6468 Project: Solr Issue Type: Bug Affects Versions: 4.8.1, 4.9 Reporter: Alexander S. Setup: * Schema version is 1.5 * Field config: {code} fieldType name=words_ngram class=solr.TextField omitNorms=false autoGeneratePhraseQueries=true analyzer tokenizer class=solr.PatternTokenizerFactory pattern=[^\w]+ / filter class=solr.StopFilterFactory words=url_stopwords.txt ignoreCase=true / filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType {code} * Stop words: {code} http https ftp www {code} So very simple. In the index I have: * twitter.com/testuser All these queries do match: * twitter.com/testuser * com/testuser * testuser But none of these does: * https://twitter.com/testuser * https://www.twitter.com/testuser * www.twitter.com/testuser Debug output shows: parsedquery_toString: +(url_words_ngram:\? twitter com testuser\) But we need: parsedquery_toString: +(url_words_ngram:\twitter com testuser\) Complete debug outputs: * a valid search: http://pastie.org/pastes/9500661/text?key=rgqj5ivlgsbk1jxsudx9za * an invalid search: http://pastie.org/pastes/9500662/text?key=b4zlh2oaxtikd8jvo5xaww The complete discussion and explanation of the problem is here: http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-td4153839.html I didn't find a clear explanation how can we upgrade Solr, there's no any replacement or a workarround to this, so this is not just a major change but a major disrespect to all existing Solr users who are using this feature. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems
[ https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14092884#comment-14092884 ] Alexander S. commented on SOLR-3274: Hi, thanks for the response. bq. Well you never know I've checked nodes status, that 3rd node was online all the time and there were no any load on it. bq. In a 3-node ZK-cluster you need at least 2 healthy ZK-nodes connected with each other for the cluster to be operational. That should be the problem since 2 other ZK instances might be (theoretically) unavailable because of heavy load (since they share same nodes with Solr instances). Both nodes have 16 CPU cores, 48G of memory and RAID 10 (SSD), I thought it would be hard to get performance issues there. Anyway, adding a separate node with 4th zookeeper instance might help, right? ZooKeeper related SolrCloud problems Key: SOLR-3274 URL: https://issues.apache.org/jira/browse/SOLR-3274 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0-ALPHA Environment: Any Reporter: Per Steffensen Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 Solr servers, running 28 slices of the same collection (collA) - all slices have one replica (two shards all in all - leader + replica) - 56 cores all in all (8 shards on each solr instance). But anyways... Besides the problem reported in SOLR-3273, the system seems to run fine under high load for several hours, but eventually errors like the ones shown below start to occur. I might be wrong, but they all seem to indicate some kind of unstability in the collaboration between Solr and ZooKeeper. I have to say that I havnt been there to check ZooKeeper at the moment where those exception occur, but basically I dont believe the exceptions occur because ZooKeeper is not running stable - at least when I go and check ZooKeeper through other channels (e.g. my eclipse ZK plugin) it is always accepting my connection and generally seems to be doing fine. Exception 1) Often the first error we see in solr.log is something like this {code} Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are disabled. at org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) {code} I believe this error basically occurs because SolrZkClient.isConnected reports false, which means that its internal keeper.getState does not return ZooKeeper.States.CONNECTED.
[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems
[ https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090519#comment-14090519 ] Alexander S. commented on SOLR-3274: Suffering from the same problem, happens during high load on the nodes. Our setup is pretty simple, 4 nodes: 2 shards, 2 replicas and 3 zookeeper instance. Everything is running on 3 physical nodes: * 1st node — 1 zookeeper instance * 2nd node — 2 shards and 1 zookeeper * 3rd node — 2 replicas and 1 zookeeper And running solr instances this way: java -Xms2G -Xmx16G -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 -DzkHost=zoo1.devops:2181,zoo2.devops:2181,zoo3.devops:2181 -Dcollection.configName=Carmen -Dbootstrap_confdir=./solr/conf -Dbootstrap_conf=true -DnumShards=2 -jar start.jar etc/jetty.xml And once loading increases we get: {code} org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are disabled. at org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:1306) at org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:981) at org.apache.solr.update.processor.LogUpdateProcessor.processDelete(LogUpdateProcessorFactory.java:121) at org.apache.solr.handler.loader.XMLLoader.processDelete(XMLLoader.java:349) at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:278) at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:744) {code} That's simply impossible for all 3 zookeeper instances to get offline simultaneously. I understand that 2nd and 3rd nodes could be overloaded because of Solr, but 1st node runs just a single zookeeper instance and the load average on that node is close to zero. Since there's always at least 1 stable ZK node this seems like a communication/reliability bug in Solr. ZooKeeper related SolrCloud problems
[jira] [Comment Edited] (SOLR-3274) ZooKeeper related SolrCloud problems
[ https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14090519#comment-14090519 ] Alexander S. edited comment on SOLR-3274 at 8/8/14 9:28 AM: Suffering from the same problem, happens during high load on the nodes. Our setup is pretty simple, 4 solr instances: 2 shards, 2 replicas and 3 zookeeper instances. Everything is running on 3 physical nodes: * 1st node — 1 zookeeper instance * 2nd node — 2 solr shards and 1 zookeeper * 3rd node — 2 solr replicas and 1 zookeeper We're running solr instances this way: java -Xms2G -Xmx16G -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=80 -DzkHost=zoo1.devops:2181,zoo2.devops:2181,zoo3.devops:2181 -Dcollection.configName=Carmen -Dbootstrap_confdir=./solr/conf -Dbootstrap_conf=true -DnumShards=2 -jar start.jar etc/jetty.xml And once loading increases we get: {code} org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are disabled. at org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:1306) at org.apache.solr.update.processor.DistributedUpdateProcessor.processDelete(DistributedUpdateProcessor.java:981) at org.apache.solr.update.processor.LogUpdateProcessor.processDelete(LogUpdateProcessorFactory.java:121) at org.apache.solr.handler.loader.XMLLoader.processDelete(XMLLoader.java:349) at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:278) at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:744) {code} That's simply impossible for all 3 zookeeper instances to get offline simultaneously. I understand that 2nd and 3rd nodes could be overloaded because of Solr, but 1st node runs just a single zookeeper instance and the load average on that node is close to zero. Since there's always at least 1 stable ZK node this seems like a
[jira] [Commented] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14077928#comment-14077928 ] Alexander S. commented on SOLR-4787: It seems join doesn't work as expected, please have a look: http://lucene.472066.n3.nabble.com/Search-results-inconsistency-when-using-joins-td4149810.html Join Contrib Key: SOLR-4787 URL: https://issues.apache.org/jira/browse/SOLR-4787 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.2.1 Reporter: Joel Bernstein Priority: Minor Fix For: 4.9, 5.0 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787-pjoin-long-keys.patch, SOLR-4787-with-testcase-fix.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, SOLR-4797-hjoin-multivaluekeys-trunk.patch This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 3 join implementations. The initial patch was generated from the Solr 4.3 tag. Because of changes in the FieldCache API this patch will only build with Solr 4.2 or above. *HashSetJoinQParserPlugin aka hjoin* The hjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the hjoin is designed to work with int and long join keys only. So, in order to use hjoin, int or long join keys must be included in both the to and from core. The second difference is that the hjoin builds memory structures that are used to quickly connect the join keys. So, the hjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the hjoin is that it can scale to join millions of keys between cores and provide sub-second response time. The hjoin should work well with up to two million results from the fromIndex and tens of millions of results from the main query. The hjoin supports the following features: 1) Both lucene query and PostFilter implementations. A *cost* 99 will turn on the PostFilter. The PostFilter will typically outperform the Lucene query when the main query results have been narrowed down. 2) With the lucene query implementation there is an option to build the filter with threads. This can greatly improve the performance of the query if the main query index is very large. The threads parameter turns on threading. For example *threads=6* will use 6 threads to build the filter. This will setup a fixed threadpool with six threads to handle all hjoin requests. Once the threadpool is created the hjoin will always use it to build the filter. Threading does not come into play with the PostFilter. 3) The *size* local parameter can be used to set the initial size of the hashset used to perform the join. If this is set above the number of results from the fromIndex then the you can avoid hashset resizing which improves performance. 4) Nested filter queries. The local parameter fq can be used to nest a filter query within the join. The nested fq will filter the results of the join query. This can point to another join to support nested joins. 5) Full caching support for the lucene query implementation. The filterCache and queryResultCache should work properly even with deep nesting of joins. Only the queryResultCache comes into play with the PostFilter implementation because PostFilters are not cacheable in the filterCache. The syntax of the hjoin is similar to the JoinQParserPlugin except that the plugin is referenced by the string hjoin rather then join. fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 fq=$qq\}user:customer1qq=group:5 The example filter query above will search the fromIndex (collection2) for user:customer1 applying the local fq parameter to filter the results. The lucene filter query will be built using 6 threads. This query will generate a list of values from the from field that will be used to filter the main query. Only records from the main query, where the to field is present in the from list will be included in the results. The solrconfig.xml in the main query core must contain the reference to the hjoin. queryParser name=hjoin class=org.apache.solr.joins.HashSetJoinQParserPlugin/ And the join contrib lib jars must be registed in the solrconfig.xml. lib dir=../../../contrib/joins/lib regex=.*\.jar / After issuing the ant dist command
[jira] [Commented] (SOLR-5463) Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging)
[ https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012449#comment-14012449 ] Alexander S. commented on SOLR-5463: I have another idea about cursors implementation. That's just an idea, I am not sure if that's possible to do. Is it possible to use cursors together with start and rows parameters? That would allow to use pagination and draw links for prev, next, 1, 2, 3, n+1 pages, as we can do now. So that instead of using cursorMark we'll use cursorName, which could be a static. So the request start:0, rows:10, cursorName:* will return first page of results and a static cursor name, which could then be used for all other pages (i.e. start:10, rows:10, cursorName:#{received_cursor_name}). Does that make sense? Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging) -- Key: SOLR-5463 URL: https://issues.apache.org/jira/browse/SOLR-5463 Project: Solr Issue Type: New Feature Reporter: Hoss Man Assignee: Hoss Man Fix For: 4.7, 5.0 Attachments: SOLR-5463-randomized-faceting-test.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man__MissingStringLastComparatorSource.patch I'd like to revist a solution to the problem of deep paging in Solr, leveraging an HTTP based API similar to how IndexSearcher.searchAfter works at the lucene level: require the clients to provide back a token indicating the sort values of the last document seen on the previous page. This is similar to the cursor model I've seen in several other REST APIs that support pagnation over a large sets of results (notable the twitter API and it's since_id param) except that we'll want something that works with arbitrary multi-level sort critera that can be either ascending or descending. SOLR-1726 laid some initial ground work here and was commited quite a while ago, but the key bit of argument parsing to leverage it was commented out due to some problems (see comments in that issue). It's also somewhat out of date at this point: at the time it was commited, IndexSearcher only supported searchAfter for simple scores, not arbitrary field sorts; and the params added in SOLR-1726 suffer from this limitation as well. --- I think it would make sense to start fresh with a new issue with a focus on ensuring that we have deep paging which: * supports arbitrary field sorts in addition to sorting by score * works in distributed mode {panel:title=Basic Usage} * send a request with {{sort=Xstart=0rows=NcursorMark=*}} ** sort can be anything, but must include the uniqueKey field (as a tie breaker) ** N can be any number you want per page ** start must be 0 ** \* denotes you want to use a cursor starting at the beginning mark * parse the response body and extract the (String) {{nextCursorMark}} value * Replace the \* value in your initial request params with the {{nextCursorMark}} value from the response in the subsequent request * repeat until the {{nextCursorMark}} value stops changing, or you have collected as many docs as you need {panel} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5463) Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging)
[ https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010881#comment-14010881 ] Alexander S. commented on SOLR-5463: Inability to use this without sorting by an unique key (e.g. id) makes this feature useless. Same could be achieved previously with sorting by id and searching for docs where id is / than the last received. See how cursors do work in MongoDB, that's the right direction. Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging) -- Key: SOLR-5463 URL: https://issues.apache.org/jira/browse/SOLR-5463 Project: Solr Issue Type: New Feature Reporter: Hoss Man Assignee: Hoss Man Fix For: 4.7, 5.0 Attachments: SOLR-5463-randomized-faceting-test.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man__MissingStringLastComparatorSource.patch I'd like to revist a solution to the problem of deep paging in Solr, leveraging an HTTP based API similar to how IndexSearcher.searchAfter works at the lucene level: require the clients to provide back a token indicating the sort values of the last document seen on the previous page. This is similar to the cursor model I've seen in several other REST APIs that support pagnation over a large sets of results (notable the twitter API and it's since_id param) except that we'll want something that works with arbitrary multi-level sort critera that can be either ascending or descending. SOLR-1726 laid some initial ground work here and was commited quite a while ago, but the key bit of argument parsing to leverage it was commented out due to some problems (see comments in that issue). It's also somewhat out of date at this point: at the time it was commited, IndexSearcher only supported searchAfter for simple scores, not arbitrary field sorts; and the params added in SOLR-1726 suffer from this limitation as well. --- I think it would make sense to start fresh with a new issue with a focus on ensuring that we have deep paging which: * supports arbitrary field sorts in addition to sorting by score * works in distributed mode {panel:title=Basic Usage} * send a request with {{sort=Xstart=0rows=NcursorMark=*}} ** sort can be anything, but must include the uniqueKey field (as a tie breaker) ** N can be any number you want per page ** start must be 0 ** \* denotes you want to use a cursor starting at the beginning mark * parse the response body and extract the (String) {{nextCursorMark}} value * Replace the \* value in your initial request params with the {{nextCursorMark}} value from the response in the subsequent request * repeat until the {{nextCursorMark}} value stops changing, or you have collected as many docs as you need {panel} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5463) Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging)
[ https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010883#comment-14010883 ] Alexander S. commented on SOLR-5463: http://docs.mongodb.org/manual/core/cursors/ Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging) -- Key: SOLR-5463 URL: https://issues.apache.org/jira/browse/SOLR-5463 Project: Solr Issue Type: New Feature Reporter: Hoss Man Assignee: Hoss Man Fix For: 4.7, 5.0 Attachments: SOLR-5463-randomized-faceting-test.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man__MissingStringLastComparatorSource.patch I'd like to revist a solution to the problem of deep paging in Solr, leveraging an HTTP based API similar to how IndexSearcher.searchAfter works at the lucene level: require the clients to provide back a token indicating the sort values of the last document seen on the previous page. This is similar to the cursor model I've seen in several other REST APIs that support pagnation over a large sets of results (notable the twitter API and it's since_id param) except that we'll want something that works with arbitrary multi-level sort critera that can be either ascending or descending. SOLR-1726 laid some initial ground work here and was commited quite a while ago, but the key bit of argument parsing to leverage it was commented out due to some problems (see comments in that issue). It's also somewhat out of date at this point: at the time it was commited, IndexSearcher only supported searchAfter for simple scores, not arbitrary field sorts; and the params added in SOLR-1726 suffer from this limitation as well. --- I think it would make sense to start fresh with a new issue with a focus on ensuring that we have deep paging which: * supports arbitrary field sorts in addition to sorting by score * works in distributed mode {panel:title=Basic Usage} * send a request with {{sort=Xstart=0rows=NcursorMark=*}} ** sort can be anything, but must include the uniqueKey field (as a tie breaker) ** N can be any number you want per page ** start must be 0 ** \* denotes you want to use a cursor starting at the beginning mark * parse the response body and extract the (String) {{nextCursorMark}} value * Replace the \* value in your initial request params with the {{nextCursorMark}} value from the response in the subsequent request * repeat until the {{nextCursorMark}} value stops changing, or you have collected as many docs as you need {panel} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5463) Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging)
[ https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010888#comment-14010888 ] Alexander S. commented on SOLR-5463: Sorry for spamming, but can't edit my previous message. I just found that in mongo they also aren't isolated and could return duplicates, I was thinking they are. But sorting docs by id is not acceptable in 99% of use cases, especially in Solr, where it is more expected to get results sorted by relevance. Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging) -- Key: SOLR-5463 URL: https://issues.apache.org/jira/browse/SOLR-5463 Project: Solr Issue Type: New Feature Reporter: Hoss Man Assignee: Hoss Man Fix For: 4.7, 5.0 Attachments: SOLR-5463-randomized-faceting-test.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man__MissingStringLastComparatorSource.patch I'd like to revist a solution to the problem of deep paging in Solr, leveraging an HTTP based API similar to how IndexSearcher.searchAfter works at the lucene level: require the clients to provide back a token indicating the sort values of the last document seen on the previous page. This is similar to the cursor model I've seen in several other REST APIs that support pagnation over a large sets of results (notable the twitter API and it's since_id param) except that we'll want something that works with arbitrary multi-level sort critera that can be either ascending or descending. SOLR-1726 laid some initial ground work here and was commited quite a while ago, but the key bit of argument parsing to leverage it was commented out due to some problems (see comments in that issue). It's also somewhat out of date at this point: at the time it was commited, IndexSearcher only supported searchAfter for simple scores, not arbitrary field sorts; and the params added in SOLR-1726 suffer from this limitation as well. --- I think it would make sense to start fresh with a new issue with a focus on ensuring that we have deep paging which: * supports arbitrary field sorts in addition to sorting by score * works in distributed mode {panel:title=Basic Usage} * send a request with {{sort=Xstart=0rows=NcursorMark=*}} ** sort can be anything, but must include the uniqueKey field (as a tie breaker) ** N can be any number you want per page ** start must be 0 ** \* denotes you want to use a cursor starting at the beginning mark * parse the response body and extract the (String) {{nextCursorMark}} value * Replace the \* value in your initial request params with the {{nextCursorMark}} value from the response in the subsequent request * repeat until the {{nextCursorMark}} value stops changing, or you have collected as many docs as you need {panel} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5463) Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging)
[ https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011226#comment-14011226 ] Alexander S. commented on SOLR-5463: Oh, that's awesome, thanks for the tip. Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging) -- Key: SOLR-5463 URL: https://issues.apache.org/jira/browse/SOLR-5463 Project: Solr Issue Type: New Feature Reporter: Hoss Man Assignee: Hoss Man Fix For: 4.7, 5.0 Attachments: SOLR-5463-randomized-faceting-test.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man__MissingStringLastComparatorSource.patch I'd like to revist a solution to the problem of deep paging in Solr, leveraging an HTTP based API similar to how IndexSearcher.searchAfter works at the lucene level: require the clients to provide back a token indicating the sort values of the last document seen on the previous page. This is similar to the cursor model I've seen in several other REST APIs that support pagnation over a large sets of results (notable the twitter API and it's since_id param) except that we'll want something that works with arbitrary multi-level sort critera that can be either ascending or descending. SOLR-1726 laid some initial ground work here and was commited quite a while ago, but the key bit of argument parsing to leverage it was commented out due to some problems (see comments in that issue). It's also somewhat out of date at this point: at the time it was commited, IndexSearcher only supported searchAfter for simple scores, not arbitrary field sorts; and the params added in SOLR-1726 suffer from this limitation as well. --- I think it would make sense to start fresh with a new issue with a focus on ensuring that we have deep paging which: * supports arbitrary field sorts in addition to sorting by score * works in distributed mode {panel:title=Basic Usage} * send a request with {{sort=Xstart=0rows=NcursorMark=*}} ** sort can be anything, but must include the uniqueKey field (as a tie breaker) ** N can be any number you want per page ** start must be 0 ** \* denotes you want to use a cursor starting at the beginning mark * parse the response body and extract the (String) {{nextCursorMark}} value * Replace the \* value in your initial request params with the {{nextCursorMark}} value from the response in the subsequent request * repeat until the {{nextCursorMark}} value stops changing, or you have collected as many docs as you need {panel} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5463) Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging)
[ https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012084#comment-14012084 ] Alexander S. commented on SOLR-5463: If, as David mentioned, Solr will add it only if it is not there, this should keep the ability for users to manually specify another key and order when that is required (a rare case it seems). Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging) -- Key: SOLR-5463 URL: https://issues.apache.org/jira/browse/SOLR-5463 Project: Solr Issue Type: New Feature Reporter: Hoss Man Assignee: Hoss Man Fix For: 4.7, 5.0 Attachments: SOLR-5463-randomized-faceting-test.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, SOLR-5463__straw_man__MissingStringLastComparatorSource.patch I'd like to revist a solution to the problem of deep paging in Solr, leveraging an HTTP based API similar to how IndexSearcher.searchAfter works at the lucene level: require the clients to provide back a token indicating the sort values of the last document seen on the previous page. This is similar to the cursor model I've seen in several other REST APIs that support pagnation over a large sets of results (notable the twitter API and it's since_id param) except that we'll want something that works with arbitrary multi-level sort critera that can be either ascending or descending. SOLR-1726 laid some initial ground work here and was commited quite a while ago, but the key bit of argument parsing to leverage it was commented out due to some problems (see comments in that issue). It's also somewhat out of date at this point: at the time it was commited, IndexSearcher only supported searchAfter for simple scores, not arbitrary field sorts; and the params added in SOLR-1726 suffer from this limitation as well. --- I think it would make sense to start fresh with a new issue with a focus on ensuring that we have deep paging which: * supports arbitrary field sorts in addition to sorting by score * works in distributed mode {panel:title=Basic Usage} * send a request with {{sort=Xstart=0rows=NcursorMark=*}} ** sort can be anything, but must include the uniqueKey field (as a tie breaker) ** N can be any number you want per page ** start must be 0 ** \* denotes you want to use a cursor starting at the beginning mark * parse the response body and extract the (String) {{nextCursorMark}} value * Replace the \* value in your initial request params with the {{nextCursorMark}} value from the response in the subsequent request * repeat until the {{nextCursorMark}} value stops changing, or you have collected as many docs as you need {panel} -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5871) Ability to see the list of fields that matched the query with scores
[ https://issues.apache.org/jira/browse/SOLR-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13967927#comment-13967927 ] Alexander S. commented on SOLR-5871: I already asked at solr-u...@lucene.apache.org but seems only one way currently is to read the debug explanation. Unfortunately I am not a java developer thus unable to create a patch, but Solr jira has a wish type so I posted my wish here. Ability to see the list of fields that matched the query with scores Key: SOLR-5871 URL: https://issues.apache.org/jira/browse/SOLR-5871 Project: Solr Issue Type: Wish Reporter: Alexander S. Assignee: Erick Erickson Hello, I need the ability to tell users what content matched their query, this way: | Name | Twitter Profile | Topics | Site Title | Site Description | Site content | | John Doe | Yes| No | Yes | No | Yes | | Jane Doe | No | Yes | No | No | Yes | All these columns are indexed text fields and I need to know what content matched the query and would be also cool to be able to show the score per field. As far as I know right now there's no way to return this information when running a query request. Debug outputs is suitable for visual review but has lots of nesting levels and is hard for understanding. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5871) Ability to see the list of fields that matched the query with scores
[ https://issues.apache.org/jira/browse/SOLR-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13961730#comment-13961730 ] Alexander S. commented on SOLR-5871: Any luck this could be reviewed by someone? Ability to see the list of fields that matched the query with scores Key: SOLR-5871 URL: https://issues.apache.org/jira/browse/SOLR-5871 Project: Solr Issue Type: Wish Reporter: Alexander S. Hello, I need the ability to tell users what content matched their query, this way: | Name | Twitter Profile | Topics | Site Title | Site Description | Site content | | John Doe | Yes| No | Yes | No | Yes | | Jane Doe | No | Yes | No | No | Yes | All these columns are indexed text fields and I need to know what content matched the query and would be also cool to be able to show the score per field. As far as I know right now there's no way to return this information when running a query request. Debug outputs is suitable for visual review but has lots of nesting levels and is hard for understanding. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13961733#comment-13961733 ] Alexander S. edited comment on SOLR-4787 at 4/7/14 9:10 AM: @Kranti Parisa, hi, any luck with this? was (Author: aheaven): @Kranti Parisa, hi, any lick with this? Join Contrib Key: SOLR-4787 URL: https://issues.apache.org/jira/browse/SOLR-4787 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.2.1 Reporter: Joel Bernstein Priority: Minor Fix For: 4.8 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, SOLR-4797-hjoin-multivaluekeys-trunk.patch This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 3 join implementations. The initial patch was generated from the Solr 4.3 tag. Because of changes in the FieldCache API this patch will only build with Solr 4.2 or above. *HashSetJoinQParserPlugin aka hjoin* The hjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the hjoin is designed to work with int and long join keys only. So, in order to use hjoin, int or long join keys must be included in both the to and from core. The second difference is that the hjoin builds memory structures that are used to quickly connect the join keys. So, the hjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the hjoin is that it can scale to join millions of keys between cores and provide sub-second response time. The hjoin should work well with up to two million results from the fromIndex and tens of millions of results from the main query. The hjoin supports the following features: 1) Both lucene query and PostFilter implementations. A *cost* 99 will turn on the PostFilter. The PostFilter will typically outperform the Lucene query when the main query results have been narrowed down. 2) With the lucene query implementation there is an option to build the filter with threads. This can greatly improve the performance of the query if the main query index is very large. The threads parameter turns on threading. For example *threads=6* will use 6 threads to build the filter. This will setup a fixed threadpool with six threads to handle all hjoin requests. Once the threadpool is created the hjoin will always use it to build the filter. Threading does not come into play with the PostFilter. 3) The *size* local parameter can be used to set the initial size of the hashset used to perform the join. If this is set above the number of results from the fromIndex then the you can avoid hashset resizing which improves performance. 4) Nested filter queries. The local parameter fq can be used to nest a filter query within the join. The nested fq will filter the results of the join query. This can point to another join to support nested joins. 5) Full caching support for the lucene query implementation. The filterCache and queryResultCache should work properly even with deep nesting of joins. Only the queryResultCache comes into play with the PostFilter implementation because PostFilters are not cacheable in the filterCache. The syntax of the hjoin is similar to the JoinQParserPlugin except that the plugin is referenced by the string hjoin rather then join. fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 fq=$qq\}user:customer1qq=group:5 The example filter query above will search the fromIndex (collection2) for user:customer1 applying the local fq parameter to filter the results. The lucene filter query will be built using 6 threads. This query will generate a list of values from the from field that will be used to filter the main query. Only records from the main query, where the to field is present in the from list will be included in the results. The solrconfig.xml in the main query core must contain the reference to the hjoin. queryParser name=hjoin class=org.apache.solr.joins.HashSetJoinQParserPlugin/ And the join contrib lib jars must be registed in the solrconfig.xml. lib dir=../../../contrib/joins/lib regex=.*\.jar / After issuing the ant dist command from inside the solr directory the joins
[jira] [Commented] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13961733#comment-13961733 ] Alexander S. commented on SOLR-4787: @Kranti Parisa, hi, any lick with this? Join Contrib Key: SOLR-4787 URL: https://issues.apache.org/jira/browse/SOLR-4787 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.2.1 Reporter: Joel Bernstein Priority: Minor Fix For: 4.8 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, SOLR-4797-hjoin-multivaluekeys-trunk.patch This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 3 join implementations. The initial patch was generated from the Solr 4.3 tag. Because of changes in the FieldCache API this patch will only build with Solr 4.2 or above. *HashSetJoinQParserPlugin aka hjoin* The hjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the hjoin is designed to work with int and long join keys only. So, in order to use hjoin, int or long join keys must be included in both the to and from core. The second difference is that the hjoin builds memory structures that are used to quickly connect the join keys. So, the hjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the hjoin is that it can scale to join millions of keys between cores and provide sub-second response time. The hjoin should work well with up to two million results from the fromIndex and tens of millions of results from the main query. The hjoin supports the following features: 1) Both lucene query and PostFilter implementations. A *cost* 99 will turn on the PostFilter. The PostFilter will typically outperform the Lucene query when the main query results have been narrowed down. 2) With the lucene query implementation there is an option to build the filter with threads. This can greatly improve the performance of the query if the main query index is very large. The threads parameter turns on threading. For example *threads=6* will use 6 threads to build the filter. This will setup a fixed threadpool with six threads to handle all hjoin requests. Once the threadpool is created the hjoin will always use it to build the filter. Threading does not come into play with the PostFilter. 3) The *size* local parameter can be used to set the initial size of the hashset used to perform the join. If this is set above the number of results from the fromIndex then the you can avoid hashset resizing which improves performance. 4) Nested filter queries. The local parameter fq can be used to nest a filter query within the join. The nested fq will filter the results of the join query. This can point to another join to support nested joins. 5) Full caching support for the lucene query implementation. The filterCache and queryResultCache should work properly even with deep nesting of joins. Only the queryResultCache comes into play with the PostFilter implementation because PostFilters are not cacheable in the filterCache. The syntax of the hjoin is similar to the JoinQParserPlugin except that the plugin is referenced by the string hjoin rather then join. fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 fq=$qq\}user:customer1qq=group:5 The example filter query above will search the fromIndex (collection2) for user:customer1 applying the local fq parameter to filter the results. The lucene filter query will be built using 6 threads. This query will generate a list of values from the from field that will be used to filter the main query. Only records from the main query, where the to field is present in the from list will be included in the results. The solrconfig.xml in the main query core must contain the reference to the hjoin. queryParser name=hjoin class=org.apache.solr.joins.HashSetJoinQParserPlugin/ And the join contrib lib jars must be registed in the solrconfig.xml. lib dir=../../../contrib/joins/lib regex=.*\.jar / After issuing the ant dist command from inside the solr directory the joins contrib jar will appear in the solr/dist directory. Place the the solr-joins-4.*-.jar in the WEB-INF/lib directory
[jira] [Commented] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13943153#comment-13943153 ] Alexander S. commented on SOLR-4787: Kranti Parisa Did you try to apply this patch to 4.7.0? I was trying to download it here: http://www.apache.org/dyn/closer.cgi/lucene/solr/4.7.0 and then did the next steps: * ant compile * ant ivy-bootstrap * ant dist And then created a package for my Linux distributive, but no luck, Solr fails to initialize with queryParser name=hjoin class=org.apache.solr.search.joins.HashSetJoinQParserPlugin/ Join Contrib Key: SOLR-4787 URL: https://issues.apache.org/jira/browse/SOLR-4787 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.2.1 Reporter: Joel Bernstein Priority: Minor Fix For: 4.8 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, SOLR-4797-hjoin-multivaluekeys-trunk.patch This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 3 join implementations. The initial patch was generated from the Solr 4.3 tag. Because of changes in the FieldCache API this patch will only build with Solr 4.2 or above. *HashSetJoinQParserPlugin aka hjoin* The hjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the hjoin is designed to work with int and long join keys only. So, in order to use hjoin, int or long join keys must be included in both the to and from core. The second difference is that the hjoin builds memory structures that are used to quickly connect the join keys. So, the hjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the hjoin is that it can scale to join millions of keys between cores and provide sub-second response time. The hjoin should work well with up to two million results from the fromIndex and tens of millions of results from the main query. The hjoin supports the following features: 1) Both lucene query and PostFilter implementations. A *cost* 99 will turn on the PostFilter. The PostFilter will typically outperform the Lucene query when the main query results have been narrowed down. 2) With the lucene query implementation there is an option to build the filter with threads. This can greatly improve the performance of the query if the main query index is very large. The threads parameter turns on threading. For example *threads=6* will use 6 threads to build the filter. This will setup a fixed threadpool with six threads to handle all hjoin requests. Once the threadpool is created the hjoin will always use it to build the filter. Threading does not come into play with the PostFilter. 3) The *size* local parameter can be used to set the initial size of the hashset used to perform the join. If this is set above the number of results from the fromIndex then the you can avoid hashset resizing which improves performance. 4) Nested filter queries. The local parameter fq can be used to nest a filter query within the join. The nested fq will filter the results of the join query. This can point to another join to support nested joins. 5) Full caching support for the lucene query implementation. The filterCache and queryResultCache should work properly even with deep nesting of joins. Only the queryResultCache comes into play with the PostFilter implementation because PostFilters are not cacheable in the filterCache. The syntax of the hjoin is similar to the JoinQParserPlugin except that the plugin is referenced by the string hjoin rather then join. fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 fq=$qq\}user:customer1qq=group:5 The example filter query above will search the fromIndex (collection2) for user:customer1 applying the local fq parameter to filter the results. The lucene filter query will be built using 6 threads. This query will generate a list of values from the from field that will be used to filter the main query. Only records from the main query, where the to field is present in the from list will be included in the results. The solrconfig.xml in the main query core must contain the reference to the hjoin. queryParser name=hjoin
[jira] [Commented] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13940740#comment-13940740 ] Alexander S. commented on SOLR-4787: Any query fails, seems I am doing something wrong (perhaps the patch was applied incorrectly). I see this error: {quote} SolrCore Initialization Failures crm-dev: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.search.joins.HashSetJoinQParserPlugin' {quote} when trying to access the web interface. Join Contrib Key: SOLR-4787 URL: https://issues.apache.org/jira/browse/SOLR-4787 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.2.1 Reporter: Joel Bernstein Priority: Minor Fix For: 4.8 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, SOLR-4797-hjoin-multivaluekeys-trunk.patch This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 3 join implementations. The initial patch was generated from the Solr 4.3 tag. Because of changes in the FieldCache API this patch will only build with Solr 4.2 or above. *HashSetJoinQParserPlugin aka hjoin* The hjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the hjoin is designed to work with int and long join keys only. So, in order to use hjoin, int or long join keys must be included in both the to and from core. The second difference is that the hjoin builds memory structures that are used to quickly connect the join keys. So, the hjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the hjoin is that it can scale to join millions of keys between cores and provide sub-second response time. The hjoin should work well with up to two million results from the fromIndex and tens of millions of results from the main query. The hjoin supports the following features: 1) Both lucene query and PostFilter implementations. A *cost* 99 will turn on the PostFilter. The PostFilter will typically outperform the Lucene query when the main query results have been narrowed down. 2) With the lucene query implementation there is an option to build the filter with threads. This can greatly improve the performance of the query if the main query index is very large. The threads parameter turns on threading. For example *threads=6* will use 6 threads to build the filter. This will setup a fixed threadpool with six threads to handle all hjoin requests. Once the threadpool is created the hjoin will always use it to build the filter. Threading does not come into play with the PostFilter. 3) The *size* local parameter can be used to set the initial size of the hashset used to perform the join. If this is set above the number of results from the fromIndex then the you can avoid hashset resizing which improves performance. 4) Nested filter queries. The local parameter fq can be used to nest a filter query within the join. The nested fq will filter the results of the join query. This can point to another join to support nested joins. 5) Full caching support for the lucene query implementation. The filterCache and queryResultCache should work properly even with deep nesting of joins. Only the queryResultCache comes into play with the PostFilter implementation because PostFilters are not cacheable in the filterCache. The syntax of the hjoin is similar to the JoinQParserPlugin except that the plugin is referenced by the string hjoin rather then join. fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 fq=$qq\}user:customer1qq=group:5 The example filter query above will search the fromIndex (collection2) for user:customer1 applying the local fq parameter to filter the results. The lucene filter query will be built using 6 threads. This query will generate a list of values from the from field that will be used to filter the main query. Only records from the main query, where the to field is present in the from list will be included in the results. The solrconfig.xml in the main query core must contain the reference to the hjoin. queryParser name=hjoin
[jira] [Created] (SOLR-5871) Ability to see the list of fields that matched the query with scores
Alexander S. created SOLR-5871: -- Summary: Ability to see the list of fields that matched the query with scores Key: SOLR-5871 URL: https://issues.apache.org/jira/browse/SOLR-5871 Project: Solr Issue Type: Wish Reporter: Alexander S. Hello, I need the ability to show users what content matched their query, this way: | Name | Twitter Profile | Topics | Site Title | Site Description | Site content | | John Doe | Yes| No | Yes | No | Yes | | Jane Doe | No | Yes | No | No | Yes | All these columns are indexed text fields and I need to know what content matched the query and would be also cool to be able to show the score per field. As far as I know right now there's no way to return this information when running a query request. Debug outputs is suitable for visual review but has lots of nesting levels and is hard for understanding. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5871) Ability to see the list of fields that matched the query with scores
[ https://issues.apache.org/jira/browse/SOLR-5871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander S. updated SOLR-5871: --- Description: Hello, I need the ability to tell users what content matched their query, this way: | Name | Twitter Profile | Topics | Site Title | Site Description | Site content | | John Doe | Yes| No | Yes | No | Yes | | Jane Doe | No | Yes | No | No | Yes | All these columns are indexed text fields and I need to know what content matched the query and would be also cool to be able to show the score per field. As far as I know right now there's no way to return this information when running a query request. Debug outputs is suitable for visual review but has lots of nesting levels and is hard for understanding. was: Hello, I need the ability to show users what content matched their query, this way: | Name | Twitter Profile | Topics | Site Title | Site Description | Site content | | John Doe | Yes| No | Yes | No | Yes | | Jane Doe | No | Yes | No | No | Yes | All these columns are indexed text fields and I need to know what content matched the query and would be also cool to be able to show the score per field. As far as I know right now there's no way to return this information when running a query request. Debug outputs is suitable for visual review but has lots of nesting levels and is hard for understanding. Ability to see the list of fields that matched the query with scores Key: SOLR-5871 URL: https://issues.apache.org/jira/browse/SOLR-5871 Project: Solr Issue Type: Wish Reporter: Alexander S. Hello, I need the ability to tell users what content matched their query, this way: | Name | Twitter Profile | Topics | Site Title | Site Description | Site content | | John Doe | Yes| No | Yes | No | Yes | | Jane Doe | No | Yes | No | No | Yes | All these columns are indexed text fields and I need to know what content matched the query and would be also cool to be able to show the score per field. As far as I know right now there's no way to return this information when running a query request. Debug outputs is suitable for visual review but has lots of nesting levels and is hard for understanding. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937732#comment-13937732 ] Alexander S. commented on SOLR-4787: Thank you, Kranti Parisa, I am far from java development, how can I apply this patch and build solr for linux? I tried to patch, it creates a new folder joins in solr/contrib, installed ivy and launched ant compile but got this error: {quote} common.compile-core: [mkdir] Created dir: /home/heaven/Desktop/solr-4.7.0/solr/build/contrib/solr-joins/classes/java [javac] Compiling 3 source files to /home/heaven/Desktop/solr-4.7.0/solr/build/contrib/solr-joins/classes/java [javac] warning: [options] bootstrap class path not set in conjunction with -source 1.6 [javac] /home/heaven/Desktop/solr-4.7.0/solr/contrib/joins/src/java/org/apache/solr/joins/HashSetJoinQParserPlugin.java:883: error: reached end of file while parsing [javac] return this.delegate.acceptsDocsOutOfOrder(); [javac]^ [javac] /home/heaven/Desktop/solr-4.7.0/solr/contrib/joins/src/java/org/apache/solr/joins/HashSetJoinQParserPlugin.java:884: error: reached end of file while parsing [javac] 2 errors [javac] 1 warning BUILD FAILED /home/heaven/Desktop/solr-4.7.0/build.xml:106: The following error occurred while executing this line: /home/heaven/Desktop/solr-4.7.0/solr/common-build.xml:458: The following error occurred while executing this line: /home/heaven/Desktop/solr-4.7.0/solr/common-build.xml:449: The following error occurred while executing this line: /home/heaven/Desktop/solr-4.7.0/lucene/common-build.xml:471: The following error occurred while executing this line: /home/heaven/Desktop/solr-4.7.0/lucene/common-build.xml:1736: Compile failed; see the compiler error output for details. Total time: 8 minutes 55 seconds {quote} Join Contrib Key: SOLR-4787 URL: https://issues.apache.org/jira/browse/SOLR-4787 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.2.1 Reporter: Joel Bernstein Priority: Minor Fix For: 4.8 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, SOLR-4797-hjoin-multivaluekeys-trunk.patch This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 3 join implementations. The initial patch was generated from the Solr 4.3 tag. Because of changes in the FieldCache API this patch will only build with Solr 4.2 or above. *HashSetJoinQParserPlugin aka hjoin* The hjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the hjoin is designed to work with int and long join keys only. So, in order to use hjoin, int or long join keys must be included in both the to and from core. The second difference is that the hjoin builds memory structures that are used to quickly connect the join keys. So, the hjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the hjoin is that it can scale to join millions of keys between cores and provide sub-second response time. The hjoin should work well with up to two million results from the fromIndex and tens of millions of results from the main query. The hjoin supports the following features: 1) Both lucene query and PostFilter implementations. A *cost* 99 will turn on the PostFilter. The PostFilter will typically outperform the Lucene query when the main query results have been narrowed down. 2) With the lucene query implementation there is an option to build the filter with threads. This can greatly improve the performance of the query if the main query index is very large. The threads parameter turns on threading. For example *threads=6* will use 6 threads to build the filter. This will setup a fixed threadpool with six threads to handle all hjoin requests. Once the threadpool is created the hjoin will always use it to build the filter. Threading does not come into play with the PostFilter. 3) The *size* local parameter can be used to set the initial size of the hashset used to perform the join. If this is set above the number of results from the fromIndex then the you can avoid hashset resizing which
[jira] [Commented] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937747#comment-13937747 ] Alexander S. commented on SOLR-4787: Nvm, there were 3 missing } at the end of HashSetJoinQParserPlugin.java, the build was successful, testing now. Join Contrib Key: SOLR-4787 URL: https://issues.apache.org/jira/browse/SOLR-4787 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.2.1 Reporter: Joel Bernstein Priority: Minor Fix For: 4.8 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, SOLR-4797-hjoin-multivaluekeys-trunk.patch This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 3 join implementations. The initial patch was generated from the Solr 4.3 tag. Because of changes in the FieldCache API this patch will only build with Solr 4.2 or above. *HashSetJoinQParserPlugin aka hjoin* The hjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the hjoin is designed to work with int and long join keys only. So, in order to use hjoin, int or long join keys must be included in both the to and from core. The second difference is that the hjoin builds memory structures that are used to quickly connect the join keys. So, the hjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the hjoin is that it can scale to join millions of keys between cores and provide sub-second response time. The hjoin should work well with up to two million results from the fromIndex and tens of millions of results from the main query. The hjoin supports the following features: 1) Both lucene query and PostFilter implementations. A *cost* 99 will turn on the PostFilter. The PostFilter will typically outperform the Lucene query when the main query results have been narrowed down. 2) With the lucene query implementation there is an option to build the filter with threads. This can greatly improve the performance of the query if the main query index is very large. The threads parameter turns on threading. For example *threads=6* will use 6 threads to build the filter. This will setup a fixed threadpool with six threads to handle all hjoin requests. Once the threadpool is created the hjoin will always use it to build the filter. Threading does not come into play with the PostFilter. 3) The *size* local parameter can be used to set the initial size of the hashset used to perform the join. If this is set above the number of results from the fromIndex then the you can avoid hashset resizing which improves performance. 4) Nested filter queries. The local parameter fq can be used to nest a filter query within the join. The nested fq will filter the results of the join query. This can point to another join to support nested joins. 5) Full caching support for the lucene query implementation. The filterCache and queryResultCache should work properly even with deep nesting of joins. Only the queryResultCache comes into play with the PostFilter implementation because PostFilters are not cacheable in the filterCache. The syntax of the hjoin is similar to the JoinQParserPlugin except that the plugin is referenced by the string hjoin rather then join. fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 fq=$qq\}user:customer1qq=group:5 The example filter query above will search the fromIndex (collection2) for user:customer1 applying the local fq parameter to filter the results. The lucene filter query will be built using 6 threads. This query will generate a list of values from the from field that will be used to filter the main query. Only records from the main query, where the to field is present in the from list will be included in the results. The solrconfig.xml in the main query core must contain the reference to the hjoin. queryParser name=hjoin class=org.apache.solr.joins.HashSetJoinQParserPlugin/ And the join contrib lib jars must be registed in the solrconfig.xml. lib dir=../../../contrib/joins/lib regex=.*\.jar / After issuing the ant dist command from inside the solr directory the joins contrib jar will appear in the solr/dist
[jira] [Commented] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937845#comment-13937845 ] Alexander S. commented on SOLR-4787: Kranti, Do I need to update anything in my solr config/schema? I've just tried the patched version and it still ignores the fq parameter. I was using solr 4.7.0. Thanks, Alex Join Contrib Key: SOLR-4787 URL: https://issues.apache.org/jira/browse/SOLR-4787 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.2.1 Reporter: Joel Bernstein Priority: Minor Fix For: 4.8 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, SOLR-4797-hjoin-multivaluekeys-trunk.patch This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 3 join implementations. The initial patch was generated from the Solr 4.3 tag. Because of changes in the FieldCache API this patch will only build with Solr 4.2 or above. *HashSetJoinQParserPlugin aka hjoin* The hjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the hjoin is designed to work with int and long join keys only. So, in order to use hjoin, int or long join keys must be included in both the to and from core. The second difference is that the hjoin builds memory structures that are used to quickly connect the join keys. So, the hjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the hjoin is that it can scale to join millions of keys between cores and provide sub-second response time. The hjoin should work well with up to two million results from the fromIndex and tens of millions of results from the main query. The hjoin supports the following features: 1) Both lucene query and PostFilter implementations. A *cost* 99 will turn on the PostFilter. The PostFilter will typically outperform the Lucene query when the main query results have been narrowed down. 2) With the lucene query implementation there is an option to build the filter with threads. This can greatly improve the performance of the query if the main query index is very large. The threads parameter turns on threading. For example *threads=6* will use 6 threads to build the filter. This will setup a fixed threadpool with six threads to handle all hjoin requests. Once the threadpool is created the hjoin will always use it to build the filter. Threading does not come into play with the PostFilter. 3) The *size* local parameter can be used to set the initial size of the hashset used to perform the join. If this is set above the number of results from the fromIndex then the you can avoid hashset resizing which improves performance. 4) Nested filter queries. The local parameter fq can be used to nest a filter query within the join. The nested fq will filter the results of the join query. This can point to another join to support nested joins. 5) Full caching support for the lucene query implementation. The filterCache and queryResultCache should work properly even with deep nesting of joins. Only the queryResultCache comes into play with the PostFilter implementation because PostFilters are not cacheable in the filterCache. The syntax of the hjoin is similar to the JoinQParserPlugin except that the plugin is referenced by the string hjoin rather then join. fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 fq=$qq\}user:customer1qq=group:5 The example filter query above will search the fromIndex (collection2) for user:customer1 applying the local fq parameter to filter the results. The lucene filter query will be built using 6 threads. This query will generate a list of values from the from field that will be used to filter the main query. Only records from the main query, where the to field is present in the from list will be included in the results. The solrconfig.xml in the main query core must contain the reference to the hjoin. queryParser name=hjoin class=org.apache.solr.joins.HashSetJoinQParserPlugin/ And the join contrib lib jars must be registed in the solrconfig.xml. lib dir=../../../contrib/joins/lib regex=.*\.jar / After issuing the ant dist command from inside the solr
[jira] [Commented] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937887#comment-13937887 ] Alexander S. commented on SOLR-4787: Hi, I am using simple join, this way: {!join from=profile_ids_im to=id_i fq=$joinFilter1 v=$joinQuery1}. Join Contrib Key: SOLR-4787 URL: https://issues.apache.org/jira/browse/SOLR-4787 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.2.1 Reporter: Joel Bernstein Priority: Minor Fix For: 4.8 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, SOLR-4797-hjoin-multivaluekeys-trunk.patch This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 3 join implementations. The initial patch was generated from the Solr 4.3 tag. Because of changes in the FieldCache API this patch will only build with Solr 4.2 or above. *HashSetJoinQParserPlugin aka hjoin* The hjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the hjoin is designed to work with int and long join keys only. So, in order to use hjoin, int or long join keys must be included in both the to and from core. The second difference is that the hjoin builds memory structures that are used to quickly connect the join keys. So, the hjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the hjoin is that it can scale to join millions of keys between cores and provide sub-second response time. The hjoin should work well with up to two million results from the fromIndex and tens of millions of results from the main query. The hjoin supports the following features: 1) Both lucene query and PostFilter implementations. A *cost* 99 will turn on the PostFilter. The PostFilter will typically outperform the Lucene query when the main query results have been narrowed down. 2) With the lucene query implementation there is an option to build the filter with threads. This can greatly improve the performance of the query if the main query index is very large. The threads parameter turns on threading. For example *threads=6* will use 6 threads to build the filter. This will setup a fixed threadpool with six threads to handle all hjoin requests. Once the threadpool is created the hjoin will always use it to build the filter. Threading does not come into play with the PostFilter. 3) The *size* local parameter can be used to set the initial size of the hashset used to perform the join. If this is set above the number of results from the fromIndex then the you can avoid hashset resizing which improves performance. 4) Nested filter queries. The local parameter fq can be used to nest a filter query within the join. The nested fq will filter the results of the join query. This can point to another join to support nested joins. 5) Full caching support for the lucene query implementation. The filterCache and queryResultCache should work properly even with deep nesting of joins. Only the queryResultCache comes into play with the PostFilter implementation because PostFilters are not cacheable in the filterCache. The syntax of the hjoin is similar to the JoinQParserPlugin except that the plugin is referenced by the string hjoin rather then join. fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 fq=$qq\}user:customer1qq=group:5 The example filter query above will search the fromIndex (collection2) for user:customer1 applying the local fq parameter to filter the results. The lucene filter query will be built using 6 threads. This query will generate a list of values from the from field that will be used to filter the main query. Only records from the main query, where the to field is present in the from list will be included in the results. The solrconfig.xml in the main query core must contain the reference to the hjoin. queryParser name=hjoin class=org.apache.solr.joins.HashSetJoinQParserPlugin/ And the join contrib lib jars must be registed in the solrconfig.xml. lib dir=../../../contrib/joins/lib regex=.*\.jar / After issuing the ant dist command from inside the solr directory the joins contrib jar will appear in the solr/dist directory.
[jira] [Commented] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937921#comment-13937921 ] Alexander S. commented on SOLR-4787: Ok, thx, I'll try with hjoin. And yes, I am trying to do it on the same core. Join Contrib Key: SOLR-4787 URL: https://issues.apache.org/jira/browse/SOLR-4787 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.2.1 Reporter: Joel Bernstein Priority: Minor Fix For: 4.8 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, SOLR-4797-hjoin-multivaluekeys-trunk.patch This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 3 join implementations. The initial patch was generated from the Solr 4.3 tag. Because of changes in the FieldCache API this patch will only build with Solr 4.2 or above. *HashSetJoinQParserPlugin aka hjoin* The hjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the hjoin is designed to work with int and long join keys only. So, in order to use hjoin, int or long join keys must be included in both the to and from core. The second difference is that the hjoin builds memory structures that are used to quickly connect the join keys. So, the hjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the hjoin is that it can scale to join millions of keys between cores and provide sub-second response time. The hjoin should work well with up to two million results from the fromIndex and tens of millions of results from the main query. The hjoin supports the following features: 1) Both lucene query and PostFilter implementations. A *cost* 99 will turn on the PostFilter. The PostFilter will typically outperform the Lucene query when the main query results have been narrowed down. 2) With the lucene query implementation there is an option to build the filter with threads. This can greatly improve the performance of the query if the main query index is very large. The threads parameter turns on threading. For example *threads=6* will use 6 threads to build the filter. This will setup a fixed threadpool with six threads to handle all hjoin requests. Once the threadpool is created the hjoin will always use it to build the filter. Threading does not come into play with the PostFilter. 3) The *size* local parameter can be used to set the initial size of the hashset used to perform the join. If this is set above the number of results from the fromIndex then the you can avoid hashset resizing which improves performance. 4) Nested filter queries. The local parameter fq can be used to nest a filter query within the join. The nested fq will filter the results of the join query. This can point to another join to support nested joins. 5) Full caching support for the lucene query implementation. The filterCache and queryResultCache should work properly even with deep nesting of joins. Only the queryResultCache comes into play with the PostFilter implementation because PostFilters are not cacheable in the filterCache. The syntax of the hjoin is similar to the JoinQParserPlugin except that the plugin is referenced by the string hjoin rather then join. fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 fq=$qq\}user:customer1qq=group:5 The example filter query above will search the fromIndex (collection2) for user:customer1 applying the local fq parameter to filter the results. The lucene filter query will be built using 6 threads. This query will generate a list of values from the from field that will be used to filter the main query. Only records from the main query, where the to field is present in the from list will be included in the results. The solrconfig.xml in the main query core must contain the reference to the hjoin. queryParser name=hjoin class=org.apache.solr.joins.HashSetJoinQParserPlugin/ And the join contrib lib jars must be registed in the solrconfig.xml. lib dir=../../../contrib/joins/lib regex=.*\.jar / After issuing the ant dist command from inside the solr directory the joins contrib jar will appear in the solr/dist directory. Place the the
[jira] [Commented] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13938117#comment-13938117 ] Alexander S. commented on SOLR-4787: Getting this error: {code} RSolr::Error::Http - 500 Internal Server Error Error: {msg=SolrCore 'crm-dev' is not available due to init failure: Error loading class 'org.apache.solr.search.joins.HashSetJoinQParserPlugin',trace=org.apache.solr.common.SolrException: SolrCore 'crm-dev' is not available due to init failure: Error loading class 'org.apache.solr.search.joins.HashSetJoinQParserPlugin' at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:827) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:309) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) {code} Join Contrib Key: SOLR-4787 URL: https://issues.apache.org/jira/browse/SOLR-4787 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.2.1 Reporter: Joel Bernstein Priority: Minor Fix For: 4.8 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4797-hjoin-multivaluekeys-nestedJoins.patch, SOLR-4797-hjoin-multivaluekeys-trunk.patch This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 3 join implementations. The initial patch was generated from the Solr 4.3 tag. Because of changes in the FieldCache API this patch will only build with Solr 4.2 or above. *HashSetJoinQParserPlugin aka hjoin* The hjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the hjoin is designed to work with int and long join keys only. So, in order to use hjoin, int or long join keys must be included in both the to and from core. The second difference is that the hjoin builds memory structures that are used to quickly connect the join keys. So, the hjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the hjoin is that it can scale to join millions of keys between cores and provide sub-second response time. The hjoin should work well with up to two million results from the fromIndex and tens of millions of results from the main query. The hjoin supports the following features: 1) Both lucene query and PostFilter implementations. A *cost* 99 will turn on the PostFilter. The PostFilter will typically outperform the Lucene query when the main query results have been narrowed down. 2) With the lucene query implementation there is an option to build the filter with threads. This can greatly improve the performance of the query if the main query index is very large. The threads parameter turns on threading. For example *threads=6* will use 6 threads to build the filter. This will setup a fixed threadpool with six threads to handle all hjoin requests. Once the threadpool is created the hjoin will always use it to build the filter. Threading does not come into play with the PostFilter. 3) The *size* local parameter can be used to set the initial size of the hashset used to perform the join. If this is set above the number of results from the fromIndex then the you can avoid hashset resizing which improves performance. 4) Nested filter queries. The local parameter fq can be used to nest a filter query within the join. The nested fq will filter the results of the join query. This can point to another join to support nested joins. 5) Full caching support for the lucene query implementation. The filterCache and queryResultCache should work properly even with deep nesting of joins. Only the
[jira] [Commented] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920798#comment-13920798 ] Alexander S. commented on SOLR-4787: Hi Joel, thanks, I seems need to perform a nested join inside a single collection, but need fq inside join as it is shown here: https://issues.apache.org/jira/browse/SOLR-4787?focusedCommentId=13750854page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13750854 I have a single collection with a field type which determines the kind of document. 3 types of documents: Profile, Site, and SiteSource. When searching for Profiles I have to look in SiteSource content, so I need something like this: {code} q = {!join from=owner_id_im to=id_i fq=$joinFilter1 v=$joinQuery1} # Profile → Site join joinQuery1 = {!join from=site_id_i to=id_i fq=$joinFilter2 v=$joinQuery2} # Site → SiteSource join joinQuery2 = {!edismax}my_keywords joinFilter1 = type:Site joinFilter2 = type:SiteSource {code} Right now this works only partially, fq inside {!join} is ignored. When to expect this patch to be merged? Also, will it work in the way I've explained or do I understand it wrong? Thank you, Alex Join Contrib Key: SOLR-4787 URL: https://issues.apache.org/jira/browse/SOLR-4787 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.2.1 Reporter: Joel Bernstein Priority: Minor Fix For: 4.7 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4797-hjoin-multivaluekeys-trunk.patch This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 3 join implementations. The initial patch was generated from the Solr 4.3 tag. Because of changes in the FieldCache API this patch will only build with Solr 4.2 or above. *HashSetJoinQParserPlugin aka hjoin* The hjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the hjoin is designed to work with int and long join keys only. So, in order to use hjoin, int or long join keys must be included in both the to and from core. The second difference is that the hjoin builds memory structures that are used to quickly connect the join keys. So, the hjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the hjoin is that it can scale to join millions of keys between cores and provide sub-second response time. The hjoin should work well with up to two million results from the fromIndex and tens of millions of results from the main query. The hjoin supports the following features: 1) Both lucene query and PostFilter implementations. A *cost* 99 will turn on the PostFilter. The PostFilter will typically outperform the Lucene query when the main query results have been narrowed down. 2) With the lucene query implementation there is an option to build the filter with threads. This can greatly improve the performance of the query if the main query index is very large. The threads parameter turns on threading. For example *threads=6* will use 6 threads to build the filter. This will setup a fixed threadpool with six threads to handle all hjoin requests. Once the threadpool is created the hjoin will always use it to build the filter. Threading does not come into play with the PostFilter. 3) The *size* local parameter can be used to set the initial size of the hashset used to perform the join. If this is set above the number of results from the fromIndex then the you can avoid hashset resizing which improves performance. 4) Nested filter queries. The local parameter fq can be used to nest a filter query within the join. The nested fq will filter the results of the join query. This can point to another join to support nested joins. 5) Full caching support for the lucene query implementation. The filterCache and queryResultCache should work properly even with deep nesting of joins. Only the queryResultCache comes into play with the PostFilter implementation because PostFilters are not cacheable in the filterCache. The syntax of the hjoin is similar to the JoinQParserPlugin except that the plugin is referenced by the string hjoin rather then join. fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6
[jira] [Comment Edited] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920798#comment-13920798 ] Alexander S. edited comment on SOLR-4787 at 3/5/14 12:36 PM: - Hi Joel, thanks, I seems need to perform a nested join inside a single collection, but need fq inside join as it is shown here: https://issues.apache.org/jira/browse/SOLR-4787?focusedCommentId=13750854page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13750854 I have a single collection with a field type which determines the kind of document. 3 types of documents: Profile, Site, and SiteSource. When searching for Profiles I have to look in SiteSource content, so I need something like this: {code} q = {!join from=owner_id_im to=id_i fq=$joinFilter1 v=$joinQuery1} # Profile → Site join joinQuery1 = {!join from=site_id_i to=id_i fq=$joinFilter2 v=$joinQuery2} # Site → SiteSource join joinQuery2 = {!edismax}my_keywords joinFilter1 = type:Site joinFilter2 = type:SiteSource {code} Right now this works only partially, fq inside \{!join\} is ignored. When to expect this patch to be merged? Also, will it work in the way I've explained or do I understand it wrong? Thank you, Alex was (Author: aheaven): Hi Joel, thanks, I seems need to perform a nested join inside a single collection, but need fq inside join as it is shown here: https://issues.apache.org/jira/browse/SOLR-4787?focusedCommentId=13750854page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13750854 I have a single collection with a field type which determines the kind of document. 3 types of documents: Profile, Site, and SiteSource. When searching for Profiles I have to look in SiteSource content, so I need something like this: {code} q = {!join from=owner_id_im to=id_i fq=$joinFilter1 v=$joinQuery1} # Profile → Site join joinQuery1 = {!join from=site_id_i to=id_i fq=$joinFilter2 v=$joinQuery2} # Site → SiteSource join joinQuery2 = {!edismax}my_keywords joinFilter1 = type:Site joinFilter2 = type:SiteSource {code} Right now this works only partially, fq inside {!join} is ignored. When to expect this patch to be merged? Also, will it work in the way I've explained or do I understand it wrong? Thank you, Alex Join Contrib Key: SOLR-4787 URL: https://issues.apache.org/jira/browse/SOLR-4787 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.2.1 Reporter: Joel Bernstein Priority: Minor Fix For: 4.7 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4797-hjoin-multivaluekeys-trunk.patch This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 3 join implementations. The initial patch was generated from the Solr 4.3 tag. Because of changes in the FieldCache API this patch will only build with Solr 4.2 or above. *HashSetJoinQParserPlugin aka hjoin* The hjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the hjoin is designed to work with int and long join keys only. So, in order to use hjoin, int or long join keys must be included in both the to and from core. The second difference is that the hjoin builds memory structures that are used to quickly connect the join keys. So, the hjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the hjoin is that it can scale to join millions of keys between cores and provide sub-second response time. The hjoin should work well with up to two million results from the fromIndex and tens of millions of results from the main query. The hjoin supports the following features: 1) Both lucene query and PostFilter implementations. A *cost* 99 will turn on the PostFilter. The PostFilter will typically outperform the Lucene query when the main query results have been narrowed down. 2) With the lucene query implementation there is an option to build the filter with threads. This can greatly improve the performance of the query if the main query index is very large. The threads parameter turns on threading. For example *threads=6* will use 6 threads to build the filter. This will setup a fixed threadpool with six threads to handle all hjoin requests. Once the threadpool is
[jira] [Commented] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13920805#comment-13920805 ] Alexander S. commented on SOLR-4787: Hi, 4.4 and 4.7 Join Contrib Key: SOLR-4787 URL: https://issues.apache.org/jira/browse/SOLR-4787 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.2.1 Reporter: Joel Bernstein Priority: Minor Fix For: 4.7 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4797-hjoin-multivaluekeys-trunk.patch This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 3 join implementations. The initial patch was generated from the Solr 4.3 tag. Because of changes in the FieldCache API this patch will only build with Solr 4.2 or above. *HashSetJoinQParserPlugin aka hjoin* The hjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the hjoin is designed to work with int and long join keys only. So, in order to use hjoin, int or long join keys must be included in both the to and from core. The second difference is that the hjoin builds memory structures that are used to quickly connect the join keys. So, the hjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the hjoin is that it can scale to join millions of keys between cores and provide sub-second response time. The hjoin should work well with up to two million results from the fromIndex and tens of millions of results from the main query. The hjoin supports the following features: 1) Both lucene query and PostFilter implementations. A *cost* 99 will turn on the PostFilter. The PostFilter will typically outperform the Lucene query when the main query results have been narrowed down. 2) With the lucene query implementation there is an option to build the filter with threads. This can greatly improve the performance of the query if the main query index is very large. The threads parameter turns on threading. For example *threads=6* will use 6 threads to build the filter. This will setup a fixed threadpool with six threads to handle all hjoin requests. Once the threadpool is created the hjoin will always use it to build the filter. Threading does not come into play with the PostFilter. 3) The *size* local parameter can be used to set the initial size of the hashset used to perform the join. If this is set above the number of results from the fromIndex then the you can avoid hashset resizing which improves performance. 4) Nested filter queries. The local parameter fq can be used to nest a filter query within the join. The nested fq will filter the results of the join query. This can point to another join to support nested joins. 5) Full caching support for the lucene query implementation. The filterCache and queryResultCache should work properly even with deep nesting of joins. Only the queryResultCache comes into play with the PostFilter implementation because PostFilters are not cacheable in the filterCache. The syntax of the hjoin is similar to the JoinQParserPlugin except that the plugin is referenced by the string hjoin rather then join. fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 fq=$qq\}user:customer1qq=group:5 The example filter query above will search the fromIndex (collection2) for user:customer1 applying the local fq parameter to filter the results. The lucene filter query will be built using 6 threads. This query will generate a list of values from the from field that will be used to filter the main query. Only records from the main query, where the to field is present in the from list will be included in the results. The solrconfig.xml in the main query core must contain the reference to the hjoin. queryParser name=hjoin class=org.apache.solr.joins.HashSetJoinQParserPlugin/ And the join contrib lib jars must be registed in the solrconfig.xml. lib dir=../../../contrib/joins/lib regex=.*\.jar / After issuing the ant dist command from inside the solr directory the joins contrib jar will appear in the solr/dist directory. Place the the solr-joins-4.*-.jar in the WEB-INF/lib directory of the solr webapplication. This will ensure that the top level Solr
[jira] [Commented] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915768#comment-13915768 ] Alexander S. commented on SOLR-4787: Just tried 4.7.0 and it does not work either. Join Contrib Key: SOLR-4787 URL: https://issues.apache.org/jira/browse/SOLR-4787 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.2.1 Reporter: Joel Bernstein Priority: Minor Fix For: 4.7 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4797-hjoin-multivaluekeys-trunk.patch This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 3 join implementations. The initial patch was generated from the Solr 4.3 tag. Because of changes in the FieldCache API this patch will only build with Solr 4.2 or above. *HashSetJoinQParserPlugin aka hjoin* The hjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the hjoin is designed to work with int and long join keys only. So, in order to use hjoin, int or long join keys must be included in both the to and from core. The second difference is that the hjoin builds memory structures that are used to quickly connect the join keys. So, the hjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the hjoin is that it can scale to join millions of keys between cores and provide sub-second response time. The hjoin should work well with up to two million results from the fromIndex and tens of millions of results from the main query. The hjoin supports the following features: 1) Both lucene query and PostFilter implementations. A *cost* 99 will turn on the PostFilter. The PostFilter will typically outperform the Lucene query when the main query results have been narrowed down. 2) With the lucene query implementation there is an option to build the filter with threads. This can greatly improve the performance of the query if the main query index is very large. The threads parameter turns on threading. For example *threads=6* will use 6 threads to build the filter. This will setup a fixed threadpool with six threads to handle all hjoin requests. Once the threadpool is created the hjoin will always use it to build the filter. Threading does not come into play with the PostFilter. 3) The *size* local parameter can be used to set the initial size of the hashset used to perform the join. If this is set above the number of results from the fromIndex then the you can avoid hashset resizing which improves performance. 4) Nested filter queries. The local parameter fq can be used to nest a filter query within the join. The nested fq will filter the results of the join query. This can point to another join to support nested joins. 5) Full caching support for the lucene query implementation. The filterCache and queryResultCache should work properly even with deep nesting of joins. Only the queryResultCache comes into play with the PostFilter implementation because PostFilters are not cacheable in the filterCache. The syntax of the hjoin is similar to the JoinQParserPlugin except that the plugin is referenced by the string hjoin rather then join. fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 fq=$qq\}user:customer1qq=group:5 The example filter query above will search the fromIndex (collection2) for user:customer1 applying the local fq parameter to filter the results. The lucene filter query will be built using 6 threads. This query will generate a list of values from the from field that will be used to filter the main query. Only records from the main query, where the to field is present in the from list will be included in the results. The solrconfig.xml in the main query core must contain the reference to the hjoin. queryParser name=hjoin class=org.apache.solr.joins.HashSetJoinQParserPlugin/ And the join contrib lib jars must be registed in the solrconfig.xml. lib dir=../../../contrib/joins/lib regex=.*\.jar / After issuing the ant dist command from inside the solr directory the joins contrib jar will appear in the solr/dist directory. Place the the solr-joins-4.*-.jar in the WEB-INF/lib directory of the solr webapplication. This will
[jira] [Commented] (SOLR-4787) Join Contrib
[ https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13912786#comment-13912786 ] Alexander S. commented on SOLR-4787: Which release does have support for {!join} with fq parameter? I was trying with 4.5.1 but fq seems does not have any effect. Join Contrib Key: SOLR-4787 URL: https://issues.apache.org/jira/browse/SOLR-4787 Project: Solr Issue Type: New Feature Components: search Affects Versions: 4.2.1 Reporter: Joel Bernstein Priority: Minor Fix For: 4.7 Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787-pjoin-long-keys.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4797-hjoin-multivaluekeys-trunk.patch This contrib provides a place where different join implementations can be contributed to Solr. This contrib currently includes 3 join implementations. The initial patch was generated from the Solr 4.3 tag. Because of changes in the FieldCache API this patch will only build with Solr 4.2 or above. *HashSetJoinQParserPlugin aka hjoin* The hjoin provides a join implementation that filters results in one core based on the results of a search in another core. This is similar in functionality to the JoinQParserPlugin but the implementation differs in a couple of important ways. The first way is that the hjoin is designed to work with int and long join keys only. So, in order to use hjoin, int or long join keys must be included in both the to and from core. The second difference is that the hjoin builds memory structures that are used to quickly connect the join keys. So, the hjoin will need more memory then the JoinQParserPlugin to perform the join. The main advantage of the hjoin is that it can scale to join millions of keys between cores and provide sub-second response time. The hjoin should work well with up to two million results from the fromIndex and tens of millions of results from the main query. The hjoin supports the following features: 1) Both lucene query and PostFilter implementations. A *cost* 99 will turn on the PostFilter. The PostFilter will typically outperform the Lucene query when the main query results have been narrowed down. 2) With the lucene query implementation there is an option to build the filter with threads. This can greatly improve the performance of the query if the main query index is very large. The threads parameter turns on threading. For example *threads=6* will use 6 threads to build the filter. This will setup a fixed threadpool with six threads to handle all hjoin requests. Once the threadpool is created the hjoin will always use it to build the filter. Threading does not come into play with the PostFilter. 3) The *size* local parameter can be used to set the initial size of the hashset used to perform the join. If this is set above the number of results from the fromIndex then the you can avoid hashset resizing which improves performance. 4) Nested filter queries. The local parameter fq can be used to nest a filter query within the join. The nested fq will filter the results of the join query. This can point to another join to support nested joins. 5) Full caching support for the lucene query implementation. The filterCache and queryResultCache should work properly even with deep nesting of joins. Only the queryResultCache comes into play with the PostFilter implementation because PostFilters are not cacheable in the filterCache. The syntax of the hjoin is similar to the JoinQParserPlugin except that the plugin is referenced by the string hjoin rather then join. fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 fq=$qq\}user:customer1qq=group:5 The example filter query above will search the fromIndex (collection2) for user:customer1 applying the local fq parameter to filter the results. The lucene filter query will be built using 6 threads. This query will generate a list of values from the from field that will be used to filter the main query. Only records from the main query, where the to field is present in the from list will be included in the results. The solrconfig.xml in the main query core must contain the reference to the hjoin. queryParser name=hjoin class=org.apache.solr.joins.HashSetJoinQParserPlugin/ And the join contrib lib jars must be registed in the solrconfig.xml. lib dir=../../../contrib/joins/lib regex=.*\.jar / After issuing the ant dist command from inside the solr directory the joins contrib jar will appear in the solr/dist directory. Place the the
[jira] [Commented] (LUCENE-4963) Deprecate broken TokenFilter constructors
[ https://issues.apache.org/jira/browse/LUCENE-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13841236#comment-13841236 ] Alexander S. commented on LUCENE-4963: -- Hi, how we're now supposed to fix this? http://stackoverflow.com/questions/18668376/solr-4-4-stopfilterfactory-and-enablepositionincrements Deprecate broken TokenFilter constructors - Key: LUCENE-4963 URL: https://issues.apache.org/jira/browse/LUCENE-4963 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Fix For: 4.4 Attachments: LUCENE-4963.patch We have some TokenFilters which are only broken with specific options. This includes: * TrimFilter when updateOffsets=true * StopFilter, JapanesePartOfSpeechStopFilter, KeepWordFilter, LengthFilter, TypeTokenFilter when enablePositionIncrements=false I think we should deprecate these behaviors in 4.4 and remove them in trunk. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4963) Deprecate broken TokenFilter constructors
[ https://issues.apache.org/jira/browse/LUCENE-4963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13841239#comment-13841239 ] Alexander S. commented on LUCENE-4963: -- I do index twitter.com/testuser then search for http://www.twitter.com/testuser These are in stopwords filter: http https www No results. Deprecate broken TokenFilter constructors - Key: LUCENE-4963 URL: https://issues.apache.org/jira/browse/LUCENE-4963 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Assignee: Adrien Grand Fix For: 4.4 Attachments: LUCENE-4963.patch We have some TokenFilters which are only broken with specific options. This includes: * TrimFilter when updateOffsets=true * StopFilter, JapanesePartOfSpeechStopFilter, KeepWordFilter, LengthFilter, TypeTokenFilter when enablePositionIncrements=false I think we should deprecate these behaviors in 4.4 and remove them in trunk. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory
[ https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander S. updated SOLR-5332: --- Affects Version/s: 4.4 Add preserve original setting to the EdgeNGramFilterFactory - Key: SOLR-5332 URL: https://issues.apache.org/jira/browse/SOLR-5332 Project: Solr Issue Type: Wish Affects Versions: 4.4 Reporter: Alexander S. Hi, as described here: http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html the problem is in that if you have these 2 strings to index: 1. facebook.com/someuser.1 2. facebook.com/someveryandverylongusername and the edge ngram filter factory with min and max gram size settings 2 and 25, search requests for these urls will fail. But search requests for: 1. facebook.com/someuser 2. facebook.com/someveryandverylonguserna will work properly. It's because first url has 1 at the end, which is lover than the allowed min gram size. In the second url the user name is longer than the max gram size (27 characters). Would be good to have a preserve original option, that will add the original string to the index if it does not fit the allowed gram size, so that 1 and someveryandverylongusername tokens will also be added to the index. Best, Alex -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory
[ https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander S. updated SOLR-5332: --- Affects Version/s: 4.6 4.5 4.5.1 Add preserve original setting to the EdgeNGramFilterFactory - Key: SOLR-5332 URL: https://issues.apache.org/jira/browse/SOLR-5332 Project: Solr Issue Type: Wish Affects Versions: 4.4, 4.5, 4.5.1, 4.6 Reporter: Alexander S. Hi, as described here: http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html the problem is in that if you have these 2 strings to index: 1. facebook.com/someuser.1 2. facebook.com/someveryandverylongusername and the edge ngram filter factory with min and max gram size settings 2 and 25, search requests for these urls will fail. But search requests for: 1. facebook.com/someuser 2. facebook.com/someveryandverylonguserna will work properly. It's because first url has 1 at the end, which is lover than the allowed min gram size. In the second url the user name is longer than the max gram size (27 characters). Would be good to have a preserve original option, that will add the original string to the index if it does not fit the allowed gram size, so that 1 and someveryandverylongusername tokens will also be added to the index. Best, Alex -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory
Alexander S. created SOLR-5332: -- Summary: Add preserve original setting to the EdgeNGramFilterFactory Key: SOLR-5332 URL: https://issues.apache.org/jira/browse/SOLR-5332 Project: Solr Issue Type: Wish Reporter: Alexander S. Hi, as described here: http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html the problem is in that if you have these 2 strings to index: 1. facebook.com/someuser.1 2. facebook.com/someveryandverylongusername and the edge ngram filter factory with min and max gram size settings 2 and 25, search requests for these urls will fail. But search requests for: 1. facebook.com/someuser 2. facebook.com/someveryandverylonguserna will work properly. It's because first url has 1 at the end, which is lover that the allowed min gram size. In the second url the user name is longer than the max gram size (27 characters). Would be good to have a preserve original option, that will add the original string to the index if it does not fit the allowed gram size, so that 1 and someveryandverylongusername tokens will also be added to the index. Best, Alex -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5332) Add preserve original setting to the EdgeNGramFilterFactory
[ https://issues.apache.org/jira/browse/SOLR-5332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander S. updated SOLR-5332: --- Description: Hi, as described here: http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html the problem is in that if you have these 2 strings to index: 1. facebook.com/someuser.1 2. facebook.com/someveryandverylongusername and the edge ngram filter factory with min and max gram size settings 2 and 25, search requests for these urls will fail. But search requests for: 1. facebook.com/someuser 2. facebook.com/someveryandverylonguserna will work properly. It's because first url has 1 at the end, which is lover than the allowed min gram size. In the second url the user name is longer than the max gram size (27 characters). Would be good to have a preserve original option, that will add the original string to the index if it does not fit the allowed gram size, so that 1 and someveryandverylongusername tokens will also be added to the index. Best, Alex was: Hi, as described here: http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html the problem is in that if you have these 2 strings to index: 1. facebook.com/someuser.1 2. facebook.com/someveryandverylongusername and the edge ngram filter factory with min and max gram size settings 2 and 25, search requests for these urls will fail. But search requests for: 1. facebook.com/someuser 2. facebook.com/someveryandverylonguserna will work properly. It's because first url has 1 at the end, which is lover that the allowed min gram size. In the second url the user name is longer than the max gram size (27 characters). Would be good to have a preserve original option, that will add the original string to the index if it does not fit the allowed gram size, so that 1 and someveryandverylongusername tokens will also be added to the index. Best, Alex Add preserve original setting to the EdgeNGramFilterFactory - Key: SOLR-5332 URL: https://issues.apache.org/jira/browse/SOLR-5332 Project: Solr Issue Type: Wish Reporter: Alexander S. Hi, as described here: http://lucene.472066.n3.nabble.com/Help-to-figure-out-why-query-does-not-match-td4086967.html the problem is in that if you have these 2 strings to index: 1. facebook.com/someuser.1 2. facebook.com/someveryandverylongusername and the edge ngram filter factory with min and max gram size settings 2 and 25, search requests for these urls will fail. But search requests for: 1. facebook.com/someuser 2. facebook.com/someveryandverylonguserna will work properly. It's because first url has 1 at the end, which is lover than the allowed min gram size. In the second url the user name is longer than the max gram size (27 characters). Would be good to have a preserve original option, that will add the original string to the index if it does not fit the allowed gram size, so that 1 and someveryandverylongusername tokens will also be added to the index. Best, Alex -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-874) Dismax parser exceptions on trailing OPERATOR
Alexander S. commented on SOLR-874 Dismax parser exceptions on trailing OPERATOR Hi, sorry for asking this here, but is the next error related to this issue? Aug 26, 2012 8:22:33 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:134) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:165) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:164) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:929) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:405) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:279) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:515) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:300) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:679) Caused by: org.apache.lucene.queryParser.ParseException: Cannot parse '"admission";"adolescent";"adrenal gland disorders";"adrenocortical carcinoma";"adrenoleukodystrophy see leukodystrophies";"advocacy";"afd";"affordability";"african american health";"africaso";"aga";"aganglionic megacolon";"aggressive mastocytosis";"aging";"agranulocytic angina";"agu";"agyria";"ahc";"ahd";"ahds";"ahus";"aicardi syndrome";"aids";"aids and infections";"aids and pregnancy";"': Lexical error at line 1, column 391. Encountered: EOF after : "" at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:216) at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:79) at org.apache.solr.search.QParser.getQuery(QParser.java:143) at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:105) ... 21 more Caused by: org.apache.lucene.queryParser.TokenMgrError: Lexical error at line 1, column 391. Encountered: EOF after : "" at org.apache.lucene.queryParser.QueryParserTokenManager.getNextToken(QueryParserTokenManager.java:1229) at org.apache.lucene.queryParser.QueryParser.jj_scan_token(QueryParser.java:1733) at org.apache.lucene.queryParser.QueryParser.jj_3R_2(QueryParser.java:1616) at org.apache.lucene.queryParser.QueryParser.jj_3_1(QueryParser.java:1623) at org.apache.lucene.queryParser.QueryParser.jj_2_1(QueryParser.java:1609) at org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1288) at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1274) at org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1234) at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:206) ... 24 more And this one also looks very similar http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201203.mbox/%3C007b01ccf78e$9171c1f0$b45545d0$@gmail.com%3E Best, Alex This message is automatically generated by JIRA. If you think it was
[jira] [Comment Edited] (SOLR-874) Dismax parser exceptions on trailing OPERATOR
Alexander S. edited a comment on SOLR-874 Dismax parser exceptions on trailing OPERATOR Hi, sorry for asking this here, but is the next error related to this issue? Aug 26, 2012 8:36:24 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:134) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:165) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:164) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:929) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:405) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:279) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:515) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:300) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:679) Caused by: org.apache.lucene.queryParser.ParseException: Cannot parse '"hgps" "hhho" "hhrh" ...truncated... "kidney stones" "kidney transplant" "kidney trafq=type:Tweet': Lexical error at line 1, column 6783. Encountered: EOF after : "\"kidney trafq=type:Tweet" at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:216) at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:79) at org.apache.solr.search.QParser.getQuery(QParser.java:143) at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:105) ... 21 more Caused by: org.apache.lucene.queryParser.TokenMgrError: Lexical error at line 1, column 6783. Encountered: EOF after : "\"kidney trafq=type:Tweet" at org.apache.lucene.queryParser.QueryParserTokenManager.getNextToken(QueryParserTokenManager.java:1229) at org.apache.lucene.queryParser.QueryParser.jj_ntk(QueryParser.java:1772) at org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1555) at org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1317) at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1274) at org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1234) at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:206) ... 24 more And this one also looks very similar http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201203.mbox/%3C007b01ccf78e$9171c1f0$b45545d0$@gmail.com%3E Best, Alex This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-874) Dismax parser exceptions on trailing OPERATOR
Alexander S. edited a comment on SOLR-874 Dismax parser exceptions on trailing OPERATOR Hi, sorry for asking this here, but is the next error related to this issue? Aug 26, 2012 8:36:24 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:134) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:165) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:164) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:929) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:405) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:279) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:515) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:300) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:679) Caused by: org.apache.lucene.queryParser.ParseException: Cannot parse '"hgps" "hhho" "hhrh" ...truncated... "kidney stones" "kidney transplant" "kidney trafq=type:Tweet': Lexical error at line 1, column 6783. Encountered: EOF after : "\"kidney trafq=type:Tweet" at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:216) at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:79) at org.apache.solr.search.QParser.getQuery(QParser.java:143) at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:105) ... 21 more Caused by: org.apache.lucene.queryParser.TokenMgrError: Lexical error at line 1, column 6783. Encountered: EOF after : "\"kidney trafq=type:Tweet" at org.apache.lucene.queryParser.QueryParserTokenManager.getNextToken(QueryParserTokenManager.java:1229) at org.apache.lucene.queryParser.QueryParser.jj_ntk(QueryParser.java:1772) at org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1555) at org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1317) at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1274) at org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1234) at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:206) ... 24 more "kidney trafq=" should be "kidney transplantation" fq='type:Tweet', so it looks like the query string was truncated. And this one also looks very similar http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201203.mbox/%3C007b01ccf78e$9171c1f0$b45545d0$@gmail.com%3E Best, Alex This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
[jira] Created: (SOLR-1862) CLONE -java.io.IOException: read past EOF
CLONE -java.io.IOException: read past EOF - Key: SOLR-1862 URL: https://issues.apache.org/jira/browse/SOLR-1862 Project: Solr Issue Type: Bug Affects Versions: 1.4 Reporter: Alexander S Assignee: Yonik Seeley Priority: Critical Fix For: 1.5 A query with relevancy scores of all zeros produces an invalid doclist that includes sentinel values 2147483647 and causes Solr to request that invalid docid from Lucene which results in a java.io.IOException: read past EOF http://search.lucidimagination.com/search/document/2d5359c0e0d103be/java_io_ioexception_read_past_eof_after_solr_1_4_0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.