[jira] [Commented] (LUCENE-8987) Move Lucene web site from svn to git
[ https://issues.apache.org/jira/browse/LUCENE-8987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17035984#comment-17035984 ] Jan Høydahl commented on LUCENE-8987: - Fixed broken links to core system requirements (missing core/ in URL path) > Move Lucene web site from svn to git > > > Key: LUCENE-8987 > URL: https://issues.apache.org/jira/browse/LUCENE-8987 > Project: Lucene - Core > Issue Type: Task > Components: general/website >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > Attachments: lucene-site-repo.png > > > INFRA just enabled [a new way of configuring website > build|https://s.apache.org/asfyaml] from a git branch, [see dev list > email|https://lists.apache.org/thread.html/b6f7e40bece5e83e27072ecc634a7815980c90240bc0a2ccb417f1fd@%3Cdev.lucene.apache.org%3E]. > It allows for automatic builds of both staging and production site, much > like the old CMS. We can choose to auto publish the html content of an > {{output/}} folder, or to have a bot build the site using > [Pelican|https://github.com/getpelican/pelican] from a {{content/}} folder. > The goal of this issue is to explore how this can be done for > [http://lucene.apache.org|http://lucene.apache.org/] by, by creating a new > git repo {{lucene-site}}, copy over the site from svn, see if it can be > "Pelicanized" easily and then test staging. Benefits are that more people > will be able to edit the web site and we can take PRs from the public (with > GitHub preview of pages). > Non-goals: > * Create a new web site or a new graphic design > * Change from Markdown to Asciidoc -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14259) backport SOLR-14013 to Solr 7.7
[ https://issues.apache.org/jira/browse/SOLR-14259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17035864#comment-17035864 ] Noble Paul commented on SOLR-14259: --- True. We will not have a 7.8 release > backport SOLR-14013 to Solr 7.7 > --- > > Key: SOLR-14259 > URL: https://issues.apache.org/jira/browse/SOLR-14259 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > Fix For: 7.7.3 > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8987) Move Lucene web site from svn to git
[ https://issues.apache.org/jira/browse/LUCENE-8987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17035813#comment-17035813 ] Jan Høydahl commented on LUCENE-8987: - Ok, I tried to disable the plugin `md_inline_extension` ([https://github.com/apache/lucene-site/commit/26bf54c2e14c6d134cebe3faa74d965eff31683d]) and now the site builds. I diffed output folder with and without extension and no difference, so I don't think we rely on it for anything. [~adamwalz] do you know why it is there? > Move Lucene web site from svn to git > > > Key: LUCENE-8987 > URL: https://issues.apache.org/jira/browse/LUCENE-8987 > Project: Lucene - Core > Issue Type: Task > Components: general/website >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > Attachments: lucene-site-repo.png > > > INFRA just enabled [a new way of configuring website > build|https://s.apache.org/asfyaml] from a git branch, [see dev list > email|https://lists.apache.org/thread.html/b6f7e40bece5e83e27072ecc634a7815980c90240bc0a2ccb417f1fd@%3Cdev.lucene.apache.org%3E]. > It allows for automatic builds of both staging and production site, much > like the old CMS. We can choose to auto publish the html content of an > {{output/}} folder, or to have a bot build the site using > [Pelican|https://github.com/getpelican/pelican] from a {{content/}} folder. > The goal of this issue is to explore how this can be done for > [http://lucene.apache.org|http://lucene.apache.org/] by, by creating a new > git repo {{lucene-site}}, copy over the site from svn, see if it can be > "Pelicanized" easily and then test staging. Benefits are that more people > will be able to edit the web site and we can take PRs from the public (with > GitHub preview of pages). > Non-goals: > * Create a new web site or a new graphic design > * Change from Markdown to Asciidoc -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14260) Make SchemaRegistryProvider pluggable in HttpClientUtil
Andy Throgmorton created SOLR-14260: --- Summary: Make SchemaRegistryProvider pluggable in HttpClientUtil Key: SOLR-14260 URL: https://issues.apache.org/jira/browse/SOLR-14260 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Components: SolrJ Reporter: Andy Throgmorton HttpClientUtil.java defines and uses an abstract SchemaRegistryProvider for mapping a protocol to an Apache ConnectionSocketFactory. There is only one implementation of this abstract class (outside of test cases). Currently, it is not override-able at runtime. This PR adds the ability to override the registry provider at runtime, using the class name value provided by "solr.schema.registry.provider", similar to how this class allows for choosing the HttpClientBuilderFactory at runtime. We've implemented a custom mTLS solution in Solr (which uses a custom SSL context). This change helps us more easily configure Solr in a modular way, since we've implemented a custom SchemaRegistryProvider that configures Apache clients to use our SSL context. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9034) Officially publish the new site
[ https://issues.apache.org/jira/browse/LUCENE-9034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17035776#comment-17035776 ] Uwe Schindler commented on LUCENE-9034: --- Yes let's release the site now. I have some time tomorrow. I can contact infra on slack to manage switch. After that I will care on cleaning up svn with them, to get rid of (then) outdated clone. I will possibly also shuffle the docs folders to final location and test everything in production. > Officially publish the new site > --- > > Key: LUCENE-9034 > URL: https://issues.apache.org/jira/browse/LUCENE-9034 > Project: Lucene - Core > Issue Type: Sub-task > Components: general/website >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > > Publishing the web site means creating a publish branch and adding the right > magic instructions to {{.asf.yml}} etc. This will then publish the new site > and disable old CMS. > Before we do that we should > # Make sure all docs and release tools are updated for new site publishing > instructions > # Create a PR with latest changes in old CMS site since the export. This > will be the changes done during 8.3.0 release and possibly some news entries > related to security issues etc. > After publishing we should ask INFRA to make old site svn read-only (and > perhaps do a commit that replaces svn content with a README.txt), so it is > obvious for everyone that we have migrated. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8987) Move Lucene web site from svn to git
[ https://issues.apache.org/jira/browse/LUCENE-8987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17035773#comment-17035773 ] Jan Høydahl commented on LUCENE-8987: - I pushed a change to the site but buildbot failed to build the site, see [https://ci2.apache.org/#/builders/3/builds/366/steps/2/logs/stdio] Don't know why this suddenly happens now and not before. I flagged it on INFRA slack, hope they look into it. The really bad thing is that they publish the site even if the Pelican build failed - leaving a non-working staging website :( > Move Lucene web site from svn to git > > > Key: LUCENE-8987 > URL: https://issues.apache.org/jira/browse/LUCENE-8987 > Project: Lucene - Core > Issue Type: Task > Components: general/website >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > Attachments: lucene-site-repo.png > > > INFRA just enabled [a new way of configuring website > build|https://s.apache.org/asfyaml] from a git branch, [see dev list > email|https://lists.apache.org/thread.html/b6f7e40bece5e83e27072ecc634a7815980c90240bc0a2ccb417f1fd@%3Cdev.lucene.apache.org%3E]. > It allows for automatic builds of both staging and production site, much > like the old CMS. We can choose to auto publish the html content of an > {{output/}} folder, or to have a bot build the site using > [Pelican|https://github.com/getpelican/pelican] from a {{content/}} folder. > The goal of this issue is to explore how this can be done for > [http://lucene.apache.org|http://lucene.apache.org/] by, by creating a new > git repo {{lucene-site}}, copy over the site from svn, see if it can be > "Pelicanized" easily and then test staging. Benefits are that more people > will be able to edit the web site and we can take PRs from the public (with > GitHub preview of pages). > Non-goals: > * Create a new web site or a new graphic design > * Change from Markdown to Asciidoc -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9223) Add Apache license headers
Jan Høydahl created LUCENE-9223: --- Summary: Add Apache license headers Key: LUCENE-9223 URL: https://issues.apache.org/jira/browse/LUCENE-9223 Project: Lucene - Core Issue Type: Sub-task Reporter: Jan Høydahl All source files should probably have the license header. Now some have and others don't. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8987) Move Lucene web site from svn to git
[ https://issues.apache.org/jira/browse/LUCENE-8987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17035762#comment-17035762 ] Jan Høydahl commented on LUCENE-8987: - [~danmuzi] I just pushed a fix for the core/features.html that you reported above - it was missing. I think we have fixed all your comments now. Really grateful for your review - let us know if you find other bugs in the new site before we push it to production. > Move Lucene web site from svn to git > > > Key: LUCENE-8987 > URL: https://issues.apache.org/jira/browse/LUCENE-8987 > Project: Lucene - Core > Issue Type: Task > Components: general/website >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > Attachments: lucene-site-repo.png > > > INFRA just enabled [a new way of configuring website > build|https://s.apache.org/asfyaml] from a git branch, [see dev list > email|https://lists.apache.org/thread.html/b6f7e40bece5e83e27072ecc634a7815980c90240bc0a2ccb417f1fd@%3Cdev.lucene.apache.org%3E]. > It allows for automatic builds of both staging and production site, much > like the old CMS. We can choose to auto publish the html content of an > {{output/}} folder, or to have a bot build the site using > [Pelican|https://github.com/getpelican/pelican] from a {{content/}} folder. > The goal of this issue is to explore how this can be done for > [http://lucene.apache.org|http://lucene.apache.org/] by, by creating a new > git repo {{lucene-site}}, copy over the site from svn, see if it can be > "Pelicanized" easily and then test staging. Benefits are that more people > will be able to edit the web site and we can take PRs from the public (with > GitHub preview of pages). > Non-goals: > * Create a new web site or a new graphic design > * Change from Markdown to Asciidoc -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9034) Officially publish the new site
[ https://issues.apache.org/jira/browse/LUCENE-9034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17035738#comment-17035738 ] Jan Høydahl commented on LUCENE-9034: - [~uschindler] and others watching: I can see no new or open issues regarding things that need fixing on the stage version of the site. Shall we process with prod release and then tackle issues as they pop up? Bette to do this now than letting another release go by before the switch. > Officially publish the new site > --- > > Key: LUCENE-9034 > URL: https://issues.apache.org/jira/browse/LUCENE-9034 > Project: Lucene - Core > Issue Type: Sub-task > Components: general/website >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > > Publishing the web site means creating a publish branch and adding the right > magic instructions to {{.asf.yml}} etc. This will then publish the new site > and disable old CMS. > Before we do that we should > # Make sure all docs and release tools are updated for new site publishing > instructions > # Create a PR with latest changes in old CMS site since the export. This > will be the changes done during 8.3.0 release and possibly some news entries > related to security issues etc. > After publishing we should ask INFRA to make old site svn read-only (and > perhaps do a commit that replaces svn content with a README.txt), so it is > obvious for everyone that we have migrated. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-14257) Keyword's not indexed or searchable
[ https://issues.apache.org/jira/browse/SOLR-14257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson resolved SOLR-14257. --- Resolution: Not A Problem What is your analysis chain? Have you looked at the admin/analysis page to see what's thrown away? Most likely the tokenizer you've specified (or are using by default) is throwing this away and you need to use a different tokenizer for the field. That said, please raise questions like this on the user's list, we try to reserve JIRAs for known bugs/enhancements rather than usage questions. See: http://lucene.apache.org/solr/community.html#mailing-lists-irc A _lot_ more people will see your question on that list and may be able to help more quickly. You might want to review: https://wiki.apache.org/solr/UsingMailingLists If it's determined that this really is a code issue or enhancement to Solr and not a configuration/usage problem, we can raise a new JIRA or reopen this one. > Keyword's not indexed or searchable > --- > > Key: SOLR-14257 > URL: https://issues.apache.org/jira/browse/SOLR-14257 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Schema and Analysis >Affects Versions: 7.6 >Reporter: Shae Bottum >Priority: Major > > During indexing, if the value of your column is the literal char , > solr's tokenizer will pass over this value and Not tokenize it. This value > then is not indexed and therefore not searchable. Need to make this keyword > searchable. I understand to search it, you would need to add quotes around > the value * to ensure the asterisk is not treated as a wildcard and return > all. The use case is searching for the actual value of an asterisk. > > tokenizer works for "jo*n" or "j*n" > tokenizer does Not work for "**" -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9048) Tutorial and docs section missing from the new website
[ https://issues.apache.org/jira/browse/LUCENE-9048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl resolved LUCENE-9048. - Resolution: Fixed > Tutorial and docs section missing from the new website > -- > > Key: LUCENE-9048 > URL: https://issues.apache.org/jira/browse/LUCENE-9048 > Project: Lucene - Core > Issue Type: Bug > Components: general/website >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > > See [https://lucene.staged.apache.org/solr/resources.html#tutorials] > The Tutorials and Docuemtation sub sections are missing from this page -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9048) Tutorial and docs section missing from the new website
[ https://issues.apache.org/jira/browse/LUCENE-9048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17035735#comment-17035735 ] Jan Høydahl commented on LUCENE-9048: - This is fixed already > Tutorial and docs section missing from the new website > -- > > Key: LUCENE-9048 > URL: https://issues.apache.org/jira/browse/LUCENE-9048 > Project: Lucene - Core > Issue Type: Bug > Components: general/website >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > > See [https://lucene.staged.apache.org/solr/resources.html#tutorials] > The Tutorials and Docuemtation sub sections are missing from this page -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-9048) Tutorial and docs section missing from the new website
[ https://issues.apache.org/jira/browse/LUCENE-9048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl reassigned LUCENE-9048: --- Assignee: Jan Høydahl > Tutorial and docs section missing from the new website > -- > > Key: LUCENE-9048 > URL: https://issues.apache.org/jira/browse/LUCENE-9048 > Project: Lucene - Core > Issue Type: Bug > Components: general/website >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > > See [https://lucene.staged.apache.org/solr/resources.html#tutorials] > The Tutorials and Docuemtation sub sections are missing from this page -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14259) backport SOLR-14013 to Solr 7.7
[ https://issues.apache.org/jira/browse/SOLR-14259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17035711#comment-17035711 ] Jan Høydahl commented on SOLR-14259: Since there will never be a 7.8 release that should not be necessary. That branch should probably be made read only? > backport SOLR-14013 to Solr 7.7 > --- > > Key: SOLR-14259 > URL: https://issues.apache.org/jira/browse/SOLR-14259 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > Fix For: 7.7.3 > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14259) backport SOLR-14013 to Solr 7.7
[ https://issues.apache.org/jira/browse/SOLR-14259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17035696#comment-17035696 ] Houston Putman commented on SOLR-14259: --- Jus to confirm, this should be merged to 7_x as well? > backport SOLR-14013 to Solr 7.7 > --- > > Key: SOLR-14259 > URL: https://issues.apache.org/jira/browse/SOLR-14259 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > Fix For: 7.7.3 > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14254) Index backcompat break between 8.3.1 and 8.4.1
[ https://issues.apache.org/jira/browse/SOLR-14254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17035630#comment-17035630 ] Adrien Grand commented on SOLR-14254: - I started a discussion on LUCENE-9222. bq. If I understand you then wouldn't this mean introducing backwards incompatibilities that don't actually exist? Yes this is correct. This might actually be a feature, as it makes all Lucene versions look the same, instead of some versions being compatible with the previous one and others not. And it also avoids silent corruptions from sneaking in, ie. when a change is made that would cause API calls to return wrong results without triggering a CorruptIndexException? > Index backcompat break between 8.3.1 and 8.4.1 > -- > > Key: SOLR-14254 > URL: https://issues.apache.org/jira/browse/SOLR-14254 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Jason Gerlowski >Priority: Major > > I believe I found a backcompat break between 8.4.1 and 8.3.1. > I encountered this when a Solr 8.3.1 cluster was upgraded to 8.4.1. On 8.4. > nodes, several collections had cores fail to come up with > {{CorruptIndexException}}: > {code} > 2020-02-10 20:58:26.136 ERROR > (coreContainerWorkExecutor-2-thread-1-processing-n:192.168.1.194:8983_solr) [ > ] o.a.s.c.CoreContainer Error waiting for SolrCore to be loaded on startup > => org.apache.sol > r.common.SolrException: Unable to create core > [testbackcompat_shard1_replica_n1] > at > org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1313) > org.apache.solr.common.SolrException: Unable to create core > [testbackcompat_shard1_replica_n1] > at > org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1313) > ~[?:?] > at > org.apache.solr.core.CoreContainer.lambda$load$13(CoreContainer.java:788) > ~[?:?] > at > com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:202) > ~[metrics-core-4.0.5.jar:4.0.5] > at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210) > ~[?:?] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > ~[?:?] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > ~[?:?] > at java.lang.Thread.run(Thread.java:834) [?:?] > Caused by: org.apache.solr.common.SolrException: Error opening new searcher > at org.apache.solr.core.SolrCore.(SolrCore.java:1072) ~[?:?] > at org.apache.solr.core.SolrCore.(SolrCore.java:901) ~[?:?] > at > org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1292) > ~[?:?] > ... 7 more > Caused by: org.apache.solr.common.SolrException: Error opening new searcher > at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2182) > ~[?:?] > at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:2302) > ~[?:?] > at org.apache.solr.core.SolrCore.initSearcher(SolrCore.java:1132) > ~[?:?] > at org.apache.solr.core.SolrCore.(SolrCore.java:1013) ~[?:?] > at org.apache.solr.core.SolrCore.(SolrCore.java:901) ~[?:?] > at > org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1292) > ~[?:?] > ... 7 more > Caused by: org.apache.lucene.index.CorruptIndexException: codec mismatch: > actual codec=Lucene50PostingsWriterDoc vs expected > codec=Lucene84PostingsWriterDoc > (resource=MMapIndexInput(path="/Users/jasongerlowski/run/solrdata/data/testbackcompat_shard1_replica_n1/data/index/_0_FST50_0.doc")) > at > org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.java:208) > ~[?:?] > at org.apache.lucene.codecs.CodecUtil.checkHeader(CodecUtil.java:198) > ~[?:?] > at > org.apache.lucene.codecs.CodecUtil.checkIndexHeader(CodecUtil.java:255) ~[?:?] > at > org.apache.lucene.codecs.lucene84.Lucene84PostingsReader.(Lucene84PostingsReader.java:82) > ~[?:?] > at > org.apache.lucene.codecs.memory.FSTPostingsFormat.fieldsProducer(FSTPostingsFormat.java:66) > ~[?:?] > at > org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.(PerFieldPostingsFormat.java:315) > ~[?:?] > at > org.apache.lucene.codecs.perfield.PerFieldPostingsFormat.fieldsProducer(PerFieldPostingsFormat.java:395) > ~[?:?] > at > org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:114) > ~[?:?] > at > org.apache.lucene.index.SegmentReader.(SegmentReader.java:84) ~[?:?] > at >
[jira] [Created] (LUCENE-9222) Detect upgrades with non-default formats
Adrien Grand created LUCENE-9222: Summary: Detect upgrades with non-default formats Key: LUCENE-9222 URL: https://issues.apache.org/jira/browse/LUCENE-9222 Project: Lucene - Core Issue Type: Wish Reporter: Adrien Grand Lucene doesn't give any backward-compatibility guarantees with non-default formats, but doesn't try to detect such misuse either, and a couple users fell in this trap over the years, see e.g. SOLR-14254. What about dynamically creating the version number of the index format based on the current Lucene version, so that Lucene would fail with an IndexFormatTooOldException with non-default formats instead of a confusing CorruptIndexException. The change would consist of doing something like that for all our non-default index formats: {code} diff --git a/lucene/codecs/src/java/org/apache/lucene/codecs/memory/FSTTermsWriter.java b/lucene/codecs/src/java/org/apache/lucene/codecs/memory/FSTTermsWriter.java index fcc0d00a593..18b35760aec 100644 --- a/lucene/codecs/src/java/org/apache/lucene/codecs/memory/FSTTermsWriter.java +++ b/lucene/codecs/src/java/org/apache/lucene/codecs/memory/FSTTermsWriter.java @@ -41,6 +41,7 @@ import org.apache.lucene.util.BytesRef; import org.apache.lucene.util.FixedBitSet; import org.apache.lucene.util.IOUtils; import org.apache.lucene.util.IntsRefBuilder; +import org.apache.lucene.util.Version; import org.apache.lucene.util.fst.FSTCompiler; import org.apache.lucene.util.fst.FST; import org.apache.lucene.util.fst.Util; @@ -123,7 +124,7 @@ import org.apache.lucene.util.fst.Util; public class FSTTermsWriter extends FieldsConsumer { static final String TERMS_EXTENSION = "tfp"; static final String TERMS_CODEC_NAME = "FSTTerms"; - public static final int TERMS_VERSION_START = 2; + public static final int TERMS_VERSION_START = (Version.LATEST.major << 16) | (Version.LATEST.minor << 8) | Version.LATEST.bugfix; public static final int TERMS_VERSION_CURRENT = TERMS_VERSION_START; final PostingsWriterBase postingsWriter; {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9211) Adding compression to BinaryDocValues storage
[ https://issues.apache.org/jira/browse/LUCENE-9211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17035624#comment-17035624 ] David Smiley commented on LUCENE-9211: -- Thanks so much for running the benchmarks [~mharwood]! When you say you modified "this line"; the link did not work. If you merely changed the default spatial.alg to use composite then it's only indexing point data which is not realistic for this spatial strategy. Instead LUCENE-5579 has a spatial.alg file that converts those points to random circles and it'll be more interesting. I just did a diff on that spatial.alg with the default one and they are pretty similar overall. > Adding compression to BinaryDocValues storage > - > > Key: LUCENE-9211 > URL: https://issues.apache.org/jira/browse/LUCENE-9211 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Mark Harwood >Assignee: Mark Harwood >Priority: Minor > Labels: pull-request-available > > While SortedSetDocValues can be used today to store identical values in a > compact form this is not effective for data with many unique values. > The proposal is that BinaryDocValues should be stored in LZ4 compressed > blocks which can dramatically reduce disk storage costs in many cases. The > proposal is blocks of a number of documents are stored as a single compressed > blob along with metadata that records offsets where the original document > values can be found in the uncompressed content. > There's a trade-off here between efficient compression (more docs-per-block = > better compression) and fast retrieval times (fewer docs-per-block = faster > read access for single values). A fixed block size of 32 docs seems like it > would be a reasonable compromise for most scenarios. > A PR is up for review here [https://github.com/apache/lucene-solr/pull/1234] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14254) Index backcompat break between 8.3.1 and 8.4.1
[ https://issues.apache.org/jira/browse/SOLR-14254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17035619#comment-17035619 ] David Smiley commented on SOLR-14254: - > rejecting upgrading indices that have non-default formats in Solr How might that work? My limited understanding is that "upgrading indices" happen transparently via merging, typically due to adding data. But for the special postings format case, Lucene can't even properly read the data any more. > changing non-default formats to use a version number that is computed using > the current Lucene version. If I understand you then wouldn't this mean introducing backwards incompatibilities that don't actually exist? Maybe I don't get the idea. Even if some format names haven't changes despite actual format changes, maybe for the non-default formats this is what we should do; it's the simplest course of action that would be helpful IMO. > Index backcompat break between 8.3.1 and 8.4.1 > -- > > Key: SOLR-14254 > URL: https://issues.apache.org/jira/browse/SOLR-14254 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Jason Gerlowski >Priority: Major > > I believe I found a backcompat break between 8.4.1 and 8.3.1. > I encountered this when a Solr 8.3.1 cluster was upgraded to 8.4.1. On 8.4. > nodes, several collections had cores fail to come up with > {{CorruptIndexException}}: > {code} > 2020-02-10 20:58:26.136 ERROR > (coreContainerWorkExecutor-2-thread-1-processing-n:192.168.1.194:8983_solr) [ > ] o.a.s.c.CoreContainer Error waiting for SolrCore to be loaded on startup > => org.apache.sol > r.common.SolrException: Unable to create core > [testbackcompat_shard1_replica_n1] > at > org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1313) > org.apache.solr.common.SolrException: Unable to create core > [testbackcompat_shard1_replica_n1] > at > org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1313) > ~[?:?] > at > org.apache.solr.core.CoreContainer.lambda$load$13(CoreContainer.java:788) > ~[?:?] > at > com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:202) > ~[metrics-core-4.0.5.jar:4.0.5] > at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210) > ~[?:?] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > ~[?:?] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > ~[?:?] > at java.lang.Thread.run(Thread.java:834) [?:?] > Caused by: org.apache.solr.common.SolrException: Error opening new searcher > at org.apache.solr.core.SolrCore.(SolrCore.java:1072) ~[?:?] > at org.apache.solr.core.SolrCore.(SolrCore.java:901) ~[?:?] > at > org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1292) > ~[?:?] > ... 7 more > Caused by: org.apache.solr.common.SolrException: Error opening new searcher > at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2182) > ~[?:?] > at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:2302) > ~[?:?] > at org.apache.solr.core.SolrCore.initSearcher(SolrCore.java:1132) > ~[?:?] > at org.apache.solr.core.SolrCore.(SolrCore.java:1013) ~[?:?] > at org.apache.solr.core.SolrCore.(SolrCore.java:901) ~[?:?] > at > org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1292) > ~[?:?] > ... 7 more > Caused by: org.apache.lucene.index.CorruptIndexException: codec mismatch: > actual codec=Lucene50PostingsWriterDoc vs expected > codec=Lucene84PostingsWriterDoc > (resource=MMapIndexInput(path="/Users/jasongerlowski/run/solrdata/data/testbackcompat_shard1_replica_n1/data/index/_0_FST50_0.doc")) > at > org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.java:208) > ~[?:?] > at org.apache.lucene.codecs.CodecUtil.checkHeader(CodecUtil.java:198) > ~[?:?] > at > org.apache.lucene.codecs.CodecUtil.checkIndexHeader(CodecUtil.java:255) ~[?:?] > at > org.apache.lucene.codecs.lucene84.Lucene84PostingsReader.(Lucene84PostingsReader.java:82) > ~[?:?] > at > org.apache.lucene.codecs.memory.FSTPostingsFormat.fieldsProducer(FSTPostingsFormat.java:66) > ~[?:?] > at > org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.(PerFieldPostingsFormat.java:315) > ~[?:?] > at >
[jira] [Resolved] (SOLR-14247) IndexSizeTriggerMixedBoundsTest does a lot of sleeping
[ https://issues.apache.org/jira/browse/SOLR-14247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris M. Hostetter resolved SOLR-14247. --- Resolution: Fixed > IndexSizeTriggerMixedBoundsTest does a lot of sleeping > -- > > Key: SOLR-14247 > URL: https://issues.apache.org/jira/browse/SOLR-14247 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Tests >Reporter: Mike Drob >Assignee: Mike Drob >Priority: Minor > Fix For: master (9.0) > > Time Spent: 20m > Remaining Estimate: 0h > > When I run tests locally, the slowest reported test is always > IndexSizeTriggerMixedBoundsTest coming in at around 2 minutes. > I took a look at the code and discovered that at least 80s of that is all > sleeps! > There might need to be more synchronization and ordering added back in, but > when I removed all of the sleeps the test still passed locally for me, so I'm > not too sure what the point was or why we were slowing the system down so > much. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14247) IndexSizeTriggerMixedBoundsTest does a lot of sleeping
[ https://issues.apache.org/jira/browse/SOLR-14247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17035615#comment-17035615 ] ASF subversion and git services commented on SOLR-14247: Commit f1fc3e7ba204d7211e9920639fb525d100614886 in lucene-solr's branch refs/heads/master from Chris M. Hostetter [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=f1fc3e7 ] SOLR-14247: Revert SolrTestCase Logger removal > IndexSizeTriggerMixedBoundsTest does a lot of sleeping > -- > > Key: SOLR-14247 > URL: https://issues.apache.org/jira/browse/SOLR-14247 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Tests >Reporter: Mike Drob >Assignee: Mike Drob >Priority: Minor > Fix For: master (9.0) > > Time Spent: 20m > Remaining Estimate: 0h > > When I run tests locally, the slowest reported test is always > IndexSizeTriggerMixedBoundsTest coming in at around 2 minutes. > I took a look at the code and discovered that at least 80s of that is all > sleeps! > There might need to be more synchronization and ordering added back in, but > when I removed all of the sleeps the test still passed locally for me, so I'm > not too sure what the point was or why we were slowing the system down so > much. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-14245) Validate Replica / ReplicaInfo on creation
[ https://issues.apache.org/jira/browse/SOLR-14245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris M. Hostetter resolved SOLR-14245. --- Resolution: Fixed > Validate Replica / ReplicaInfo on creation > -- > > Key: SOLR-14245 > URL: https://issues.apache.org/jira/browse/SOLR-14245 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Minor > Fix For: 8.5 > > > Replica / ReplicaInfo should be immutable and their fields should be > validated on creation. > Some users reported that very rarely during a failed collection CREATE or > DELETE, or when the Overseer task queue becomes corrupted, Solr may write to > ZK incomplete replica infos (eg. node_name = null). > This problem is difficult to reproduce but we should add safeguards anyway to > prevent writing such corrupted replica info to ZK. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14259) backport SOLR-14013 to Solr 7.7
[ https://issues.apache.org/jira/browse/SOLR-14259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-14259: -- Fix Version/s: 7.7.3 > backport SOLR-14013 to Solr 7.7 > --- > > Key: SOLR-14259 > URL: https://issues.apache.org/jira/browse/SOLR-14259 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Noble Paul >Assignee: Noble Paul >Priority: Major > Fix For: 7.7.3 > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] noblepaul opened a new pull request #1254: SOLR-14259: trying to port to Solr 7.7
noblepaul opened a new pull request #1254: SOLR-14259: trying to port to Solr 7.7 URL: https://github.com/apache/lucene-solr/pull/1254 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Reopened] (SOLR-14247) IndexSizeTriggerMixedBoundsTest does a lot of sleeping
[ https://issues.apache.org/jira/browse/SOLR-14247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris M. Hostetter reopened SOLR-14247: --- Why did this issue modify SolrTestCase.java ? ? ? For reasons i don't understand, this issue removed the Logger from SolrTestCase – which (again for reasons i don't undestand) seems to be causing suite level thread leaks of Log4j AsyncLogger threads from any test that does not define it's own loggers – ie: something about how we are using async logging means that any SolrCloudTestCase that doesn't initialize a logger anywhere will leak a logger thread – and evidently the SolrCloudTestCase Logger was ensuring this didn't happen until it was removed by this jira... As an example, starting with 71b869381ef0090a6e96eccbc9924ebdb4f57306 the trivial {{NamedListTest}} fails for me 100% of the time with leaked threads (regardless of seed) ... {noformat} [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=NamedListTest -Dtests.seed=F67D0AB0258C4521 -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=yue-Hant -Dtests.timezone=Antarctica/South_Pole -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] ERROR 0.00s | NamedListTest (suite) <<< [junit4]> Throwable #1: com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE scope at org.apache.solr.common.util.NamedListTest: [junit4]>1) Thread[id=16, name=Log4j2-TF-1-AsyncLoggerConfig-1, state=TIMED_WAITING, group=TGRP-NamedListTest] [junit4]> at java.base@11.0.4/jdk.internal.misc.Unsafe.park(Native Method) [junit4]> at java.base@11.0.4/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:234) [junit4]> at java.base@11.0.4/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2123) [junit4]> at app//com.lmax.disruptor.TimeoutBlockingWaitStrategy.waitFor(TimeoutBlockingWaitStrategy.java:38) [junit4]> at app//com.lmax.disruptor.ProcessingSequenceBarrier.waitFor(ProcessingSequenceBarrier.java:56) [junit4]> at app//com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:159) [junit4]> at app//com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:125) [junit4]> at java.base@11.0.4/java.lang.Thread.run(Thread.java:834) [junit4]>at __randomizedtesting.SeedInfo.seed([F67D0AB0258C4521]:0)Throwable #2: com.carrotsearch.randomizedtesting.ThreadLeakError: There are still zombie threads that couldn't be terminated: [junit4]>1) Thread[id=16, name=Log4j2-TF-1-AsyncLoggerConfig-1, state=TIMED_WAITING, group=TGRP-NamedListTest] [junit4]> at java.base@11.0.4/jdk.internal.misc.Unsafe.park(Native Method) [junit4]> at java.base@11.0.4/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:234) [junit4]> at java.base@11.0.4/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2123) [junit4]> at app//com.lmax.disruptor.TimeoutBlockingWaitStrategy.waitFor(TimeoutBlockingWaitStrategy.java:38) [junit4]> at app//com.lmax.disruptor.ProcessingSequenceBarrier.waitFor(ProcessingSequenceBarrier.java:56) [junit4]> at app//com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:159) [junit4]> at app//com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:125) [junit4]> at java.base@11.0.4/java.lang.Thread.run(Thread.java:834) [junit4]>at __randomizedtesting.SeedInfo.seed([F67D0AB0258C4521]:0) [junit4] Completed [1/1 (1!)] in 23.32s, 6 tests, 2 errors <<< FAILURES! {noformat} These failures do not happen w/ b21312f411bdfb069114846f31f45dcc6ec6ecb8 (the prior commit on the master branch) checked out. > IndexSizeTriggerMixedBoundsTest does a lot of sleeping > -- > > Key: SOLR-14247 > URL: https://issues.apache.org/jira/browse/SOLR-14247 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Tests >Reporter: Mike Drob >Assignee: Mike Drob >Priority: Minor > Fix For: master (9.0) > > Time Spent: 20m > Remaining Estimate: 0h > > When I run tests locally, the slowest reported test is always > IndexSizeTriggerMixedBoundsTest coming in at around 2 minutes. > I took a look at the code and discovered that at least 80s of that is all > sleeps! > There might need to be more synchronization and ordering added back in, but > when I removed all of the sleeps the test still
[jira] [Commented] (SOLR-14013) javabin performance regressions
[ https://issues.apache.org/jira/browse/SOLR-14013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17035589#comment-17035589 ] Noble Paul commented on SOLR-14013: --- I've opened SOLR-14259 > javabin performance regressions > --- > > Key: SOLR-14013 > URL: https://issues.apache.org/jira/browse/SOLR-14013 > Project: Solr > Issue Type: Bug >Affects Versions: 7.7 >Reporter: Yonik Seeley >Assignee: Yonik Seeley >Priority: Blocker > Fix For: 8.4 > > Attachments: SOLR-14013.patch, SOLR-14013.patch, TestQuerySpeed.java, > test.json > > > As noted by [~rrockenbaugh] in SOLR-13963, javabin also recently became > orders of magnitude slower in certain cases since v7.7. The cases identified > so far include large numbers of values in a field. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14259) backport SOLR-14013 to Solr 7.7
Noble Paul created SOLR-14259: - Summary: backport SOLR-14013 to Solr 7.7 Key: SOLR-14259 URL: https://issues.apache.org/jira/browse/SOLR-14259 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Reporter: Noble Paul Assignee: Noble Paul -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14245) Validate Replica / ReplicaInfo on creation
[ https://issues.apache.org/jira/browse/SOLR-14245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17035581#comment-17035581 ] ASF subversion and git services commented on SOLR-14245: Commit 3dd484ba29db04e4b5d4181e4a042dcc448b34be in lucene-solr's branch refs/heads/branch_8x from Chris M. Hostetter [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=3dd484b ] SOLR-14245: Fix ReplicaListTransformerTest Previous changes to this issue 'fixed' the way the test was creating mock Replica instances, to ensure all properties were specified -- but these changes tickled a bug in the existing test scaffolding that caused it's "expecations" to be based on a regex check against only the base "url" even though the test logic itself looked at the entire "core url" The result is that there were reproducible failures if/when the randomly generated regex matched ".*1.*" because the existing test logic did not expect that to match the url or a Replica with a core name of "core1" because it only considered the base url (cherry picked from commit 49e20dbee4b7e74448928a48bfbb50da1018400f) > Validate Replica / ReplicaInfo on creation > -- > > Key: SOLR-14245 > URL: https://issues.apache.org/jira/browse/SOLR-14245 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Minor > Fix For: 8.5 > > > Replica / ReplicaInfo should be immutable and their fields should be > validated on creation. > Some users reported that very rarely during a failed collection CREATE or > DELETE, or when the Overseer task queue becomes corrupted, Solr may write to > ZK incomplete replica infos (eg. node_name = null). > This problem is difficult to reproduce but we should add safeguards anyway to > prevent writing such corrupted replica info to ZK. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14254) Index backcompat break between 8.3.1 and 8.4.1
[ https://issues.apache.org/jira/browse/SOLR-14254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17035577#comment-17035577 ] Adrien Grand commented on SOLR-14254: - I don't have any objections to changing the name but I don't think FST50 is inappropriate. We don't always rename index formats when we change them, for instance we kept the name Lucene60PointsFormat when we added selective indexing. Maybe we could make the situation less trappy by rejecting upgrading indices that have non-default formats in Solr, or changing non-default formats to use a version number that is computed using the current Lucene version. > Index backcompat break between 8.3.1 and 8.4.1 > -- > > Key: SOLR-14254 > URL: https://issues.apache.org/jira/browse/SOLR-14254 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Jason Gerlowski >Priority: Major > > I believe I found a backcompat break between 8.4.1 and 8.3.1. > I encountered this when a Solr 8.3.1 cluster was upgraded to 8.4.1. On 8.4. > nodes, several collections had cores fail to come up with > {{CorruptIndexException}}: > {code} > 2020-02-10 20:58:26.136 ERROR > (coreContainerWorkExecutor-2-thread-1-processing-n:192.168.1.194:8983_solr) [ > ] o.a.s.c.CoreContainer Error waiting for SolrCore to be loaded on startup > => org.apache.sol > r.common.SolrException: Unable to create core > [testbackcompat_shard1_replica_n1] > at > org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1313) > org.apache.solr.common.SolrException: Unable to create core > [testbackcompat_shard1_replica_n1] > at > org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1313) > ~[?:?] > at > org.apache.solr.core.CoreContainer.lambda$load$13(CoreContainer.java:788) > ~[?:?] > at > com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:202) > ~[metrics-core-4.0.5.jar:4.0.5] > at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210) > ~[?:?] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > ~[?:?] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > ~[?:?] > at java.lang.Thread.run(Thread.java:834) [?:?] > Caused by: org.apache.solr.common.SolrException: Error opening new searcher > at org.apache.solr.core.SolrCore.(SolrCore.java:1072) ~[?:?] > at org.apache.solr.core.SolrCore.(SolrCore.java:901) ~[?:?] > at > org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1292) > ~[?:?] > ... 7 more > Caused by: org.apache.solr.common.SolrException: Error opening new searcher > at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2182) > ~[?:?] > at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:2302) > ~[?:?] > at org.apache.solr.core.SolrCore.initSearcher(SolrCore.java:1132) > ~[?:?] > at org.apache.solr.core.SolrCore.(SolrCore.java:1013) ~[?:?] > at org.apache.solr.core.SolrCore.(SolrCore.java:901) ~[?:?] > at > org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1292) > ~[?:?] > ... 7 more > Caused by: org.apache.lucene.index.CorruptIndexException: codec mismatch: > actual codec=Lucene50PostingsWriterDoc vs expected > codec=Lucene84PostingsWriterDoc > (resource=MMapIndexInput(path="/Users/jasongerlowski/run/solrdata/data/testbackcompat_shard1_replica_n1/data/index/_0_FST50_0.doc")) > at > org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.java:208) > ~[?:?] > at org.apache.lucene.codecs.CodecUtil.checkHeader(CodecUtil.java:198) > ~[?:?] > at > org.apache.lucene.codecs.CodecUtil.checkIndexHeader(CodecUtil.java:255) ~[?:?] > at > org.apache.lucene.codecs.lucene84.Lucene84PostingsReader.(Lucene84PostingsReader.java:82) > ~[?:?] > at > org.apache.lucene.codecs.memory.FSTPostingsFormat.fieldsProducer(FSTPostingsFormat.java:66) > ~[?:?] > at > org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.(PerFieldPostingsFormat.java:315) > ~[?:?] > at > org.apache.lucene.codecs.perfield.PerFieldPostingsFormat.fieldsProducer(PerFieldPostingsFormat.java:395) > ~[?:?] > at > org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:114) > ~[?:?] > at > org.apache.lucene.index.SegmentReader.(SegmentReader.java:84) ~[?:?] > at >
[GitHub] [lucene-solr] dsmiley commented on issue #357: [SOLR-12238] Synonym Queries boost
dsmiley commented on issue #357: [SOLR-12238] Synonym Queries boost URL: https://github.com/apache/lucene-solr/pull/357#issuecomment-585341721 Personally I'm fine with backporting this although QueryBuilder is not labelled as experimental so I wonder if we are "allowed" to change the API like we did in a minor release? It's debatable. Maybe it should be labelled experimental now. Please merge @romseygeek as I'll be on a vacation shortly. Otherwise I could get to it the last week of this month. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14245) Validate Replica / ReplicaInfo on creation
[ https://issues.apache.org/jira/browse/SOLR-14245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17035568#comment-17035568 ] ASF subversion and git services commented on SOLR-14245: Commit 49e20dbee4b7e74448928a48bfbb50da1018400f in lucene-solr's branch refs/heads/master from Chris M. Hostetter [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=49e20db ] SOLR-14245: Fix ReplicaListTransformerTest Previous changes to this issue 'fixed' the way the test was creating mock Replica instances, to ensure all properties were specified -- but these changes tickled a bug in the existing test scaffolding that caused it's "expecations" to be based on a regex check against only the base "url" even though the test logic itself looked at the entire "core url" The result is that there were reproducible failures if/when the randomly generated regex matched ".*1.*" because the existing test logic did not expect that to match the url or a Replica with a core name of "core1" because it only considered the base url > Validate Replica / ReplicaInfo on creation > -- > > Key: SOLR-14245 > URL: https://issues.apache.org/jira/browse/SOLR-14245 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Minor > Fix For: 8.5 > > > Replica / ReplicaInfo should be immutable and their fields should be > validated on creation. > Some users reported that very rarely during a failed collection CREATE or > DELETE, or when the Overseer task queue becomes corrupted, Solr may write to > ZK incomplete replica infos (eg. node_name = null). > This problem is difficult to reproduce but we should add safeguards anyway to > prevent writing such corrupted replica info to ZK. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] nknize opened a new pull request #1253: LUCENE-9150: Restore support for dynamic PlanetModel in spatial3d
nknize opened a new pull request #1253: LUCENE-9150: Restore support for dynamic PlanetModel in spatial3d URL: https://github.com/apache/lucene-solr/pull/1253 This PR adds dynamic geographic datum support to Geo3D to make lucene a viable option for indexing/searching in different spatial reference systems (e.g., more accurately computing query shape relations to BKD's internal nodes using datum consistent with the spatial projection). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14254) Index backcompat break between 8.3.1 and 8.4.1
[ https://issues.apache.org/jira/browse/SOLR-14254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17035554#comment-17035554 ] David Smiley commented on SOLR-14254: - Surely "FST50" is not appropriate anymore; no? If it changed to FST84 or whatever then the user would get a message about the postingsFormat FST50 not being found or something like that. That's helpful! > Index backcompat break between 8.3.1 and 8.4.1 > -- > > Key: SOLR-14254 > URL: https://issues.apache.org/jira/browse/SOLR-14254 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Jason Gerlowski >Priority: Major > > I believe I found a backcompat break between 8.4.1 and 8.3.1. > I encountered this when a Solr 8.3.1 cluster was upgraded to 8.4.1. On 8.4. > nodes, several collections had cores fail to come up with > {{CorruptIndexException}}: > {code} > 2020-02-10 20:58:26.136 ERROR > (coreContainerWorkExecutor-2-thread-1-processing-n:192.168.1.194:8983_solr) [ > ] o.a.s.c.CoreContainer Error waiting for SolrCore to be loaded on startup > => org.apache.sol > r.common.SolrException: Unable to create core > [testbackcompat_shard1_replica_n1] > at > org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1313) > org.apache.solr.common.SolrException: Unable to create core > [testbackcompat_shard1_replica_n1] > at > org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1313) > ~[?:?] > at > org.apache.solr.core.CoreContainer.lambda$load$13(CoreContainer.java:788) > ~[?:?] > at > com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:202) > ~[metrics-core-4.0.5.jar:4.0.5] > at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210) > ~[?:?] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > ~[?:?] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > ~[?:?] > at java.lang.Thread.run(Thread.java:834) [?:?] > Caused by: org.apache.solr.common.SolrException: Error opening new searcher > at org.apache.solr.core.SolrCore.(SolrCore.java:1072) ~[?:?] > at org.apache.solr.core.SolrCore.(SolrCore.java:901) ~[?:?] > at > org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1292) > ~[?:?] > ... 7 more > Caused by: org.apache.solr.common.SolrException: Error opening new searcher > at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:2182) > ~[?:?] > at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:2302) > ~[?:?] > at org.apache.solr.core.SolrCore.initSearcher(SolrCore.java:1132) > ~[?:?] > at org.apache.solr.core.SolrCore.(SolrCore.java:1013) ~[?:?] > at org.apache.solr.core.SolrCore.(SolrCore.java:901) ~[?:?] > at > org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1292) > ~[?:?] > ... 7 more > Caused by: org.apache.lucene.index.CorruptIndexException: codec mismatch: > actual codec=Lucene50PostingsWriterDoc vs expected > codec=Lucene84PostingsWriterDoc > (resource=MMapIndexInput(path="/Users/jasongerlowski/run/solrdata/data/testbackcompat_shard1_replica_n1/data/index/_0_FST50_0.doc")) > at > org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.java:208) > ~[?:?] > at org.apache.lucene.codecs.CodecUtil.checkHeader(CodecUtil.java:198) > ~[?:?] > at > org.apache.lucene.codecs.CodecUtil.checkIndexHeader(CodecUtil.java:255) ~[?:?] > at > org.apache.lucene.codecs.lucene84.Lucene84PostingsReader.(Lucene84PostingsReader.java:82) > ~[?:?] > at > org.apache.lucene.codecs.memory.FSTPostingsFormat.fieldsProducer(FSTPostingsFormat.java:66) > ~[?:?] > at > org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.(PerFieldPostingsFormat.java:315) > ~[?:?] > at > org.apache.lucene.codecs.perfield.PerFieldPostingsFormat.fieldsProducer(PerFieldPostingsFormat.java:395) > ~[?:?] > at > org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:114) > ~[?:?] > at > org.apache.lucene.index.SegmentReader.(SegmentReader.java:84) ~[?:?] > at > org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:177) > ~[?:?] > at > org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:219) > ~[?:?] > at > org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:109) >
[jira] [Updated] (LUCENE-9221) Lucene Logo Contest
[ https://issues.apache.org/jira/browse/LUCENE-9221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Ernst updated LUCENE-9221: --- Description: The Lucene logo has served the project well for almost 20 years. However, it does sometimes show its age and misses modern nice-to-haves like invertable or grayscale variants. The PMC would like to have a contest to replace the current logo. This issue will serve as the submission mechanism for that contest. When the submission deadline closes, a community poll will be used to guide the PMC in the decision of which logo to choose. Keeping the current logo will be a possible outcome of this decision, if a majority likes the current logo more than any other proposal. The logo should adhere to the guidelines set forth by Apache for project logos ([https://www.apache.org/foundation/marks/pmcs#graphics]), specifically that the full project name, "Apache Lucene", must appear in the logo (although the word "Apache" may be in a smaller font than "Lucene"). The contest will last approximately one month. The submission deadline is *Monday, March 16, 2020*. Submissions should be attached in a single zip or tar archive, with the filename of the form \{{[user]-[proposal number].[extension]}}. was: The Lucene logo has served the project well for almost 20 years. However, it does sometimes show its age and misses modern nice-to-haves like invertable or grayscale variants. The PMC would like to have a contest to replace the current logo. This issue will serve as the submission mechanism for that contest. When the submission deadline closes, a community poll will be used to guide the PMC in the decision of which logo to choose. Keeping the current logo will be a possible outcome of this decision, if a majority likes the current logo more than any other proposal. The logo should adhere to the guidelines set forth by Apache for project logos ([https://www.apache.org/foundation/marks/pmcs#graphics]), specifically that the full project name, "Apache Lucene", must appear in the logo (although the word "Apache" may be in a smaller font than "Lucene"). The contest will last approximately one month. The submission deadline is *Monday, March 16, 2020*. Submissions should be attached in a single zip or tar archive, with the filename of the form {{{user}-\{proposal number}.\{extension}.}} > Lucene Logo Contest > --- > > Key: LUCENE-9221 > URL: https://issues.apache.org/jira/browse/LUCENE-9221 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Ryan Ernst >Priority: Trivial > > The Lucene logo has served the project well for almost 20 years. However, it > does sometimes show its age and misses modern nice-to-haves like invertable > or grayscale variants. > > The PMC would like to have a contest to replace the current logo. This issue > will serve as the submission mechanism for that contest. When the submission > deadline closes, a community poll will be used to guide the PMC in the > decision of which logo to choose. Keeping the current logo will be a possible > outcome of this decision, if a majority likes the current logo more than any > other proposal. > > The logo should adhere to the guidelines set forth by Apache for project > logos ([https://www.apache.org/foundation/marks/pmcs#graphics]), specifically > that the full project name, "Apache Lucene", must appear in the logo > (although the word "Apache" may be in a smaller font than "Lucene"). > > The contest will last approximately one month. The submission deadline is > *Monday, March 16, 2020*. Submissions should be attached in a single zip or > tar archive, with the filename of the form \{{[user]-[proposal > number].[extension]}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9221) Lucene Logo Contest
[ https://issues.apache.org/jira/browse/LUCENE-9221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Ernst updated LUCENE-9221: --- Description: The Lucene logo has served the project well for almost 20 years. However, it does sometimes show its age and misses modern nice-to-haves like invertable or grayscale variants. The PMC would like to have a contest to replace the current logo. This issue will serve as the submission mechanism for that contest. When the submission deadline closes, a community poll will be used to guide the PMC in the decision of which logo to choose. Keeping the current logo will be a possible outcome of this decision, if a majority likes the current logo more than any other proposal. The logo should adhere to the guidelines set forth by Apache for project logos ([https://www.apache.org/foundation/marks/pmcs#graphics]), specifically that the full project name, "Apache Lucene", must appear in the logo (although the word "Apache" may be in a smaller font than "Lucene"). The contest will last approximately one month. The submission deadline is *Monday, March 16, 2020*. Submissions should be attached in a single zip or tar archive, with the filename of the form {{{user}-\{proposal number}.\{extension}.}} was: The Lucene logo has served the project well for almost 20 years. However, it does sometimes show its age and misses modern nice-to-haves like invertable or grayscale variants. The PMC would like to have a contest to replace the current logo. This issue will serve as the submission mechanism for that contest. When the submission deadline closes, a community poll will be used to guide the PMC in the decision of which logo to choose. Keeping the current logo will be a possible outcome of this decision, if a majority likes the current logo more than any other proposal. The logo should adhere to the guidelines set forth by Apache for project logos ([https://www.apache.org/foundation/marks/pmcs#graphics]), specifically that the full project name, "Apache Lucene", must appear in the logo (although the word "Apache" may be in a smaller font than "Lucene"). The contest will last approximately one month. The submission deadline is *Monday, March 16, 2020*. Submissions should be attached in a single zip or tar archive, with the filename of the form *{user}-\{proposal number}.\{extension}*. > Lucene Logo Contest > --- > > Key: LUCENE-9221 > URL: https://issues.apache.org/jira/browse/LUCENE-9221 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Ryan Ernst >Priority: Trivial > > The Lucene logo has served the project well for almost 20 years. However, it > does sometimes show its age and misses modern nice-to-haves like invertable > or grayscale variants. > > The PMC would like to have a contest to replace the current logo. This issue > will serve as the submission mechanism for that contest. When the submission > deadline closes, a community poll will be used to guide the PMC in the > decision of which logo to choose. Keeping the current logo will be a possible > outcome of this decision, if a majority likes the current logo more than any > other proposal. > > The logo should adhere to the guidelines set forth by Apache for project > logos ([https://www.apache.org/foundation/marks/pmcs#graphics]), specifically > that the full project name, "Apache Lucene", must appear in the logo > (although the word "Apache" may be in a smaller font than "Lucene"). > > The contest will last approximately one month. The submission deadline is > *Monday, March 16, 2020*. Submissions should be attached in a single zip or > tar archive, with the filename of the form {{{user}-\{proposal > number}.\{extension}.}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9221) Lucene Logo Contest
Ryan Ernst created LUCENE-9221: -- Summary: Lucene Logo Contest Key: LUCENE-9221 URL: https://issues.apache.org/jira/browse/LUCENE-9221 Project: Lucene - Core Issue Type: Improvement Reporter: Ryan Ernst The Lucene logo has served the project well for almost 20 years. However, it does sometimes show its age and misses modern nice-to-haves like invertable or grayscale variants. The PMC would like to have a contest to replace the current logo. This issue will serve as the submission mechanism for that contest. When the submission deadline closes, a community poll will be used to guide the PMC in the decision of which logo to choose. Keeping the current logo will be a possible outcome of this decision, if a majority likes the current logo more than any other proposal. The logo should adhere to the guidelines set forth by Apache for project logos ([https://www.apache.org/foundation/marks/pmcs#graphics]), specifically that the full project name, "Apache Lucene", must appear in the logo (although the word "Apache" may be in a smaller font than "Lucene"). The contest will last approximately one month. The submission deadline is *Monday, March 16, 2020*. Submissions should be attached in a single zip or tar archive, with the filename of the form *{user}-\{proposal number}.\{extension}*. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14013) javabin performance regressions
[ https://issues.apache.org/jira/browse/SOLR-14013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17035499#comment-17035499 ] Houston Putman commented on SOLR-14013: --- [~noble.paul], at the very least I think we should backport this to 7_7. If we want to leave the latest release of 7 in a state with a significant regression/bug in it, then we are basically asking people to either: * Know that 7.6 is the last stable release of solr for people wanting to use multiValued fields in a sharded collection * Upgrade to Solr 8.4 In my opinion, neither of those are good options. Because users are always going to go with the most up to date version of Solr that works for their index, and upgrading to new major versions is a very tough process for a lot of people. This isn't a bug that existed throughout the entirety of Solr 7, it was introduced in the last minor release. A lot of people are very comfortable with Solr 7, and trust it. People also trust that the last minor/patch version of something is going to be the most stable version. We should make sure that the latest release of our second to last major version (7) is stable and maintains that trust that users have in it and Solr in general. It is very little work to backport this, and also probably not a whole lot of work to do another patch or minor release (7.8 or 7.7.3). And with that work we will be providing a significantly better user experience for our community. > javabin performance regressions > --- > > Key: SOLR-14013 > URL: https://issues.apache.org/jira/browse/SOLR-14013 > Project: Solr > Issue Type: Bug >Affects Versions: 7.7 >Reporter: Yonik Seeley >Assignee: Yonik Seeley >Priority: Blocker > Fix For: 8.4 > > Attachments: SOLR-14013.patch, SOLR-14013.patch, TestQuerySpeed.java, > test.json > > > As noted by [~rrockenbaugh] in SOLR-13963, javabin also recently became > orders of magnitude slower in certain cases since v7.7. The cases identified > so far include large numbers of values in a field. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14257) Keyword's not indexed or searchable
[ https://issues.apache.org/jira/browse/SOLR-14257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shae Bottum updated SOLR-14257: --- Description: During indexing, if the value of your column is the literal char , solr's tokenizer will pass over this value and Not tokenize it. This value then is not indexed and therefore not searchable. Need to make this keyword searchable. I understand to search it, you would need to add quotes around the value * to ensure the asterisk is not treated as a wildcard and return all. The use case is searching for the actual value of an asterisk. tokenizer works for "jo*n" or "j*n" tokenizer does Not work for "**" was: During indexing, if the value of your column is the literal char , solr's tokenizer will pass over this value and Not tokenize it. This value then is not indexed and therefore not searchable. Need to make this keyword searchable. I understand to search it, you would need to add quotes around the value * to ensure the asterisk is not treated as a wildcard and return all. The use case is searching for the actual value of an asterisk. tokenizer works for "jo*n" or "j*n" tokenizer does Not work for "*" or ** "" or ** "***" etc etc. > Keyword's not indexed or searchable > --- > > Key: SOLR-14257 > URL: https://issues.apache.org/jira/browse/SOLR-14257 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Schema and Analysis >Affects Versions: 7.6 >Reporter: Shae Bottum >Priority: Major > > During indexing, if the value of your column is the literal char , > solr's tokenizer will pass over this value and Not tokenize it. This value > then is not indexed and therefore not searchable. Need to make this keyword > searchable. I understand to search it, you would need to add quotes around > the value * to ensure the asterisk is not treated as a wildcard and return > all. The use case is searching for the actual value of an asterisk. > > tokenizer works for "jo*n" or "j*n" > tokenizer does Not work for "**" -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14257) Keyword's not indexed or searchable
[ https://issues.apache.org/jira/browse/SOLR-14257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shae Bottum updated SOLR-14257: --- Description: During indexing, if the value of your column is the literal char , solr's tokenizer will pass over this value and Not tokenize it. This value then is not indexed and therefore not searchable. Need to make this keyword searchable. I understand to search it, you would need to add quotes around the value * to ensure the asterisk is not treated as a wildcard and return all. The use case is searching for the actual value of an asterisk. tokenizer works for "jo*n" or "j*n" tokenizer does Not work for "*" or ** "" or ** "***" etc etc. was: During indexing, if the value of your column is the literal char , solr's tokenizer will pass over this value and Not tokenize it. This value then is not indexed and therefore not searchable. Need to make this keyword searchable. I understand to search it, you would need to add quotes around the value * to ensure the asterisk is not treated as a wildcard and return all. The use case is searching for the actual value of an asterisk. tokenizer works for "jo*n" or "j*n" tokenizer does Not work for "*" or "**" or "***" etc etc. > Keyword's not indexed or searchable > --- > > Key: SOLR-14257 > URL: https://issues.apache.org/jira/browse/SOLR-14257 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: Schema and Analysis >Affects Versions: 7.6 >Reporter: Shae Bottum >Priority: Major > > During indexing, if the value of your column is the literal char , > solr's tokenizer will pass over this value and Not tokenize it. This value > then is not indexed and therefore not searchable. Need to make this keyword > searchable. I understand to search it, you would need to add quotes around > the value * to ensure the asterisk is not treated as a wildcard and return > all. The use case is searching for the actual value of an asterisk. > > tokenizer works for "jo*n" or "j*n" > tokenizer does Not work for "*" or ** "" or ** "***" etc etc. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14257) Keyword's not indexed or searchable
Shae Bottum created SOLR-14257: -- Summary: Keyword's not indexed or searchable Key: SOLR-14257 URL: https://issues.apache.org/jira/browse/SOLR-14257 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: Schema and Analysis Affects Versions: 7.6 Reporter: Shae Bottum During indexing, if the value of your column is the literal char , solr's tokenizer will pass over this value and Not tokenize it. This value then is not indexed and therefore not searchable. Need to make this keyword searchable. I understand to search it, you would need to add quotes around the value * to ensure the asterisk is not treated as a wildcard and return all. The use case is searching for the actual value of an asterisk. tokenizer works for "jo*n" or "j*n" tokenizer does Not work for "*" or "**" or "***" etc etc. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14256) Remove HashDocSet
[ https://issues.apache.org/jira/browse/SOLR-14256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated SOLR-14256: Issue Type: Task (was: Bug) > Remove HashDocSet > - > > Key: SOLR-14256 > URL: https://issues.apache.org/jira/browse/SOLR-14256 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: search >Reporter: David Smiley >Priority: Major > > This particular DocSet is only used in places where we need to convert > SortedIntDocSet in particular to a DocSet that is fast for random access. > Once such a conversion happens, it's only used to test some docs for presence > and it could be another interface. DocSet has kind of a large-ish API > surface area to implement. Since we only need to test docs, we could use > Bits interface (having only 2 methods) backed by an off-the-shelf primitive > long hash set on our classpath. Perhaps a new method on DocSet: getBits() or > DocSetUtil.getBits(DocSet). > In addition to removing complexity unto itself, this improvement is required > by SOLR-14185 because it wants to be able to produce a DocIdSetIterator slice > directly from the DocSet but HashDocSet can't do that without sorting first. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14256) Remove HashDocSet
David Smiley created SOLR-14256: --- Summary: Remove HashDocSet Key: SOLR-14256 URL: https://issues.apache.org/jira/browse/SOLR-14256 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: search Reporter: David Smiley This particular DocSet is only used in places where we need to convert SortedIntDocSet in particular to a DocSet that is fast for random access. Once such a conversion happens, it's only used to test some docs for presence and it could be another interface. DocSet has kind of a large-ish API surface area to implement. Since we only need to test docs, we could use Bits interface (having only 2 methods) backed by an off-the-shelf primitive long hash set on our classpath. Perhaps a new method on DocSet: getBits() or DocSetUtil.getBits(DocSet). In addition to removing complexity unto itself, this improvement is required by SOLR-14185 because it wants to be able to produce a DocIdSetIterator slice directly from the DocSet but HashDocSet can't do that without sorting first. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (SOLR-14216) Exclude HealthCheck from authentication
[ https://issues.apache.org/jira/browse/SOLR-14216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl reassigned SOLR-14216: -- Assignee: Jan Høydahl > Exclude HealthCheck from authentication > --- > > Key: SOLR-14216 > URL: https://issues.apache.org/jira/browse/SOLR-14216 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: Authentication >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > > The {{HealthCheckHandler}} on {{/api/node/health}} and > {{/solr/admin/info/health}} should by default not be subject to > authentication, but be open for all. This allows for load balancers and > various monitoring to probe Solr's health without having to support the auth > scheme in place. I can't see any reason we need auth on the health endpoint. > It is possible to achieve the same by setting blockUnknown=false and > configuring three RBAC permissions: One for v1 endpoint, one for v2 endpoint > and one "all" catch all at the end of the chain. But this is cumbersome so > better have this ootb. > An alternative solution is to create a separate HttpServer for health check, > listening on a different port, just like embedded ZK and JMX. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (SOLR-14250) Solr tries to read request body after error response is sent
[ https://issues.apache.org/jira/browse/SOLR-14250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl reassigned SOLR-14250: -- Assignee: Jan Høydahl > Solr tries to read request body after error response is sent > > > Key: SOLR-14250 > URL: https://issues.apache.org/jira/browse/SOLR-14250 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > If a client sends a {{HTTP POST}} request with header {{Expect: > 100-continue}} the normal flow is for Solr (Jetty) to first respond with a > {{HTTP 100 continue}} response, then the client will send the body which will > be processed and then a final response is sent by Solr. > However, if such a request leads to an error (e.g. 404 or 401), then Solr > will skip the 100 response and instead send the error response directly. The > very last ation of {{SolrDispatchFilter#doFilter}} is to call > {{consumeInputFully()}}. However, this should not be done in case an error > response has already been sent, else you'll provoke an exception in Jetty's > HTTP lib: > {noformat} > 2020-02-07 23:13:26.459 INFO (qtp403547747-24) [ ] > o.a.s.s.SolrDispatchFilter Could not consume full client request => > java.io.IOException: Committed before 100 Continues > at > org.eclipse.jetty.http2.server.HttpChannelOverHTTP2.continue100(HttpChannelOverHTTP2.java:362) > java.io.IOException: Committed before 100 Continues > at > org.eclipse.jetty.http2.server.HttpChannelOverHTTP2.continue100(HttpChannelOverHTTP2.java:362) > ~[http2-server-9.4.19.v20190610.jar:9.4.19.v20190610] > at org.eclipse.jetty.server.Request.getInputStream(Request.java:872) > ~[jetty-server-9.4.19.v20190610.jar:9.4.19.v20190610] > at > javax.servlet.ServletRequestWrapper.getInputStream(ServletRequestWrapper.java:185) > ~[javax.servlet-api-3.1.0.jar:3.1.0] > at > org.apache.solr.servlet.SolrDispatchFilter$1.getInputStream(SolrDispatchFilter.java:612) > ~[solr-core-8.4.1.jar:8.4.1 832bf13dd9187095831caf69783179d41059d013 - ishan > - 2020-01-10 13:40:28] > at > org.apache.solr.servlet.SolrDispatchFilter.consumeInputFully(SolrDispatchFilter.java:454) > ~[solr-core-8.4.1.jar:8.4.1 832bf13dd9187095831caf69783179d41059d013 - ishan > - 2020-01-10 13:40:28] > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:445) > ~[solr-core-8.4.1.jar:8.4.1 832bf13dd9187095831caf69783179d41059d013 - ishan > - 2020-01-10 13:40:28] > {noformat} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9220) Upgrade Snowball version to 2.0
[ https://issues.apache.org/jira/browse/LUCENE-9220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17035394#comment-17035394 ] Erick Erickson commented on LUCENE-9220: It'd be interesting to see if the tests pass. On a quick look, it would be fairly tedious. All of the class names have been changed to start with a lowercase letter, so all of the referenced in the code would need to be changed and there have been some interface changes that would need to be hunted down one by one. I don't know how much work that would be. It's probably a good idea to upgrade, but not something I'll have time for any time soon. > Upgrade Snowball version to 2.0 > --- > > Key: LUCENE-9220 > URL: https://issues.apache.org/jira/browse/LUCENE-9220 > Project: Lucene - Core > Issue Type: Wish >Reporter: Nguyen Minh Gia Huy >Priority: Major > > When working with Snowball-based stemmers, I realized that Lucene is > currently [using a pre-compiled version of > Snowball|https://lucene.apache.org/core/8_4_1/analyzers-common/org/apache/lucene/analysis/snowball/package-summary.html], > that seems from 12 years ago: > https://github.com/snowballstem/snowball/tree/e103b5c257383ee94a96e7fc58cab3c567bf079b > Snowball has just released v2.0 in 10/2019 with many improvements, new > supported languages ( Arabic, Indonesian…) and new features ( stringdef > notation for Unicode codepoints…). Details of the changes could be found > here: https://github.com/snowballstem/snowball/blob/master/NEWS. I think > these changes of Snowball could give a promising positive impact on Lucene. > I wonder when Lucene should upgrade Snowball to the latest version ( v2.0). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9211) Adding compression to BinaryDocValues storage
[ https://issues.apache.org/jira/browse/LUCENE-9211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17035224#comment-17035224 ] juan camilo rodriguez duran commented on LUCENE-9211: - [~mharwood] the main idea of mine PR it just to make code cleaner and extensible, it is not supposed to introduce any regression nor improvement of the current format. (spoiler alert: I'm working in the extension to improve sorted and sorted set doc values for the lookup using BytesRef) > Adding compression to BinaryDocValues storage > - > > Key: LUCENE-9211 > URL: https://issues.apache.org/jira/browse/LUCENE-9211 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: Mark Harwood >Assignee: Mark Harwood >Priority: Minor > Labels: pull-request-available > > While SortedSetDocValues can be used today to store identical values in a > compact form this is not effective for data with many unique values. > The proposal is that BinaryDocValues should be stored in LZ4 compressed > blocks which can dramatically reduce disk storage costs in many cases. The > proposal is blocks of a number of documents are stored as a single compressed > blob along with metadata that records offsets where the original document > values can be found in the uncompressed content. > There's a trade-off here between efficient compression (more docs-per-block = > better compression) and fast retrieval times (fewer docs-per-block = faster > read access for single values). A fixed block size of 32 docs seems like it > would be a reasonable compromise for most scenarios. > A PR is up for review here [https://github.com/apache/lucene-solr/pull/1234] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-6853) solr.ManagedSynonymFilterFactory/ManagedStopwordFilterFactory: URLEncoding - Not able to delete Synonyms/Stopwords with special characters
[ https://issues.apache.org/jira/browse/SOLR-6853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17035219#comment-17035219 ] Markus Kalkbrenner commented on SOLR-6853: -- This issue has been reported for the solarium Solr PHP client, too: https://github.com/solariumphp/solarium/pull/742 > solr.ManagedSynonymFilterFactory/ManagedStopwordFilterFactory: URLEncoding - > Not able to delete Synonyms/Stopwords with special characters > -- > > Key: SOLR-6853 > URL: https://issues.apache.org/jira/browse/SOLR-6853 > Project: Solr > Issue Type: Bug > Components: Schema and Analysis >Affects Versions: 4.10.2 > Environment: Solr 4.10.2 running @ Win7 >Reporter: Tomasz Sulkowski >Priority: Major > Labels: ManagedStopwordFilterFactory, > ManagedSynonymFilterFactory, REST, SOLR > Attachments: SOLR-6853.patch > > > Hi Guys, > We're using the SOLR Rest API in order to manage synonyms and stopwords with > solr.Managed*FilterFactory. > {_emphasis_}The same applies to stopwords. I am going to explain the synonym > case only from this point on.{_emphasis_} > Let us consider the following _schema_analysis_synonyms_en.json managedMap: { > "xxx#xxx":["xxx#xxx"], > "xxx%xxx":["xxx%xxx"], > "xxx/xxx":["xxx/xxx"], > "xxx:xxx":["xxx:xxx"], > "xxx;xxx":["xxx;xxx"], > "xx ":["xx "] > } > I can add such synonym to keyword relations using REST API. The problem is > that I cannot remove/list them as > http://localhost:8983/solr/collection1/schema/analysis/synonyms/en/ > where is one of the map's key throws 404, or 500 (in case of > xxx%25xxx): > java.lang.NullPointerException at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:367) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) > at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) > at > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) > at org.eclipse.jetty.server.Server.handle(Server.java:368) at > org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) > at > org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) > at > org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) > at > org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) > at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at > org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at > org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) > at > org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) > at java.lang.Thread.run(Unknown Source) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17035213#comment-17035213 ] Xin-Chun Zhang commented on LUCENE-9136: The index format of IVFFlat is organized as follows, !1581409981369-9dea4099-4e41-4431-8f45-a3bb8cac46c0.png! In general, the number of centroids lies within the interval [4 * sqrt(N), 16 * sqrt(N)], where N is the data set size. We use (4 * sqrt(N)) as the actual value of centroid number to balance between accuracy and computational load, denoted by c. And the full data set is used for training if its size no larger than 200,000. Otherwise (128 * c) points are selected after shuffling for training in order to accelerate training. Experiments have been conducted on a large data set (sift1M, [http://corpus-texmex.irisa.fr/]) to verify the implementation of IVFFlat. The base data set (sift_base.fvecs) contains 1,000,000 vectors with 128 dimensions. And 10,000 queries (sift_query.fvecs) are used for recall testing. The recall ratio follows Recall=(Recall vectors in groundTruth) / (number of queries * TopK), where number of queries = 10,000 and TopK=100. The results are as follows (single thread and single segment), ||nprobe||avg. search time (ms)||recall (%)|| |8|16.3827|44.24| |16|16.5834|58.04| |32|19.2031|71.55| |64|24.7065|83.30| |128|34.9165|92.03| |256|60.5844|97.18| | | | | **The test codes could be found in [https://github.com/irvingzhang/lucene-solr/blob/jira/LUCENE-9136/lucene/core/src/test/org/apache/lucene/util/KnnIvfAndGraphPerformTester.java.|https://github.com/irvingzhang/lucene-solr/blob/jira/LUCENE-9136/lucene/core/src/test/org/apache/lucene/util/KnnIvfAndGraphPerformTester.java] > Introduce IVFFlat to Lucene for ANN similarity search > - > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > Attachments: 1581409981369-9dea4099-4e41-4431-8f45-a3bb8cac46c0.png > > > Representation learning (RL) has been an established discipline in the > machine learning space for decades but it draws tremendous attention lately > with the emergence of deep learning. The central problem of RL is to > determine an optimal representation of the input data. By embedding the data > into a high dimensional vector, the vector retrieval (VR) method is then > applied to search the relevant items. > With the rapid development of RL over the past few years, the technique has > been used extensively in industry from online advertising to computer vision > and speech recognition. There exist many open source implementations of VR > algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various > choices for potential users. However, the aforementioned implementations are > all written in C++, and no plan for supporting Java interface, making it hard > to be integrated in Java projects or those who are not familier with C/C++ > [[https://github.com/facebookresearch/faiss/issues/105]]. > The algorithms for vector retrieval can be roughly classified into four > categories, > # Tree-base algorithms, such as KD-tree; > # Hashing methods, such as LSH (Local Sensitive Hashing); > # Product quantization based algorithms, such as IVFFlat; > # Graph-base algorithms, such as HNSW, SSG, NSG; > where IVFFlat and HNSW are the most popular ones among all the VR algorithms. > IVFFlat is better for high-precision applications such as face recognition, > while HNSW performs better in general scenarios including recommendation and > personalized advertisement. *The recall ratio of IVFFlat could be gradually > increased by adjusting the query parameter (nprobe), while it's hard for HNSW > to improve its accuracy*. In theory, IVFFlat could achieve 100% recall ratio. > Recently, the implementation of HNSW (Hierarchical Navigable Small World, > LUCENE-9004) for Lucene, has made great progress. The issue draws attention > of those who are interested in Lucene or hope to use HNSW with Solr/Lucene. > As an alternative for solving ANN similarity search problems, IVFFlat is also > very popular with many users and supporters. Compared with HNSW, IVFFlat has > smaller index size but requires k-means clustering, while HNSW is faster in > query (no training required) but requires extra storage for saving graphs > [indexing 1M > vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. > Another advantage is that IVFFlat can be faster and more accurate when > enables GPU parallel computing (current not support in Java). Both algorithms > have their merits and demerits. Since HNSW is now under development, it may > be better to provide both implementations
[jira] [Created] (LUCENE-9220) Upgrade Snowball version to 2.0
Nguyen Minh Gia Huy created LUCENE-9220: --- Summary: Upgrade Snowball version to 2.0 Key: LUCENE-9220 URL: https://issues.apache.org/jira/browse/LUCENE-9220 Project: Lucene - Core Issue Type: Wish Reporter: Nguyen Minh Gia Huy When working with Snowball-based stemmers, I realized that Lucene is currently [using a pre-compiled version of Snowball|https://lucene.apache.org/core/8_4_1/analyzers-common/org/apache/lucene/analysis/snowball/package-summary.html], that seems from 12 years ago: https://github.com/snowballstem/snowball/tree/e103b5c257383ee94a96e7fc58cab3c567bf079b Snowball has just released v2.0 in 10/2019 with many improvements, new supported languages ( Arabic, Indonesian…) and new features ( stringdef notation for Unicode codepoints…). Details of the changes could be found here: https://github.com/snowballstem/snowball/blob/master/NEWS. I think these changes of Snowball could give a promising positive impact on Lucene. I wonder when Lucene should upgrade Snowball to the latest version ( v2.0). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xin-Chun Zhang updated LUCENE-9136: --- Attachment: 1581409981369-9dea4099-4e41-4431-8f45-a3bb8cac46c0.png > Introduce IVFFlat to Lucene for ANN similarity search > - > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > Attachments: 1581409981369-9dea4099-4e41-4431-8f45-a3bb8cac46c0.png > > > Representation learning (RL) has been an established discipline in the > machine learning space for decades but it draws tremendous attention lately > with the emergence of deep learning. The central problem of RL is to > determine an optimal representation of the input data. By embedding the data > into a high dimensional vector, the vector retrieval (VR) method is then > applied to search the relevant items. > With the rapid development of RL over the past few years, the technique has > been used extensively in industry from online advertising to computer vision > and speech recognition. There exist many open source implementations of VR > algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various > choices for potential users. However, the aforementioned implementations are > all written in C++, and no plan for supporting Java interface, making it hard > to be integrated in Java projects or those who are not familier with C/C++ > [[https://github.com/facebookresearch/faiss/issues/105]]. > The algorithms for vector retrieval can be roughly classified into four > categories, > # Tree-base algorithms, such as KD-tree; > # Hashing methods, such as LSH (Local Sensitive Hashing); > # Product quantization based algorithms, such as IVFFlat; > # Graph-base algorithms, such as HNSW, SSG, NSG; > where IVFFlat and HNSW are the most popular ones among all the VR algorithms. > IVFFlat is better for high-precision applications such as face recognition, > while HNSW performs better in general scenarios including recommendation and > personalized advertisement. *The recall ratio of IVFFlat could be gradually > increased by adjusting the query parameter (nprobe), while it's hard for HNSW > to improve its accuracy*. In theory, IVFFlat could achieve 100% recall ratio. > Recently, the implementation of HNSW (Hierarchical Navigable Small World, > LUCENE-9004) for Lucene, has made great progress. The issue draws attention > of those who are interested in Lucene or hope to use HNSW with Solr/Lucene. > As an alternative for solving ANN similarity search problems, IVFFlat is also > very popular with many users and supporters. Compared with HNSW, IVFFlat has > smaller index size but requires k-means clustering, while HNSW is faster in > query (no training required) but requires extra storage for saving graphs > [indexing 1M > vectors|[https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors]]. > Another advantage is that IVFFlat can be faster and more accurate when > enables GPU parallel computing (current not support in Java). Both algorithms > have their merits and demerits. Since HNSW is now under development, it may > be better to provide both implementations (HNSW && IVFFlat) for potential > users who are faced with very different scenarios and want to more choices. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9136) Introduce IVFFlat to Lucene for ANN similarity search
[ https://issues.apache.org/jira/browse/LUCENE-9136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17019507#comment-17019507 ] Xin-Chun Zhang edited comment on LUCENE-9136 at 2/12/20 9:33 AM: - I worked on this issue for about three to four days. And it now works fine for searching. My personal dev branch is available in github [https://github.com/irvingzhang/lucene-solr/tree/jira/LUCENE-9136]. The index format (only one meta file with suffix .ifi) of IVFFlat is shown in the class Lucene90IvfFlatIndexFormat. In my implementation, the clustering process was optimized when the number of vectors is sufficient large (e.g. > 200,000 per segment). A subset after shuffling is selected for training, thereby saving time and memory. The insertion performance of IVFFlat is better due to no extra executions on insertion while HNSW need to maintain the graph. However, IVFFlat consumes more time in flushing because of the k-means clustering. My test cases show that the query performance of IVFFlat is better than HNSW, even if HNSW uses a cache for graphs while IVFFlat has no cache. And its recall is pretty high (avg time < 10ms and recall>96% over a set of 5 random vectors with 100 dimensions). My test class for IVFFlat is under the directory [https://github.com/irvingzhang/lucene-solr/blob/jira/LUCENE-9136/lucene/core/src/test/org/apache/lucene/util/ivfflat/|https://github.com/irvingzhang/lucene-solr/blob/jira/LUCENE-9136/lucene/core/src/test/org/apache/lucene/util/ivfflat/TestKnnIvfFlat.java]. Performance comparison between IVFFlat and HNSW is in the class TestKnnGraphAndIvfFlat. The work is still in its early stage. There must be some bugs that need to be fixed and and I would like to hear more comments. Everyone is welcomed to participate in this issue. was (Author: irvingzhang): I worked on this issue for about three to four days. And it now works fine for searching. My personal dev branch is available in github [https://github.com/irvingzhang/lucene-solr/tree/jira/LUCENE-9136]. The index format (only one meta file with suffix .ifi) of IVFFlat is shown in the class Lucene90IvfFlatIndexFormat. In my implementation, the clustering process was optimized when the number of vectors is sufficient large (e.g. > 10,000,000 per segment). A subset after shuffling is selected for training, thereby saving time and memory. The insertion performance of IVFFlat is better due to no extra executions on insertion while HNSW need to maintain the graph. However, IVFFlat consumes more time in flushing because of the k-means clustering. My test cases show that the query performance of IVFFlat is better than HNSW, even if HNSW uses a cache for graphs while IVFFlat has no cache. And its recall is pretty high (avg time < 10ms and recall>96% over a set of 5 random vectors with 100 dimensions). My test class for IVFFlat is under the directory [https://github.com/irvingzhang/lucene-solr/blob/jira/LUCENE-9136/lucene/core/src/test/org/apache/lucene/util/ivfflat/|https://github.com/irvingzhang/lucene-solr/blob/jira/LUCENE-9136/lucene/core/src/test/org/apache/lucene/util/ivfflat/TestKnnIvfFlat.java]. Performance comparison between IVFFlat and HNSW is in the class TestKnnGraphAndIvfFlat. The work is still in its early stage. There must be some bugs that need to be fixed and and I would like to hear more comments. Everyone is welcomed to participate in this issue. > Introduce IVFFlat to Lucene for ANN similarity search > - > > Key: LUCENE-9136 > URL: https://issues.apache.org/jira/browse/LUCENE-9136 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Xin-Chun Zhang >Priority: Major > > Representation learning (RL) has been an established discipline in the > machine learning space for decades but it draws tremendous attention lately > with the emergence of deep learning. The central problem of RL is to > determine an optimal representation of the input data. By embedding the data > into a high dimensional vector, the vector retrieval (VR) method is then > applied to search the relevant items. > With the rapid development of RL over the past few years, the technique has > been used extensively in industry from online advertising to computer vision > and speech recognition. There exist many open source implementations of VR > algorithms, such as Facebook's FAISS and Microsoft's SPTAG, providing various > choices for potential users. However, the aforementioned implementations are > all written in C++, and no plan for supporting Java interface, making it hard > to be integrated in Java projects or those who are not familier with C/C++ > [[https://github.com/facebookresearch/faiss/issues/105]]. > The algorithms for vector retrieval can be roughly
[jira] [Created] (LUCENE-9219) Port ECJ-based linter to gradle
Dawid Weiss created LUCENE-9219: --- Summary: Port ECJ-based linter to gradle Key: LUCENE-9219 URL: https://issues.apache.org/jira/browse/LUCENE-9219 Project: Lucene - Core Issue Type: Sub-task Reporter: Dawid Weiss -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on issue #1242: LUCENE-9201: Port documentation-lint task to Gradle build
dweiss commented on issue #1242: LUCENE-9201: Port documentation-lint task to Gradle build URL: https://github.com/apache/lucene-solr/pull/1242#issuecomment-585103020 Ok, sure thing. I'll create a sub-task on this issue and maybe try to push the ecj linter forward so that it is there as an example to copy from. Many things in gradle are not so obvious (although they are fairly clear once you soak in the basic concepts). If you have doubts or questions about how the code in the patch works please don't hesitate to ask. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mocobeta closed pull request #1242: LUCENE-9201: Port documentation-lint task to Gradle build
mocobeta closed pull request #1242: LUCENE-9201: Port documentation-lint task to Gradle build URL: https://github.com/apache/lucene-solr/pull/1242 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mocobeta commented on issue #1242: LUCENE-9201: Port documentation-lint task to Gradle build
mocobeta commented on issue #1242: LUCENE-9201: Port documentation-lint task to Gradle build URL: https://github.com/apache/lucene-solr/pull/1242#issuecomment-585101615 @dweiss thank you for your comments and the patch in Jira. Let me close this PR for now and open another ones (in a week or so), to narrow down the scopes. - source code linting by ECJ (unused imports check) - documentation linting by the python checkers: this may be further split into - "missing javadoc check" (defined at each sub-project) and - "broken links check" (defined at the root project to check inter-project links) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] iverase commented on issue #1249: LUCENE-9217: Add validation to XYGeometries
iverase commented on issue #1249: LUCENE-9217: Add validation to XYGeometries URL: https://github.com/apache/lucene-solr/pull/1249#issuecomment-585091231 I have opened #1252 that should supersede this one. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] iverase opened a new pull request #1252: LUCENE-9218: XYGeoemtries should expose values as floats
iverase opened a new pull request #1252: LUCENE-9218: XYGeoemtries should expose values as floats URL: https://github.com/apache/lucene-solr/pull/1252 Boxing the values to doubles happen when creating component2D objects. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9218) XYGeometries should use floats instead of doubles
Ignacio Vera created LUCENE-9218: Summary: XYGeometries should use floats instead of doubles Key: LUCENE-9218 URL: https://issues.apache.org/jira/browse/LUCENE-9218 Project: Lucene - Core Issue Type: Improvement Reporter: Ignacio Vera XYGeometries (XYPolygon, XYLine, XYRectangle & XYPoint) are a bit counter-intuitive. Where most of them are initialised using floats, when returning those values, they are returned as doubles. In addition XYRectangle seems to work on doubles. In this issue it is proposed to harmonise those classes to only work on floats. As these classes were just move to core and they have not been released, it should be ok to change its interfaces. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org