Re: hybrid document routing
Sounds like complex ACLs based on group memberships that use graph queries ? that would require local ACL's... On Mon, Aug 10, 2020 at 5:56 PM Ishan Chattopadhyaya < ichattopadhy...@gmail.com> wrote: > This seems like an XY problem. Would it be possible to describe the > original problem that led you to this solution (in the prototype)? Also, do > you think folks at solr-users@ list would have more ideas related to this > usecase and cross posting there would help? > > On Tue, 11 Aug, 2020, 1:43 am David Smiley, wrote: > >> Are you sure you need the docs in the same shard when maybe you could >> assume a core exists on each node and then do a query-time join? >> >> ~ David Smiley >> Apache Lucene/Solr Search Developer >> http://www.linkedin.com/in/davidwsmiley >> >> >> On Mon, Aug 10, 2020 at 2:34 PM Joel Bernstein >> wrote: >> >>> I have a situation where I'd like to have the standard compositeId >>> router in place for a collection. But, I'd like certain documents (ACL >>> documents) to be duplicated on each shard in the collection. To achieve the >>> level of access control performance and scalability I'm looking for I need >>> the ACL records to be in the same core as the main documents. >>> >>> I put together a prototype where the compositeId router accepted >>> implicit routing parameters and it worked in my testing. Before I open a >>> ticket suggesting this approach I wonder what other people thought the best >>> approach would be to accomplish this goal. >>> >>> >>> -- http://www.needhamsoftware.com (work) http://www.the111shift.com (play)
Re: SOLR-13412 (Make the Lucene Luke module available from a Solr distribution)
See comments on the JIRA. Short form: let’s not do this. > On Aug 10, 2020, at 7:47 PM, Tomoko Uchida > wrote: > > Thanks David, for the information. I agree with Luke - a GUI app which needs > Window system - is not inherently suited to a Server application. > > > if Docker could run GUI apps > This reminds me an elasticsearch user once notified us he/she worked on > Dockernized Luke. I refused to merge it at that time (the integration had > just been ongoing then), but we could revisit it. > https://github.com/DmitryKey/luke/issues/162 > > There may be a few options to materialize the goal... the most natural > direction is, I think, to improve LukeRequestHandler ;) > Or, CUI application might be more suitable for some situations > (https://github.com/javasoze/clue) ? > Until we find somewhat sensible ways, I am totally fine with the current way > of doing, just download Lucene package and use it. > > Tomoko > > > 2020年8月11日(火) 5:37 David Smiley : > There's a decent tutorial here: https://sematext.com/blog/solr-plugins-system/ > But it's unclear if a standalone tool like Luke is really sensible as a Solr > "plug-in" because it does not "plug-in" to Solr; it does not live within Solr > in any way. > > It'd be interesting if Docker could run GUI apps or if somehow Luke could run > as an Applet or something. Or maybe "java web start" but I thought that > technology might be dead. > > ~ David Smiley > Apache Lucene/Solr Search Developer > http://www.linkedin.com/in/davidwsmiley > > > On Sun, Aug 9, 2020 at 8:57 PM Tomoko Uchida > wrote: > I don't know anything about Solr packages, is there any guide for plugin > developers / maintainers? Also, maybe an official host server for the plugin > is needed? > In general Luke is just an ordinary JAR, all you need is downloading the > correct version of it and setting the right classpaths. > If there is proper documentation and others think it's somewhat beneficial > that solr has the "Luke plugin", I'd be happy to add it my todo list (or it'd > be perfectly fit for "newdevs", I think). > > Tomoko > > > 2020年8月10日(月) 4:39 Erick Erickson : > Tomoko: > > Indeed, this is what is behind my question about whether it should be a > package for Solr rather than something in the standard distro. The more I > think about this, it’s hard to justify it being part of the standard distro > rather than a package given that some people find it _very_ useful, but I’d > bet that most Solr users don’t even know it exists... > > Which means I’ll have to actually _understand_ the package infrastructure… > Something about an old dog and new tricks. Siiigggh… > > Best, > Erick > > > On Aug 9, 2020, at 2:32 PM, Tomoko Uchida > > wrote: > > > > LUCENE-9448 > > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: SOLR-13412 (Make the Lucene Luke module available from a Solr distribution)
Thanks David, for the information. I agree with Luke - a GUI app which needs Window system - is not inherently suited to a Server application. > if Docker could run GUI apps This reminds me an elasticsearch user once notified us he/she worked on Dockernized Luke. I refused to merge it at that time (the integration had just been ongoing then), but we could revisit it. https://github.com/DmitryKey/luke/issues/162 There may be a few options to materialize the goal... the most natural direction is, I think, to improve LukeRequestHandler ;) Or, CUI application might be more suitable for some situations ( https://github.com/javasoze/clue) ? Until we find somewhat sensible ways, I am totally fine with the current way of doing, just download Lucene package and use it. Tomoko 2020年8月11日(火) 5:37 David Smiley : > There's a decent tutorial here: > https://sematext.com/blog/solr-plugins-system/ > But it's unclear if a standalone tool like Luke is really sensible as a > Solr "plug-in" because it does not "plug-in" to Solr; it does not live > within Solr in any way. > > It'd be interesting if Docker could run GUI apps or if somehow Luke could > run as an Applet or something. Or maybe "java web start" but I thought > that technology might be dead. > > ~ David Smiley > Apache Lucene/Solr Search Developer > http://www.linkedin.com/in/davidwsmiley > > > On Sun, Aug 9, 2020 at 8:57 PM Tomoko Uchida > wrote: > >> I don't know anything about Solr packages, is there any guide for plugin >> developers / maintainers? Also, maybe an official host server for the >> plugin is needed? >> In general Luke is just an ordinary JAR, all you need is downloading the >> correct version of it and setting the right classpaths. >> If there is proper documentation and others think it's somewhat >> beneficial that solr has the "Luke plugin", I'd be happy to add it my todo >> list (or it'd be perfectly fit for "newdevs", I think). >> >> Tomoko >> >> >> 2020年8月10日(月) 4:39 Erick Erickson : >> >>> Tomoko: >>> >>> Indeed, this is what is behind my question about whether it should be a >>> package for Solr rather than something in the standard distro. The more I >>> think about this, it’s hard to justify it being part of the standard distro >>> rather than a package given that some people find it _very_ useful, but I’d >>> bet that most Solr users don’t even know it exists... >>> >>> Which means I’ll have to actually _understand_ the package >>> infrastructure… Something about an old dog and new tricks. Siiigggh… >>> >>> Best, >>> Erick >>> >>> > On Aug 9, 2020, at 2:32 PM, Tomoko Uchida < >>> tomoko.uchida.1...@gmail.com> wrote: >>> > >>> > LUCENE-9448 >>> >>> >>> - >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: dev-h...@lucene.apache.org >>> >>>
Performance in Solr 9 / Java 11
In my IDE, I have a few profiling tools that I bounce between that I started using in my work at Lucidworks but I continue to use in my current work today. I have suspicions that there may be some performance improvements in Java 11 that we can exploit further. I'm curious as to if there has been any investigation, possibly Mark Miller or @u...@thetaphi.de , into performance improvements specific to the newer version of Java in Master? There are some obvious ones that we get for free, like a better GC, but curious as to prior work in this area before publishing anything that might be redundant or irrelevant. Best, -- Marcus Eagan
Re: When zero offsets are not bad - a.k.a. multi-token synonyms yet again
oh,thanks! that saves everybody some time. I have commented in there, pleading to be allowed to do something - if that proposal sounds even little bit reasonable, please consider amplifying the signal On Mon, Aug 10, 2020 at 4:22 PM David Smiley wrote: > > There already is one: https://issues.apache.org/jira/browse/LUCENE-8776 > > ~ David Smiley > Apache Lucene/Solr Search Developer > http://www.linkedin.com/in/davidwsmiley > > > On Mon, Aug 10, 2020 at 1:30 PM Roman Chyla wrote: >> >> I'll have to somehow find a solution for this situation, giving up >> offsets seems like too big a price to pay, I see that overriding >> DefaultIndexingChain is not exactly easy -- the only thing I can think >> of is to just trick the classloader into giving it a different version >> of the chain (praying this can be done without compromising security, >> I have not followed JDK evolutions for some time...) - aside from >> forking lucene and editing that; which I decidedly don't want to do >> (monkey-patching it, ok, i can live with that... :-)) >> >> It *seems* to me that the original reason for negative offset checks >> stemmed from the fact that vint could have been written (and possibly >> vlong too) - https://issues.apache.org/jira/browse/LUCENE-3738 >> >> but the underlying issue and some of the patches seem to have been >> addressing those problems; but a much shorter version of the patch was >> committed -- despite the perf results not being indicative (i.e. it >> could have been good with the longer patch) -- but to really >> understand it, one would have to spend more than 10mins reading the >> comments >> >> Further to the point, I think negative offsets can be produced only on >> the very first token, unless there is a bug in a filter (there was/is >> a separate check for that in 6x and perhaps it is still there in 7x). >> That would be much less restrictive than the current condition which >> disallows all backward offsets. We never ran into an index corruption >> in lucene 4-6x, so I really wonder if the "forbid all backwards >> offsets" approach might be too restrictive. >> >> Looks like I should create an issue... >> >> On Thu, Aug 6, 2020 at 11:28 AM Gus Heck wrote: >> > >> > I've had a nearly identical experience to what Dave describes, I also >> > chafe under this restriction. >> > >> > On Thu, Aug 6, 2020 at 11:07 AM David Smiley wrote: >> >> >> >> I sympathize with your pain, Roman. >> >> >> >> It appears we can't really do index-time multi-word synonyms because of >> >> the offset ordering rule. But it's not just synonyms, it's other forms >> >> of multi-token expansion. Where I work, I've seen an interesting >> >> approach to mixed language text analysis in which a sophisticated >> >> Tokenizer effectively re-tokenizes an input multiple ways by producing a >> >> token stream that is a concatenation of different interpretations of the >> >> input. On a Lucene upgrade, we had to "coarsen" the offsets to the point >> >> of having highlights that point to a whole sentence instead of the words >> >> in that sentence :-(. I need to do something to fix this; I'm trying >> >> hard to resist modifying our Lucene fork for this constraint. Maybe >> >> instead of concatenating, it might be interleaved / overlapped but the >> >> interpretations aren't necessarily aligned to make this possible without >> >> risking breaking position-sensitive queries. >> >> >> >> So... I'm not a fan of this constraint on offsets. >> >> >> >> ~ David Smiley >> >> Apache Lucene/Solr Search Developer >> >> http://www.linkedin.com/in/davidwsmiley >> >> >> >> >> >> On Thu, Aug 6, 2020 at 10:49 AM Roman Chyla wrote: >> >>> >> >>> Hi Mike, >> >>> >> >>> Yes, they are not zero offsets - I was instinctively avoiding >> >>> "negative offsets"; but they are indeed backward offsets. >> >>> >> >>> Here is the token stream as produced by the analyzer chain indexing >> >>> "THE HUBBLE constant: a summary of the hubble space telescope program" >> >>> >> >>> term=hubble pos=2 type=word offsetStart=4 offsetEnd=10 >> >>> term=acr::hubble pos=0 type=ACRONYM offsetStart=4 offsetEnd=10 >> >>> term=constant pos=1 type=word offsetStart=11 offsetEnd=20 >> >>> term=summary pos=1 type=word offsetStart=23 offsetEnd=30 >> >>> term=hubble pos=1 type=word offsetStart=38 offsetEnd=44 >> >>> term=syn::hubble space telescope pos=0 type=SYNONYM offsetStart=38 >> >>> offsetEnd=60 >> >>> term=syn::hst pos=0 type=SYNONYM offsetStart=38 offsetEnd=60 >> >>> term=acr::hst pos=0 type=ACRONYM offsetStart=38 offsetEnd=60 >> >>> term=space pos=1 type=word offsetStart=45 offsetEnd=50 >> >>> term=telescope pos=1 type=word offsetStart=51 offsetEnd=60 >> >>> term=program pos=1 type=word offsetStart=61 offsetEnd=68 >> >>> >> >>> Sometimes, we'll even have a situation when synonyms overlap: for >> >>> example "anti de sitter space time" >> >>> >> >>> "anti de sitter space time" -> "antidesitter space" (one token >> >>> spanning offsets 0-26; it gets emitted
Re: hybrid document routing
This seems like an XY problem. Would it be possible to describe the original problem that led you to this solution (in the prototype)? Also, do you think folks at solr-users@ list would have more ideas related to this usecase and cross posting there would help? On Tue, 11 Aug, 2020, 1:43 am David Smiley, wrote: > Are you sure you need the docs in the same shard when maybe you could > assume a core exists on each node and then do a query-time join? > > ~ David Smiley > Apache Lucene/Solr Search Developer > http://www.linkedin.com/in/davidwsmiley > > > On Mon, Aug 10, 2020 at 2:34 PM Joel Bernstein wrote: > >> I have a situation where I'd like to have the standard compositeId router >> in place for a collection. But, I'd like certain documents (ACL documents) >> to be duplicated on each shard in the collection. To achieve the level of >> access control performance and scalability I'm looking for I need the ACL >> records to be in the same core as the main documents. >> >> I put together a prototype where the compositeId router accepted implicit >> routing parameters and it worked in my testing. Before I open a ticket >> suggesting this approach I wonder what other people thought the best >> approach would be to accomplish this goal. >> >> >>
Re: SOLR-13412 (Make the Lucene Luke module available from a Solr distribution)
There's a decent tutorial here: https://sematext.com/blog/solr-plugins-system/ But it's unclear if a standalone tool like Luke is really sensible as a Solr "plug-in" because it does not "plug-in" to Solr; it does not live within Solr in any way. It'd be interesting if Docker could run GUI apps or if somehow Luke could run as an Applet or something. Or maybe "java web start" but I thought that technology might be dead. ~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley On Sun, Aug 9, 2020 at 8:57 PM Tomoko Uchida wrote: > I don't know anything about Solr packages, is there any guide for plugin > developers / maintainers? Also, maybe an official host server for the > plugin is needed? > In general Luke is just an ordinary JAR, all you need is downloading the > correct version of it and setting the right classpaths. > If there is proper documentation and others think it's somewhat beneficial > that solr has the "Luke plugin", I'd be happy to add it my todo list (or > it'd be perfectly fit for "newdevs", I think). > > Tomoko > > > 2020年8月10日(月) 4:39 Erick Erickson : > >> Tomoko: >> >> Indeed, this is what is behind my question about whether it should be a >> package for Solr rather than something in the standard distro. The more I >> think about this, it’s hard to justify it being part of the standard distro >> rather than a package given that some people find it _very_ useful, but I’d >> bet that most Solr users don’t even know it exists... >> >> Which means I’ll have to actually _understand_ the package >> infrastructure… Something about an old dog and new tricks. Siiigggh… >> >> Best, >> Erick >> >> > On Aug 9, 2020, at 2:32 PM, Tomoko Uchida >> wrote: >> > >> > LUCENE-9448 >> >> >> - >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> >>
Re: When zero offsets are not bad - a.k.a. multi-token synonyms yet again
There already is one: https://issues.apache.org/jira/browse/LUCENE-8776 ~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley On Mon, Aug 10, 2020 at 1:30 PM Roman Chyla wrote: > I'll have to somehow find a solution for this situation, giving up > offsets seems like too big a price to pay, I see that overriding > DefaultIndexingChain is not exactly easy -- the only thing I can think > of is to just trick the classloader into giving it a different version > of the chain (praying this can be done without compromising security, > I have not followed JDK evolutions for some time...) - aside from > forking lucene and editing that; which I decidedly don't want to do > (monkey-patching it, ok, i can live with that... :-)) > > It *seems* to me that the original reason for negative offset checks > stemmed from the fact that vint could have been written (and possibly > vlong too) - https://issues.apache.org/jira/browse/LUCENE-3738 > > but the underlying issue and some of the patches seem to have been > addressing those problems; but a much shorter version of the patch was > committed -- despite the perf results not being indicative (i.e. it > could have been good with the longer patch) -- but to really > understand it, one would have to spend more than 10mins reading the > comments > > Further to the point, I think negative offsets can be produced only on > the very first token, unless there is a bug in a filter (there was/is > a separate check for that in 6x and perhaps it is still there in 7x). > That would be much less restrictive than the current condition which > disallows all backward offsets. We never ran into an index corruption > in lucene 4-6x, so I really wonder if the "forbid all backwards > offsets" approach might be too restrictive. > > Looks like I should create an issue... > > On Thu, Aug 6, 2020 at 11:28 AM Gus Heck wrote: > > > > I've had a nearly identical experience to what Dave describes, I also > chafe under this restriction. > > > > On Thu, Aug 6, 2020 at 11:07 AM David Smiley wrote: > >> > >> I sympathize with your pain, Roman. > >> > >> It appears we can't really do index-time multi-word synonyms because of > the offset ordering rule. But it's not just synonyms, it's other forms of > multi-token expansion. Where I work, I've seen an interesting approach to > mixed language text analysis in which a sophisticated Tokenizer effectively > re-tokenizes an input multiple ways by producing a token stream that is a > concatenation of different interpretations of the input. On a Lucene > upgrade, we had to "coarsen" the offsets to the point of having highlights > that point to a whole sentence instead of the words in that sentence :-(. > I need to do something to fix this; I'm trying hard to resist modifying our > Lucene fork for this constraint. Maybe instead of concatenating, it might > be interleaved / overlapped but the interpretations aren't necessarily > aligned to make this possible without risking breaking position-sensitive > queries. > >> > >> So... I'm not a fan of this constraint on offsets. > >> > >> ~ David Smiley > >> Apache Lucene/Solr Search Developer > >> http://www.linkedin.com/in/davidwsmiley > >> > >> > >> On Thu, Aug 6, 2020 at 10:49 AM Roman Chyla > wrote: > >>> > >>> Hi Mike, > >>> > >>> Yes, they are not zero offsets - I was instinctively avoiding > >>> "negative offsets"; but they are indeed backward offsets. > >>> > >>> Here is the token stream as produced by the analyzer chain indexing > >>> "THE HUBBLE constant: a summary of the hubble space telescope program" > >>> > >>> term=hubble pos=2 type=word offsetStart=4 offsetEnd=10 > >>> term=acr::hubble pos=0 type=ACRONYM offsetStart=4 offsetEnd=10 > >>> term=constant pos=1 type=word offsetStart=11 offsetEnd=20 > >>> term=summary pos=1 type=word offsetStart=23 offsetEnd=30 > >>> term=hubble pos=1 type=word offsetStart=38 offsetEnd=44 > >>> term=syn::hubble space telescope pos=0 type=SYNONYM offsetStart=38 > offsetEnd=60 > >>> term=syn::hst pos=0 type=SYNONYM offsetStart=38 offsetEnd=60 > >>> term=acr::hst pos=0 type=ACRONYM offsetStart=38 offsetEnd=60 > >>> term=space pos=1 type=word offsetStart=45 offsetEnd=50 > >>> term=telescope pos=1 type=word offsetStart=51 offsetEnd=60 > >>> term=program pos=1 type=word offsetStart=61 offsetEnd=68 > >>> > >>> Sometimes, we'll even have a situation when synonyms overlap: for > >>> example "anti de sitter space time" > >>> > >>> "anti de sitter space time" -> "antidesitter space" (one token > >>> spanning offsets 0-26; it gets emitted with the first token "anti" > >>> right now) > >>> "space time" -> "spacetime" (synonym 16-26) > >>> "space" -> "universe" (25-26) > >>> > >>> Yes, weird, but useful if people want to search for `universe NEAR > >>> anti` -- but another usecase which would be prohibited by the "new" > >>> rule. > >>> > >>> DefaultIndexingChain checks new token offset against the last emitted > >>> token, so I don't see a way to
Re: hybrid document routing
Are you sure you need the docs in the same shard when maybe you could assume a core exists on each node and then do a query-time join? ~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley On Mon, Aug 10, 2020 at 2:34 PM Joel Bernstein wrote: > I have a situation where I'd like to have the standard compositeId router > in place for a collection. But, I'd like certain documents (ACL documents) > to be duplicated on each shard in the collection. To achieve the level of > access control performance and scalability I'm looking for I need the ACL > records to be in the same core as the main documents. > > I put together a prototype where the compositeId router accepted implicit > routing parameters and it worked in my testing. Before I open a ticket > suggesting this approach I wonder what other people thought the best > approach would be to accomplish this goal. > > >
[VOTE] Release Lucene/Solr 8.6.1 RC2
Please vote for release candidate 2 for Lucene/Solr 8.6.1 The artifacts can be downloaded from: https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.6.1-RC2-rev6e11a1c3f0599f1c918bc69c4f51928d23160e99 You can run the smoke tester directly with this command: python3 -u dev-tools/scripts/smokeTestRelease.py \ https://dist.apache.org/repos/dist/dev/lucene/lucene-solr-8.6.1-RC2-rev6e11a1c3f0599f1c918bc69c4f51928d23160e99 The vote will be open for at least 72 hours i.e. until 2020-08-13 20:00 UTC. [ ] +1 approve [ ] +0 no opinion [ ] -1 disapprove (and reason why) Here is my +1
hybrid document routing
I have a situation where I'd like to have the standard compositeId router in place for a collection. But, I'd like certain documents (ACL documents) to be duplicated on each shard in the collection. To achieve the level of access control performance and scalability I'm looking for I need the ACL records to be in the same core as the main documents. I put together a prototype where the compositeId router accepted implicit routing parameters and it worked in my testing. Before I open a ticket suggesting this approach I wonder what other people thought the best approach would be to accomplish this goal.
Re: Badapple report
OK, thanks. I’m not really annotating things at this point, although occasionally removing some that haven’t failed in a long time. > On Aug 10, 2020, at 1:44 PM, Tomás Fernández Löbbe > wrote: > > Hi Erick, > I've introduced and later fixed a bug in TestConfig. It hasn't failed since, > so please don't annotate it. > > On Mon, Aug 10, 2020 at 7:47 AM Erick Erickson > wrote: > We’re backsliding some. I encourage people to look at: > http://fucit.org/solr-jenkins-reports/failure-report.html, we have a number > of ill-behaved tests, particularly TestRequestRateLimiter, > TestBulkSchemaConcurrent, TestConfig, SchemaApiFailureTest and > TestIndexingSequenceNumbers… > > > Raw fail count by week totals, most recent week first (corresponds to bits): > Week: 0 had 100 failures > Week: 1 had 82 failures > Week: 2 had 94 failures > Week: 3 had 502 failures > > > Failures in Hoss' reports for the last 4 rollups. > > There were 585 unannotated tests that failed in Hoss' rollups. Ordered by the > date I downloaded the rollup file, newest->oldest. See above for the dates > the files were collected > These tests were NOT BadApple'd or AwaitsFix'd > > Failures in the last 4 reports.. >Report Pct runsfails test > 0123 4.4 1583 37 BasicDistributedZkTest.test > 0123 4.3 1727 77 CloudExitableDirectoryReaderTest.test > 0123 2.5 8598248 > CloudExitableDirectoryReaderTest.testCreepThenBite > 0123 1.9 1712 36 > CloudExitableDirectoryReaderTest.testWhitebox > 0123 0.5 1587 11 > DocValuesNotIndexedTest.testGroupingDVOnlySortLast > 0123 2.2 1679 82 HttpPartitionOnCommitTest.test > 0123 0.5 1592 16 HttpPartitionTest.test > 0123 1.0 1578 9 HttpPartitionWithTlogReplicasTest.test > 0123 1.3 1569 13 LeaderFailoverAfterPartitionTest.test > 0123 7.4 1643 59 MultiThreadedOCPTest.test > 0123 0.3 1567 8 ReplaceNodeTest.test > 0123 0.2 1588 6 ShardSplitTest.testSplitShardWithRule > 0123 100.0 38 33 SharedFSAutoReplicaFailoverTest.test > 0123 2.1 818 19 > TestCircuitBreaker.testBuildingMemoryPressure > 0123 2.6 818 13 > TestCircuitBreaker.testResponseWithCBTiming > 0123 6.2 1848104 TestContainerPlugin.testApiFromPackage > 0123 2.5 1662 33 TestDistributedGrouping.test > 0123 0.4 1448 6 TestDynamicLoading.testDynamicLoading > 0123 6.4 1614 74 TestExportWriter.testExpr > 0123 8.6 1356 70 TestHdfsCloudBackupRestore.test > 0123 9.1 1697136 TestLocalFSCloudBackupRestore.test > 0123 0.5 1607 26 TestPackages.testPluginLoading > 0123 0.7 1596 15 > TestQueryingOnDownCollection.testQueryToDownCollectionShouldFailFast > 0123 1.5 1610 59 TestReRankQParserPlugin.testMinExactCount > 0123 0.3 1552 4 TestReplicaProperties.test > 0123 0.3 1556 5 > TestSolrCloudWithDelegationTokens.testDelegationTokenRenew > 0123 0.3 1565 9 TestSolrConfigHandlerCloud.test > > > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Badapple report
Hi Erick, I've introduced and later fixed a bug in TestConfig. It hasn't failed since, so please don't annotate it. On Mon, Aug 10, 2020 at 7:47 AM Erick Erickson wrote: > We’re backsliding some. I encourage people to look at: > http://fucit.org/solr-jenkins-reports/failure-report.html, we have a > number of ill-behaved tests, particularly TestRequestRateLimiter, > TestBulkSchemaConcurrent, TestConfig, SchemaApiFailureTest and > TestIndexingSequenceNumbers… > > > Raw fail count by week totals, most recent week first (corresponds to > bits): > Week: 0 had 100 failures > Week: 1 had 82 failures > Week: 2 had 94 failures > Week: 3 had 502 failures > > > Failures in Hoss' reports for the last 4 rollups. > > There were 585 unannotated tests that failed in Hoss' rollups. Ordered by > the date I downloaded the rollup file, newest->oldest. See above for the > dates the files were collected > These tests were NOT BadApple'd or AwaitsFix'd > > Failures in the last 4 reports.. >Report Pct runsfails test > 0123 4.4 1583 37 BasicDistributedZkTest.test > 0123 4.3 1727 77 CloudExitableDirectoryReaderTest.test > 0123 2.5 8598248 > CloudExitableDirectoryReaderTest.testCreepThenBite > 0123 1.9 1712 36 > CloudExitableDirectoryReaderTest.testWhitebox > 0123 0.5 1587 11 > DocValuesNotIndexedTest.testGroupingDVOnlySortLast > 0123 2.2 1679 82 HttpPartitionOnCommitTest.test > 0123 0.5 1592 16 HttpPartitionTest.test > 0123 1.0 1578 9 HttpPartitionWithTlogReplicasTest.test > 0123 1.3 1569 13 LeaderFailoverAfterPartitionTest.test > 0123 7.4 1643 59 MultiThreadedOCPTest.test > 0123 0.3 1567 8 ReplaceNodeTest.test > 0123 0.2 1588 6 ShardSplitTest.testSplitShardWithRule > 0123 100.0 38 33 SharedFSAutoReplicaFailoverTest.test > 0123 2.1 818 19 > TestCircuitBreaker.testBuildingMemoryPressure > 0123 2.6 818 13 > TestCircuitBreaker.testResponseWithCBTiming > 0123 6.2 1848104 TestContainerPlugin.testApiFromPackage > 0123 2.5 1662 33 TestDistributedGrouping.test > 0123 0.4 1448 6 TestDynamicLoading.testDynamicLoading > 0123 6.4 1614 74 TestExportWriter.testExpr > 0123 8.6 1356 70 TestHdfsCloudBackupRestore.test > 0123 9.1 1697136 TestLocalFSCloudBackupRestore.test > 0123 0.5 1607 26 TestPackages.testPluginLoading > 0123 0.7 1596 15 > TestQueryingOnDownCollection.testQueryToDownCollectionShouldFailFast > 0123 1.5 1610 59 > TestReRankQParserPlugin.testMinExactCount > 0123 0.3 1552 4 TestReplicaProperties.test > 0123 0.3 1556 5 > TestSolrCloudWithDelegationTokens.testDelegationTokenRenew > 0123 0.3 1565 9 TestSolrConfigHandlerCloud.test > > > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org
Re: When zero offsets are not bad - a.k.a. multi-token synonyms yet again
I'll have to somehow find a solution for this situation, giving up offsets seems like too big a price to pay, I see that overriding DefaultIndexingChain is not exactly easy -- the only thing I can think of is to just trick the classloader into giving it a different version of the chain (praying this can be done without compromising security, I have not followed JDK evolutions for some time...) - aside from forking lucene and editing that; which I decidedly don't want to do (monkey-patching it, ok, i can live with that... :-)) It *seems* to me that the original reason for negative offset checks stemmed from the fact that vint could have been written (and possibly vlong too) - https://issues.apache.org/jira/browse/LUCENE-3738 but the underlying issue and some of the patches seem to have been addressing those problems; but a much shorter version of the patch was committed -- despite the perf results not being indicative (i.e. it could have been good with the longer patch) -- but to really understand it, one would have to spend more than 10mins reading the comments Further to the point, I think negative offsets can be produced only on the very first token, unless there is a bug in a filter (there was/is a separate check for that in 6x and perhaps it is still there in 7x). That would be much less restrictive than the current condition which disallows all backward offsets. We never ran into an index corruption in lucene 4-6x, so I really wonder if the "forbid all backwards offsets" approach might be too restrictive. Looks like I should create an issue... On Thu, Aug 6, 2020 at 11:28 AM Gus Heck wrote: > > I've had a nearly identical experience to what Dave describes, I also chafe > under this restriction. > > On Thu, Aug 6, 2020 at 11:07 AM David Smiley wrote: >> >> I sympathize with your pain, Roman. >> >> It appears we can't really do index-time multi-word synonyms because of the >> offset ordering rule. But it's not just synonyms, it's other forms of >> multi-token expansion. Where I work, I've seen an interesting approach to >> mixed language text analysis in which a sophisticated Tokenizer effectively >> re-tokenizes an input multiple ways by producing a token stream that is a >> concatenation of different interpretations of the input. On a Lucene >> upgrade, we had to "coarsen" the offsets to the point of having highlights >> that point to a whole sentence instead of the words in that sentence :-(. I >> need to do something to fix this; I'm trying hard to resist modifying our >> Lucene fork for this constraint. Maybe instead of concatenating, it might >> be interleaved / overlapped but the interpretations aren't necessarily >> aligned to make this possible without risking breaking position-sensitive >> queries. >> >> So... I'm not a fan of this constraint on offsets. >> >> ~ David Smiley >> Apache Lucene/Solr Search Developer >> http://www.linkedin.com/in/davidwsmiley >> >> >> On Thu, Aug 6, 2020 at 10:49 AM Roman Chyla wrote: >>> >>> Hi Mike, >>> >>> Yes, they are not zero offsets - I was instinctively avoiding >>> "negative offsets"; but they are indeed backward offsets. >>> >>> Here is the token stream as produced by the analyzer chain indexing >>> "THE HUBBLE constant: a summary of the hubble space telescope program" >>> >>> term=hubble pos=2 type=word offsetStart=4 offsetEnd=10 >>> term=acr::hubble pos=0 type=ACRONYM offsetStart=4 offsetEnd=10 >>> term=constant pos=1 type=word offsetStart=11 offsetEnd=20 >>> term=summary pos=1 type=word offsetStart=23 offsetEnd=30 >>> term=hubble pos=1 type=word offsetStart=38 offsetEnd=44 >>> term=syn::hubble space telescope pos=0 type=SYNONYM offsetStart=38 >>> offsetEnd=60 >>> term=syn::hst pos=0 type=SYNONYM offsetStart=38 offsetEnd=60 >>> term=acr::hst pos=0 type=ACRONYM offsetStart=38 offsetEnd=60 >>> term=space pos=1 type=word offsetStart=45 offsetEnd=50 >>> term=telescope pos=1 type=word offsetStart=51 offsetEnd=60 >>> term=program pos=1 type=word offsetStart=61 offsetEnd=68 >>> >>> Sometimes, we'll even have a situation when synonyms overlap: for >>> example "anti de sitter space time" >>> >>> "anti de sitter space time" -> "antidesitter space" (one token >>> spanning offsets 0-26; it gets emitted with the first token "anti" >>> right now) >>> "space time" -> "spacetime" (synonym 16-26) >>> "space" -> "universe" (25-26) >>> >>> Yes, weird, but useful if people want to search for `universe NEAR >>> anti` -- but another usecase which would be prohibited by the "new" >>> rule. >>> >>> DefaultIndexingChain checks new token offset against the last emitted >>> token, so I don't see a way to emit the multi-token synonym with >>> offsetts spanning multiple tokens if even one of these tokens was >>> already emitted. And the complement is equally true: if multi-token is >>> emitted as last of the group - it trips over `startOffset < >>> invertState.lastStartOffset` >>> >>> https://github.com/apache/lucene-solr/blame/master/lucene/core/src/jav
Badapple report
We’re backsliding some. I encourage people to look at: http://fucit.org/solr-jenkins-reports/failure-report.html, we have a number of ill-behaved tests, particularly TestRequestRateLimiter, TestBulkSchemaConcurrent, TestConfig, SchemaApiFailureTest and TestIndexingSequenceNumbers… Raw fail count by week totals, most recent week first (corresponds to bits): Week: 0 had 100 failures Week: 1 had 82 failures Week: 2 had 94 failures Week: 3 had 502 failures Failures in Hoss' reports for the last 4 rollups. There were 585 unannotated tests that failed in Hoss' rollups. Ordered by the date I downloaded the rollup file, newest->oldest. See above for the dates the files were collected These tests were NOT BadApple'd or AwaitsFix'd Failures in the last 4 reports.. Report Pct runsfails test 0123 4.4 1583 37 BasicDistributedZkTest.test 0123 4.3 1727 77 CloudExitableDirectoryReaderTest.test 0123 2.5 8598248 CloudExitableDirectoryReaderTest.testCreepThenBite 0123 1.9 1712 36 CloudExitableDirectoryReaderTest.testWhitebox 0123 0.5 1587 11 DocValuesNotIndexedTest.testGroupingDVOnlySortLast 0123 2.2 1679 82 HttpPartitionOnCommitTest.test 0123 0.5 1592 16 HttpPartitionTest.test 0123 1.0 1578 9 HttpPartitionWithTlogReplicasTest.test 0123 1.3 1569 13 LeaderFailoverAfterPartitionTest.test 0123 7.4 1643 59 MultiThreadedOCPTest.test 0123 0.3 1567 8 ReplaceNodeTest.test 0123 0.2 1588 6 ShardSplitTest.testSplitShardWithRule 0123 100.0 38 33 SharedFSAutoReplicaFailoverTest.test 0123 2.1 818 19 TestCircuitBreaker.testBuildingMemoryPressure 0123 2.6 818 13 TestCircuitBreaker.testResponseWithCBTiming 0123 6.2 1848104 TestContainerPlugin.testApiFromPackage 0123 2.5 1662 33 TestDistributedGrouping.test 0123 0.4 1448 6 TestDynamicLoading.testDynamicLoading 0123 6.4 1614 74 TestExportWriter.testExpr 0123 8.6 1356 70 TestHdfsCloudBackupRestore.test 0123 9.1 1697136 TestLocalFSCloudBackupRestore.test 0123 0.5 1607 26 TestPackages.testPluginLoading 0123 0.7 1596 15 TestQueryingOnDownCollection.testQueryToDownCollectionShouldFailFast 0123 1.5 1610 59 TestReRankQParserPlugin.testMinExactCount 0123 0.3 1552 4 TestReplicaProperties.test 0123 0.3 1556 5 TestSolrCloudWithDelegationTokens.testDelegationTokenRenew 0123 0.3 1565 9 TestSolrConfigHandlerCloud.test DO NOT ENABLE LIST: MoveReplicaHDFSTest.testFailedMove MoveReplicaHDFSTest.testNormalFailedMove TestControlledRealTimeReopenThread.testCRTReopen TestICUNormalizer2CharFilter.testRandomStrings TestICUTokenizerCJK TestImpersonationWithHadoopAuth.testForwarding TestLTRReRankingPipeline.testDifferentTopN TestRandomChains DO NOT ANNOTATE LIST CdcrBidirectionalTest.testBiDir IndexSizeTriggerTest.testMergeIntegration IndexSizeTriggerTest.testMixedBounds IndexSizeTriggerTest.testSplitIntegration IndexSizeTriggerTest.testTrigger InfixSuggestersTest.testShutdownDuringBuild ShardSplitTest.test ShardSplitTest.testSplitMixedReplicaTypes ShardSplitTest.testSplitWithChaosMonkey Test2BPostings.test TestLatLonShapeQueries.testRandomBig TestPackedInts.testPackedLongValues TestRandomChains.testRandomChainsWithLargeStrings TestTriggerIntegration.testSearchRate SuppressWarnings count: last week: 4,825, this week: 4,819, delta -6 *** Files with increased @SuppressWarnings annotations: Suppress count increase in: solr/core/src/java/org/apache/solr/handler/ReplicationHandler.java. Was: 13, now: 15 Suppress count increase in: solr/core/src/java/org/apache/solr/packagemanager/PackageManager.java. Was: 7, now: 8 Suppress count increase in: solr/core/src/test/org/apache/solr/core/TestSolrConfigHandler.java. Was: 14, now: 17 Suppress count increase in: solr/solrj/src/java/org/apache/solr/client/solrj/impl/HttpSolrClient.java. Was: 12, now: 13 *** Files with decreased @SuppressWarnings annotations: Suppress count decrease in: solr/core/src/java/org/apache/solr/core/PluginBag.java. Was: 6, now: 5 Processing file (History bit 3): HOSS-2020-08-10.csv Processing file (History bit 2): HOSS-2020-08-03.csv Processing file (History bit 1): HOSS-2020-07-27.csv Processing file (History bit 0): HOSS-2020-07-20.csv Number of AwaitsFix: 33 Number of BadApples: 4 **Annotated tests that didn't fail in the last 4 weeks. **Tests removed from the next two lists beca
SOLR-14714 (Solr.cmd in windows loads the incorrect jetty module when using java>=9)
Could someone with a Windows machine try the patch at SOLR-14714? I looked it over and LGTM with one nit: I would move the following up to before they’re actually used: set JAVA_MAJOR_VERSION=0 set JAVA_VERSION_INFO= set JAVA_BUILD=0 I don’t think it matters functionally, just a style thing. Seems a straightforward fix, I just can’t try it even without SSL ‘cause I don’t have a Windows machine. I’ll push it if someone can double check... - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org