[jira] [Commented] (SOLR-9830) Once IndexWriter is closed due to some RunTimeException like FileSystemException, It never return to normal unless restart the Solr JVM
[ https://issues.apache.org/jira/browse/SOLR-9830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044048#comment-17044048 ] Vinh Le commented on SOLR-9830: --- I've seen this error when requesting /metrics APIs in 7.3 also, and only disappear when restarting. > Once IndexWriter is closed due to some RunTimeException like > FileSystemException, It never return to normal unless restart the Solr JVM > --- > > Key: SOLR-9830 > URL: https://issues.apache.org/jira/browse/SOLR-9830 > Project: Solr > Issue Type: Bug > Components: update >Affects Versions: 6.2 > Environment: Red Hat 4.4.7-3,SolrCloud >Reporter: Daisy.Yuan >Priority: Major > > 1. Collection coll_test, has 9 shards, each has two replicas in different > solr instances. > 2. When update documens to the collection use Solrj, inject the exhausted > handle fault to one solr instance like solr1. > 3. Update to col_test_shard3_replica1(It's leader) is failed due to > FileSystemException, and IndexWriter is closed. > 4. And clear the fault, the col_test_shard3_replica1 (is leader) is always > cannot be updated documens and the numDocs is always less than the standby > replica. > 5. After Solr instance restart, It can update documens and the numDocs is > consistent between the two replicas. > I think in this case in Solr Cloud mode, it should recovery itself and not > restart to recovery the solrcore update function. > 2016-12-01 14:13:00,932 | INFO | http-nio-21101-exec-20 | > [DWPT][http-nio-21101-exec-20]: now abort | > org.apache.solr.update.LoggingInfoStream.message(LoggingInfoStream.java:34) > 2016-12-01 14:13:00,932 | INFO | http-nio-21101-exec-20 | > [DWPT][http-nio-21101-exec-20]: done abort | > org.apache.solr.update.LoggingInfoStream.message(LoggingInfoStream.java:34) > 2016-12-01 14:13:00,932 | INFO | http-nio-21101-exec-20 | > [IW][http-nio-21101-exec-20]: hit exception updating document | > org.apache.solr.update.LoggingInfoStream.message(LoggingInfoStream.java:34) > 2016-12-01 14:13:00,933 | INFO | http-nio-21101-exec-20 | > [IW][http-nio-21101-exec-20]: hit tragic FileSystemException inside > updateDocument | > org.apache.solr.update.LoggingInfoStream.message(LoggingInfoStream.java:34) > 2016-12-01 14:13:00,933 | INFO | http-nio-21101-exec-20 | > [IW][http-nio-21101-exec-20]: rollback | > org.apache.solr.update.LoggingInfoStream.message(LoggingInfoStream.java:34) > 2016-12-01 14:13:00,933 | INFO | http-nio-21101-exec-20 | > [IW][http-nio-21101-exec-20]: all running merges have aborted | > org.apache.solr.update.LoggingInfoStream.message(LoggingInfoStream.java:34) > 2016-12-01 14:13:00,934 | INFO | http-nio-21101-exec-20 | > [IW][http-nio-21101-exec-20]: rollback: done finish merges | > org.apache.solr.update.LoggingInfoStream.message(LoggingInfoStream.java:34) > 2016-12-01 14:13:00,934 | INFO | http-nio-21101-exec-20 | > [DW][http-nio-21101-exec-20]: abort | > org.apache.solr.update.LoggingInfoStream.message(LoggingInfoStream.java:34) > 2016-12-01 14:13:00,939 | INFO | commitScheduler-46-thread-1 | > [DWPT][commitScheduler-46-thread-1]: flush postings as segment _4h9 > numDocs=3798 | > org.apache.solr.update.LoggingInfoStream.message(LoggingInfoStream.java:34) > 2016-12-01 14:13:00,940 | INFO | commitScheduler-46-thread-1 | > [DWPT][commitScheduler-46-thread-1]: now abort | > org.apache.solr.update.LoggingInfoStream.message(LoggingInfoStream.java:34) > 2016-12-01 14:13:00,940 | INFO | commitScheduler-46-thread-1 | > [DWPT][commitScheduler-46-thread-1]: done abort | > org.apache.solr.update.LoggingInfoStream.message(LoggingInfoStream.java:34) > 2016-12-01 14:13:00,940 | INFO | http-nio-21101-exec-20 | > [DW][http-nio-21101-exec-20]: done abort success=true | > org.apache.solr.update.LoggingInfoStream.message(LoggingInfoStream.java:34) > 2016-12-01 14:13:00,940 | INFO | commitScheduler-46-thread-1 | > [DW][commitScheduler-46-thread-1]: commitScheduler-46-thread-1 > finishFullFlush success=false | > org.apache.solr.update.LoggingInfoStream.message(LoggingInfoStream.java:34) > 2016-12-01 14:13:00,940 | INFO | http-nio-21101-exec-20 | > [IW][http-nio-21101-exec-20]: rollback: > infos=_4g7(6.2.0):C59169/23684:delGen=4 _4gq(6.2.0):C67474/11636:delGen=1 > _4gg(6.2.0):C64067/15664:delGen=2 _4gr(6.2.0):C13131 _4gs(6.2.0):C966 > _4gt(6.2.0):C4543 _4gu(6.2.0):C6960 _4gv(6.2.0):C2544 | > org.apache.solr.update.LoggingInfoStream.message(LoggingInfoStream.java:34) > 2016-12-01 14:13:00,940 | INFO | commitScheduler-46-thread-1 | > [IW][commitScheduler-46-thread-1]: hit exception during NRT reader | > org.apache.solr.update.LoggingInfoStream.message(LoggingInfoStream.java:34) > 2016-12-01 14:
[jira] [Commented] (LUCENE-9212) Intervals.multiterm() should take a CompiledAutomaton
[ https://issues.apache.org/jira/browse/LUCENE-9212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044009#comment-17044009 ] David Smiley commented on LUCENE-9212: -- > Automatons can be defined in both binary and unicode space, and there's no >way of telling which it is when it comes to compiling them Isn't that a problem with our API -- more of a root cause? I've been bitten by the un-typed nature of byte vs char automatons. > Intervals.multiterm() should take a CompiledAutomaton > - > > Key: LUCENE-9212 > URL: https://issues.apache.org/jira/browse/LUCENE-9212 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Fix For: 8.5 > > Time Spent: 0.5h > Remaining Estimate: 0h > > LUCENE-9028 added a `multiterm` factory method for intervals that accepts an > arbitrary Automaton, and converts it internally into a CompiledAutomaton. > This isn't necessarily correct behaviour, however, because Automatons can be > defined in both binary and unicode space, and there's no way of telling which > it is when it comes to compiling them. In particular, for automatons > produced by FuzzyTermsEnum, we need to convert them to unicode before > compilation. > The `multiterm` factory should just take `CompiledAutomaton` directly, and we > should deprecate the methods that take `Automaton` and remove in master. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9227) Make page ready for pure HTTPS
[ https://issues.apache.org/jira/browse/LUCENE-9227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043985#comment-17043985 ] Uwe Schindler commented on LUCENE-9227: --- bq. Tested with browser and curl. The redirect works, but I know nothing about STS Thanks. STS is Strict Transport Security (https://en.wikipedia.org/wiki/HTTP_Strict_Transport_Security). It send a special HTTP header that instruts the browser to always use HTTPS for a domain. This lowers the risk that somebody intercepts the initial connection to the webserver with HTTP (users normally only enter the domain name making the browser use HTTP and get redirected to HTTPS). As the redirect is not secured, a bad guy could remove the redirect and serve (a modified) page. With HSTS the browser will (except for the very first access) use HTTPS forever, also when links use HTTP or user enters domain name without protocol. Basically, when you once sent this header you can no loger switch off HTTPS until the lifetime of this header. The recommendation is to send one year or more, but I initially added 300seconds for testing. It's now deployed also in production. I will raise to one year next weekend. > Make page ready for pure HTTPS > -- > > Key: LUCENE-9227 > URL: https://issues.apache.org/jira/browse/LUCENE-9227 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Uwe Schindler >Assignee: Uwe Schindler >Priority: Blocker > > The web page can currently be visited using HTTPS but this brings warning: > - Both search providers create a form that passes USER ENTERED INPUT using no > encryption. This is not allowed due to GDPR. We have to fix this asap. It > looks like [~otis] search is working with HTTPS (if we change domain name), > but the Lucidworks does not > - There were some CSS files loaded with HTTP (fonts from Google - this was > fixed) > Once those 2 problems are fixed (I grepped for HTTP and still found many > links with HTTP, but looks like no images or scripts or css anymore), I'd > like to add a permanent redirect http://lucene.apache.org/ -> > https://lucene.apache.org to the htaccess template file. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13910) Create security news feed on website with RSS/Atom feed
[ https://issues.apache.org/jira/browse/SOLR-13910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043981#comment-17043981 ] Uwe Schindler commented on SOLR-13910: -- I modified the styles a bit, now it's looking fine and is more flexible with responsive screen sizes. > Create security news feed on website with RSS/Atom feed > --- > > Key: SOLR-13910 > URL: https://issues.apache.org/jira/browse/SOLR-13910 > Project: Solr > Issue Type: Task > Components: website >Reporter: Adam Walz >Assignee: Jan Høydahl >Priority: Minor > Attachments: recent-security-ann.png, security-page-with-table.png, > security-page-with-table.png, solr-security-page.png > > Time Spent: 20m > Remaining Estimate: 0h > > From [~janhoy] > We're in the process of migrating our web site to Git and in that same > process we also change CMS from an ASF one to Pelican. The new site has > built-in support for news posts as individual files and also RSS feeds of > those. So I propose to add [https://lucene.apache.org/solr/security.html] > to the site, including a list of newest CVEs and an RSS/Atom feed to go > along with it. This way users have ONE place to visit to check security > announcements and they can monitor RSS to be alerted once we post a new > announcement. > We could also add RSS feeds for Lucene-core news and Solr-news sections > of course. > At the same time I propose that the news on the front-page > [lucene.apache.org|http://lucene.apache.org/] > is replaced with widgets that show the title only of the last 3 announcements > from Lucene, Solr and PyLucene sub projects. That front page is waaay > too long :) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1270: LUCENE-9237: Faster UniformSplit IntersectBlockReader.
dsmiley commented on a change in pull request #1270: LUCENE-9237: Faster UniformSplit IntersectBlockReader. URL: https://github.com/apache/lucene-solr/pull/1270#discussion_r383499305 ## File path: lucene/codecs/src/java/org/apache/lucene/codecs/uniformsplit/IntersectBlockReader.java ## @@ -18,260 +18,337 @@ package org.apache.lucene.codecs.uniformsplit; import java.io.IOException; -import java.util.Objects; +import java.util.Arrays; import org.apache.lucene.codecs.PostingsReaderBase; import org.apache.lucene.index.TermState; import org.apache.lucene.index.TermsEnum; import org.apache.lucene.store.IndexInput; +import org.apache.lucene.util.ArrayUtil; import org.apache.lucene.util.BytesRef; import org.apache.lucene.util.BytesRefBuilder; import org.apache.lucene.util.IntsRefBuilder; -import org.apache.lucene.util.StringHelper; import org.apache.lucene.util.automaton.Automaton; import org.apache.lucene.util.automaton.ByteRunAutomaton; import org.apache.lucene.util.automaton.CompiledAutomaton; -import org.apache.lucene.util.automaton.Operations; import org.apache.lucene.util.automaton.Transition; /** * The "intersect" {@link TermsEnum} response to {@link UniformSplitTerms#intersect(CompiledAutomaton, BytesRef)}, * intersecting the terms with an automaton. + * + * By design of the UniformSplit block keys, it is less efficient than + * {@code org.apache.lucene.codecs.blocktree.IntersectTermsEnum} for {@link org.apache.lucene.search.FuzzyQuery} (-37%). + * It is slightly slower for {@link org.apache.lucene.search.WildcardQuery} (-5%) and slightly faster for + * {@link org.apache.lucene.search.PrefixQuery} (+5%). + * + * @lucene.experimental */ public class IntersectBlockReader extends BlockReader { - protected final AutomatonNextTermCalculator nextStringCalculator; - protected final ByteRunAutomaton runAutomaton; - protected final BytesRef commonSuffixRef; // maybe null - protected final BytesRef commonPrefixRef; - protected final BytesRef startTerm; // maybe null + /** + * Block iteration order. Whether to move next block, jump to a block away, or end the iteration. + */ + protected enum BlockIteration {NEXT, SEEK, END} - /** Set this when our current mode is seeking to this term. Set to null after. */ - protected BytesRef seekTerm; + /** + * Threshold that controls when to attempt to jump to a block away. + * + * This counter is 0 when entering a block. It is incremented each time a term is rejected by the automaton. + * When the counter is greater than or equal to this threshold, then we compute the next term accepted by + * the automaton, with {@link AutomatonNextTermCalculator}, and we jump to a block away if the next term + * accepted is greater than the immediate next term in the block. + * + * A low value, for example 1, improves the performance of automatons requiring many jumps, for example + * {@link org.apache.lucene.search.FuzzyQuery} and most {@link org.apache.lucene.search.WildcardQuery}. + * A higher value improves the performance of automatons with less or no jump, for example + * {@link org.apache.lucene.search.PrefixQuery}. + * A threshold of 4 seems to be a good balance. + */ + protected final int NUM_CONSECUTIVELY_REJECTED_TERMS_THRESHOLD = 4; - protected int blockPrefixRunAutomatonState; - protected int blockPrefixLen; + protected final Automaton automaton; + protected final ByteRunAutomaton runAutomaton; + protected final boolean finite; + protected final BytesRef commonSuffix; // maybe null + protected final int minTermLength; + protected final AutomatonNextTermCalculator nextStringCalculator; /** - * Number of bytes accepted by the last call to {@link #runAutomatonForState}. + * Set this when our current mode is seeking to this term. Set to null after. + */ + protected BytesRef seekTerm; + /** + * Number of bytes accepted by the automaton when validating the current term. + */ + protected int numMatchedBytes; + /** + * Automaton states reached when validating the current term, from 0 to {@link #numMatchedBytes} - 1. + */ + protected int[] states; + /** + * Block iteration order determined when scanning the terms in the current block. */ - protected int numBytesAccepted; + protected BlockIteration blockIteration; /** - * Whether the current term is beyond the automaton common prefix. - * If true this means the enumeration should stop immediately. + * Counter of the number of consecutively rejected terms. + * Depending on {@link #NUM_CONSECUTIVELY_REJECTED_TERMS_THRESHOLD}, this may trigger a jump to a block away. */ - protected boolean beyondCommonPrefix; + protected int numConsecutivelyRejectedTerms; - public IntersectBlockReader(CompiledAutomaton compiled, BytesRef startTerm, - IndexDictionary.BrowserSupplier dictionaryBrowserSupplier, IndexInput blockInput, PostingsReaderBase
[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1270: LUCENE-9237: Faster UniformSplit IntersectBlockReader.
dsmiley commented on a change in pull request #1270: LUCENE-9237: Faster UniformSplit IntersectBlockReader. URL: https://github.com/apache/lucene-solr/pull/1270#discussion_r383522035 ## File path: lucene/codecs/src/java/org/apache/lucene/codecs/uniformsplit/IntersectBlockReader.java ## @@ -18,260 +18,337 @@ package org.apache.lucene.codecs.uniformsplit; import java.io.IOException; -import java.util.Objects; +import java.util.Arrays; import org.apache.lucene.codecs.PostingsReaderBase; import org.apache.lucene.index.TermState; import org.apache.lucene.index.TermsEnum; import org.apache.lucene.store.IndexInput; +import org.apache.lucene.util.ArrayUtil; import org.apache.lucene.util.BytesRef; import org.apache.lucene.util.BytesRefBuilder; import org.apache.lucene.util.IntsRefBuilder; -import org.apache.lucene.util.StringHelper; import org.apache.lucene.util.automaton.Automaton; import org.apache.lucene.util.automaton.ByteRunAutomaton; import org.apache.lucene.util.automaton.CompiledAutomaton; -import org.apache.lucene.util.automaton.Operations; import org.apache.lucene.util.automaton.Transition; /** * The "intersect" {@link TermsEnum} response to {@link UniformSplitTerms#intersect(CompiledAutomaton, BytesRef)}, * intersecting the terms with an automaton. + * + * By design of the UniformSplit block keys, it is less efficient than + * {@code org.apache.lucene.codecs.blocktree.IntersectTermsEnum} for {@link org.apache.lucene.search.FuzzyQuery} (-37%). + * It is slightly slower for {@link org.apache.lucene.search.WildcardQuery} (-5%) and slightly faster for + * {@link org.apache.lucene.search.PrefixQuery} (+5%). + * + * @lucene.experimental */ public class IntersectBlockReader extends BlockReader { - protected final AutomatonNextTermCalculator nextStringCalculator; - protected final ByteRunAutomaton runAutomaton; - protected final BytesRef commonSuffixRef; // maybe null - protected final BytesRef commonPrefixRef; - protected final BytesRef startTerm; // maybe null + /** + * Block iteration order. Whether to move next block, jump to a block away, or end the iteration. + */ + protected enum BlockIteration {NEXT, SEEK, END} - /** Set this when our current mode is seeking to this term. Set to null after. */ - protected BytesRef seekTerm; + /** + * Threshold that controls when to attempt to jump to a block away. + * + * This counter is 0 when entering a block. It is incremented each time a term is rejected by the automaton. + * When the counter is greater than or equal to this threshold, then we compute the next term accepted by + * the automaton, with {@link AutomatonNextTermCalculator}, and we jump to a block away if the next term + * accepted is greater than the immediate next term in the block. + * + * A low value, for example 1, improves the performance of automatons requiring many jumps, for example + * {@link org.apache.lucene.search.FuzzyQuery} and most {@link org.apache.lucene.search.WildcardQuery}. + * A higher value improves the performance of automatons with less or no jump, for example + * {@link org.apache.lucene.search.PrefixQuery}. + * A threshold of 4 seems to be a good balance. + */ + protected final int NUM_CONSECUTIVELY_REJECTED_TERMS_THRESHOLD = 4; - protected int blockPrefixRunAutomatonState; - protected int blockPrefixLen; + protected final Automaton automaton; + protected final ByteRunAutomaton runAutomaton; + protected final boolean finite; + protected final BytesRef commonSuffix; // maybe null + protected final int minTermLength; + protected final AutomatonNextTermCalculator nextStringCalculator; /** - * Number of bytes accepted by the last call to {@link #runAutomatonForState}. + * Set this when our current mode is seeking to this term. Set to null after. + */ + protected BytesRef seekTerm; + /** + * Number of bytes accepted by the automaton when validating the current term. + */ + protected int numMatchedBytes; + /** + * Automaton states reached when validating the current term, from 0 to {@link #numMatchedBytes} - 1. + */ + protected int[] states; + /** + * Block iteration order determined when scanning the terms in the current block. */ - protected int numBytesAccepted; + protected BlockIteration blockIteration; /** - * Whether the current term is beyond the automaton common prefix. - * If true this means the enumeration should stop immediately. + * Counter of the number of consecutively rejected terms. + * Depending on {@link #NUM_CONSECUTIVELY_REJECTED_TERMS_THRESHOLD}, this may trigger a jump to a block away. */ - protected boolean beyondCommonPrefix; + protected int numConsecutivelyRejectedTerms; - public IntersectBlockReader(CompiledAutomaton compiled, BytesRef startTerm, - IndexDictionary.BrowserSupplier dictionaryBrowserSupplier, IndexInput blockInput, PostingsReaderBase
[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1270: LUCENE-9237: Faster UniformSplit IntersectBlockReader.
dsmiley commented on a change in pull request #1270: LUCENE-9237: Faster UniformSplit IntersectBlockReader. URL: https://github.com/apache/lucene-solr/pull/1270#discussion_r383527391 ## File path: lucene/codecs/src/java/org/apache/lucene/codecs/uniformsplit/IntersectBlockReader.java ## @@ -18,260 +18,337 @@ package org.apache.lucene.codecs.uniformsplit; import java.io.IOException; -import java.util.Objects; +import java.util.Arrays; import org.apache.lucene.codecs.PostingsReaderBase; import org.apache.lucene.index.TermState; import org.apache.lucene.index.TermsEnum; import org.apache.lucene.store.IndexInput; +import org.apache.lucene.util.ArrayUtil; import org.apache.lucene.util.BytesRef; import org.apache.lucene.util.BytesRefBuilder; import org.apache.lucene.util.IntsRefBuilder; -import org.apache.lucene.util.StringHelper; import org.apache.lucene.util.automaton.Automaton; import org.apache.lucene.util.automaton.ByteRunAutomaton; import org.apache.lucene.util.automaton.CompiledAutomaton; -import org.apache.lucene.util.automaton.Operations; import org.apache.lucene.util.automaton.Transition; /** * The "intersect" {@link TermsEnum} response to {@link UniformSplitTerms#intersect(CompiledAutomaton, BytesRef)}, * intersecting the terms with an automaton. + * + * By design of the UniformSplit block keys, it is less efficient than + * {@code org.apache.lucene.codecs.blocktree.IntersectTermsEnum} for {@link org.apache.lucene.search.FuzzyQuery} (-37%). + * It is slightly slower for {@link org.apache.lucene.search.WildcardQuery} (-5%) and slightly faster for + * {@link org.apache.lucene.search.PrefixQuery} (+5%). + * + * @lucene.experimental */ public class IntersectBlockReader extends BlockReader { - protected final AutomatonNextTermCalculator nextStringCalculator; - protected final ByteRunAutomaton runAutomaton; - protected final BytesRef commonSuffixRef; // maybe null - protected final BytesRef commonPrefixRef; - protected final BytesRef startTerm; // maybe null + /** + * Block iteration order. Whether to move next block, jump to a block away, or end the iteration. + */ + protected enum BlockIteration {NEXT, SEEK, END} - /** Set this when our current mode is seeking to this term. Set to null after. */ - protected BytesRef seekTerm; + /** + * Threshold that controls when to attempt to jump to a block away. + * + * This counter is 0 when entering a block. It is incremented each time a term is rejected by the automaton. + * When the counter is greater than or equal to this threshold, then we compute the next term accepted by + * the automaton, with {@link AutomatonNextTermCalculator}, and we jump to a block away if the next term + * accepted is greater than the immediate next term in the block. + * + * A low value, for example 1, improves the performance of automatons requiring many jumps, for example + * {@link org.apache.lucene.search.FuzzyQuery} and most {@link org.apache.lucene.search.WildcardQuery}. + * A higher value improves the performance of automatons with less or no jump, for example + * {@link org.apache.lucene.search.PrefixQuery}. + * A threshold of 4 seems to be a good balance. + */ + protected final int NUM_CONSECUTIVELY_REJECTED_TERMS_THRESHOLD = 4; - protected int blockPrefixRunAutomatonState; - protected int blockPrefixLen; + protected final Automaton automaton; + protected final ByteRunAutomaton runAutomaton; + protected final boolean finite; + protected final BytesRef commonSuffix; // maybe null + protected final int minTermLength; + protected final AutomatonNextTermCalculator nextStringCalculator; /** - * Number of bytes accepted by the last call to {@link #runAutomatonForState}. + * Set this when our current mode is seeking to this term. Set to null after. + */ + protected BytesRef seekTerm; + /** + * Number of bytes accepted by the automaton when validating the current term. + */ + protected int numMatchedBytes; + /** + * Automaton states reached when validating the current term, from 0 to {@link #numMatchedBytes} - 1. + */ + protected int[] states; + /** + * Block iteration order determined when scanning the terms in the current block. */ - protected int numBytesAccepted; + protected BlockIteration blockIteration; /** - * Whether the current term is beyond the automaton common prefix. - * If true this means the enumeration should stop immediately. + * Counter of the number of consecutively rejected terms. + * Depending on {@link #NUM_CONSECUTIVELY_REJECTED_TERMS_THRESHOLD}, this may trigger a jump to a block away. */ - protected boolean beyondCommonPrefix; + protected int numConsecutivelyRejectedTerms; - public IntersectBlockReader(CompiledAutomaton compiled, BytesRef startTerm, - IndexDictionary.BrowserSupplier dictionaryBrowserSupplier, IndexInput blockInput, PostingsReaderBase
[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1270: LUCENE-9237: Faster UniformSplit IntersectBlockReader.
dsmiley commented on a change in pull request #1270: LUCENE-9237: Faster UniformSplit IntersectBlockReader. URL: https://github.com/apache/lucene-solr/pull/1270#discussion_r383526667 ## File path: lucene/codecs/src/java/org/apache/lucene/codecs/uniformsplit/IntersectBlockReader.java ## @@ -18,260 +18,337 @@ package org.apache.lucene.codecs.uniformsplit; import java.io.IOException; -import java.util.Objects; +import java.util.Arrays; import org.apache.lucene.codecs.PostingsReaderBase; import org.apache.lucene.index.TermState; import org.apache.lucene.index.TermsEnum; import org.apache.lucene.store.IndexInput; +import org.apache.lucene.util.ArrayUtil; import org.apache.lucene.util.BytesRef; import org.apache.lucene.util.BytesRefBuilder; import org.apache.lucene.util.IntsRefBuilder; -import org.apache.lucene.util.StringHelper; import org.apache.lucene.util.automaton.Automaton; import org.apache.lucene.util.automaton.ByteRunAutomaton; import org.apache.lucene.util.automaton.CompiledAutomaton; -import org.apache.lucene.util.automaton.Operations; import org.apache.lucene.util.automaton.Transition; /** * The "intersect" {@link TermsEnum} response to {@link UniformSplitTerms#intersect(CompiledAutomaton, BytesRef)}, * intersecting the terms with an automaton. + * + * By design of the UniformSplit block keys, it is less efficient than + * {@code org.apache.lucene.codecs.blocktree.IntersectTermsEnum} for {@link org.apache.lucene.search.FuzzyQuery} (-37%). + * It is slightly slower for {@link org.apache.lucene.search.WildcardQuery} (-5%) and slightly faster for + * {@link org.apache.lucene.search.PrefixQuery} (+5%). + * + * @lucene.experimental */ public class IntersectBlockReader extends BlockReader { - protected final AutomatonNextTermCalculator nextStringCalculator; - protected final ByteRunAutomaton runAutomaton; - protected final BytesRef commonSuffixRef; // maybe null - protected final BytesRef commonPrefixRef; - protected final BytesRef startTerm; // maybe null + /** + * Block iteration order. Whether to move next block, jump to a block away, or end the iteration. + */ + protected enum BlockIteration {NEXT, SEEK, END} - /** Set this when our current mode is seeking to this term. Set to null after. */ - protected BytesRef seekTerm; + /** + * Threshold that controls when to attempt to jump to a block away. + * + * This counter is 0 when entering a block. It is incremented each time a term is rejected by the automaton. + * When the counter is greater than or equal to this threshold, then we compute the next term accepted by + * the automaton, with {@link AutomatonNextTermCalculator}, and we jump to a block away if the next term + * accepted is greater than the immediate next term in the block. + * + * A low value, for example 1, improves the performance of automatons requiring many jumps, for example + * {@link org.apache.lucene.search.FuzzyQuery} and most {@link org.apache.lucene.search.WildcardQuery}. + * A higher value improves the performance of automatons with less or no jump, for example + * {@link org.apache.lucene.search.PrefixQuery}. + * A threshold of 4 seems to be a good balance. + */ + protected final int NUM_CONSECUTIVELY_REJECTED_TERMS_THRESHOLD = 4; - protected int blockPrefixRunAutomatonState; - protected int blockPrefixLen; + protected final Automaton automaton; + protected final ByteRunAutomaton runAutomaton; + protected final boolean finite; + protected final BytesRef commonSuffix; // maybe null + protected final int minTermLength; + protected final AutomatonNextTermCalculator nextStringCalculator; /** - * Number of bytes accepted by the last call to {@link #runAutomatonForState}. + * Set this when our current mode is seeking to this term. Set to null after. + */ + protected BytesRef seekTerm; + /** + * Number of bytes accepted by the automaton when validating the current term. + */ + protected int numMatchedBytes; + /** + * Automaton states reached when validating the current term, from 0 to {@link #numMatchedBytes} - 1. + */ + protected int[] states; + /** + * Block iteration order determined when scanning the terms in the current block. */ - protected int numBytesAccepted; + protected BlockIteration blockIteration; /** - * Whether the current term is beyond the automaton common prefix. - * If true this means the enumeration should stop immediately. + * Counter of the number of consecutively rejected terms. + * Depending on {@link #NUM_CONSECUTIVELY_REJECTED_TERMS_THRESHOLD}, this may trigger a jump to a block away. */ - protected boolean beyondCommonPrefix; + protected int numConsecutivelyRejectedTerms; - public IntersectBlockReader(CompiledAutomaton compiled, BytesRef startTerm, - IndexDictionary.BrowserSupplier dictionaryBrowserSupplier, IndexInput blockInput, PostingsReaderBase
[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1270: LUCENE-9237: Faster UniformSplit IntersectBlockReader.
dsmiley commented on a change in pull request #1270: LUCENE-9237: Faster UniformSplit IntersectBlockReader. URL: https://github.com/apache/lucene-solr/pull/1270#discussion_r383523994 ## File path: lucene/codecs/src/java/org/apache/lucene/codecs/uniformsplit/IntersectBlockReader.java ## @@ -18,260 +18,337 @@ package org.apache.lucene.codecs.uniformsplit; import java.io.IOException; -import java.util.Objects; +import java.util.Arrays; import org.apache.lucene.codecs.PostingsReaderBase; import org.apache.lucene.index.TermState; import org.apache.lucene.index.TermsEnum; import org.apache.lucene.store.IndexInput; +import org.apache.lucene.util.ArrayUtil; import org.apache.lucene.util.BytesRef; import org.apache.lucene.util.BytesRefBuilder; import org.apache.lucene.util.IntsRefBuilder; -import org.apache.lucene.util.StringHelper; import org.apache.lucene.util.automaton.Automaton; import org.apache.lucene.util.automaton.ByteRunAutomaton; import org.apache.lucene.util.automaton.CompiledAutomaton; -import org.apache.lucene.util.automaton.Operations; import org.apache.lucene.util.automaton.Transition; /** * The "intersect" {@link TermsEnum} response to {@link UniformSplitTerms#intersect(CompiledAutomaton, BytesRef)}, * intersecting the terms with an automaton. + * + * By design of the UniformSplit block keys, it is less efficient than + * {@code org.apache.lucene.codecs.blocktree.IntersectTermsEnum} for {@link org.apache.lucene.search.FuzzyQuery} (-37%). + * It is slightly slower for {@link org.apache.lucene.search.WildcardQuery} (-5%) and slightly faster for + * {@link org.apache.lucene.search.PrefixQuery} (+5%). + * + * @lucene.experimental */ public class IntersectBlockReader extends BlockReader { - protected final AutomatonNextTermCalculator nextStringCalculator; - protected final ByteRunAutomaton runAutomaton; - protected final BytesRef commonSuffixRef; // maybe null - protected final BytesRef commonPrefixRef; - protected final BytesRef startTerm; // maybe null + /** + * Block iteration order. Whether to move next block, jump to a block away, or end the iteration. + */ + protected enum BlockIteration {NEXT, SEEK, END} - /** Set this when our current mode is seeking to this term. Set to null after. */ - protected BytesRef seekTerm; + /** + * Threshold that controls when to attempt to jump to a block away. + * + * This counter is 0 when entering a block. It is incremented each time a term is rejected by the automaton. + * When the counter is greater than or equal to this threshold, then we compute the next term accepted by + * the automaton, with {@link AutomatonNextTermCalculator}, and we jump to a block away if the next term + * accepted is greater than the immediate next term in the block. + * + * A low value, for example 1, improves the performance of automatons requiring many jumps, for example + * {@link org.apache.lucene.search.FuzzyQuery} and most {@link org.apache.lucene.search.WildcardQuery}. + * A higher value improves the performance of automatons with less or no jump, for example + * {@link org.apache.lucene.search.PrefixQuery}. + * A threshold of 4 seems to be a good balance. + */ + protected final int NUM_CONSECUTIVELY_REJECTED_TERMS_THRESHOLD = 4; - protected int blockPrefixRunAutomatonState; - protected int blockPrefixLen; + protected final Automaton automaton; + protected final ByteRunAutomaton runAutomaton; + protected final boolean finite; + protected final BytesRef commonSuffix; // maybe null + protected final int minTermLength; + protected final AutomatonNextTermCalculator nextStringCalculator; /** - * Number of bytes accepted by the last call to {@link #runAutomatonForState}. + * Set this when our current mode is seeking to this term. Set to null after. + */ + protected BytesRef seekTerm; + /** + * Number of bytes accepted by the automaton when validating the current term. + */ + protected int numMatchedBytes; + /** + * Automaton states reached when validating the current term, from 0 to {@link #numMatchedBytes} - 1. + */ + protected int[] states; + /** + * Block iteration order determined when scanning the terms in the current block. */ - protected int numBytesAccepted; + protected BlockIteration blockIteration; /** - * Whether the current term is beyond the automaton common prefix. - * If true this means the enumeration should stop immediately. + * Counter of the number of consecutively rejected terms. + * Depending on {@link #NUM_CONSECUTIVELY_REJECTED_TERMS_THRESHOLD}, this may trigger a jump to a block away. */ - protected boolean beyondCommonPrefix; + protected int numConsecutivelyRejectedTerms; - public IntersectBlockReader(CompiledAutomaton compiled, BytesRef startTerm, - IndexDictionary.BrowserSupplier dictionaryBrowserSupplier, IndexInput blockInput, PostingsReaderBase
[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1270: LUCENE-9237: Faster UniformSplit IntersectBlockReader.
dsmiley commented on a change in pull request #1270: LUCENE-9237: Faster UniformSplit IntersectBlockReader. URL: https://github.com/apache/lucene-solr/pull/1270#discussion_r383546038 ## File path: lucene/codecs/src/java/org/apache/lucene/codecs/uniformsplit/IntersectBlockReader.java ## @@ -18,260 +18,337 @@ package org.apache.lucene.codecs.uniformsplit; import java.io.IOException; -import java.util.Objects; +import java.util.Arrays; import org.apache.lucene.codecs.PostingsReaderBase; import org.apache.lucene.index.TermState; import org.apache.lucene.index.TermsEnum; import org.apache.lucene.store.IndexInput; +import org.apache.lucene.util.ArrayUtil; import org.apache.lucene.util.BytesRef; import org.apache.lucene.util.BytesRefBuilder; import org.apache.lucene.util.IntsRefBuilder; -import org.apache.lucene.util.StringHelper; import org.apache.lucene.util.automaton.Automaton; import org.apache.lucene.util.automaton.ByteRunAutomaton; import org.apache.lucene.util.automaton.CompiledAutomaton; -import org.apache.lucene.util.automaton.Operations; import org.apache.lucene.util.automaton.Transition; /** * The "intersect" {@link TermsEnum} response to {@link UniformSplitTerms#intersect(CompiledAutomaton, BytesRef)}, * intersecting the terms with an automaton. + * + * By design of the UniformSplit block keys, it is less efficient than + * {@code org.apache.lucene.codecs.blocktree.IntersectTermsEnum} for {@link org.apache.lucene.search.FuzzyQuery} (-37%). + * It is slightly slower for {@link org.apache.lucene.search.WildcardQuery} (-5%) and slightly faster for + * {@link org.apache.lucene.search.PrefixQuery} (+5%). + * + * @lucene.experimental */ public class IntersectBlockReader extends BlockReader { - protected final AutomatonNextTermCalculator nextStringCalculator; - protected final ByteRunAutomaton runAutomaton; - protected final BytesRef commonSuffixRef; // maybe null - protected final BytesRef commonPrefixRef; - protected final BytesRef startTerm; // maybe null + /** + * Block iteration order. Whether to move next block, jump to a block away, or end the iteration. + */ + protected enum BlockIteration {NEXT, SEEK, END} - /** Set this when our current mode is seeking to this term. Set to null after. */ - protected BytesRef seekTerm; + /** + * Threshold that controls when to attempt to jump to a block away. + * + * This counter is 0 when entering a block. It is incremented each time a term is rejected by the automaton. + * When the counter is greater than or equal to this threshold, then we compute the next term accepted by + * the automaton, with {@link AutomatonNextTermCalculator}, and we jump to a block away if the next term + * accepted is greater than the immediate next term in the block. + * + * A low value, for example 1, improves the performance of automatons requiring many jumps, for example + * {@link org.apache.lucene.search.FuzzyQuery} and most {@link org.apache.lucene.search.WildcardQuery}. + * A higher value improves the performance of automatons with less or no jump, for example + * {@link org.apache.lucene.search.PrefixQuery}. + * A threshold of 4 seems to be a good balance. + */ + protected final int NUM_CONSECUTIVELY_REJECTED_TERMS_THRESHOLD = 4; - protected int blockPrefixRunAutomatonState; - protected int blockPrefixLen; + protected final Automaton automaton; + protected final ByteRunAutomaton runAutomaton; + protected final boolean finite; + protected final BytesRef commonSuffix; // maybe null + protected final int minTermLength; + protected final AutomatonNextTermCalculator nextStringCalculator; /** - * Number of bytes accepted by the last call to {@link #runAutomatonForState}. + * Set this when our current mode is seeking to this term. Set to null after. + */ + protected BytesRef seekTerm; + /** + * Number of bytes accepted by the automaton when validating the current term. + */ + protected int numMatchedBytes; + /** + * Automaton states reached when validating the current term, from 0 to {@link #numMatchedBytes} - 1. + */ + protected int[] states; + /** + * Block iteration order determined when scanning the terms in the current block. */ - protected int numBytesAccepted; + protected BlockIteration blockIteration; /** - * Whether the current term is beyond the automaton common prefix. - * If true this means the enumeration should stop immediately. + * Counter of the number of consecutively rejected terms. + * Depending on {@link #NUM_CONSECUTIVELY_REJECTED_TERMS_THRESHOLD}, this may trigger a jump to a block away. */ - protected boolean beyondCommonPrefix; + protected int numConsecutivelyRejectedTerms; - public IntersectBlockReader(CompiledAutomaton compiled, BytesRef startTerm, - IndexDictionary.BrowserSupplier dictionaryBrowserSupplier, IndexInput blockInput, PostingsReaderBase
[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1270: LUCENE-9237: Faster UniformSplit IntersectBlockReader.
dsmiley commented on a change in pull request #1270: LUCENE-9237: Faster UniformSplit IntersectBlockReader. URL: https://github.com/apache/lucene-solr/pull/1270#discussion_r383517461 ## File path: lucene/codecs/src/java/org/apache/lucene/codecs/uniformsplit/IntersectBlockReader.java ## @@ -18,260 +18,337 @@ package org.apache.lucene.codecs.uniformsplit; import java.io.IOException; -import java.util.Objects; +import java.util.Arrays; import org.apache.lucene.codecs.PostingsReaderBase; import org.apache.lucene.index.TermState; import org.apache.lucene.index.TermsEnum; import org.apache.lucene.store.IndexInput; +import org.apache.lucene.util.ArrayUtil; import org.apache.lucene.util.BytesRef; import org.apache.lucene.util.BytesRefBuilder; import org.apache.lucene.util.IntsRefBuilder; -import org.apache.lucene.util.StringHelper; import org.apache.lucene.util.automaton.Automaton; import org.apache.lucene.util.automaton.ByteRunAutomaton; import org.apache.lucene.util.automaton.CompiledAutomaton; -import org.apache.lucene.util.automaton.Operations; import org.apache.lucene.util.automaton.Transition; /** * The "intersect" {@link TermsEnum} response to {@link UniformSplitTerms#intersect(CompiledAutomaton, BytesRef)}, * intersecting the terms with an automaton. + * + * By design of the UniformSplit block keys, it is less efficient than + * {@code org.apache.lucene.codecs.blocktree.IntersectTermsEnum} for {@link org.apache.lucene.search.FuzzyQuery} (-37%). + * It is slightly slower for {@link org.apache.lucene.search.WildcardQuery} (-5%) and slightly faster for + * {@link org.apache.lucene.search.PrefixQuery} (+5%). + * + * @lucene.experimental */ public class IntersectBlockReader extends BlockReader { - protected final AutomatonNextTermCalculator nextStringCalculator; - protected final ByteRunAutomaton runAutomaton; - protected final BytesRef commonSuffixRef; // maybe null - protected final BytesRef commonPrefixRef; - protected final BytesRef startTerm; // maybe null + /** + * Block iteration order. Whether to move next block, jump to a block away, or end the iteration. + */ + protected enum BlockIteration {NEXT, SEEK, END} - /** Set this when our current mode is seeking to this term. Set to null after. */ - protected BytesRef seekTerm; + /** + * Threshold that controls when to attempt to jump to a block away. + * + * This counter is 0 when entering a block. It is incremented each time a term is rejected by the automaton. + * When the counter is greater than or equal to this threshold, then we compute the next term accepted by + * the automaton, with {@link AutomatonNextTermCalculator}, and we jump to a block away if the next term + * accepted is greater than the immediate next term in the block. + * + * A low value, for example 1, improves the performance of automatons requiring many jumps, for example + * {@link org.apache.lucene.search.FuzzyQuery} and most {@link org.apache.lucene.search.WildcardQuery}. + * A higher value improves the performance of automatons with less or no jump, for example + * {@link org.apache.lucene.search.PrefixQuery}. + * A threshold of 4 seems to be a good balance. + */ + protected final int NUM_CONSECUTIVELY_REJECTED_TERMS_THRESHOLD = 4; - protected int blockPrefixRunAutomatonState; - protected int blockPrefixLen; + protected final Automaton automaton; + protected final ByteRunAutomaton runAutomaton; + protected final boolean finite; + protected final BytesRef commonSuffix; // maybe null + protected final int minTermLength; + protected final AutomatonNextTermCalculator nextStringCalculator; /** - * Number of bytes accepted by the last call to {@link #runAutomatonForState}. + * Set this when our current mode is seeking to this term. Set to null after. + */ + protected BytesRef seekTerm; + /** + * Number of bytes accepted by the automaton when validating the current term. + */ + protected int numMatchedBytes; + /** + * Automaton states reached when validating the current term, from 0 to {@link #numMatchedBytes} - 1. + */ + protected int[] states; + /** + * Block iteration order determined when scanning the terms in the current block. */ - protected int numBytesAccepted; + protected BlockIteration blockIteration; /** - * Whether the current term is beyond the automaton common prefix. - * If true this means the enumeration should stop immediately. + * Counter of the number of consecutively rejected terms. + * Depending on {@link #NUM_CONSECUTIVELY_REJECTED_TERMS_THRESHOLD}, this may trigger a jump to a block away. */ - protected boolean beyondCommonPrefix; + protected int numConsecutivelyRejectedTerms; - public IntersectBlockReader(CompiledAutomaton compiled, BytesRef startTerm, - IndexDictionary.BrowserSupplier dictionaryBrowserSupplier, IndexInput blockInput, PostingsReaderBase
[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1270: LUCENE-9237: Faster UniformSplit IntersectBlockReader.
dsmiley commented on a change in pull request #1270: LUCENE-9237: Faster UniformSplit IntersectBlockReader. URL: https://github.com/apache/lucene-solr/pull/1270#discussion_r383536370 ## File path: lucene/codecs/src/java/org/apache/lucene/codecs/uniformsplit/IntersectBlockReader.java ## @@ -285,64 +362,66 @@ public void seekExact(long ord) { } @Override - public SeekStatus seekCeil(BytesRef text) { + public void seekExact(BytesRef term, TermState state) { throw new UnsupportedOperationException(); } @Override - public void seekExact(BytesRef term, TermState state) { + public SeekStatus seekCeil(BytesRef text) { throw new UnsupportedOperationException(); } /** * This is a copy of AutomatonTermsEnum. Since it's an inner class, the outer class can Review comment: Well; it's _mostly_ a copy of AutomatonTermsEnum now :-/ The duplication is a shame. Just insert the word "_mostly_" and it satisfies me. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13910) Create security news feed on website with RSS/Atom feed
[ https://issues.apache.org/jira/browse/SOLR-13910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043952#comment-17043952 ] Uwe Schindler commented on SOLR-13910: -- The header line wraps now on my computer. It looks like the menu font size is a bit too large. I will reduce it a bit, so it's 0.92rem instead of 1rem. It seems to depend on your browser and screen resolution if this occurs. > Create security news feed on website with RSS/Atom feed > --- > > Key: SOLR-13910 > URL: https://issues.apache.org/jira/browse/SOLR-13910 > Project: Solr > Issue Type: Task > Components: website >Reporter: Adam Walz >Assignee: Jan Høydahl >Priority: Minor > Attachments: recent-security-ann.png, security-page-with-table.png, > security-page-with-table.png, solr-security-page.png > > Time Spent: 20m > Remaining Estimate: 0h > > From [~janhoy] > We're in the process of migrating our web site to Git and in that same > process we also change CMS from an ASF one to Pelican. The new site has > built-in support for news posts as individual files and also RSS feeds of > those. So I propose to add [https://lucene.apache.org/solr/security.html] > to the site, including a list of newest CVEs and an RSS/Atom feed to go > along with it. This way users have ONE place to visit to check security > announcements and they can monitor RSS to be alerted once we post a new > announcement. > We could also add RSS feeds for Lucene-core news and Solr-news sections > of course. > At the same time I propose that the news on the front-page > [lucene.apache.org|http://lucene.apache.org/] > is replaced with widgets that show the title only of the last 3 announcements > from Lucene, Solr and PyLucene sub projects. That front page is waaay > too long :) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9201) Port documentation-lint task to Gradle build
[ https://issues.apache.org/jira/browse/LUCENE-9201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043951#comment-17043951 ] Dawid Weiss commented on LUCENE-9201: - A custom javadoc invocation is certainly possible and could possibly make things easier in the long run. You'd need to declare inputs/ outputs properly though so that it is skippable. Those javadoc invocations take a long time in precommit. > Port documentation-lint task to Gradle build > > > Key: LUCENE-9201 > URL: https://issues.apache.org/jira/browse/LUCENE-9201 > Project: Lucene - Core > Issue Type: Sub-task >Affects Versions: master (9.0) >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Major > Attachments: LUCENE-9201-ecj-2.patch, LUCENE-9201-ecj.patch, > LUCENE-9201-missing-docs.patch, LUCENE-9201.patch, javadocGRADLE.png, > javadocHTML4.png, javadocHTML5.png > > Time Spent: 4.5h > Remaining Estimate: 0h > > Ant build's "documentation-lint" target consists of those two sub targets. > * "-ecj-javadoc-lint" (Javadoc linting by ECJ) > * "-documentation-lint"(Missing javadocs / broken links check by python > scripts) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8987) Move Lucene web site from svn to git
[ https://issues.apache.org/jira/browse/LUCENE-8987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043942#comment-17043942 ] Uwe Schindler commented on LUCENE-8987: --- bq. I attempted a fix to the CSS caching issue. It is just a simple Pelican variable that gets injected for every unversioned CSS and JS in our HTML templates. See https://github.com/apache/lucene-site/pull/13 - Adding this should make the new front page load well for everyone after publishing I improved the CSS/JS caching: Whenever the {{v=X}} query string is appended, the underlying Apache is now sending a Cache-Control header. This will cache the resources for longer time (I started with 10 days). This improves page loads, as not event If-Modified-Since requests need to be done. > Move Lucene web site from svn to git > > > Key: LUCENE-8987 > URL: https://issues.apache.org/jira/browse/LUCENE-8987 > Project: Lucene - Core > Issue Type: Task > Components: general/website >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > Attachments: lucene-site-repo.png > > > INFRA just enabled [a new way of configuring website > build|https://s.apache.org/asfyaml] from a git branch, [see dev list > email|https://lists.apache.org/thread.html/b6f7e40bece5e83e27072ecc634a7815980c90240bc0a2ccb417f1fd@%3Cdev.lucene.apache.org%3E]. > It allows for automatic builds of both staging and production site, much > like the old CMS. We can choose to auto publish the html content of an > {{output/}} folder, or to have a bot build the site using > [Pelican|https://github.com/getpelican/pelican] from a {{content/}} folder. > The goal of this issue is to explore how this can be done for > [http://lucene.apache.org|http://lucene.apache.org/] by, by creating a new > git repo {{lucene-site}}, copy over the site from svn, see if it can be > "Pelicanized" easily and then test staging. Benefits are that more people > will be able to edit the web site and we can take PRs from the public (with > GitHub preview of pages). > Non-goals: > * Create a new web site or a new graphic design > * Change from Markdown to Asciidoc -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9241) fix most memory-hungry tests
[ https://issues.apache.org/jira/browse/LUCENE-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043928#comment-17043928 ] Dawid Weiss commented on LUCENE-9241: - I have reviewed it as well. :) Except for the things I mentioned I didn't think anything else was worth mentioning. Direct memory allocation may be misleading in that it is still allocation but escapes the heap... but I don't have an opinion on that (whether it's a good thing or not) so I'll just leave it up to you. > fix most memory-hungry tests > > > Key: LUCENE-9241 > URL: https://issues.apache.org/jira/browse/LUCENE-9241 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Robert Muir >Priority: Major > Attachments: LUCENE-9241.patch > > > Currently each test jvm has Xmx of 512M. With a modern macbook pro this is > 4GB which is pretty crazy. > On the other hand, if we fix a few edge cases, tests can work with lower > heaps such as 128M. This can save many gigabytes (also it finds interesting > memory waste/issues). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] andyvuong commented on a change in pull request #1223: SOLR-14213: Configuring Solr Cloud to use Shared Storage
andyvuong commented on a change in pull request #1223: SOLR-14213: Configuring Solr Cloud to use Shared Storage URL: https://github.com/apache/lucene-solr/pull/1223#discussion_r383548881 ## File path: solr/core/src/java/org/apache/solr/store/shared/SharedStoreManager.java ## @@ -43,68 +43,38 @@ public SharedStoreManager(ZkController controller) { zkController = controller; -// initialize BlobProcessUtil with the SharedStoreManager for background processes to be ready -blobProcessUtil = new BlobProcessUtil(zkController.getCoreContainer()); -blobCoreSyncer = new BlobCoreSyncer(); -sharedCoreConcurrencyController = new SharedCoreConcurrencyController(zkController.getCoreContainer()); - } - - @VisibleForTesting - public void initBlobStorageProvider(BlobStorageProvider blobStorageProvider) { -this.blobStorageProvider = blobStorageProvider; - } - - @VisibleForTesting - public void initBlobProcessUtil(BlobProcessUtil processUtil) { -if (blobProcessUtil != null) { - blobProcessUtil.shutdown(); -} -blobProcessUtil = processUtil; +blobStorageProvider = new BlobStorageProvider(); +blobDeleteManager = new BlobDeleteManager(getBlobStorageProvider().getClient()); +corePullTracker = new CorePullTracker(); +sharedShardMetadataController = new SharedShardMetadataController(zkController.getSolrCloudManager()); +sharedCoreConcurrencyController = new SharedCoreConcurrencyController(sharedShardMetadataController); } - /* - * Initiates a SharedShardMetadataController if it doesn't exist and returns one + /** + * Start blob processes that depend on an initiated SharedStoreManager */ + public void load() { +blobCoreSyncer = new BlobCoreSyncer(); Review comment: For the first problem, there are shared storage components that have corecontainer injected explicitly in their api methods or via other dependencies that have getters opening access to it (zkcontroller for example). There are also shared storage components that have it injected in the constructor. Thinking about and looking at it again, it's kind of a mess identifying the initialization flows/orders and I might need to refactor a bunch of things here for better consistency This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13965) Adding new functions to GraphHandler should be same as Streamhandler
[ https://issues.apache.org/jira/browse/SOLR-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043879#comment-17043879 ] Lucene/Solr QA commented on SOLR-13965: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Release audit (RAT) {color} | {color:green} 1m 10s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Check forbidden APIs {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Validate source patterns {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Validate ref guide {color} | {color:green} 1m 6s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 46m 43s{color} | {color:green} core in the patch passed. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 51m 24s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | SOLR-13965 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12994461/SOLR-13965.02.patch | | Optional Tests | compile javac unit ratsources checkforbiddenapis validatesourcepatterns validaterefguide | | uname | Linux lucene1-us-west 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | ant | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh | | git revision | master / 1770797387d | | ant | version: Apache Ant(TM) version 1.10.5 compiled on March 28 2019 | | Default Java | LTS | | Test Results | https://builds.apache.org/job/PreCommit-SOLR-Build/688/testReport/ | | modules | C: solr/core solr/solr-ref-guide U: solr | | Console output | https://builds.apache.org/job/PreCommit-SOLR-Build/688/console | | Powered by | Apache Yetus 0.7.0 http://yetus.apache.org | This message was automatically generated. > Adding new functions to GraphHandler should be same as Streamhandler > > > Key: SOLR-13965 > URL: https://issues.apache.org/jira/browse/SOLR-13965 > Project: Solr > Issue Type: Improvement > Components: streaming expressions >Affects Versions: 8.3 >Reporter: David Eric Pugh >Priority: Minor > Attachments: SOLR-13965.01.patch, SOLR-13965.02.patch > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Currently you add new functions to GraphHandler differently than you do in > StreamHandler. We should have one way of extending the handlers that support > streaming expressions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9227) Make page ready for pure HTTPS
[ https://issues.apache.org/jira/browse/LUCENE-9227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043855#comment-17043855 ] Jan Høydahl commented on LUCENE-9227: - Tested with browser and curl. The redirect works, but I know nothing about STS :) > Make page ready for pure HTTPS > -- > > Key: LUCENE-9227 > URL: https://issues.apache.org/jira/browse/LUCENE-9227 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Uwe Schindler >Assignee: Uwe Schindler >Priority: Blocker > > The web page can currently be visited using HTTPS but this brings warning: > - Both search providers create a form that passes USER ENTERED INPUT using no > encryption. This is not allowed due to GDPR. We have to fix this asap. It > looks like [~otis] search is working with HTTPS (if we change domain name), > but the Lucidworks does not > - There were some CSS files loaded with HTTP (fonts from Google - this was > fixed) > Once those 2 problems are fixed (I grepped for HTTP and still found many > links with HTTP, but looks like no images or scripts or css anymore), I'd > like to add a permanent redirect http://lucene.apache.org/ -> > https://lucene.apache.org to the htaccess template file. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14137) Boosting by date (and perhaps others) shows a steady decline 6.6->8.3
[ https://issues.apache.org/jira/browse/SOLR-14137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043840#comment-17043840 ] Erick Erickson commented on SOLR-14137: --- The programs I use to generate docs and run Jmeter are here: [https://github.com/ErickErickson/index_doc_generator.] It's a bit of a mess, I was trying several different things. But if people want to work with it I can help untangle it. > Boosting by date (and perhaps others) shows a steady decline 6.6->8.3 > - > > Key: SOLR-14137 > URL: https://issues.apache.org/jira/browse/SOLR-14137 > Project: Solr > Issue Type: Improvement >Reporter: Erick Erickson >Priority: Major > Attachments: Screen Shot 2019-12-19 at 2.35.41 PM.png, Screen Shot > 2019-12-19 at 3.09.37 PM.png, Screen Shot 2019-12-19 at 3.31.16 PM.png, > second_run.png > > > Moving a user's list discussion over here. > {color:#00}The very short form is that from Solr 6.6.1 to Solr 8.3.1, the > throughput for date boosting in my tests dropped by 40+%{color} > {color:#00}I’ve been hearing about slowdowns in successive Solr releases > with boost functions, so I dug into it a bit. The test setup is just a > boost-by-date with an additional big OR clause of 100 random words so I’d be > sure to hit a bunch of docs. I figured that if there were few hits, the > signal would be lost in the noise, but I didn’t look at the actual hit > counts.{color} > {color:#00}I saw several Solr JIRAs about this subject, but they were > slightly different, although quite possibly the same underlying issue. So I > tried to get this down to a very specific form of a query.{color} > {color:#00}I’ve also seen some cases in the wild where the response was > proportional to the number of segments, thus my optimize experiments.{color} > {color:#00}Here are the results, explanation below. O stands for > optimized to one segment. I spot checked pdate against 6.6, 7.1 and 8.3 and > they weren’t significantly different performance wise from tdate. All have > docValues enabled. I ran these against a multiValued=“false” field. All the > tests pegged all my CPUs. Jmeter is being run on a different machine than > Solr. Only one Solr was running for any test.{color} > {color:#00}Solr version queries/min {color} > {color:#00}6.6.1 3,400 {color} > {color:#00}6.6.1 O 4,800 {color} > {color:#00}7.1 2,800 {color} > {color:#00}7.1 O 4,200 {color} > {color:#00}7.7.1 2,400 {color} > {color:#00}7.7.1 O 3,500 {color} > {color:#00}8.3.1 2,000 {color} > {color:#00}8.3.1 O 2,600 {color} > {color:#00}The tests I’ve been running just index 20M docs into a single > core, then run the exact same 10,000 queries against them from jmeter with 24 > threads. Spot checks showed no hits on the queryResultCache.{color} > {color:#00}A query looks like this: {color} > {color:#00}rows=0&\{!boost b=recip(ms(NOW, > INSERT_FIELD_HERE),3.16e-11,1,1)}text_txt:(campaigners OR adjourned OR > anyplace…97 more random words){color} > {color:#00}There is no faceting. No grouping. No sorting.{color} > {color:#00}I fill in INSERT_FIELD_HERE through jmeter magic. I’m running > the exact same queries for every test.{color} > {color:#00}One wildcard is that I did regenerate the index for each major > revision, and the chose random words from the same list of words, as well as > random times (bounded in the same range though) so the docs are not > completely identical. The index was in the native format for that major > version even if slightly different between versions. I ran the test once, > then ran it again after optimizing the index.{color} > {color:#00}I haven’t dug any farther, if anyone’s interested I can throw > a profiler at, say, 8.3 and see what I can see, although I’m not going to > have time to dive into this any time soon. I’d be glad to run some tests > though. I saved the queries and the indexes so running a test would only > take a few minutes.{color} > {color:#00}While I concentrated on date fields, the docs have date, int, > and long fields, both docValues=true and docValues=false, each variant with > multiValued=true and multiValued=false and both Trie and Point (where > possible) variants as well as a pretty simple text field.{color} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9237) Faster TermsEnum intersect for UniformSplit
[ https://issues.apache.org/jira/browse/LUCENE-9237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043837#comment-17043837 ] David Smiley commented on LUCENE-9237: -- Were you able to do a comparison while keeping the term dictionary memory usage equal? This will take some repeated tweaking of the parameters that UniformSplit provides and then examine the size of the term dict files (or some similar approach). Annoying; i know. Without doing this, we allow any postings format to cheat by using memory gratuitously over its competitor. An analogy is doing tour de france competition and not checking who is on drugs :-D. Or at least allowing an equal amount of drugs for the contestants -- LOL I amuse myself. Also, check that the on-heap vs off-heap FST usage is equivalent amongst the contestants, as this is easily toggled by any format. > Faster TermsEnum intersect for UniformSplit > --- > > Key: LUCENE-9237 > URL: https://issues.apache.org/jira/browse/LUCENE-9237 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Bruno Roustant >Assignee: Bruno Roustant >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > New version of TermsEnum intersect for UniformSplit. It is 75% more efficient > than the previous version for FuzzyQuery. > Compared to BlockTree IntersectTermsEnum: > - It is still slower for FuzzyQuery (-37%) but it is faster than the > previous version (which was -65%). > - It is slightly slower for WildcardQuery (-5%). > - It is slightly faster for PrefixQuery (+5%). Sometimes benchmarks show > more improvement (I've seen up to +17% a fourth of the time). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14223) PublicKeyHandler consumes a lot of entropy during tests
[ https://issues.apache.org/jira/browse/SOLR-14223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043804#comment-17043804 ] ASF subversion and git services commented on SOLR-14223: Commit 1770797387d761706c6d93253a3759d885f662c4 in lucene-solr's branch refs/heads/master from Mike [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=1770797 ] SOLR-14223 Create RSAKeyPair from disk (#1217) * Create properties for PublicKeyHandler to read existing keys from disk * Move pregenerated keys from core/test-files to test-framework * Update tests to use existing keys instead of new keys each run > PublicKeyHandler consumes a lot of entropy during tests > --- > > Key: SOLR-14223 > URL: https://issues.apache.org/jira/browse/SOLR-14223 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.4, 8.0 >Reporter: Mike Drob >Priority: Major > Time Spent: 2h > Remaining Estimate: 0h > > After the changes in SOLR-12354 to eagerly create a {{PublicKeyHandler}} for > the CoreContainer, the creation of the underlying {{RSAKeyPair}} uses > {{SecureRandom}} to generate primes. This eats up a lot of system entropy and > can slow down tests significantly (I observed it adding 10s to an individual > test). > Similar to what we do for SSL config for tests, we can swap in a non blocking > implementation of SecureRandom for the key pair generation to allow multiple > tests to run better in parallel. Primality testing with BigInteger is also > slow, so I'm not sure how much total speedup we can get here, maybe it's > worth checking if there are faster implementations out there in other > libraries. > In production cases, this also blocks creation of all cores. We should only > create the Handler if necessary, i.e. if the existing authn/z tell us that > they won't support internode requests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-14223) PublicKeyHandler consumes a lot of entropy during tests
[ https://issues.apache.org/jira/browse/SOLR-14223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Drob resolved SOLR-14223. -- Fix Version/s: master (9.0) Assignee: Mike Drob Resolution: Fixed > PublicKeyHandler consumes a lot of entropy during tests > --- > > Key: SOLR-14223 > URL: https://issues.apache.org/jira/browse/SOLR-14223 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 7.4, 8.0 >Reporter: Mike Drob >Assignee: Mike Drob >Priority: Major > Fix For: master (9.0) > > Time Spent: 2h > Remaining Estimate: 0h > > After the changes in SOLR-12354 to eagerly create a {{PublicKeyHandler}} for > the CoreContainer, the creation of the underlying {{RSAKeyPair}} uses > {{SecureRandom}} to generate primes. This eats up a lot of system entropy and > can slow down tests significantly (I observed it adding 10s to an individual > test). > Similar to what we do for SSL config for tests, we can swap in a non blocking > implementation of SecureRandom for the key pair generation to allow multiple > tests to run better in parallel. Primality testing with BigInteger is also > slow, so I'm not sure how much total speedup we can get here, maybe it's > worth checking if there are faster implementations out there in other > libraries. > In production cases, this also blocks creation of all cores. We should only > create the Handler if necessary, i.e. if the existing authn/z tell us that > they won't support internode requests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob merged pull request #1217: SOLR-14223 PublicKeyHandler consumes a lot of entropy during tests
madrob merged pull request #1217: SOLR-14223 PublicKeyHandler consumes a lot of entropy during tests URL: https://github.com/apache/lucene-solr/pull/1217 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dnhatn commented on a change in pull request #1274: LUCENE-9164: Prevent IW from closing gracefully if threads are still modifying
dnhatn commented on a change in pull request #1274: LUCENE-9164: Prevent IW from closing gracefully if threads are still modifying URL: https://github.com/apache/lucene-solr/pull/1274#discussion_r383433653 ## File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java ## @@ -3132,8 +3139,9 @@ public final long prepareCommit() throws IOException { * @return true iff this method flushed at least on segment to disk. * @lucene.experimental */ + @SuppressWarnings("try") public final boolean flushNextBuffer() throws IOException { -try { +try (Closeable finalizer = acquireModificationLease()){ Review comment: nit: add a space before `{` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dnhatn commented on a change in pull request #1274: LUCENE-9164: Prevent IW from closing gracefully if threads are still modifying
dnhatn commented on a change in pull request #1274: LUCENE-9164: Prevent IW from closing gracefully if threads are still modifying URL: https://github.com/apache/lucene-solr/pull/1274#discussion_r383431054 ## File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java ## @@ -1552,19 +1551,25 @@ public long deleteDocuments(Query... queries) throws IOException { } } -try { - long seqNo = docWriter.deleteQueries(queries); - if (seqNo < 0) { -seqNo = -seqNo; -processEvents(true); - } - - return seqNo; +try (Closeable finalizer = acquireModificationLease()) { + return maybeProcessEvents(docWriter.deleteQueries(queries)); } catch (VirtualMachineError tragedy) { tragicEvent(tragedy, "deleteDocuments(Query..)"); throw tragedy; } } + private Closeable acquireModificationLease() { Review comment: nit: add a new line This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dnhatn commented on a change in pull request #1274: LUCENE-9164: Prevent IW from closing gracefully if threads are still modifying
dnhatn commented on a change in pull request #1274: LUCENE-9164: Prevent IW from closing gracefully if threads are still modifying URL: https://github.com/apache/lucene-solr/pull/1274#discussion_r383433409 ## File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java ## @@ -3560,25 +3569,19 @@ private boolean doFlush(boolean applyAllDeletes) throws IOException { doBeforeFlush(); testPoint("startDoFlush"); boolean success = false; -try { +try (Closeable finalizer = acquireModificationLease()){ Review comment: nit: add a space before `{` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dnhatn commented on a change in pull request #1274: LUCENE-9164: Prevent IW from closing gracefully if threads are still modifying
dnhatn commented on a change in pull request #1274: LUCENE-9164: Prevent IW from closing gracefully if threads are still modifying URL: https://github.com/apache/lucene-solr/pull/1274#discussion_r383443954 ## File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java ## @@ -2417,7 +2424,7 @@ public long deleteAll() throws IOException { */ try { synchronized (fullFlushLock) { -try (Closeable finalizer = docWriter.lockAndAbortAll()) { +try (Closeable finalizer = acquireModificationLease(docWriter.lockAndAbortAll())) { Review comment: Do you have to release locks in the reverse order? Can we have two try-resources here instead of passing the lock to `acquireModificationLease` ? If so, we can remove the `in` parameter from `acquireModificationLease`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9227) Make page ready for pure HTTPS
[ https://issues.apache.org/jira/browse/LUCENE-9227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043794#comment-17043794 ] Uwe Schindler edited comment on LUCENE-9227 at 2/24/20 7:56 PM: I committed the following to htaccess.template: {noformat} Header always set Strict-Transport-Security "max-age=300" RewriteRule ^ https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L] {noformat} This is IMHO the most consistent way to express this. There are shorter ways, but the if/else statements are easier to read: - If user is on HTTPS, he/she gets STS header (for testing purposes limited to 300s) - If user is on HTTP, he/she gets redirect to HTTPS (permanent) {noformat} Uwe Schindler@VEGA:~ > curl -I https://lucene.staged.apache.org/ HTTP/1.1 200 OK Date: Mon, 24 Feb 2020 19:40:37 GMT Server: Apache Strict-Transport-Security: max-age=300 Last-Modified: Fri, 21 Feb 2020 12:58:09 GMT ETag: "394a-59f1592c57599" Accept-Ranges: bytes Content-Length: 14666 Vary: Accept-Encoding Content-Type: text/html Uwe Schindler@VEGA:~ > curl -I http://lucene.staged.apache.org/test?hallo HTTP/1.1 301 Moved Permanently Date: Mon, 24 Feb 2020 19:44:03 GMT Server: Apache Location: https://lucene.staged.apache.org/test?hallo Content-Type: text/html; charset=iso-8859-1 {noformat} I plan to merge this to master quite soon, so please test it! I will keep the STS header with 300seconds for a while and then raise to one year, if no complaints are coming. was (Author: thetaphi): I committed the following to htaccess.template: {noformat} Header always set Strict-Transport-Security "max-age=300" RewriteRule ^ https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L] {noformat} This is IMHO the most consistent way to express this. There are shorter ways, but the if/else statements are easier to read: - If user is on HTTPS, he gets STS header (for testing purposes, limited to 300s) - If user is on HTTP, he gets redirect to HTTPS (permanent) {noformat} Uwe Schindler@VEGA:~ > curl -I https://lucene.staged.apache.org/ HTTP/1.1 200 OK Date: Mon, 24 Feb 2020 19:40:37 GMT Server: Apache Strict-Transport-Security: max-age=300 Last-Modified: Fri, 21 Feb 2020 12:58:09 GMT ETag: "394a-59f1592c57599" Accept-Ranges: bytes Content-Length: 14666 Vary: Accept-Encoding Content-Type: text/html Uwe Schindler@VEGA:~ > curl -I http://lucene.staged.apache.org/test?hallo HTTP/1.1 301 Moved Permanently Date: Mon, 24 Feb 2020 19:44:03 GMT Server: Apache Location: https://lucene.staged.apache.org/test?hallo Content-Type: text/html; charset=iso-8859-1 {noformat} I plan to merge this to master quite soon, so please test it! I will keep the STS header with 300seconds for a while and then raise to one year, if no complaints are coming. > Make page ready for pure HTTPS > -- > > Key: LUCENE-9227 > URL: https://issues.apache.org/jira/browse/LUCENE-9227 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Uwe Schindler >Assignee: Uwe Schindler >Priority: Blocker > > The web page can currently be visited using HTTPS but this brings warning: > - Both search providers create a form that passes USER ENTERED INPUT using no > encryption. This is not allowed due to GDPR. We have to fix this asap. It > looks like [~otis] search is working with HTTPS (if we change domain name), > but the Lucidworks does not > - There were some CSS files loaded with HTTP (fonts from Google - this was > fixed) > Once those 2 problems are fixed (I grepped for HTTP and still found many > links with HTTP, but looks like no images or scripts or css anymore), I'd > like to add a permanent redirect http://lucene.apache.org/ -> > https://lucene.apache.org to the htaccess template file. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9227) Make page ready for pure HTTPS
[ https://issues.apache.org/jira/browse/LUCENE-9227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043794#comment-17043794 ] Uwe Schindler edited comment on LUCENE-9227 at 2/24/20 7:55 PM: I committed the following to htaccess.template: {noformat} Header always set Strict-Transport-Security "max-age=300" RewriteRule ^ https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L] {noformat} This is IMHO the most consistent way to express this. There are shorter ways, but the if/else statements are easier to read: - If user is on HTTPS, he gets STS header (for testing purposes, limited to 300s) - If user is on HTTP, he gets redirect to HTTPS (permanent) {noformat} Uwe Schindler@VEGA:~ > curl -I https://lucene.staged.apache.org/ HTTP/1.1 200 OK Date: Mon, 24 Feb 2020 19:40:37 GMT Server: Apache Strict-Transport-Security: max-age=300 Last-Modified: Fri, 21 Feb 2020 12:58:09 GMT ETag: "394a-59f1592c57599" Accept-Ranges: bytes Content-Length: 14666 Vary: Accept-Encoding Content-Type: text/html Uwe Schindler@VEGA:~ > curl -I http://lucene.staged.apache.org/test?hallo HTTP/1.1 301 Moved Permanently Date: Mon, 24 Feb 2020 19:44:03 GMT Server: Apache Location: https://lucene.staged.apache.org/test?hallo Content-Type: text/html; charset=iso-8859-1 {noformat} I plan to merge this to master quite soon, so please test it! I will keep the STS header with 300seconds for a while and then raise to one year, if no complaints are coming. was (Author: thetaphi): I committed the following to htaccess.template: {noformat} Header always set Strict-Transport-Security "max-age=300" RewriteRule ^ https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L] {noformat} This is IMHO the most consistent way to express this. There are shorter ways, but the if/else statements are easier to read: - If user is on HTTPS, he gets STS header (for testing purposes, limited to 300s) - If user is on HTTP, he gets redirect to HTTPS (permanent) {noformat} Uwe Schindler@VEGA:~ > curl -I https://lucene.staged.apache.org/ HTTP/1.1 200 OK Date: Mon, 24 Feb 2020 19:40:37 GMT Server: Apache Strict-Transport-Security: max-age=300 Last-Modified: Fri, 21 Feb 2020 12:58:09 GMT ETag: "394a-59f1592c57599" Accept-Ranges: bytes Content-Length: 14666 Vary: Accept-Encoding Content-Type: text/html Uwe Schindler@VEGA:~ > curl -I http://lucene.staged.apache.org/test?hallo HTTP/1.1 301 Moved Permanently Date: Mon, 24 Feb 2020 19:44:03 GMT Server: Apache Location: https://lucene.staged.apache.org/test?hallo Content-Type: text/html; charset=iso-8859-1 Uwe Schindler@VEGA:~ > curl -I https://lucene.staged.apache.org/ HTTP/1.1 200 OK Date: Mon, 24 Feb 2020 19:44:09 GMT Server: Apache Strict-Transport-Security: max-age=300 Last-Modified: Fri, 21 Feb 2020 12:58:09 GMT ETag: "394a-59f1592c57599" Accept-Ranges: bytes Content-Length: 14666 Vary: Accept-Encoding Content-Type: text/html {noformat} I plan to merge this to master quite soon, so please test it! I will keep the STS header with 300seconds for a while and then raise to one year, if no complaints are coming. > Make page ready for pure HTTPS > -- > > Key: LUCENE-9227 > URL: https://issues.apache.org/jira/browse/LUCENE-9227 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Uwe Schindler >Assignee: Uwe Schindler >Priority: Blocker > > The web page can currently be visited using HTTPS but this brings warning: > - Both search providers create a form that passes USER ENTERED INPUT using no > encryption. This is not allowed due to GDPR. We have to fix this asap. It > looks like [~otis] search is working with HTTPS (if we change domain name), > but the Lucidworks does not > - There were some CSS files loaded with HTTP (fonts from Google - this was > fixed) > Once those 2 problems are fixed (I grepped for HTTP and still found many > links with HTTP, but looks like no images or scripts or css anymore), I'd > like to add a permanent redirect http://lucene.apache.org/ -> > https://lucene.apache.org to the htaccess template file. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9227) Make page ready for pure HTTPS
[ https://issues.apache.org/jira/browse/LUCENE-9227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043794#comment-17043794 ] Uwe Schindler commented on LUCENE-9227: --- I committed the following to htaccess.template: {noformat} Header always set Strict-Transport-Security "max-age=300" RewriteRule ^ https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L] {noformat} This is IMHO the most consistent way to express this. There are shorter ways, but the if/else statements are easier to read: - If user is on HTTPS, he gets STS header (for testing purposes, limited to 300s) - If user is on HTTP, he gets redirect to HTTPS (permanent) {noformat} Uwe Schindler@VEGA:~ > curl -I https://lucene.staged.apache.org/ HTTP/1.1 200 OK Date: Mon, 24 Feb 2020 19:40:37 GMT Server: Apache Strict-Transport-Security: max-age=300 Last-Modified: Fri, 21 Feb 2020 12:58:09 GMT ETag: "394a-59f1592c57599" Accept-Ranges: bytes Content-Length: 14666 Vary: Accept-Encoding Content-Type: text/html Uwe Schindler@VEGA:~ > curl -I http://lucene.staged.apache.org/test?hallo HTTP/1.1 301 Moved Permanently Date: Mon, 24 Feb 2020 19:44:03 GMT Server: Apache Location: https://lucene.staged.apache.org/test?hallo Content-Type: text/html; charset=iso-8859-1 Uwe Schindler@VEGA:~ > curl -I https://lucene.staged.apache.org/ HTTP/1.1 200 OK Date: Mon, 24 Feb 2020 19:44:09 GMT Server: Apache Strict-Transport-Security: max-age=300 Last-Modified: Fri, 21 Feb 2020 12:58:09 GMT ETag: "394a-59f1592c57599" Accept-Ranges: bytes Content-Length: 14666 Vary: Accept-Encoding Content-Type: text/html {noformat} I plan to merge this to master quite soon, so please test it! I will keep the STS header with 300seconds for a while and then raise to one year, if no complaints are coming. > Make page ready for pure HTTPS > -- > > Key: LUCENE-9227 > URL: https://issues.apache.org/jira/browse/LUCENE-9227 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Uwe Schindler >Assignee: Uwe Schindler >Priority: Blocker > > The web page can currently be visited using HTTPS but this brings warning: > - Both search providers create a form that passes USER ENTERED INPUT using no > encryption. This is not allowed due to GDPR. We have to fix this asap. It > looks like [~otis] search is working with HTTPS (if we change domain name), > but the Lucidworks does not > - There were some CSS files loaded with HTTP (fonts from Google - this was > fixed) > Once those 2 problems are fixed (I grepped for HTTP and still found many > links with HTTP, but looks like no images or scripts or css anymore), I'd > like to add a permanent redirect http://lucene.apache.org/ -> > https://lucene.apache.org to the htaccess template file. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13965) Adding new functions to GraphHandler should be same as Streamhandler
[ https://issues.apache.org/jira/browse/SOLR-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043785#comment-17043785 ] David Eric Pugh commented on SOLR-13965: LGTM > Adding new functions to GraphHandler should be same as Streamhandler > > > Key: SOLR-13965 > URL: https://issues.apache.org/jira/browse/SOLR-13965 > Project: Solr > Issue Type: Improvement > Components: streaming expressions >Affects Versions: 8.3 >Reporter: David Eric Pugh >Priority: Minor > Attachments: SOLR-13965.01.patch, SOLR-13965.02.patch > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Currently you add new functions to GraphHandler differently than you do in > StreamHandler. We should have one way of extending the handlers that support > streaming expressions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13965) Adding new functions to GraphHandler should be same as Streamhandler
[ https://issues.apache.org/jira/browse/SOLR-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043782#comment-17043782 ] Christine Poerschke commented on SOLR-13965: Accounting for the \{{StreamHandler.addExpressiblePlugins}} factoring out (already committed above) the just attached \{{SOLR-13965.02.patch}} is what remains here from the https://github.com/apache/lucene-solr/pull/1033 I think. If there are no concerns or objections I'll aim to commit that later this week. > Adding new functions to GraphHandler should be same as Streamhandler > > > Key: SOLR-13965 > URL: https://issues.apache.org/jira/browse/SOLR-13965 > Project: Solr > Issue Type: Improvement > Components: streaming expressions >Affects Versions: 8.3 >Reporter: David Eric Pugh >Priority: Minor > Attachments: SOLR-13965.01.patch, SOLR-13965.02.patch > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Currently you add new functions to GraphHandler differently than you do in > StreamHandler. We should have one way of extending the handlers that support > streaming expressions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-13965) Adding new functions to GraphHandler should be same as Streamhandler
[ https://issues.apache.org/jira/browse/SOLR-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christine Poerschke updated SOLR-13965: --- Attachment: SOLR-13965.02.patch > Adding new functions to GraphHandler should be same as Streamhandler > > > Key: SOLR-13965 > URL: https://issues.apache.org/jira/browse/SOLR-13965 > Project: Solr > Issue Type: Improvement > Components: streaming expressions >Affects Versions: 8.3 >Reporter: David Eric Pugh >Priority: Minor > Attachments: SOLR-13965.01.patch, SOLR-13965.02.patch > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Currently you add new functions to GraphHandler differently than you do in > StreamHandler. We should have one way of extending the handlers that support > streaming expressions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14278) data loss during live shard split if leader dies
[ https://issues.apache.org/jira/browse/SOLR-14278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043780#comment-17043780 ] Yonik Seeley commented on SOLR-14278: - Testing update: I let the test loop overnight with split shard commented out. There were no failures. With the split in the test, the failure rate looks somewhere between 30-50% on my hardware. > data loss during live shard split if leader dies > > > Key: SOLR-14278 > URL: https://issues.apache.org/jira/browse/SOLR-14278 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Yonik Seeley >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > While trying to develop better tests for shared storage (SOLR-13101), I ran > across another failure for normal replica types as well (one of the first > things I do when a test fails for shared storage is to try and validate that > normal NRT replicas succeed.) The PR I'll open has a test adapted from the > one in SOLR-13813 for master. > Scenario: > - indexing is happening during shard split > - leader is killed shortly after (before the split has finished) and never > brought back up > - there are often some missing documents at the end > While it's possible that the simulated killing of the node in the unit test > is imperfect, I haven't reproduced a failure if I comment out the split > command and just kill the leader. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9248) Change internal code names of postingsFormats to use 84 suffix
[ https://issues.apache.org/jira/browse/LUCENE-9248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043766#comment-17043766 ] David Smiley commented on LUCENE-9248: -- WDYT [~jpountz]? Are all users of {{Lucene84PostingsWriter}} / Reader affected, thus nearly all formats? If we do a 8.4.1 I think this should be released in such a bug-fix version. In this issue I'd also like to update the Solr docs on the text tagger to suggest the FST format as more of a tip with a caveat. And also add to the upgrade notes on Lucene & Solr sides. > Change internal code names of postingsFormats to use 84 suffix > -- > > Key: LUCENE-9248 > URL: https://issues.apache.org/jira/browse/LUCENE-9248 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs >Reporter: David Smiley >Assignee: David Smiley >Priority: Major > > Some postings formats write the postings differently as of Lucene 8.4 due to > changes -- LUCENE-9027 and LUCENE-9116. Blocktree was transitioned in a > backwards-compatible way but some (all?) others were not. Consequently an > attempt of the new version to read an old index will fail due to some > non-obvious error. I propose here using a simple version suffix on these > postings formats like "84" (thus "FST84" as one example). I see some already > use a suffix but were not bumped for 8.4. This is a really simple change and > doesn't address the problem of us not noticing future needs to version bump. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] danmuzi commented on issue #1287: LUCENE-8954: refactor Nori analyzer
danmuzi commented on issue #1287: LUCENE-8954: refactor Nori analyzer URL: https://github.com/apache/lucene-solr/pull/1287#issuecomment-590491692 The previous PR(https://github.com/apache/lucene-solr/pull/1276) contains the lint error about unused import statement. So I reverted it on https://github.com/apache/lucene-solr/pull/1285. Sorry to make you confused. @jimczi This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] danmuzi opened a new pull request #1287: LUCENE-8954: refactor Nori analyzer
danmuzi opened a new pull request #1287: LUCENE-8954: refactor Nori analyzer URL: https://github.com/apache/lucene-solr/pull/1287 LUCENE-8954 is an issue created in August last year. (https://issues.apache.org/jira/browse/LUCENE-8954) The patch is already pushed in master branch. (#839) But I forgot to put it in branch_8x. So this PR is for it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9248) Change internal code names of postingsFormats to use 84 suffix
David Smiley created LUCENE-9248: Summary: Change internal code names of postingsFormats to use 84 suffix Key: LUCENE-9248 URL: https://issues.apache.org/jira/browse/LUCENE-9248 Project: Lucene - Core Issue Type: Improvement Components: core/codecs Reporter: David Smiley Assignee: David Smiley Some postings formats write the postings differently as of Lucene 8.4 due to changes -- LUCENE-9027 and LUCENE-9116. Blocktree was transitioned in a backwards-compatible way but some (all?) others were not. Consequently an attempt of the new version to read an old index will fail due to some non-obvious error. I propose here using a simple version suffix on these postings formats like "84" (thus "FST84" as one example). I see some already use a suffix but were not bumped for 8.4. This is a really simple change and doesn't address the problem of us not noticing future needs to version bump. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] cpoerschke opened a new pull request #1286: SOLR-14279: remove CSVStrategy's deprecated setters
cpoerschke opened a new pull request #1286: SOLR-14279: remove CSVStrategy's deprecated setters URL: https://github.com/apache/lucene-solr/pull/1286 https://issues.apache.org/jira/browse/SOLR-14279 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14279) remove CSVStrategy's deprecated setters
Christine Poerschke created SOLR-14279: -- Summary: remove CSVStrategy's deprecated setters Key: SOLR-14279 URL: https://issues.apache.org/jira/browse/SOLR-14279 Project: Solr Issue Type: Task Reporter: Christine Poerschke [https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.4.1/solr/core/src/java/org/apache/solr/internal/csv/CSVStrategy.java#L117] one possible approach: * change remaining callers to not use the deprecated setters * remove setters * make members * final remove deprecated {{ImmutableCSVStrategy}} class -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9171) Synonyms Boost by Payload
[ https://issues.apache.org/jira/browse/LUCENE-9171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043737#comment-17043737 ] David Smiley commented on LUCENE-9171: -- Alan, I think you forgot CHANGES.txt entries. Please ensure you add suitable entries in _both_ Lucene's and Solr's CHANGES.txt. Personally I would have committed the work under this Lucene issue and not Solr, but it's debatable I suppose. Also, please add "@lucene.experimental" to some of QueryBuilder's methods since we want the freedom to change this API at minor release boundaries. > Synonyms Boost by Payload > - > > Key: LUCENE-9171 > URL: https://issues.apache.org/jira/browse/LUCENE-9171 > Project: Lucene - Core > Issue Type: New Feature > Components: core/queryparser >Reporter: Alessandro Benedetti >Priority: Major > Fix For: 8.5 > > Time Spent: 10m > Remaining Estimate: 0h > > I have been working in the additional capability of boosting queries by terms > payload through a parameter to enable it in Lucene Query Builder. > This has been done targeting the Synonyms Query. > It is parametric, so it meant to see no difference unless the feature is > enabled. > Solr has its bits to comply thorugh its SynonymsQueryStyles -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9234) Keep write support for old codecs?
[ https://issues.apache.org/jira/browse/LUCENE-9234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043730#comment-17043730 ] David Smiley commented on LUCENE-9234: -- I tend to agree with Rob. Distributed systems on top of Lucene should be able to cope with the status quo, and this may mean more work for replica placement to consider the version if this wasn't thought of in the past. And a truly big/hard-core user could do some relatively basic Lucene re-packaging to ship the previous version if they were sufficiently motivated to care. Not all big search users would even care about this since a re-index or backup/restore may be feasible (it is where I work). > Keep write support for old codecs? > -- > > Key: LUCENE-9234 > URL: https://issues.apache.org/jira/browse/LUCENE-9234 > Project: Lucene - Core > Issue Type: Wish >Reporter: Adrien Grand >Priority: Minor > > Currenty we maintain read/write support for the latest codec in lucene/core, > and read-only support for codecs of previous versions (up to \{N-1\}.0}) in > lucene/backward-codecs. We often keep write support in test-framework for > testing purposes only. > This raises challenges for Elasticsearch with regard to rolling upgrades: we > have some users who index very large amounts of data on clusters that are > quite large, so that rolling upgrades take significant time. Meanwhile, > several indices may be created. > Allocating indices when the cluster has nodes of different versions requires > care as Lucene indices created on nodes with a newer version cannot be read > by the nodes running the older version. It is possible to force primary > replicas to be allocated on the older nodes, but this brings other problems > like availability, uneven disk usage across nodes, or moving a lot of data > around. > If Lucene could write data using the minimum version that exists in the > cluster, this would avoid this problem as the written data could be read by > any node of the cluster. I understand this change would not come for free, > especially when it comes to testing as we'd need to make sure that older > Lucene versions can read indices created by this "compatibility mode". > I'd be curious to understand whether this is a problem for Solr too, if not > how this problem is being handled, and maybe whether there are other problems > that you have encountered that would also benefit from the ability to write > data with an older format. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-14272) Remove autoReplicaFailoverBadNodeExpiration and autoReplicaFailoverWorkLoopDelay for 9.0 as it was deprecated in 7.1
[ https://issues.apache.org/jira/browse/SOLR-14272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anshum Gupta resolved SOLR-14272. - Resolution: Fixed > Remove autoReplicaFailoverBadNodeExpiration and > autoReplicaFailoverWorkLoopDelay for 9.0 as it was deprecated in 7.1 > > > Key: SOLR-14272 > URL: https://issues.apache.org/jira/browse/SOLR-14272 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Anshum Gupta >Assignee: Anshum Gupta >Priority: Major > Fix For: master (9.0) > > Time Spent: 10m > Remaining Estimate: 0h > > 'autoReplicaFailoverBadNodeExpiration' and 'autoReplicaFailoverWorkLoopDelay' > parameters were deprecated in 7.1 after the 'autoAddReplicas' feature was > ported to autoscaling. > We should remove them from the code to get rid of the cruft. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14272) Remove autoReplicaFailoverBadNodeExpiration and autoReplicaFailoverWorkLoopDelay for 9.0 as it was deprecated in 7.1
[ https://issues.apache.org/jira/browse/SOLR-14272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043712#comment-17043712 ] ASF subversion and git services commented on SOLR-14272: Commit 7ba9d4d756e50680b88ee10af2f13a8791588fe4 in lucene-solr's branch refs/heads/master from Anshum Gupta [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=7ba9d4d ] SOLR-14272: Remove autoReplicaFailoverBadNodeExpiration and autoReplicaFailoverWorkLoopDelay for 9.0 as it was deprecated in 7.1 (#1269) * SOLR-14272: Remove autoReplicaFailoverBadNodeExpiration and autoReplicaFailoverWorkLoopDelay for 9.0 as it was deprecated in 7.1 > Remove autoReplicaFailoverBadNodeExpiration and > autoReplicaFailoverWorkLoopDelay for 9.0 as it was deprecated in 7.1 > > > Key: SOLR-14272 > URL: https://issues.apache.org/jira/browse/SOLR-14272 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Anshum Gupta >Assignee: Anshum Gupta >Priority: Major > Fix For: master (9.0) > > Time Spent: 10m > Remaining Estimate: 0h > > 'autoReplicaFailoverBadNodeExpiration' and 'autoReplicaFailoverWorkLoopDelay' > parameters were deprecated in 7.1 after the 'autoAddReplicas' feature was > ported to autoscaling. > We should remove them from the code to get rid of the cruft. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14272) Remove autoReplicaFailoverBadNodeExpiration and autoReplicaFailoverWorkLoopDelay for 9.0 as it was deprecated in 7.1
[ https://issues.apache.org/jira/browse/SOLR-14272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043713#comment-17043713 ] ASF subversion and git services commented on SOLR-14272: Commit 7ba9d4d756e50680b88ee10af2f13a8791588fe4 in lucene-solr's branch refs/heads/master from Anshum Gupta [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=7ba9d4d ] SOLR-14272: Remove autoReplicaFailoverBadNodeExpiration and autoReplicaFailoverWorkLoopDelay for 9.0 as it was deprecated in 7.1 (#1269) * SOLR-14272: Remove autoReplicaFailoverBadNodeExpiration and autoReplicaFailoverWorkLoopDelay for 9.0 as it was deprecated in 7.1 > Remove autoReplicaFailoverBadNodeExpiration and > autoReplicaFailoverWorkLoopDelay for 9.0 as it was deprecated in 7.1 > > > Key: SOLR-14272 > URL: https://issues.apache.org/jira/browse/SOLR-14272 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Anshum Gupta >Assignee: Anshum Gupta >Priority: Major > Fix For: master (9.0) > > Time Spent: 10m > Remaining Estimate: 0h > > 'autoReplicaFailoverBadNodeExpiration' and 'autoReplicaFailoverWorkLoopDelay' > parameters were deprecated in 7.1 after the 'autoAddReplicas' feature was > ported to autoscaling. > We should remove them from the code to get rid of the cruft. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] anshumg merged pull request #1269: SOLR-14272: Remove autoReplicaFailoverBadNodeExpiration and autoReplicaFailoverWorkLoopDelay for 9.0 as it was deprecated in 7.1
anshumg merged pull request #1269: SOLR-14272: Remove autoReplicaFailoverBadNodeExpiration and autoReplicaFailoverWorkLoopDelay for 9.0 as it was deprecated in 7.1 URL: https://github.com/apache/lucene-solr/pull/1269 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Reopened] (LUCENE-8954) Refactor Nori(Korean) Analyzer
[ https://issues.apache.org/jira/browse/LUCENE-8954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namgyu Kim reopened LUCENE-8954: There is a lint error in patch. Sorry for confusing. > Refactor Nori(Korean) Analyzer > -- > > Key: LUCENE-8954 > URL: https://issues.apache.org/jira/browse/LUCENE-8954 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Namgyu Kim >Assignee: Namgyu Kim >Priority: Minor > Fix For: 8.x, master (9.0) > > Time Spent: 1h 20m > Remaining Estimate: 0h > > There are many codes that can be refactored in the Nori analyzer. > (whitespace, wrong type casting, unnecessary throws, C-style array, ...) > I think it's good to proceed if we can. > It has nothing to do with the actual working of Nori. > I'll just remove unnecessary code and make the code simple. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8954) Refactor Nori(Korean) Analyzer
[ https://issues.apache.org/jira/browse/LUCENE-8954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043710#comment-17043710 ] ASF subversion and git services commented on LUCENE-8954: - Commit 80372341426344f7d89a36adefbd178fb0e2548a in lucene-solr's branch refs/heads/branch_8x from Namgyu Kim [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8037234 ] Revert "LUCENE-8954: refactor Nori analyzer" This reverts commit 29b7e1a95c3a8857ef8ce05c0679c66e04b1f3e0. > Refactor Nori(Korean) Analyzer > -- > > Key: LUCENE-8954 > URL: https://issues.apache.org/jira/browse/LUCENE-8954 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Namgyu Kim >Assignee: Namgyu Kim >Priority: Minor > Fix For: 8.x, master (9.0) > > Time Spent: 1h 20m > Remaining Estimate: 0h > > There are many codes that can be refactored in the Nori analyzer. > (whitespace, wrong type casting, unnecessary throws, C-style array, ...) > I think it's good to proceed if we can. > It has nothing to do with the actual working of Nori. > I'll just remove unnecessary code and make the code simple. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] danmuzi merged pull request #1285: Revert "LUCENE-8954: refactor Nori analyzer"
danmuzi merged pull request #1285: Revert "LUCENE-8954: refactor Nori analyzer" URL: https://github.com/apache/lucene-solr/pull/1285 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] danmuzi opened a new pull request #1285: Revert "LUCENE-8954: refactor Nori analyzer"
danmuzi opened a new pull request #1285: Revert "LUCENE-8954: refactor Nori analyzer" URL: https://github.com/apache/lucene-solr/pull/1285 There is a lint error in patch. Sorry for confusing. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8954) Refactor Nori(Korean) Analyzer
[ https://issues.apache.org/jira/browse/LUCENE-8954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043706#comment-17043706 ] ASF subversion and git services commented on LUCENE-8954: - Commit 904ba2540b3c7b9a1d19f70941bac62d822b2926 in lucene-solr's branch refs/heads/revert-1276-branch_8x from Namgyu Kim [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=904ba25 ] Revert "LUCENE-8954: refactor Nori analyzer" This reverts commit 29b7e1a95c3a8857ef8ce05c0679c66e04b1f3e0. > Refactor Nori(Korean) Analyzer > -- > > Key: LUCENE-8954 > URL: https://issues.apache.org/jira/browse/LUCENE-8954 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Namgyu Kim >Assignee: Namgyu Kim >Priority: Minor > Fix For: 8.x, master (9.0) > > Time Spent: 1h > Remaining Estimate: 0h > > There are many codes that can be refactored in the Nori analyzer. > (whitespace, wrong type casting, unnecessary throws, C-style array, ...) > I think it's good to proceed if we can. > It has nothing to do with the actual working of Nori. > I'll just remove unnecessary code and make the code simple. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14274) Multiple CoreContainers will register the same JVM Metrics
[ https://issues.apache.org/jira/browse/SOLR-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043684#comment-17043684 ] Mike Drob commented on SOLR-14274: -- I think the behavior that we want varies with what kind of metric we are registering. If it is a core-specific metric then replacing makes sense. If it is a JVM or OS metric, then replacing might not make as much sense. I'm looking at this with replacing the idea of a binary force flag with an enum dictating what to do in case of conflict - replace, skip, or fail. > Multiple CoreContainers will register the same JVM Metrics > -- > > Key: SOLR-14274 > URL: https://issues.apache.org/jira/browse/SOLR-14274 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Mike Drob >Priority: Major > > When running multiple CoreContainer in the same JVM, either because we called > {{SolrCloudTestCase.configureCluster(int n)}} with {{n > 1}} or because we > have multiple tests running in the same JVM in succession, we will have > contention on the shared JVM {{metricsRegistry}} as they each replace the > existing metrics with their own. Further, with multiple nodes at the same > time, some of these metrics will be incorrect anyway, since they will only > reflect a single core container. Others will be fine since I think they are > reading system-level information so it doesn't matter where it comes from. > I think this is a test-only issue, since the circumstances where somebody is > running multiple core containers in a single JVM in production should be > rare, but maybe there are edge cases affected with EmbeddedSolrServer and > MapReduce or Spark, or other unusual deployment patterns. > Removing the metrics registration entirely can speed up > {{configureCluster(100).build()}} on my machine from 2 minutes to 30 seconds, > so I'm optimistic that there can be gains here without sacrificing the > feature entirely. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] bruno-roustant commented on a change in pull request #1281: LUCENE-9245: Optimize AutomatonTermsEnum memory and automaton Operations.getCommonPrefixBytesRef.
bruno-roustant commented on a change in pull request #1281: LUCENE-9245: Optimize AutomatonTermsEnum memory and automaton Operations.getCommonPrefixBytesRef. URL: https://github.com/apache/lucene-solr/pull/1281#discussion_r383377648 ## File path: lucene/core/src/java/org/apache/lucene/util/automaton/Operations.java ## @@ -1091,25 +1091,33 @@ public static String getCommonPrefix(Automaton a) { * @return common prefix, which can be an empty (length 0) BytesRef (never null) */ public static BytesRef getCommonPrefixBytesRef(Automaton a) { Review comment: Ok, removed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13411) CompositeIdRouter calculates wrong route hash if atomic update is used for route.field
[ https://issues.apache.org/jira/browse/SOLR-13411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043654#comment-17043654 ] Lucene/Solr QA commented on SOLR-13411: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 2 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 1s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Release audit (RAT) {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Check forbidden APIs {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Validate source patterns {color} | {color:green} 1m 4s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 49m 29s{color} | {color:green} core in the patch passed. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 53m 25s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | SOLR-13411 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12994329/SOLR-13411.patch | | Optional Tests | compile javac unit ratsources checkforbiddenapis validatesourcepatterns | | uname | Linux lucene1-us-west 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | ant | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh | | git revision | master / b4c2e279a94 | | ant | version: Apache Ant(TM) version 1.10.5 compiled on March 28 2019 | | Default Java | LTS | | Test Results | https://builds.apache.org/job/PreCommit-SOLR-Build/687/testReport/ | | modules | C: solr/core U: solr/core | | Console output | https://builds.apache.org/job/PreCommit-SOLR-Build/687/console | | Powered by | Apache Yetus 0.7.0 http://yetus.apache.org | This message was automatically generated. > CompositeIdRouter calculates wrong route hash if atomic update is used for > route.field > -- > > Key: SOLR-13411 > URL: https://issues.apache.org/jira/browse/SOLR-13411 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 7.5 >Reporter: Niko Himanen >Assignee: Mikhail Khludnev >Priority: Minor > Attachments: SOLR-13411.patch, SOLR-13411.patch > > > If collection is created with router.field -parameter to define some other > field than uniqueField as route field and document update comes containing > route field updated using atomic update syntax (for example set=123), hash > for document routing is calculated from "set=123" and not from 123 which is > the real value which may lead into routing document to wrong shard. > > This happens in CompositeIdRouter#sliceHash, where field value is used as is > for hash calculation. > > I think there are two possible solutions to fix this: > a) Allow use of atomic update also for route.field, but use real value > instead of atomic update syntax to route document into right shard. > b) Deny atomic update for route.field and throw exception. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] bruno-roustant commented on a change in pull request #1281: LUCENE-9245: Optimize AutomatonTermsEnum memory and automaton Operations.getCommonPrefixBytesRef.
bruno-roustant commented on a change in pull request #1281: LUCENE-9245: Optimize AutomatonTermsEnum memory and automaton Operations.getCommonPrefixBytesRef. URL: https://github.com/apache/lucene-solr/pull/1281#discussion_r383351481 ## File path: lucene/core/src/java/org/apache/lucene/index/AutomatonTermsEnum.java ## @@ -54,18 +56,20 @@ private final boolean finite; // array of sorted transitions for each state, indexed by state number private final Automaton automaton; - // for path tracking: each long records gen when we last + // for path tracking: each short records gen when we last // visited the state; we use gens to avoid having to clear - private final long[] visited; Review comment: Good catch. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz opened a new pull request #1284: LUCENE-9247: Add tests for `checkIntegrity`.
jpountz opened a new pull request #1284: LUCENE-9247: Add tests for `checkIntegrity`. URL: https://github.com/apache/lucene-solr/pull/1284 This adds a test to `BaseIndexFileFormatTestCase` that the combination of opening a reader and calling `checkIntegrity` on it reads all bytes of all files (including index headers and footers). This would help detect most cases when `checkIntegrity` is not implemented correctly. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9247) Test that checkIntegrity doesn't miss any file
Adrien Grand created LUCENE-9247: Summary: Test that checkIntegrity doesn't miss any file Key: LUCENE-9247 URL: https://issues.apache.org/jira/browse/LUCENE-9247 Project: Lucene - Core Issue Type: Test Reporter: Adrien Grand An Elasticsearch test found out that CompressingStoredFieldsReader neither checks the integrity of its index at open time nor when checkIntegrity is called. We should have a test that detects this kind of bug. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-8954) Refactor Nori(Korean) Analyzer
[ https://issues.apache.org/jira/browse/LUCENE-8954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namgyu Kim resolved LUCENE-8954. Resolution: Fixed > Refactor Nori(Korean) Analyzer > -- > > Key: LUCENE-8954 > URL: https://issues.apache.org/jira/browse/LUCENE-8954 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Namgyu Kim >Assignee: Namgyu Kim >Priority: Minor > Fix For: 8.x, master (9.0) > > Time Spent: 1h > Remaining Estimate: 0h > > There are many codes that can be refactored in the Nori analyzer. > (whitespace, wrong type casting, unnecessary throws, C-style array, ...) > I think it's good to proceed if we can. > It has nothing to do with the actual working of Nori. > I'll just remove unnecessary code and make the code simple. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-8954) Refactor Nori(Korean) Analyzer
[ https://issues.apache.org/jira/browse/LUCENE-8954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namgyu Kim updated LUCENE-8954: --- Fix Version/s: master (9.0) 8.x > Refactor Nori(Korean) Analyzer > -- > > Key: LUCENE-8954 > URL: https://issues.apache.org/jira/browse/LUCENE-8954 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Namgyu Kim >Assignee: Namgyu Kim >Priority: Minor > Fix For: 8.x, master (9.0) > > Time Spent: 1h > Remaining Estimate: 0h > > There are many codes that can be refactored in the Nori analyzer. > (whitespace, wrong type casting, unnecessary throws, C-style array, ...) > I think it's good to proceed if we can. > It has nothing to do with the actual working of Nori. > I'll just remove unnecessary code and make the code simple. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8954) Refactor Nori(Korean) Analyzer
[ https://issues.apache.org/jira/browse/LUCENE-8954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043592#comment-17043592 ] ASF subversion and git services commented on LUCENE-8954: - Commit 29b7e1a95c3a8857ef8ce05c0679c66e04b1f3e0 in lucene-solr's branch refs/heads/branch_8x from Namgyu Kim [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=29b7e1a ] LUCENE-8954: refactor Nori analyzer Signed-off-by: Namgyu Kim > Refactor Nori(Korean) Analyzer > -- > > Key: LUCENE-8954 > URL: https://issues.apache.org/jira/browse/LUCENE-8954 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Namgyu Kim >Assignee: Namgyu Kim >Priority: Minor > Time Spent: 1h > Remaining Estimate: 0h > > There are many codes that can be refactored in the Nori analyzer. > (whitespace, wrong type casting, unnecessary throws, C-style array, ...) > I think it's good to proceed if we can. > It has nothing to do with the actual working of Nori. > I'll just remove unnecessary code and make the code simple. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] danmuzi merged pull request #1276: LUCENE-8954: refactor Nori analyzer
danmuzi merged pull request #1276: LUCENE-8954: refactor Nori analyzer URL: https://github.com/apache/lucene-solr/pull/1276 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] danmuzi commented on issue #1276: LUCENE-8954: refactor Nori analyzer
danmuzi commented on issue #1276: LUCENE-8954: refactor Nori analyzer URL: https://github.com/apache/lucene-solr/pull/1276#issuecomment-590368344 Thanks for checking, @jimczi I'll merge this commit :D This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] rmuir commented on a change in pull request #1281: LUCENE-9245: Optimize AutomatonTermsEnum memory and automaton Operations.getCommonPrefixBytesRef.
rmuir commented on a change in pull request #1281: LUCENE-9245: Optimize AutomatonTermsEnum memory and automaton Operations.getCommonPrefixBytesRef. URL: https://github.com/apache/lucene-solr/pull/1281#discussion_r383312415 ## File path: lucene/core/src/java/org/apache/lucene/util/automaton/Operations.java ## @@ -1091,25 +1091,33 @@ public static String getCommonPrefix(Automaton a) { * @return common prefix, which can be an empty (length 0) BytesRef (never null) */ public static BytesRef getCommonPrefixBytesRef(Automaton a) { Review comment: I don't think this is the right tradeoff. It makes the code more complex, saving the cost of creating a few simple ordinary objects. I hate to say i don't trust your benchmark, but I don't trust your benchmark represents a typical case here. We should keep this code simple, there are other ways it can be improved. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-13411) CompositeIdRouter calculates wrong route hash if atomic update is used for route.field
[ https://issues.apache.org/jira/browse/SOLR-13411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dr Oleg Savrasov updated SOLR-13411: Attachment: SOLR-13411.patch > CompositeIdRouter calculates wrong route hash if atomic update is used for > route.field > -- > > Key: SOLR-13411 > URL: https://issues.apache.org/jira/browse/SOLR-13411 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 7.5 >Reporter: Niko Himanen >Assignee: Mikhail Khludnev >Priority: Minor > Attachments: SOLR-13411.patch, SOLR-13411.patch > > > If collection is created with router.field -parameter to define some other > field than uniqueField as route field and document update comes containing > route field updated using atomic update syntax (for example set=123), hash > for document routing is calculated from "set=123" and not from 123 which is > the real value which may lead into routing document to wrong shard. > > This happens in CompositeIdRouter#sliceHash, where field value is used as is > for hash calculation. > > I think there are two possible solutions to fix this: > a) Allow use of atomic update also for route.field, but use real value > instead of atomic update syntax to route document into right shard. > b) Deny atomic update for route.field and throw exception. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13411) CompositeIdRouter calculates wrong route hash if atomic update is used for route.field
[ https://issues.apache.org/jira/browse/SOLR-13411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043562#comment-17043562 ] Dr Oleg Savrasov commented on SOLR-13411: - Minor fix for failed test. > CompositeIdRouter calculates wrong route hash if atomic update is used for > route.field > -- > > Key: SOLR-13411 > URL: https://issues.apache.org/jira/browse/SOLR-13411 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 7.5 >Reporter: Niko Himanen >Assignee: Mikhail Khludnev >Priority: Minor > Attachments: SOLR-13411.patch, SOLR-13411.patch > > > If collection is created with router.field -parameter to define some other > field than uniqueField as route field and document update comes containing > route field updated using atomic update syntax (for example set=123), hash > for document routing is calculated from "set=123" and not from 123 which is > the real value which may lead into routing document to wrong shard. > > This happens in CompositeIdRouter#sliceHash, where field value is used as is > for hash calculation. > > I think there are two possible solutions to fix this: > a) Allow use of atomic update also for route.field, but use real value > instead of atomic update syntax to route document into right shard. > b) Deny atomic update for route.field and throw exception. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9212) Intervals.multiterm() should take a CompiledAutomaton
[ https://issues.apache.org/jira/browse/LUCENE-9212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043554#comment-17043554 ] ASF subversion and git services commented on LUCENE-9212: - Commit b4c2e279a94988c26b61d4fb95ec208081f0448a in lucene-solr's branch refs/heads/master from Alan Woodward [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b4c2e27 ] LUCENE-9212: Fix precommit > Intervals.multiterm() should take a CompiledAutomaton > - > > Key: LUCENE-9212 > URL: https://issues.apache.org/jira/browse/LUCENE-9212 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Fix For: 8.5 > > Time Spent: 0.5h > Remaining Estimate: 0h > > LUCENE-9028 added a `multiterm` factory method for intervals that accepts an > arbitrary Automaton, and converts it internally into a CompiledAutomaton. > This isn't necessarily correct behaviour, however, because Automatons can be > defined in both binary and unicode space, and there's no way of telling which > it is when it comes to compiling them. In particular, for automatons > produced by FuzzyTermsEnum, we need to convert them to unicode before > compilation. > The `multiterm` factory should just take `CompiledAutomaton` directly, and we > should deprecate the methods that take `Automaton` and remove in master. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] rmuir commented on a change in pull request #1281: LUCENE-9245: Optimize AutomatonTermsEnum memory and automaton Operations.getCommonPrefixBytesRef.
rmuir commented on a change in pull request #1281: LUCENE-9245: Optimize AutomatonTermsEnum memory and automaton Operations.getCommonPrefixBytesRef. URL: https://github.com/apache/lucene-solr/pull/1281#discussion_r383289307 ## File path: lucene/core/src/java/org/apache/lucene/index/AutomatonTermsEnum.java ## @@ -54,18 +56,20 @@ private final boolean finite; // array of sorted transitions for each state, indexed by state number private final Automaton automaton; - // for path tracking: each long records gen when we last + // for path tracking: each short records gen when we last // visited the state; we use gens to avoid having to clear - private final long[] visited; Review comment: visited-state-tracking is only needed when the automaton accepts an infinite language. We use it for loop detection. I think before we get too fancy with how we clear it, we should first stop being stupid about it? So it is wasteful that we do this stuff when `finite == true` (example: fuzzy query) because we will never even look for a loop. its just that the current code unconditionally records states that it visited. I think first, in the ctor when `finite == true`, `visited[]` can be initialized to `null` or `new long[0]` or something, and we change this line: ``` visited[state] = curGen; ``` to something like this: ``` if (!finite) visited[state] = curGen; ``` I agree we should separately avoid tracking 64 bits per state when only 1 is needed. But before optimizing the storage, first lets avoid doing this stuff at all for ones like complex fuzzy queries? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz opened a new pull request #1283: LUCENE-9246: Remove `dOff` argument from `LZ4#decompress`.
jpountz opened a new pull request #1283: LUCENE-9246: Remove `dOff` argument from `LZ4#decompress`. URL: https://github.com/apache/lucene-solr/pull/1283 It is always set to 0 at call sites. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9246) Remove "destOff" argument from LZ4#decompress
Adrien Grand created LUCENE-9246: Summary: Remove "destOff" argument from LZ4#decompress Key: LUCENE-9246 URL: https://issues.apache.org/jira/browse/LUCENE-9246 Project: Lucene - Core Issue Type: Task Reporter: Adrien Grand All call sites set it to 0, and it appears to not be handled properly when set to a different value than 0. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] rmuir commented on a change in pull request #1281: LUCENE-9245: Optimize AutomatonTermsEnum memory and automaton Operations.getCommonPrefixBytesRef.
rmuir commented on a change in pull request #1281: LUCENE-9245: Optimize AutomatonTermsEnum memory and automaton Operations.getCommonPrefixBytesRef. URL: https://github.com/apache/lucene-solr/pull/1281#discussion_r383282576 ## File path: lucene/core/src/java/org/apache/lucene/index/AutomatonTermsEnum.java ## @@ -188,7 +188,11 @@ private boolean nextString() { savedStates.setIntAt(0, 0); while (true) { - curGen++; + if (++curGen == 0) { +// Clear the visited states every time curGen overflows (so very infrequently to not impact average perf). +curGen++; Review comment: Can we remove this unnecessary increment. Also i'd change the comment from `overflows` to `wraps`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] bruno-roustant commented on a change in pull request #1281: LUCENE-9245: Optimize AutomatonTermsEnum memory and automaton Operations.getCommonPrefixBytesRef.
bruno-roustant commented on a change in pull request #1281: LUCENE-9245: Optimize AutomatonTermsEnum memory and automaton Operations.getCommonPrefixBytesRef. URL: https://github.com/apache/lucene-solr/pull/1281#discussion_r383260144 ## File path: lucene/core/src/java/org/apache/lucene/index/AutomatonTermsEnum.java ## @@ -160,17 +161,18 @@ private void setLinear(int position) { if (maxInterval != 0xff) maxInterval++; int length = position + 1; /* position + maxTransition */ -if (linearUpperBound.bytes.length < length) - linearUpperBound.bytes = new byte[length]; +if (linearUpperBound == null) { + linearUpperBound = new BytesRef(ArrayUtil.oversize(Math.max(length, 16), Byte.BYTES)); +} else if (linearUpperBound.bytes.length < length) { + linearUpperBound.bytes = new byte[ArrayUtil.oversize(length, Byte.BYTES)]; Review comment: +1 thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] rmuir commented on a change in pull request #1281: LUCENE-9245: Optimize AutomatonTermsEnum memory and automaton Operations.getCommonPrefixBytesRef.
rmuir commented on a change in pull request #1281: LUCENE-9245: Optimize AutomatonTermsEnum memory and automaton Operations.getCommonPrefixBytesRef. URL: https://github.com/apache/lucene-solr/pull/1281#discussion_r383259436 ## File path: lucene/core/src/java/org/apache/lucene/index/AutomatonTermsEnum.java ## @@ -160,17 +161,18 @@ private void setLinear(int position) { if (maxInterval != 0xff) maxInterval++; int length = position + 1; /* position + maxTransition */ -if (linearUpperBound.bytes.length < length) - linearUpperBound.bytes = new byte[length]; +if (linearUpperBound == null) { + linearUpperBound = new BytesRef(ArrayUtil.oversize(Math.max(length, 16), Byte.BYTES)); +} else if (linearUpperBound.bytes.length < length) { + linearUpperBound.bytes = new byte[ArrayUtil.oversize(length, Byte.BYTES)]; Review comment: I don't think we should have the additional null check path here. It is not worth it to save 10 bytes :). Let's make linearUpperBound final again. Better to initialize it to `new BytesRef()` if you really want to save 10 bytes for the case that its not used, does not require additional branches in the code, it will just get extended by the length check. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9241) fix most memory-hungry tests
[ https://issues.apache.org/jira/browse/LUCENE-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043486#comment-17043486 ] Bruno Roustant commented on LUCENE-9241: As expected I saw no noticeable impact in the luceneutil benchmarks. > fix most memory-hungry tests > > > Key: LUCENE-9241 > URL: https://issues.apache.org/jira/browse/LUCENE-9241 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Robert Muir >Priority: Major > Attachments: LUCENE-9241.patch > > > Currently each test jvm has Xmx of 512M. With a modern macbook pro this is > 4GB which is pretty crazy. > On the other hand, if we fix a few edge cases, tests can work with lower > heaps such as 128M. This can save many gigabytes (also it finds interesting > memory waste/issues). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13411) CompositeIdRouter calculates wrong route hash if atomic update is used for route.field
[ https://issues.apache.org/jira/browse/SOLR-13411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043465#comment-17043465 ] Lucene/Solr QA commented on SOLR-13411: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 4s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Release audit (RAT) {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Check forbidden APIs {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Validate source patterns {color} | {color:green} 1m 2s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 48m 53s{color} | {color:red} core in the patch failed. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 52m 51s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | solr.update.TestUpdate | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | SOLR-13411 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12994301/SOLR-13411.patch | | Optional Tests | compile javac unit ratsources checkforbiddenapis validatesourcepatterns | | uname | Linux lucene1-us-west 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | ant | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh | | git revision | master / 19fe1eee68d | | ant | version: Apache Ant(TM) version 1.10.5 compiled on March 28 2019 | | Default Java | LTS | | unit | https://builds.apache.org/job/PreCommit-SOLR-Build/686/artifact/out/patch-unit-solr_core.txt | | Test Results | https://builds.apache.org/job/PreCommit-SOLR-Build/686/testReport/ | | modules | C: solr/core U: solr/core | | Console output | https://builds.apache.org/job/PreCommit-SOLR-Build/686/console | | Powered by | Apache Yetus 0.7.0 http://yetus.apache.org | This message was automatically generated. > CompositeIdRouter calculates wrong route hash if atomic update is used for > route.field > -- > > Key: SOLR-13411 > URL: https://issues.apache.org/jira/browse/SOLR-13411 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 7.5 >Reporter: Niko Himanen >Assignee: Mikhail Khludnev >Priority: Minor > Attachments: SOLR-13411.patch > > > If collection is created with router.field -parameter to define some other > field than uniqueField as route field and document update comes containing > route field updated using atomic update syntax (for example set=123), hash > for document routing is calculated from "set=123" and not from 123 which is > the real value which may lead into routing document to wrong shard. > > This happens in CompositeIdRouter#sliceHash, where field value is used as is > for hash calculation. > > I think there are two possible solutions to fix this: > a) Allow use of atomic update also for route.field, but use real value > instead of atomic update syntax to route document into right shard. > b) Deny atomic update for route.field and throw exception. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (SOLR-13411) CompositeIdRouter calculates wrong route hash if atomic update is used for route.field
[ https://issues.apache.org/jira/browse/SOLR-13411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Khludnev reassigned SOLR-13411: --- Assignee: Mikhail Khludnev > CompositeIdRouter calculates wrong route hash if atomic update is used for > route.field > -- > > Key: SOLR-13411 > URL: https://issues.apache.org/jira/browse/SOLR-13411 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 7.5 >Reporter: Niko Himanen >Assignee: Mikhail Khludnev >Priority: Minor > Attachments: SOLR-13411.patch > > > If collection is created with router.field -parameter to define some other > field than uniqueField as route field and document update comes containing > route field updated using atomic update syntax (for example set=123), hash > for document routing is calculated from "set=123" and not from 123 which is > the real value which may lead into routing document to wrong shard. > > This happens in CompositeIdRouter#sliceHash, where field value is used as is > for hash calculation. > > I think there are two possible solutions to fix this: > a) Allow use of atomic update also for route.field, but use real value > instead of atomic update syntax to route document into right shard. > b) Deny atomic update for route.field and throw exception. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9212) Intervals.multiterm() should take a CompiledAutomaton
[ https://issues.apache.org/jira/browse/LUCENE-9212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Woodward resolved LUCENE-9212. --- Fix Version/s: 8.5 Resolution: Fixed > Intervals.multiterm() should take a CompiledAutomaton > - > > Key: LUCENE-9212 > URL: https://issues.apache.org/jira/browse/LUCENE-9212 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Fix For: 8.5 > > Time Spent: 0.5h > Remaining Estimate: 0h > > LUCENE-9028 added a `multiterm` factory method for intervals that accepts an > arbitrary Automaton, and converts it internally into a CompiledAutomaton. > This isn't necessarily correct behaviour, however, because Automatons can be > defined in both binary and unicode space, and there's no way of telling which > it is when it comes to compiling them. In particular, for automatons > produced by FuzzyTermsEnum, we need to convert them to unicode before > compilation. > The `multiterm` factory should just take `CompiledAutomaton` directly, and we > should deprecate the methods that take `Automaton` and remove in master. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9212) Intervals.multiterm() should take a CompiledAutomaton
[ https://issues.apache.org/jira/browse/LUCENE-9212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043427#comment-17043427 ] ASF subversion and git services commented on LUCENE-9212: - Commit 19fe1eee68d83f73c8416b319bd1b38c6e73f053 in lucene-solr's branch refs/heads/master from Alan Woodward [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=19fe1ee ] LUCENE-9212: Remove deprecated Intervals.multiterm() methods > Intervals.multiterm() should take a CompiledAutomaton > - > > Key: LUCENE-9212 > URL: https://issues.apache.org/jira/browse/LUCENE-9212 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > LUCENE-9028 added a `multiterm` factory method for intervals that accepts an > arbitrary Automaton, and converts it internally into a CompiledAutomaton. > This isn't necessarily correct behaviour, however, because Automatons can be > defined in both binary and unicode space, and there's no way of telling which > it is when it comes to compiling them. In particular, for automatons > produced by FuzzyTermsEnum, we need to convert them to unicode before > compilation. > The `multiterm` factory should just take `CompiledAutomaton` directly, and we > should deprecate the methods that take `Automaton` and remove in master. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9212) Intervals.multiterm() should take a CompiledAutomaton
[ https://issues.apache.org/jira/browse/LUCENE-9212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043425#comment-17043425 ] ASF subversion and git services commented on LUCENE-9212: - Commit 90028a7b935ad3205a8a6837cbb7ce1e9dbb6dff in lucene-solr's branch refs/heads/branch_8x from Alan Woodward [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=90028a7 ] LUCENE-9212: Intervals.multiterm() should take CompiledAutomaton > Intervals.multiterm() should take a CompiledAutomaton > - > > Key: LUCENE-9212 > URL: https://issues.apache.org/jira/browse/LUCENE-9212 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > LUCENE-9028 added a `multiterm` factory method for intervals that accepts an > arbitrary Automaton, and converts it internally into a CompiledAutomaton. > This isn't necessarily correct behaviour, however, because Automatons can be > defined in both binary and unicode space, and there's no way of telling which > it is when it comes to compiling them. In particular, for automatons > produced by FuzzyTermsEnum, we need to convert them to unicode before > compilation. > The `multiterm` factory should just take `CompiledAutomaton` directly, and we > should deprecate the methods that take `Automaton` and remove in master. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9212) Intervals.multiterm() should take a CompiledAutomaton
[ https://issues.apache.org/jira/browse/LUCENE-9212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043422#comment-17043422 ] ASF subversion and git services commented on LUCENE-9212: - Commit ffb7cafe9351cd6cd5181bc06dd053d586f6d63f in lucene-solr's branch refs/heads/master from Alan Woodward [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ffb7caf ] LUCENE-9212: Intervals.multiterm() should take CompiledAutomaton > Intervals.multiterm() should take a CompiledAutomaton > - > > Key: LUCENE-9212 > URL: https://issues.apache.org/jira/browse/LUCENE-9212 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > LUCENE-9028 added a `multiterm` factory method for intervals that accepts an > arbitrary Automaton, and converts it internally into a CompiledAutomaton. > This isn't necessarily correct behaviour, however, because Automatons can be > defined in both binary and unicode space, and there's no way of telling which > it is when it comes to compiling them. In particular, for automatons > produced by FuzzyTermsEnum, we need to convert them to unicode before > compilation. > The `multiterm` factory should just take `CompiledAutomaton` directly, and we > should deprecate the methods that take `Automaton` and remove in master. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] romseygeek commented on issue #1243: LUCENE-9212: Intervals.multiterm() should take CompiledAutomaton
romseygeek commented on issue #1243: LUCENE-9212: Intervals.multiterm() should take CompiledAutomaton URL: https://github.com/apache/lucene-solr/pull/1243#issuecomment-590270502 Merged as ffb7cafe9351cd6cd5181bc06dd053d586f6d63f This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] romseygeek closed pull request #1243: LUCENE-9212: Intervals.multiterm() should take CompiledAutomaton
romseygeek closed pull request #1243: LUCENE-9212: Intervals.multiterm() should take CompiledAutomaton URL: https://github.com/apache/lucene-solr/pull/1243 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13411) CompositeIdRouter calculates wrong route hash if atomic update is used for route.field
[ https://issues.apache.org/jira/browse/SOLR-13411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043408#comment-17043408 ] Mikhail Khludnev commented on SOLR-13411: - Appreciate, [~osavrasov]. Let's open go/no-go vote. I push it this week. > CompositeIdRouter calculates wrong route hash if atomic update is used for > route.field > -- > > Key: SOLR-13411 > URL: https://issues.apache.org/jira/browse/SOLR-13411 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 7.5 >Reporter: Niko Himanen >Priority: Minor > Attachments: SOLR-13411.patch > > > If collection is created with router.field -parameter to define some other > field than uniqueField as route field and document update comes containing > route field updated using atomic update syntax (for example set=123), hash > for document routing is calculated from "set=123" and not from 123 which is > the real value which may lead into routing document to wrong shard. > > This happens in CompositeIdRouter#sliceHash, where field value is used as is > for hash calculation. > > I think there are two possible solutions to fix this: > a) Allow use of atomic update also for route.field, but use real value > instead of atomic update syntax to route document into right shard. > b) Deny atomic update for route.field and throw exception. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-13411) CompositeIdRouter calculates wrong route hash if atomic update is used for route.field
[ https://issues.apache.org/jira/browse/SOLR-13411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mikhail Khludnev updated SOLR-13411: Status: Patch Available (was: Open) > CompositeIdRouter calculates wrong route hash if atomic update is used for > route.field > -- > > Key: SOLR-13411 > URL: https://issues.apache.org/jira/browse/SOLR-13411 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 7.5 >Reporter: Niko Himanen >Priority: Minor > Attachments: SOLR-13411.patch > > > If collection is created with router.field -parameter to define some other > field than uniqueField as route field and document update comes containing > route field updated using atomic update syntax (for example set=123), hash > for document routing is calculated from "set=123" and not from 123 which is > the real value which may lead into routing document to wrong shard. > > This happens in CompositeIdRouter#sliceHash, where field value is used as is > for hash calculation. > > I think there are two possible solutions to fix this: > a) Allow use of atomic update also for route.field, but use real value > instead of atomic update syntax to route document into right shard. > b) Deny atomic update for route.field and throw exception. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9207) Don't build SpanQuery in QueryBuilder
[ https://issues.apache.org/jira/browse/LUCENE-9207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043401#comment-17043401 ] Alan Woodward commented on LUCENE-9207: --- I think this should probably be a 9.0-only change, particularly given that the parent issue is not going to be backported. Will commit to master presently. > Don't build SpanQuery in QueryBuilder > - > > Key: LUCENE-9207 > URL: https://issues.apache.org/jira/browse/LUCENE-9207 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > Subtask of LUCENE-9204. QueryBuilder currently has special logic for graph > phrase queries with no slop, constructing a spanquery that attempts to follow > all paths using a combination of OR and NEAR queries. Given the known bugs > in this type of query (LUCENE-7398) and that we would like to move span > queries out of core in any case, we should remove this logic and just build a > disjunction of phrase queries, one phrase per path. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9171) Synonyms Boost by Payload
[ https://issues.apache.org/jira/browse/LUCENE-9171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Woodward updated LUCENE-9171: -- Fix Version/s: 8.5 > Synonyms Boost by Payload > - > > Key: LUCENE-9171 > URL: https://issues.apache.org/jira/browse/LUCENE-9171 > Project: Lucene - Core > Issue Type: New Feature > Components: core/queryparser >Reporter: Alessandro Benedetti >Priority: Major > Fix For: 8.5 > > Time Spent: 10m > Remaining Estimate: 0h > > I have been working in the additional capability of boosting queries by terms > payload through a parameter to enable it in Lucene Query Builder. > This has been done targeting the Synonyms Query. > It is parametric, so it meant to see no difference unless the feature is > enabled. > Solr has its bits to comply thorugh its SynonymsQueryStyles -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-12238) Synonym Query Style Boost By Payload
[ https://issues.apache.org/jira/browse/SOLR-12238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Woodward updated SOLR-12238: - Fix Version/s: 8.5 Assignee: Alan Woodward Resolution: Fixed Status: Resolved (was: Patch Available) > Synonym Query Style Boost By Payload > > > Key: SOLR-12238 > URL: https://issues.apache.org/jira/browse/SOLR-12238 > Project: Solr > Issue Type: Improvement > Components: query parsers >Affects Versions: 7.2 >Reporter: Alessandro Benedetti >Assignee: Alan Woodward >Priority: Major > Fix For: 8.5 > > Attachments: SOLR-12238.patch, SOLR-12238.patch, SOLR-12238.patch, > SOLR-12238.patch > > Time Spent: 8h > Remaining Estimate: 0h > > This improvement is built on top of the Synonym Query Style feature and > brings the possibility of boosting synonym queries using the payload > associated. > It introduces two new modalities for the Synonym Query Style : > PICK_BEST_BOOST_BY_PAYLOAD -> build a Disjunction query with the clauses > boosted by payload > AS_DISTINCT_TERMS_BOOST_BY_PAYLOAD -> build a Boolean query with the clauses > boosted by payload > This new synonym query styles will assume payloads are available so they must > be used in conjunction with a token filter able to produce payloads. > An synonym.txt example could be : > # Synonyms used by Payload Boost > tiger => tiger|1.0, Big_Cat|0.8, Shere_Khan|0.9 > leopard => leopard, Big_Cat|0.8, Bagheera|0.9 > lion => lion|1.0, panthera leo|0.99, Simba|0.8 > snow_leopard => panthera uncia|0.99, snow leopard|1.0 > A simple token filter to populate the payloads from such synonym.txt is : > delimiter="|"/> -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9171) Synonyms Boost by Payload
[ https://issues.apache.org/jira/browse/LUCENE-9171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Woodward updated LUCENE-9171: -- Resolution: Fixed Status: Resolved (was: Patch Available) Resolved by SOLR-12238 > Synonyms Boost by Payload > - > > Key: LUCENE-9171 > URL: https://issues.apache.org/jira/browse/LUCENE-9171 > Project: Lucene - Core > Issue Type: New Feature > Components: core/queryparser >Reporter: Alessandro Benedetti >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > I have been working in the additional capability of boosting queries by terms > payload through a parameter to enable it in Lucene Query Builder. > This has been done targeting the Synonyms Query. > It is parametric, so it meant to see no difference unless the feature is > enabled. > Solr has its bits to comply thorugh its SynonymsQueryStyles -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-12238) Synonym Query Style Boost By Payload
[ https://issues.apache.org/jira/browse/SOLR-12238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043390#comment-17043390 ] ASF subversion and git services commented on SOLR-12238: Commit 2752d50dd1dcf758a32dc573d02967612a2cf1ff in lucene-solr's branch refs/heads/branch_8x from Alessandro Benedetti [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=2752d50 ] SOLR-12238: Handle boosts in QueryBuilder QueryBuilder now detects per-term boosts supplied by a BoostAttribute when building queries using a TokenStream. This commit also adds a DelimitedBoostTokenFilter that parses boosts from tokens using a delimiter token, and exposes this in Solr > Synonym Query Style Boost By Payload > > > Key: SOLR-12238 > URL: https://issues.apache.org/jira/browse/SOLR-12238 > Project: Solr > Issue Type: Improvement > Components: query parsers >Affects Versions: 7.2 >Reporter: Alessandro Benedetti >Priority: Major > Attachments: SOLR-12238.patch, SOLR-12238.patch, SOLR-12238.patch, > SOLR-12238.patch > > Time Spent: 8h > Remaining Estimate: 0h > > This improvement is built on top of the Synonym Query Style feature and > brings the possibility of boosting synonym queries using the payload > associated. > It introduces two new modalities for the Synonym Query Style : > PICK_BEST_BOOST_BY_PAYLOAD -> build a Disjunction query with the clauses > boosted by payload > AS_DISTINCT_TERMS_BOOST_BY_PAYLOAD -> build a Boolean query with the clauses > boosted by payload > This new synonym query styles will assume payloads are available so they must > be used in conjunction with a token filter able to produce payloads. > An synonym.txt example could be : > # Synonyms used by Payload Boost > tiger => tiger|1.0, Big_Cat|0.8, Shere_Khan|0.9 > leopard => leopard, Big_Cat|0.8, Bagheera|0.9 > lion => lion|1.0, panthera leo|0.99, Simba|0.8 > snow_leopard => panthera uncia|0.99, snow leopard|1.0 > A simple token filter to populate the payloads from such synonym.txt is : > delimiter="|"/> -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] juanka588 commented on issue #1282: Lucene 9236
juanka588 commented on issue #1282: Lucene 9236 URL: https://github.com/apache/lucene-solr/pull/1282#issuecomment-590257020 Please review each commit apart. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] juanka588 commented on a change in pull request #1282: Lucene 9236
juanka588 commented on a change in pull request #1282: Lucene 9236 URL: https://github.com/apache/lucene-solr/pull/1282#discussion_r383186292 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80BinaryConsumer.java ## @@ -48,6 +53,16 @@ public Lucene80BinaryConsumer(SegmentWriteState state) { this.maxDoc = state.segmentInfo.maxDoc(); } + @Override + public CompositeFieldMetadata addBinary(FieldInfo field, DocValuesProducer valuesProducer, IndexOutput indexOutput) throws IOException { +ByteBuffersDataOutput delegate = ByteBuffersDataOutput.newResettableInstance(); Review comment: this can be replaced with a BinaryEntry Object This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-12238) Synonym Query Style Boost By Payload
[ https://issues.apache.org/jira/browse/SOLR-12238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043374#comment-17043374 ] ASF subversion and git services commented on SOLR-12238: Commit 663611c99c7d48dd31d53ea17644fcecd5e0fad7 in lucene-solr's branch refs/heads/master from Alessandro Benedetti [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=663611c ] [SOLR-12238] Synonym Queries boost (#357) SOLR-12238: Handle boosts in QueryBuilder QueryBuilder now detects per-term boosts supplied by a BoostAttribute when building queries using a TokenStream. This commit also adds a DelimitedBoostTokenFilter that parses boosts from tokens using a delimiter token, and exposes this in Solr > Synonym Query Style Boost By Payload > > > Key: SOLR-12238 > URL: https://issues.apache.org/jira/browse/SOLR-12238 > Project: Solr > Issue Type: Improvement > Components: query parsers >Affects Versions: 7.2 >Reporter: Alessandro Benedetti >Priority: Major > Attachments: SOLR-12238.patch, SOLR-12238.patch, SOLR-12238.patch, > SOLR-12238.patch > > Time Spent: 8h > Remaining Estimate: 0h > > This improvement is built on top of the Synonym Query Style feature and > brings the possibility of boosting synonym queries using the payload > associated. > It introduces two new modalities for the Synonym Query Style : > PICK_BEST_BOOST_BY_PAYLOAD -> build a Disjunction query with the clauses > boosted by payload > AS_DISTINCT_TERMS_BOOST_BY_PAYLOAD -> build a Boolean query with the clauses > boosted by payload > This new synonym query styles will assume payloads are available so they must > be used in conjunction with a token filter able to produce payloads. > An synonym.txt example could be : > # Synonyms used by Payload Boost > tiger => tiger|1.0, Big_Cat|0.8, Shere_Khan|0.9 > leopard => leopard, Big_Cat|0.8, Bagheera|0.9 > lion => lion|1.0, panthera leo|0.99, Simba|0.8 > snow_leopard => panthera uncia|0.99, snow leopard|1.0 > A simple token filter to populate the payloads from such synonym.txt is : > delimiter="|"/> -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] juanka588 commented on a change in pull request #1282: Lucene 9236
juanka588 commented on a change in pull request #1282: Lucene 9236 URL: https://github.com/apache/lucene-solr/pull/1282#discussion_r383185344 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80NumericProducer.java ## @@ -0,0 +1,541 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.codecs.lucene80; + +import java.io.IOException; +import java.util.HashMap; +import java.util.Map; + +import org.apache.lucene.codecs.composite.CompositeDocValuesProducer; +import org.apache.lucene.codecs.composite.CompositeFieldMetadata; +import org.apache.lucene.index.CorruptIndexException; +import org.apache.lucene.index.DocValues; +import org.apache.lucene.index.FieldInfo; +import org.apache.lucene.index.NumericDocValues; +import org.apache.lucene.index.SortedNumericDocValues; +import org.apache.lucene.store.IndexInput; +import org.apache.lucene.store.RandomAccessInput; +import org.apache.lucene.util.Accountable; +import org.apache.lucene.util.LongValues; +import org.apache.lucene.util.RamUsageEstimator; +import org.apache.lucene.util.packed.DirectMonotonicReader; +import org.apache.lucene.util.packed.DirectReader; + +public class Lucene80NumericProducer{ + private final int maxDoc; + + public Lucene80NumericProducer(int maxDoc) { +this.maxDoc = maxDoc; + } + + static class NumericEntry implements Accountable { +long[] table; +int blockShift; +byte bitsPerValue; +long docsWithFieldOffset; +long docsWithFieldLength; +short jumpTableEntryCount; +byte denseRankPower; +long numValues; +long minValue; +long gcd; +long valuesOffset; +long valuesLength; +long valueJumpTableOffset; // -1 if no jump-table + +@Override +public long ramBytesUsed() { + return table == null ? 0L : RamUsageEstimator.sizeOf(table); +} + } + + static class SortedNumericEntry extends NumericEntry { +int numDocsWithField; +DirectMonotonicReader.Meta addressesMeta; +long addressesOffset; +long addressesLength; + +@Override +public long ramBytesUsed() { + long ramBytesUsed = super.ramBytesUsed(); + ramBytesUsed += addressesMeta == null ? 0L : addressesMeta.ramBytesUsed(); + return ramBytesUsed; +} + } + + static NumericEntry readNumeric(IndexInput meta) throws IOException { +NumericEntry entry = new NumericEntry(); +readNumeric(meta, entry); +return entry; + } + + static void readNumeric(IndexInput meta, NumericEntry entry) throws IOException { +entry.docsWithFieldOffset = meta.readLong(); +entry.docsWithFieldLength = meta.readLong(); +entry.jumpTableEntryCount = meta.readShort(); +entry.denseRankPower = meta.readByte(); +entry.numValues = meta.readLong(); +int tableSize = meta.readInt(); +if (tableSize > 256) { + throw new CorruptIndexException("invalid table size: " + tableSize, meta); +} +if (tableSize >= 0) { + entry.table = new long[tableSize]; + for (int i = 0; i < tableSize; ++i) { +entry.table[i] = meta.readLong(); + } +} +if (tableSize < -1) { + entry.blockShift = -2 - tableSize; +} else { + entry.blockShift = -1; +} +entry.bitsPerValue = meta.readByte(); +entry.minValue = meta.readLong(); +entry.gcd = meta.readLong(); +entry.valuesOffset = meta.readLong(); +entry.valuesLength = meta.readLong(); +entry.valueJumpTableOffset = meta.readLong(); + } + + static SortedNumericEntry readSortedNumeric(IndexInput meta) throws IOException { +SortedNumericEntry entry = new SortedNumericEntry(); +readNumeric(meta, entry); +entry.numDocsWithField = meta.readInt(); +if (entry.numDocsWithField != entry.numValues) { + entry.addressesOffset = meta.readLong(); + final int blockShift = meta.readVInt(); + entry.addressesMeta = DirectMonotonicReader.loadMeta(meta, entry.numDocsWithField + 1, blockShift); + entry.addressesLength = meta.readLong(); +} +return entry; + } + + public SortedNumericDocValues getSortedNumeric(SortedNumericEntry entry, IndexInput data) throws IOException { +if (entry.numValues == entry.numDocsWithField) { +
[jira] [Commented] (SOLR-12238) Synonym Query Style Boost By Payload
[ https://issues.apache.org/jira/browse/SOLR-12238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043373#comment-17043373 ] ASF subversion and git services commented on SOLR-12238: Commit 663611c99c7d48dd31d53ea17644fcecd5e0fad7 in lucene-solr's branch refs/heads/master from Alessandro Benedetti [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=663611c ] [SOLR-12238] Synonym Queries boost (#357) SOLR-12238: Handle boosts in QueryBuilder QueryBuilder now detects per-term boosts supplied by a BoostAttribute when building queries using a TokenStream. This commit also adds a DelimitedBoostTokenFilter that parses boosts from tokens using a delimiter token, and exposes this in Solr > Synonym Query Style Boost By Payload > > > Key: SOLR-12238 > URL: https://issues.apache.org/jira/browse/SOLR-12238 > Project: Solr > Issue Type: Improvement > Components: query parsers >Affects Versions: 7.2 >Reporter: Alessandro Benedetti >Priority: Major > Attachments: SOLR-12238.patch, SOLR-12238.patch, SOLR-12238.patch, > SOLR-12238.patch > > Time Spent: 8h > Remaining Estimate: 0h > > This improvement is built on top of the Synonym Query Style feature and > brings the possibility of boosting synonym queries using the payload > associated. > It introduces two new modalities for the Synonym Query Style : > PICK_BEST_BOOST_BY_PAYLOAD -> build a Disjunction query with the clauses > boosted by payload > AS_DISTINCT_TERMS_BOOST_BY_PAYLOAD -> build a Boolean query with the clauses > boosted by payload > This new synonym query styles will assume payloads are available so they must > be used in conjunction with a token filter able to produce payloads. > An synonym.txt example could be : > # Synonyms used by Payload Boost > tiger => tiger|1.0, Big_Cat|0.8, Shere_Khan|0.9 > leopard => leopard, Big_Cat|0.8, Bagheera|0.9 > lion => lion|1.0, panthera leo|0.99, Simba|0.8 > snow_leopard => panthera uncia|0.99, snow leopard|1.0 > A simple token filter to populate the payloads from such synonym.txt is : > delimiter="|"/> -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] juanka588 commented on a change in pull request #1282: Lucene 9236
juanka588 commented on a change in pull request #1282: Lucene 9236 URL: https://github.com/apache/lucene-solr/pull/1282#discussion_r383184968 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80NumericConsumer.java ## @@ -0,0 +1,319 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.codecs.lucene80; + +import java.io.IOException; +import java.util.Arrays; +import java.util.HashMap; +import java.util.HashSet; +import java.util.Map; +import java.util.Set; + +import org.apache.lucene.codecs.DocValuesProducer; +import org.apache.lucene.index.FieldInfo; +import org.apache.lucene.index.SegmentWriteState; +import org.apache.lucene.index.SortedNumericDocValues; +import org.apache.lucene.search.DocIdSetIterator; +import org.apache.lucene.store.ByteBuffersDataOutput; +import org.apache.lucene.store.IndexOutput; +import org.apache.lucene.util.ArrayUtil; +import org.apache.lucene.util.MathUtil; +import org.apache.lucene.util.packed.DirectMonotonicWriter; +import org.apache.lucene.util.packed.DirectWriter; + +import static org.apache.lucene.codecs.lucene80.Lucene80DocValuesFormat.DIRECT_MONOTONIC_BLOCK_SHIFT; +import static org.apache.lucene.codecs.lucene80.Lucene80DocValuesFormat.NUMERIC_BLOCK_SHIFT; +import static org.apache.lucene.codecs.lucene80.Lucene80DocValuesFormat.NUMERIC_BLOCK_SIZE; + +public class Lucene80NumericConsumer{ + + private final int maxDoc; + + public Lucene80NumericConsumer(SegmentWriteState state) { +this.maxDoc = state.segmentInfo.maxDoc(); + } + + public void addSortedNumericField(FieldInfo field, DocValuesProducer valuesProducer, IndexOutput data, IndexOutput meta) throws IOException { +long[] stats = writeValues(field, valuesProducer, data, meta); +int numDocsWithField = Math.toIntExact(stats[0]); +long numValues = stats[1]; +assert numValues >= numDocsWithField; + +meta.writeInt(numDocsWithField); +if (numValues > numDocsWithField) { + long start = data.getFilePointer(); + meta.writeLong(start); + meta.writeVInt(DIRECT_MONOTONIC_BLOCK_SHIFT); + + final DirectMonotonicWriter addressesWriter = DirectMonotonicWriter.getInstance(meta, data, numDocsWithField + 1L, DIRECT_MONOTONIC_BLOCK_SHIFT); + long addr = 0; + addressesWriter.add(addr); + SortedNumericDocValues values = valuesProducer.getSortedNumeric(field); + for (int doc = values.nextDoc(); doc != DocIdSetIterator.NO_MORE_DOCS; doc = values.nextDoc()) { +addr += values.docValueCount(); +addressesWriter.add(addr); + } + addressesWriter.finish(); + meta.writeLong(data.getFilePointer() - start); +} + } + + private static class MinMaxTracker { +long min, max, numValues, spaceInBits; + +MinMaxTracker() { + reset(); + spaceInBits = 0; +} + +private void reset() { + min = Long.MAX_VALUE; + max = Long.MIN_VALUE; + numValues = 0; +} + +/** + * Accumulate a new value. + */ +void update(long v) { + min = Math.min(min, v); + max = Math.max(max, v); + ++numValues; +} + +/** + * Update the required space. + */ +void finish() { + if (max > min) { +spaceInBits += DirectWriter.unsignedBitsRequired(max - min) * numValues; + } +} + +/** + * Update space usage and get ready for accumulating values for the next block. + */ +void nextBlock() { + finish(); + reset(); +} + } + + public long[] writeValues(FieldInfo field, DocValuesProducer valuesProducer, IndexOutput data, IndexOutput meta) throws IOException { Review comment: added data and meta index output This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands,
[GitHub] [lucene-solr] romseygeek merged pull request #357: [SOLR-12238] Synonym Queries boost
romseygeek merged pull request #357: [SOLR-12238] Synonym Queries boost URL: https://github.com/apache/lucene-solr/pull/357 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] juanka588 commented on a change in pull request #1282: Lucene 9236
juanka588 commented on a change in pull request #1282: Lucene 9236 URL: https://github.com/apache/lucene-solr/pull/1282#discussion_r383184671 ## File path: lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesProducer.java ## @@ -121,1445 +116,80 @@ private void readFields(ChecksumIndexInput meta, FieldInfos infos) throws IOExce } byte type = meta.readByte(); if (type == Lucene80DocValuesFormat.NUMERIC) { -numerics.put(info.name, readNumeric(meta)); +numerics.put(info.name, Lucene80NumericProducer.readNumeric(meta)); } else if (type == Lucene80DocValuesFormat.BINARY) { -binaries.put(info.name, readBinary(meta)); +binaries.put(info.name, Lucene80BinaryProducer.readBinary(meta, version)); } else if (type == Lucene80DocValuesFormat.SORTED) { -sorted.put(info.name, readSorted(meta)); +sorted.put(info.name, Lucene80SortedSetProducer.readSorted(meta)); } else if (type == Lucene80DocValuesFormat.SORTED_SET) { -sortedSets.put(info.name, readSortedSet(meta)); +sortedSets.put(info.name, Lucene80SortedSetProducer.readSortedSet(meta)); } else if (type == Lucene80DocValuesFormat.SORTED_NUMERIC) { -sortedNumerics.put(info.name, readSortedNumeric(meta)); +sortedNumerics.put(info.name, Lucene80NumericProducer.readSortedNumeric(meta)); } else { throw new CorruptIndexException("invalid type: " + type, meta); } } } - private NumericEntry readNumeric(ChecksumIndexInput meta) throws IOException { -NumericEntry entry = new NumericEntry(); -readNumeric(meta, entry); -return entry; - } - - private void readNumeric(ChecksumIndexInput meta, NumericEntry entry) throws IOException { -entry.docsWithFieldOffset = meta.readLong(); -entry.docsWithFieldLength = meta.readLong(); -entry.jumpTableEntryCount = meta.readShort(); -entry.denseRankPower = meta.readByte(); -entry.numValues = meta.readLong(); -int tableSize = meta.readInt(); -if (tableSize > 256) { - throw new CorruptIndexException("invalid table size: " + tableSize, meta); -} -if (tableSize >= 0) { - entry.table = new long[tableSize]; - ramBytesUsed += RamUsageEstimator.sizeOf(entry.table); - for (int i = 0; i < tableSize; ++i) { -entry.table[i] = meta.readLong(); - } -} -if (tableSize < -1) { - entry.blockShift = -2 - tableSize; -} else { - entry.blockShift = -1; -} -entry.bitsPerValue = meta.readByte(); -entry.minValue = meta.readLong(); -entry.gcd = meta.readLong(); -entry.valuesOffset = meta.readLong(); -entry.valuesLength = meta.readLong(); -entry.valueJumpTableOffset = meta.readLong(); - } - - private BinaryEntry readBinary(ChecksumIndexInput meta) throws IOException { -BinaryEntry entry = new BinaryEntry(); -entry.dataOffset = meta.readLong(); -entry.dataLength = meta.readLong(); -entry.docsWithFieldOffset = meta.readLong(); -entry.docsWithFieldLength = meta.readLong(); -entry.jumpTableEntryCount = meta.readShort(); -entry.denseRankPower = meta.readByte(); -entry.numDocsWithField = meta.readInt(); -entry.minLength = meta.readInt(); -entry.maxLength = meta.readInt(); -if ((version >= Lucene80DocValuesFormat.VERSION_BIN_COMPRESSED && entry.numDocsWithField > 0) || entry.minLength < entry.maxLength) { - entry.addressesOffset = meta.readLong(); - - // Old count of uncompressed addresses - long numAddresses = entry.numDocsWithField + 1L; - // New count of compressed addresses - the number of compresseed blocks - if (version >= Lucene80DocValuesFormat.VERSION_BIN_COMPRESSED) { -entry.numCompressedChunks = meta.readVInt(); -entry.docsPerChunkShift = meta.readVInt(); -entry.maxUncompressedChunkSize = meta.readVInt(); -numAddresses = entry.numCompressedChunks; - } - - final int blockShift = meta.readVInt(); - entry.addressesMeta = DirectMonotonicReader.loadMeta(meta, numAddresses, blockShift); - ramBytesUsed += entry.addressesMeta.ramBytesUsed(); - entry.addressesLength = meta.readLong(); -} -return entry; - } - - private SortedEntry readSorted(ChecksumIndexInput meta) throws IOException { -SortedEntry entry = new SortedEntry(); -entry.docsWithFieldOffset = meta.readLong(); -entry.docsWithFieldLength = meta.readLong(); -entry.jumpTableEntryCount = meta.readShort(); -entry.denseRankPower = meta.readByte(); -entry.numDocsWithField = meta.readInt(); -entry.bitsPerValue = meta.readByte(); -entry.ordsOffset = meta.readLong(); -entry.ordsLength = meta.readLong(); -readTermDict(meta, entry); -return entry; - } - - private SortedSetEntry readSortedSet(ChecksumIndexInput meta) throws IOException { -SortedSetEntry entry = new SortedSetEntry(); -b
[jira] [Updated] (LUCENE-9236) Having a modular Doc Values format
[ https://issues.apache.org/jira/browse/LUCENE-9236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] juan camilo rodriguez duran updated LUCENE-9236: Description: Today DocValues Consumer/Producer require override 5 different methods, even if you only want to use one and given that one given field can only support one doc values type at same time. In the attached PR I’ve implemented a new modular version of those classes (consumer/producer) each one having a single responsibility and writing in the same unique file. This is mainly a refactor of the existing format opening the possibility to override or implement the sub-format you need. I’ll do in 3 steps: # Create a CompositeDocValuesFormat and moving the code of Lucene80DocValuesFormat in separate classes, without modifying the inner code. At same time I created a Lucene85CompositeDocValuesFormat based on these changes. # I’ll introduce some basic components for writing doc values in general such as: ## DocumentIdSetIterator Serializer: used in each type of field based on an IndexedDISI. ## Document Ordinals Serializer: Used in Sorted and SortedSet for deduplicate values using a dictionary. ## Document Boundaries Serializer (optional used only for multivalued fields: SortedNumeric and SortedSet) ## TermsEnum Serializer: useful to write and read the terms dictionary for sorted and sorted set doc values. # I’ll create the new Sub-DocValues format using the previous components. PR: [https://github.com/apache/lucene-solr/pull/1282] was: Today DocValues Consumer/Producer require override 5 different methods, even if you only want to use one and given that one given field can only support one doc values type at same time. In the attached PR I’ve implemented a new modular version of those classes (consumer/producer) each one having a single responsibility and writing in the same unique file. This is mainly a refactor of the existing format opening the possibility to override or implement the sub-format you need. I’ll do in 3 steps: # Create a CompositeDocValuesFormat and moving the code of Lucene80DocValuesFormat in separate classes, without modifying the inner code. At same time I created a Lucene85CompositeDocValuesFormat based on these changes. # I’ll introduce some basic components for writing doc values in general such as: ## DocumentIdSetIterator Serializer: used in each type of field based on an IndexedDISI. ## Document Ordinals Serializer: Used in Sorted and SortedSet for deduplicate values using a dictionary. ## Document Boundaries Serializer (optional used only for multivalued fields: SortedNumeric and SortedSet) ## TermsEnum Serializer: useful to write and read the terms dictionary for sorted and sorted set doc values. # I’ll create the new Sub-DocValues format using the previous components. > Having a modular Doc Values format > -- > > Key: LUCENE-9236 > URL: https://issues.apache.org/jira/browse/LUCENE-9236 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: juan camilo rodriguez duran >Priority: Minor > Labels: docValues > > Today DocValues Consumer/Producer require override 5 different methods, even > if you only want to use one and given that one given field can only support > one doc values type at same time. > > In the attached PR I’ve implemented a new modular version of those classes > (consumer/producer) each one having a single responsibility and writing in > the same unique file. > This is mainly a refactor of the existing format opening the possibility to > override or implement the sub-format you need. > > I’ll do in 3 steps: > # Create a CompositeDocValuesFormat and moving the code of > Lucene80DocValuesFormat in separate classes, without modifying the inner > code. At same time I created a Lucene85CompositeDocValuesFormat based on > these changes. > # I’ll introduce some basic components for writing doc values in general > such as: > ## DocumentIdSetIterator Serializer: used in each type of field based on an > IndexedDISI. > ## Document Ordinals Serializer: Used in Sorted and SortedSet for > deduplicate values using a dictionary. > ## Document Boundaries Serializer (optional used only for multivalued > fields: SortedNumeric and SortedSet) > ## TermsEnum Serializer: useful to write and read the terms dictionary for > sorted and sorted set doc values. > # I’ll create the new Sub-DocValues format using the previous components. > > PR: [https://github.com/apache/lucene-solr/pull/1282] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands,
[GitHub] [lucene-solr] juanka588 opened a new pull request #1282: Lucene 9236
juanka588 opened a new pull request #1282: Lucene 9236 URL: https://github.com/apache/lucene-solr/pull/1282 # Description Please provide a short description of the changes you're making with this pull request. # Solution Please provide a short description of the approach taken to implement your solution. # Tests Please describe the tests you've developed or run to confirm this patch implements the feature or solves the problem. # Checklist Please review the following and check all that apply: - [ ] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [ ] I have created a Jira issue and added the issue ID to my pull request title. - [ ] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [ ] I have developed this patch against the `master` branch. - [ ] I have run `ant precommit` and the appropriate test suite. - [ ] I have added tests for my changes. - [ ] I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13411) CompositeIdRouter calculates wrong route hash if atomic update is used for route.field
[ https://issues.apache.org/jira/browse/SOLR-13411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043361#comment-17043361 ] Dr Oleg Savrasov commented on SOLR-13411: - Patch for option > b) Deny atomic update for route.field and throw exception. is provided > CompositeIdRouter calculates wrong route hash if atomic update is used for > route.field > -- > > Key: SOLR-13411 > URL: https://issues.apache.org/jira/browse/SOLR-13411 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 7.5 >Reporter: Niko Himanen >Priority: Minor > Attachments: SOLR-13411.patch > > > If collection is created with router.field -parameter to define some other > field than uniqueField as route field and document update comes containing > route field updated using atomic update syntax (for example set=123), hash > for document routing is calculated from "set=123" and not from 123 which is > the real value which may lead into routing document to wrong shard. > > This happens in CompositeIdRouter#sliceHash, where field value is used as is > for hash calculation. > > I think there are two possible solutions to fix this: > a) Allow use of atomic update also for route.field, but use real value > instead of atomic update syntax to route document into right shard. > b) Deny atomic update for route.field and throw exception. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org