date:20200224

[jira] [Commented] (SOLR-9830) Once IndexWriter is closed due to some RunTimeException like FileSystemException, It never return to normal unless restart the Solr JVM

2020-02-24 Thread Vinh Le (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-9830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044048#comment-17044048
 ] 

Vinh Le commented on SOLR-9830:
---

I've seen this error when requesting /metrics APIs in 7.3 also, and only 
disappear when restarting.

> Once IndexWriter is closed due to some RunTimeException like 
> FileSystemException, It never return to normal unless restart the Solr JVM
> ---
>
> Key: SOLR-9830
> URL: https://issues.apache.org/jira/browse/SOLR-9830
> Project: Solr
>  Issue Type: Bug
>  Components: update
>Affects Versions: 6.2
> Environment: Red Hat 4.4.7-3,SolrCloud 
>Reporter: Daisy.Yuan
>Priority: Major
>
> 1. Collection coll_test, has 9 shards, each has two replicas in different 
> solr instances.
> 2. When update documens to the collection use Solrj, inject the exhausted 
> handle fault to one solr instance like solr1.
> 3. Update to col_test_shard3_replica1(It's leader) is failed due to 
> FileSystemException, and IndexWriter is closed.
> 4. And clear the fault, the col_test_shard3_replica1 (is leader) is always 
> cannot be updated documens and the numDocs is always less than the standby 
> replica.
> 5. After Solr instance restart, It can update documens and the numDocs is  
> consistent  between the two replicas.
> I think in this case in Solr Cloud mode, it should recovery itself and not 
> restart to recovery the solrcore update function.
>  2016-12-01 14:13:00,932 | INFO  | http-nio-21101-exec-20 | 
> [DWPT][http-nio-21101-exec-20]: now abort | 
> org.apache.solr.update.LoggingInfoStream.message(LoggingInfoStream.java:34)
> 2016-12-01 14:13:00,932 | INFO  | http-nio-21101-exec-20 | 
> [DWPT][http-nio-21101-exec-20]: done abort | 
> org.apache.solr.update.LoggingInfoStream.message(LoggingInfoStream.java:34)
> 2016-12-01 14:13:00,932 | INFO  | http-nio-21101-exec-20 | 
> [IW][http-nio-21101-exec-20]: hit exception updating document | 
> org.apache.solr.update.LoggingInfoStream.message(LoggingInfoStream.java:34)
> 2016-12-01 14:13:00,933 | INFO  | http-nio-21101-exec-20 | 
> [IW][http-nio-21101-exec-20]: hit tragic FileSystemException inside 
> updateDocument | 
> org.apache.solr.update.LoggingInfoStream.message(LoggingInfoStream.java:34)
> 2016-12-01 14:13:00,933 | INFO  | http-nio-21101-exec-20 | 
> [IW][http-nio-21101-exec-20]: rollback | 
> org.apache.solr.update.LoggingInfoStream.message(LoggingInfoStream.java:34)
> 2016-12-01 14:13:00,933 | INFO  | http-nio-21101-exec-20 | 
> [IW][http-nio-21101-exec-20]: all running merges have aborted | 
> org.apache.solr.update.LoggingInfoStream.message(LoggingInfoStream.java:34)
> 2016-12-01 14:13:00,934 | INFO  | http-nio-21101-exec-20 | 
> [IW][http-nio-21101-exec-20]: rollback: done finish merges | 
> org.apache.solr.update.LoggingInfoStream.message(LoggingInfoStream.java:34)
> 2016-12-01 14:13:00,934 | INFO  | http-nio-21101-exec-20 | 
> [DW][http-nio-21101-exec-20]: abort | 
> org.apache.solr.update.LoggingInfoStream.message(LoggingInfoStream.java:34)
> 2016-12-01 14:13:00,939 | INFO  | commitScheduler-46-thread-1 | 
> [DWPT][commitScheduler-46-thread-1]: flush postings as segment _4h9 
> numDocs=3798 | 
> org.apache.solr.update.LoggingInfoStream.message(LoggingInfoStream.java:34)
> 2016-12-01 14:13:00,940 | INFO  | commitScheduler-46-thread-1 | 
> [DWPT][commitScheduler-46-thread-1]: now abort | 
> org.apache.solr.update.LoggingInfoStream.message(LoggingInfoStream.java:34)
> 2016-12-01 14:13:00,940 | INFO  | commitScheduler-46-thread-1 | 
> [DWPT][commitScheduler-46-thread-1]: done abort | 
> org.apache.solr.update.LoggingInfoStream.message(LoggingInfoStream.java:34)
> 2016-12-01 14:13:00,940 | INFO  | http-nio-21101-exec-20 | 
> [DW][http-nio-21101-exec-20]: done abort success=true | 
> org.apache.solr.update.LoggingInfoStream.message(LoggingInfoStream.java:34)
> 2016-12-01 14:13:00,940 | INFO  | commitScheduler-46-thread-1 | 
> [DW][commitScheduler-46-thread-1]: commitScheduler-46-thread-1 
> finishFullFlush success=false | 
> org.apache.solr.update.LoggingInfoStream.message(LoggingInfoStream.java:34)
> 2016-12-01 14:13:00,940 | INFO  | http-nio-21101-exec-20 | 
> [IW][http-nio-21101-exec-20]: rollback: 
> infos=_4g7(6.2.0):C59169/23684:delGen=4 _4gq(6.2.0):C67474/11636:delGen=1 
> _4gg(6.2.0):C64067/15664:delGen=2 _4gr(6.2.0):C13131 _4gs(6.2.0):C966 
> _4gt(6.2.0):C4543 _4gu(6.2.0):C6960 _4gv(6.2.0):C2544 | 
> org.apache.solr.update.LoggingInfoStream.message(LoggingInfoStream.java:34)
> 2016-12-01 14:13:00,940 | INFO  | commitScheduler-46-thread-1 | 
> [IW][commitScheduler-46-thread-1]: hit exception during NRT reader | 
> org.apache.solr.update.LoggingInfoStream.message(LoggingInfoStream.java:34)
> 2016-12-01 14:

[jira] [Commented] (LUCENE-9212) Intervals.multiterm() should take a CompiledAutomaton

2020-02-24 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044009#comment-17044009
 ] 

David Smiley commented on LUCENE-9212:
--

>  Automatons can be defined in both binary and unicode space, and there's no 
>way of telling which it is when it comes to compiling them

Isn't that a problem with our API -- more of a root cause?  I've been bitten by 
the un-typed nature of byte vs char automatons.

> Intervals.multiterm() should take a CompiledAutomaton
> -
>
> Key: LUCENE-9212
> URL: https://issues.apache.org/jira/browse/LUCENE-9212
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 8.5
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> LUCENE-9028 added a `multiterm` factory method for intervals that accepts an 
> arbitrary Automaton, and converts it internally into a CompiledAutomaton.  
> This isn't necessarily correct behaviour, however, because Automatons can be 
> defined in both binary and unicode space, and there's no way of telling which 
> it is when it comes to compiling them.  In particular, for automatons 
> produced by FuzzyTermsEnum, we need to convert them to unicode before 
> compilation.
> The `multiterm` factory should just take `CompiledAutomaton` directly, and we 
> should deprecate the methods that take `Automaton` and remove in master.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9227) Make page ready for pure HTTPS

2020-02-24 Thread Uwe Schindler (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043985#comment-17043985
 ] 

Uwe Schindler commented on LUCENE-9227:
---

bq. Tested with browser and curl. The redirect works, but I know nothing about 
STS 

Thanks. STS is Strict Transport Security 
(https://en.wikipedia.org/wiki/HTTP_Strict_Transport_Security). It send a 
special HTTP header that instruts the browser to always use HTTPS for a domain. 
This lowers the risk that somebody intercepts the initial connection to the 
webserver with HTTP (users normally only enter the domain name making the 
browser use HTTP and get redirected to HTTPS). As the redirect is not secured, 
a bad guy could remove the redirect and serve (a modified) page. With HSTS the 
browser will (except for the very first access) use HTTPS forever, also when 
links use HTTP or user enters domain name without protocol. Basically, when you 
once sent this header you can no loger switch off HTTPS until the lifetime of 
this header. The recommendation is to send one year or more, but I initially 
added 300seconds for testing.

It's now deployed also in production. I will raise to one year next weekend.

> Make page ready for pure HTTPS
> --
>
> Key: LUCENE-9227
> URL: https://issues.apache.org/jira/browse/LUCENE-9227
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>Priority: Blocker
>
> The web page can currently be visited using HTTPS but this brings warning:
> - Both search providers create a form that passes USER ENTERED INPUT using no 
> encryption. This is not allowed due to GDPR. We have to fix this asap. It 
> looks like [~otis] search is working with HTTPS (if we change domain name), 
> but the Lucidworks does not
> - There were some CSS files loaded with HTTP (fonts from Google - this was 
> fixed)
> Once those 2 problems are fixed (I grepped for HTTP and still found many 
> links with HTTP, but looks like no images or scripts or css anymore), I'd 
> like to add a permanent redirect http://lucene.apache.org/ -> 
> https://lucene.apache.org to the htaccess template file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13910) Create security news feed on website with RSS/Atom feed

2020-02-24 Thread Uwe Schindler (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043981#comment-17043981
 ] 

Uwe Schindler commented on SOLR-13910:
--

I modified the styles a bit, now it's looking fine and is more flexible with 
responsive screen sizes.

> Create security news feed on website with RSS/Atom feed
> ---
>
> Key: SOLR-13910
> URL: https://issues.apache.org/jira/browse/SOLR-13910
> Project: Solr
>  Issue Type: Task
>  Components: website
>Reporter: Adam Walz
>Assignee: Jan Høydahl
>Priority: Minor
> Attachments: recent-security-ann.png, security-page-with-table.png, 
> security-page-with-table.png, solr-security-page.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> From [~janhoy]
> We're in the process of migrating our web site to Git and in that same
> process we also change CMS from an ASF one to Pelican. The new site has
> built-in support for news posts as individual files and also RSS feeds of
> those. So I propose to add [https://lucene.apache.org/solr/security.html]
> to the site, including a list of newest CVEs and an RSS/Atom feed to go
> along with it. This way users have ONE place to visit to check security
> announcements and they can monitor RSS to be alerted once we post a new
> announcement.
> We could also add RSS feeds for Lucene-core news and Solr-news sections
> of course.
> At the same time I propose that the news on the front-page 
> [lucene.apache.org|http://lucene.apache.org/]
> is replaced with widgets that show the title only of the last 3 announcements
> from Lucene, Solr and PyLucene sub projects. That front page is waaay
> too long :)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1270: LUCENE-9237: Faster UniformSplit IntersectBlockReader.

2020-02-24 Thread GitBox

dsmiley commented on a change in pull request #1270: LUCENE-9237: Faster 
UniformSplit IntersectBlockReader.
URL: https://github.com/apache/lucene-solr/pull/1270#discussion_r383499305
 
 

 ##
 File path: 
lucene/codecs/src/java/org/apache/lucene/codecs/uniformsplit/IntersectBlockReader.java
 ##
 @@ -18,260 +18,337 @@
 package org.apache.lucene.codecs.uniformsplit;
 
 import java.io.IOException;
-import java.util.Objects;
+import java.util.Arrays;
 
 import org.apache.lucene.codecs.PostingsReaderBase;
 import org.apache.lucene.index.TermState;
 import org.apache.lucene.index.TermsEnum;
 import org.apache.lucene.store.IndexInput;
+import org.apache.lucene.util.ArrayUtil;
 import org.apache.lucene.util.BytesRef;
 import org.apache.lucene.util.BytesRefBuilder;
 import org.apache.lucene.util.IntsRefBuilder;
-import org.apache.lucene.util.StringHelper;
 import org.apache.lucene.util.automaton.Automaton;
 import org.apache.lucene.util.automaton.ByteRunAutomaton;
 import org.apache.lucene.util.automaton.CompiledAutomaton;
-import org.apache.lucene.util.automaton.Operations;
 import org.apache.lucene.util.automaton.Transition;
 
 /**
  * The "intersect" {@link TermsEnum} response to {@link 
UniformSplitTerms#intersect(CompiledAutomaton, BytesRef)},
  * intersecting the terms with an automaton.
+ * 
+ * By design of the UniformSplit block keys, it is less efficient than
+ * {@code org.apache.lucene.codecs.blocktree.IntersectTermsEnum} for {@link 
org.apache.lucene.search.FuzzyQuery} (-37%).
+ * It is slightly slower for {@link org.apache.lucene.search.WildcardQuery} 
(-5%) and slightly faster for
+ * {@link org.apache.lucene.search.PrefixQuery} (+5%).
+ *
+ * @lucene.experimental
  */
 public class IntersectBlockReader extends BlockReader {
 
-  protected final AutomatonNextTermCalculator nextStringCalculator;
-  protected final ByteRunAutomaton runAutomaton;
-  protected final BytesRef commonSuffixRef; // maybe null
-  protected final BytesRef commonPrefixRef;
-  protected final BytesRef startTerm; // maybe null
+  /**
+   * Block iteration order. Whether to move next block, jump to a block away, 
or end the iteration.
+   */
+  protected enum BlockIteration {NEXT, SEEK, END}
 
-  /** Set this when our current mode is seeking to this term.  Set to null 
after. */
-  protected BytesRef seekTerm;
+  /**
+   * Threshold that controls when to attempt to jump to a block away.
+   * 
+   * This counter is 0 when entering a block. It is incremented each time a 
term is rejected by the automaton.
+   * When the counter is greater than or equal to this threshold, then we 
compute the next term accepted by
+   * the automaton, with {@link AutomatonNextTermCalculator}, and we jump to a 
block away if the next term
+   * accepted is greater than the immediate next term in the block.
+   * 
+   * A low value, for example 1, improves the performance of automatons 
requiring many jumps, for example
+   * {@link org.apache.lucene.search.FuzzyQuery} and most {@link 
org.apache.lucene.search.WildcardQuery}.
+   * A higher value improves the performance of automatons with less or no 
jump, for example
+   * {@link org.apache.lucene.search.PrefixQuery}.
+   * A threshold of 4 seems to be a good balance.
+   */
+  protected final int NUM_CONSECUTIVELY_REJECTED_TERMS_THRESHOLD = 4;
 
-  protected int blockPrefixRunAutomatonState;
-  protected int blockPrefixLen;
+  protected final Automaton automaton;
+  protected final ByteRunAutomaton runAutomaton;
+  protected final boolean finite;
+  protected final BytesRef commonSuffix; // maybe null
+  protected final int minTermLength;
+  protected final AutomatonNextTermCalculator nextStringCalculator;
 
   /**
-   * Number of bytes accepted by the last call to {@link 
#runAutomatonForState}.
+   * Set this when our current mode is seeking to this term.  Set to null 
after.
+   */
+  protected BytesRef seekTerm;
+  /**
+   * Number of bytes accepted by the automaton when validating the current 
term.
+   */
+  protected int numMatchedBytes;
+  /**
+   * Automaton states reached when validating the current term, from 0 to 
{@link #numMatchedBytes} - 1.
+   */
+  protected int[] states;
+  /**
+   * Block iteration order determined when scanning the terms in the current 
block.
*/
-  protected int numBytesAccepted;
+  protected BlockIteration blockIteration;
   /**
-   * Whether the current term is beyond the automaton common prefix.
-   * If true this means the enumeration should stop immediately.
+   * Counter of the number of consecutively rejected terms.
+   * Depending on {@link #NUM_CONSECUTIVELY_REJECTED_TERMS_THRESHOLD}, this 
may trigger a jump to a block away.
*/
-  protected boolean beyondCommonPrefix;
+  protected int numConsecutivelyRejectedTerms;
 
-  public IntersectBlockReader(CompiledAutomaton compiled, BytesRef startTerm,
-  IndexDictionary.BrowserSupplier 
dictionaryBrowserSupplier, IndexInput blockInput, PostingsReaderBase

[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1270: LUCENE-9237: Faster UniformSplit IntersectBlockReader.

2020-02-24 Thread GitBox

dsmiley commented on a change in pull request #1270: LUCENE-9237: Faster 
UniformSplit IntersectBlockReader.
URL: https://github.com/apache/lucene-solr/pull/1270#discussion_r383522035
 
 

 ##
 File path: 
lucene/codecs/src/java/org/apache/lucene/codecs/uniformsplit/IntersectBlockReader.java
 ##
 @@ -18,260 +18,337 @@
 package org.apache.lucene.codecs.uniformsplit;
 
 import java.io.IOException;
-import java.util.Objects;
+import java.util.Arrays;
 
 import org.apache.lucene.codecs.PostingsReaderBase;
 import org.apache.lucene.index.TermState;
 import org.apache.lucene.index.TermsEnum;
 import org.apache.lucene.store.IndexInput;
+import org.apache.lucene.util.ArrayUtil;
 import org.apache.lucene.util.BytesRef;
 import org.apache.lucene.util.BytesRefBuilder;
 import org.apache.lucene.util.IntsRefBuilder;
-import org.apache.lucene.util.StringHelper;
 import org.apache.lucene.util.automaton.Automaton;
 import org.apache.lucene.util.automaton.ByteRunAutomaton;
 import org.apache.lucene.util.automaton.CompiledAutomaton;
-import org.apache.lucene.util.automaton.Operations;
 import org.apache.lucene.util.automaton.Transition;
 
 /**
  * The "intersect" {@link TermsEnum} response to {@link 
UniformSplitTerms#intersect(CompiledAutomaton, BytesRef)},
  * intersecting the terms with an automaton.
+ * 
+ * By design of the UniformSplit block keys, it is less efficient than
+ * {@code org.apache.lucene.codecs.blocktree.IntersectTermsEnum} for {@link 
org.apache.lucene.search.FuzzyQuery} (-37%).
+ * It is slightly slower for {@link org.apache.lucene.search.WildcardQuery} 
(-5%) and slightly faster for
+ * {@link org.apache.lucene.search.PrefixQuery} (+5%).
+ *
+ * @lucene.experimental
  */
 public class IntersectBlockReader extends BlockReader {
 
-  protected final AutomatonNextTermCalculator nextStringCalculator;
-  protected final ByteRunAutomaton runAutomaton;
-  protected final BytesRef commonSuffixRef; // maybe null
-  protected final BytesRef commonPrefixRef;
-  protected final BytesRef startTerm; // maybe null
+  /**
+   * Block iteration order. Whether to move next block, jump to a block away, 
or end the iteration.
+   */
+  protected enum BlockIteration {NEXT, SEEK, END}
 
-  /** Set this when our current mode is seeking to this term.  Set to null 
after. */
-  protected BytesRef seekTerm;
+  /**
+   * Threshold that controls when to attempt to jump to a block away.
+   * 
+   * This counter is 0 when entering a block. It is incremented each time a 
term is rejected by the automaton.
+   * When the counter is greater than or equal to this threshold, then we 
compute the next term accepted by
+   * the automaton, with {@link AutomatonNextTermCalculator}, and we jump to a 
block away if the next term
+   * accepted is greater than the immediate next term in the block.
+   * 
+   * A low value, for example 1, improves the performance of automatons 
requiring many jumps, for example
+   * {@link org.apache.lucene.search.FuzzyQuery} and most {@link 
org.apache.lucene.search.WildcardQuery}.
+   * A higher value improves the performance of automatons with less or no 
jump, for example
+   * {@link org.apache.lucene.search.PrefixQuery}.
+   * A threshold of 4 seems to be a good balance.
+   */
+  protected final int NUM_CONSECUTIVELY_REJECTED_TERMS_THRESHOLD = 4;
 
-  protected int blockPrefixRunAutomatonState;
-  protected int blockPrefixLen;
+  protected final Automaton automaton;
+  protected final ByteRunAutomaton runAutomaton;
+  protected final boolean finite;
+  protected final BytesRef commonSuffix; // maybe null
+  protected final int minTermLength;
+  protected final AutomatonNextTermCalculator nextStringCalculator;
 
   /**
-   * Number of bytes accepted by the last call to {@link 
#runAutomatonForState}.
+   * Set this when our current mode is seeking to this term.  Set to null 
after.
+   */
+  protected BytesRef seekTerm;
+  /**
+   * Number of bytes accepted by the automaton when validating the current 
term.
+   */
+  protected int numMatchedBytes;
+  /**
+   * Automaton states reached when validating the current term, from 0 to 
{@link #numMatchedBytes} - 1.
+   */
+  protected int[] states;
+  /**
+   * Block iteration order determined when scanning the terms in the current 
block.
*/
-  protected int numBytesAccepted;
+  protected BlockIteration blockIteration;
   /**
-   * Whether the current term is beyond the automaton common prefix.
-   * If true this means the enumeration should stop immediately.
+   * Counter of the number of consecutively rejected terms.
+   * Depending on {@link #NUM_CONSECUTIVELY_REJECTED_TERMS_THRESHOLD}, this 
may trigger a jump to a block away.
*/
-  protected boolean beyondCommonPrefix;
+  protected int numConsecutivelyRejectedTerms;
 
-  public IntersectBlockReader(CompiledAutomaton compiled, BytesRef startTerm,
-  IndexDictionary.BrowserSupplier 
dictionaryBrowserSupplier, IndexInput blockInput, PostingsReaderBase

[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1270: LUCENE-9237: Faster UniformSplit IntersectBlockReader.

2020-02-24 Thread GitBox

dsmiley commented on a change in pull request #1270: LUCENE-9237: Faster 
UniformSplit IntersectBlockReader.
URL: https://github.com/apache/lucene-solr/pull/1270#discussion_r383527391
 
 

 ##
 File path: 
lucene/codecs/src/java/org/apache/lucene/codecs/uniformsplit/IntersectBlockReader.java
 ##
 @@ -18,260 +18,337 @@
 package org.apache.lucene.codecs.uniformsplit;
 
 import java.io.IOException;
-import java.util.Objects;
+import java.util.Arrays;
 
 import org.apache.lucene.codecs.PostingsReaderBase;
 import org.apache.lucene.index.TermState;
 import org.apache.lucene.index.TermsEnum;
 import org.apache.lucene.store.IndexInput;
+import org.apache.lucene.util.ArrayUtil;
 import org.apache.lucene.util.BytesRef;
 import org.apache.lucene.util.BytesRefBuilder;
 import org.apache.lucene.util.IntsRefBuilder;
-import org.apache.lucene.util.StringHelper;
 import org.apache.lucene.util.automaton.Automaton;
 import org.apache.lucene.util.automaton.ByteRunAutomaton;
 import org.apache.lucene.util.automaton.CompiledAutomaton;
-import org.apache.lucene.util.automaton.Operations;
 import org.apache.lucene.util.automaton.Transition;
 
 /**
  * The "intersect" {@link TermsEnum} response to {@link 
UniformSplitTerms#intersect(CompiledAutomaton, BytesRef)},
  * intersecting the terms with an automaton.
+ * 
+ * By design of the UniformSplit block keys, it is less efficient than
+ * {@code org.apache.lucene.codecs.blocktree.IntersectTermsEnum} for {@link 
org.apache.lucene.search.FuzzyQuery} (-37%).
+ * It is slightly slower for {@link org.apache.lucene.search.WildcardQuery} 
(-5%) and slightly faster for
+ * {@link org.apache.lucene.search.PrefixQuery} (+5%).
+ *
+ * @lucene.experimental
  */
 public class IntersectBlockReader extends BlockReader {
 
-  protected final AutomatonNextTermCalculator nextStringCalculator;
-  protected final ByteRunAutomaton runAutomaton;
-  protected final BytesRef commonSuffixRef; // maybe null
-  protected final BytesRef commonPrefixRef;
-  protected final BytesRef startTerm; // maybe null
+  /**
+   * Block iteration order. Whether to move next block, jump to a block away, 
or end the iteration.
+   */
+  protected enum BlockIteration {NEXT, SEEK, END}
 
-  /** Set this when our current mode is seeking to this term.  Set to null 
after. */
-  protected BytesRef seekTerm;
+  /**
+   * Threshold that controls when to attempt to jump to a block away.
+   * 
+   * This counter is 0 when entering a block. It is incremented each time a 
term is rejected by the automaton.
+   * When the counter is greater than or equal to this threshold, then we 
compute the next term accepted by
+   * the automaton, with {@link AutomatonNextTermCalculator}, and we jump to a 
block away if the next term
+   * accepted is greater than the immediate next term in the block.
+   * 
+   * A low value, for example 1, improves the performance of automatons 
requiring many jumps, for example
+   * {@link org.apache.lucene.search.FuzzyQuery} and most {@link 
org.apache.lucene.search.WildcardQuery}.
+   * A higher value improves the performance of automatons with less or no 
jump, for example
+   * {@link org.apache.lucene.search.PrefixQuery}.
+   * A threshold of 4 seems to be a good balance.
+   */
+  protected final int NUM_CONSECUTIVELY_REJECTED_TERMS_THRESHOLD = 4;
 
-  protected int blockPrefixRunAutomatonState;
-  protected int blockPrefixLen;
+  protected final Automaton automaton;
+  protected final ByteRunAutomaton runAutomaton;
+  protected final boolean finite;
+  protected final BytesRef commonSuffix; // maybe null
+  protected final int minTermLength;
+  protected final AutomatonNextTermCalculator nextStringCalculator;
 
   /**
-   * Number of bytes accepted by the last call to {@link 
#runAutomatonForState}.
+   * Set this when our current mode is seeking to this term.  Set to null 
after.
+   */
+  protected BytesRef seekTerm;
+  /**
+   * Number of bytes accepted by the automaton when validating the current 
term.
+   */
+  protected int numMatchedBytes;
+  /**
+   * Automaton states reached when validating the current term, from 0 to 
{@link #numMatchedBytes} - 1.
+   */
+  protected int[] states;
+  /**
+   * Block iteration order determined when scanning the terms in the current 
block.
*/
-  protected int numBytesAccepted;
+  protected BlockIteration blockIteration;
   /**
-   * Whether the current term is beyond the automaton common prefix.
-   * If true this means the enumeration should stop immediately.
+   * Counter of the number of consecutively rejected terms.
+   * Depending on {@link #NUM_CONSECUTIVELY_REJECTED_TERMS_THRESHOLD}, this 
may trigger a jump to a block away.
*/
-  protected boolean beyondCommonPrefix;
+  protected int numConsecutivelyRejectedTerms;
 
-  public IntersectBlockReader(CompiledAutomaton compiled, BytesRef startTerm,
-  IndexDictionary.BrowserSupplier 
dictionaryBrowserSupplier, IndexInput blockInput, PostingsReaderBase

[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1270: LUCENE-9237: Faster UniformSplit IntersectBlockReader.

2020-02-24 Thread GitBox

dsmiley commented on a change in pull request #1270: LUCENE-9237: Faster 
UniformSplit IntersectBlockReader.
URL: https://github.com/apache/lucene-solr/pull/1270#discussion_r383526667
 
 

 ##
 File path: 
lucene/codecs/src/java/org/apache/lucene/codecs/uniformsplit/IntersectBlockReader.java
 ##
 @@ -18,260 +18,337 @@
 package org.apache.lucene.codecs.uniformsplit;
 
 import java.io.IOException;
-import java.util.Objects;
+import java.util.Arrays;
 
 import org.apache.lucene.codecs.PostingsReaderBase;
 import org.apache.lucene.index.TermState;
 import org.apache.lucene.index.TermsEnum;
 import org.apache.lucene.store.IndexInput;
+import org.apache.lucene.util.ArrayUtil;
 import org.apache.lucene.util.BytesRef;
 import org.apache.lucene.util.BytesRefBuilder;
 import org.apache.lucene.util.IntsRefBuilder;
-import org.apache.lucene.util.StringHelper;
 import org.apache.lucene.util.automaton.Automaton;
 import org.apache.lucene.util.automaton.ByteRunAutomaton;
 import org.apache.lucene.util.automaton.CompiledAutomaton;
-import org.apache.lucene.util.automaton.Operations;
 import org.apache.lucene.util.automaton.Transition;
 
 /**
  * The "intersect" {@link TermsEnum} response to {@link 
UniformSplitTerms#intersect(CompiledAutomaton, BytesRef)},
  * intersecting the terms with an automaton.
+ * 
+ * By design of the UniformSplit block keys, it is less efficient than
+ * {@code org.apache.lucene.codecs.blocktree.IntersectTermsEnum} for {@link 
org.apache.lucene.search.FuzzyQuery} (-37%).
+ * It is slightly slower for {@link org.apache.lucene.search.WildcardQuery} 
(-5%) and slightly faster for
+ * {@link org.apache.lucene.search.PrefixQuery} (+5%).
+ *
+ * @lucene.experimental
  */
 public class IntersectBlockReader extends BlockReader {
 
-  protected final AutomatonNextTermCalculator nextStringCalculator;
-  protected final ByteRunAutomaton runAutomaton;
-  protected final BytesRef commonSuffixRef; // maybe null
-  protected final BytesRef commonPrefixRef;
-  protected final BytesRef startTerm; // maybe null
+  /**
+   * Block iteration order. Whether to move next block, jump to a block away, 
or end the iteration.
+   */
+  protected enum BlockIteration {NEXT, SEEK, END}
 
-  /** Set this when our current mode is seeking to this term.  Set to null 
after. */
-  protected BytesRef seekTerm;
+  /**
+   * Threshold that controls when to attempt to jump to a block away.
+   * 
+   * This counter is 0 when entering a block. It is incremented each time a 
term is rejected by the automaton.
+   * When the counter is greater than or equal to this threshold, then we 
compute the next term accepted by
+   * the automaton, with {@link AutomatonNextTermCalculator}, and we jump to a 
block away if the next term
+   * accepted is greater than the immediate next term in the block.
+   * 
+   * A low value, for example 1, improves the performance of automatons 
requiring many jumps, for example
+   * {@link org.apache.lucene.search.FuzzyQuery} and most {@link 
org.apache.lucene.search.WildcardQuery}.
+   * A higher value improves the performance of automatons with less or no 
jump, for example
+   * {@link org.apache.lucene.search.PrefixQuery}.
+   * A threshold of 4 seems to be a good balance.
+   */
+  protected final int NUM_CONSECUTIVELY_REJECTED_TERMS_THRESHOLD = 4;
 
-  protected int blockPrefixRunAutomatonState;
-  protected int blockPrefixLen;
+  protected final Automaton automaton;
+  protected final ByteRunAutomaton runAutomaton;
+  protected final boolean finite;
+  protected final BytesRef commonSuffix; // maybe null
+  protected final int minTermLength;
+  protected final AutomatonNextTermCalculator nextStringCalculator;
 
   /**
-   * Number of bytes accepted by the last call to {@link 
#runAutomatonForState}.
+   * Set this when our current mode is seeking to this term.  Set to null 
after.
+   */
+  protected BytesRef seekTerm;
+  /**
+   * Number of bytes accepted by the automaton when validating the current 
term.
+   */
+  protected int numMatchedBytes;
+  /**
+   * Automaton states reached when validating the current term, from 0 to 
{@link #numMatchedBytes} - 1.
+   */
+  protected int[] states;
+  /**
+   * Block iteration order determined when scanning the terms in the current 
block.
*/
-  protected int numBytesAccepted;
+  protected BlockIteration blockIteration;
   /**
-   * Whether the current term is beyond the automaton common prefix.
-   * If true this means the enumeration should stop immediately.
+   * Counter of the number of consecutively rejected terms.
+   * Depending on {@link #NUM_CONSECUTIVELY_REJECTED_TERMS_THRESHOLD}, this 
may trigger a jump to a block away.
*/
-  protected boolean beyondCommonPrefix;
+  protected int numConsecutivelyRejectedTerms;
 
-  public IntersectBlockReader(CompiledAutomaton compiled, BytesRef startTerm,
-  IndexDictionary.BrowserSupplier 
dictionaryBrowserSupplier, IndexInput blockInput, PostingsReaderBase

[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1270: LUCENE-9237: Faster UniformSplit IntersectBlockReader.

2020-02-24 Thread GitBox

dsmiley commented on a change in pull request #1270: LUCENE-9237: Faster 
UniformSplit IntersectBlockReader.
URL: https://github.com/apache/lucene-solr/pull/1270#discussion_r383523994
 
 

 ##
 File path: 
lucene/codecs/src/java/org/apache/lucene/codecs/uniformsplit/IntersectBlockReader.java
 ##
 @@ -18,260 +18,337 @@
 package org.apache.lucene.codecs.uniformsplit;
 
 import java.io.IOException;
-import java.util.Objects;
+import java.util.Arrays;
 
 import org.apache.lucene.codecs.PostingsReaderBase;
 import org.apache.lucene.index.TermState;
 import org.apache.lucene.index.TermsEnum;
 import org.apache.lucene.store.IndexInput;
+import org.apache.lucene.util.ArrayUtil;
 import org.apache.lucene.util.BytesRef;
 import org.apache.lucene.util.BytesRefBuilder;
 import org.apache.lucene.util.IntsRefBuilder;
-import org.apache.lucene.util.StringHelper;
 import org.apache.lucene.util.automaton.Automaton;
 import org.apache.lucene.util.automaton.ByteRunAutomaton;
 import org.apache.lucene.util.automaton.CompiledAutomaton;
-import org.apache.lucene.util.automaton.Operations;
 import org.apache.lucene.util.automaton.Transition;
 
 /**
  * The "intersect" {@link TermsEnum} response to {@link 
UniformSplitTerms#intersect(CompiledAutomaton, BytesRef)},
  * intersecting the terms with an automaton.
+ * 
+ * By design of the UniformSplit block keys, it is less efficient than
+ * {@code org.apache.lucene.codecs.blocktree.IntersectTermsEnum} for {@link 
org.apache.lucene.search.FuzzyQuery} (-37%).
+ * It is slightly slower for {@link org.apache.lucene.search.WildcardQuery} 
(-5%) and slightly faster for
+ * {@link org.apache.lucene.search.PrefixQuery} (+5%).
+ *
+ * @lucene.experimental
  */
 public class IntersectBlockReader extends BlockReader {
 
-  protected final AutomatonNextTermCalculator nextStringCalculator;
-  protected final ByteRunAutomaton runAutomaton;
-  protected final BytesRef commonSuffixRef; // maybe null
-  protected final BytesRef commonPrefixRef;
-  protected final BytesRef startTerm; // maybe null
+  /**
+   * Block iteration order. Whether to move next block, jump to a block away, 
or end the iteration.
+   */
+  protected enum BlockIteration {NEXT, SEEK, END}
 
-  /** Set this when our current mode is seeking to this term.  Set to null 
after. */
-  protected BytesRef seekTerm;
+  /**
+   * Threshold that controls when to attempt to jump to a block away.
+   * 
+   * This counter is 0 when entering a block. It is incremented each time a 
term is rejected by the automaton.
+   * When the counter is greater than or equal to this threshold, then we 
compute the next term accepted by
+   * the automaton, with {@link AutomatonNextTermCalculator}, and we jump to a 
block away if the next term
+   * accepted is greater than the immediate next term in the block.
+   * 
+   * A low value, for example 1, improves the performance of automatons 
requiring many jumps, for example
+   * {@link org.apache.lucene.search.FuzzyQuery} and most {@link 
org.apache.lucene.search.WildcardQuery}.
+   * A higher value improves the performance of automatons with less or no 
jump, for example
+   * {@link org.apache.lucene.search.PrefixQuery}.
+   * A threshold of 4 seems to be a good balance.
+   */
+  protected final int NUM_CONSECUTIVELY_REJECTED_TERMS_THRESHOLD = 4;
 
-  protected int blockPrefixRunAutomatonState;
-  protected int blockPrefixLen;
+  protected final Automaton automaton;
+  protected final ByteRunAutomaton runAutomaton;
+  protected final boolean finite;
+  protected final BytesRef commonSuffix; // maybe null
+  protected final int minTermLength;
+  protected final AutomatonNextTermCalculator nextStringCalculator;
 
   /**
-   * Number of bytes accepted by the last call to {@link 
#runAutomatonForState}.
+   * Set this when our current mode is seeking to this term.  Set to null 
after.
+   */
+  protected BytesRef seekTerm;
+  /**
+   * Number of bytes accepted by the automaton when validating the current 
term.
+   */
+  protected int numMatchedBytes;
+  /**
+   * Automaton states reached when validating the current term, from 0 to 
{@link #numMatchedBytes} - 1.
+   */
+  protected int[] states;
+  /**
+   * Block iteration order determined when scanning the terms in the current 
block.
*/
-  protected int numBytesAccepted;
+  protected BlockIteration blockIteration;
   /**
-   * Whether the current term is beyond the automaton common prefix.
-   * If true this means the enumeration should stop immediately.
+   * Counter of the number of consecutively rejected terms.
+   * Depending on {@link #NUM_CONSECUTIVELY_REJECTED_TERMS_THRESHOLD}, this 
may trigger a jump to a block away.
*/
-  protected boolean beyondCommonPrefix;
+  protected int numConsecutivelyRejectedTerms;
 
-  public IntersectBlockReader(CompiledAutomaton compiled, BytesRef startTerm,
-  IndexDictionary.BrowserSupplier 
dictionaryBrowserSupplier, IndexInput blockInput, PostingsReaderBase

[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1270: LUCENE-9237: Faster UniformSplit IntersectBlockReader.

2020-02-24 Thread GitBox

dsmiley commented on a change in pull request #1270: LUCENE-9237: Faster 
UniformSplit IntersectBlockReader.
URL: https://github.com/apache/lucene-solr/pull/1270#discussion_r383546038
 
 

 ##
 File path: 
lucene/codecs/src/java/org/apache/lucene/codecs/uniformsplit/IntersectBlockReader.java
 ##
 @@ -18,260 +18,337 @@
 package org.apache.lucene.codecs.uniformsplit;
 
 import java.io.IOException;
-import java.util.Objects;
+import java.util.Arrays;
 
 import org.apache.lucene.codecs.PostingsReaderBase;
 import org.apache.lucene.index.TermState;
 import org.apache.lucene.index.TermsEnum;
 import org.apache.lucene.store.IndexInput;
+import org.apache.lucene.util.ArrayUtil;
 import org.apache.lucene.util.BytesRef;
 import org.apache.lucene.util.BytesRefBuilder;
 import org.apache.lucene.util.IntsRefBuilder;
-import org.apache.lucene.util.StringHelper;
 import org.apache.lucene.util.automaton.Automaton;
 import org.apache.lucene.util.automaton.ByteRunAutomaton;
 import org.apache.lucene.util.automaton.CompiledAutomaton;
-import org.apache.lucene.util.automaton.Operations;
 import org.apache.lucene.util.automaton.Transition;
 
 /**
  * The "intersect" {@link TermsEnum} response to {@link 
UniformSplitTerms#intersect(CompiledAutomaton, BytesRef)},
  * intersecting the terms with an automaton.
+ * 
+ * By design of the UniformSplit block keys, it is less efficient than
+ * {@code org.apache.lucene.codecs.blocktree.IntersectTermsEnum} for {@link 
org.apache.lucene.search.FuzzyQuery} (-37%).
+ * It is slightly slower for {@link org.apache.lucene.search.WildcardQuery} 
(-5%) and slightly faster for
+ * {@link org.apache.lucene.search.PrefixQuery} (+5%).
+ *
+ * @lucene.experimental
  */
 public class IntersectBlockReader extends BlockReader {
 
-  protected final AutomatonNextTermCalculator nextStringCalculator;
-  protected final ByteRunAutomaton runAutomaton;
-  protected final BytesRef commonSuffixRef; // maybe null
-  protected final BytesRef commonPrefixRef;
-  protected final BytesRef startTerm; // maybe null
+  /**
+   * Block iteration order. Whether to move next block, jump to a block away, 
or end the iteration.
+   */
+  protected enum BlockIteration {NEXT, SEEK, END}
 
-  /** Set this when our current mode is seeking to this term.  Set to null 
after. */
-  protected BytesRef seekTerm;
+  /**
+   * Threshold that controls when to attempt to jump to a block away.
+   * 
+   * This counter is 0 when entering a block. It is incremented each time a 
term is rejected by the automaton.
+   * When the counter is greater than or equal to this threshold, then we 
compute the next term accepted by
+   * the automaton, with {@link AutomatonNextTermCalculator}, and we jump to a 
block away if the next term
+   * accepted is greater than the immediate next term in the block.
+   * 
+   * A low value, for example 1, improves the performance of automatons 
requiring many jumps, for example
+   * {@link org.apache.lucene.search.FuzzyQuery} and most {@link 
org.apache.lucene.search.WildcardQuery}.
+   * A higher value improves the performance of automatons with less or no 
jump, for example
+   * {@link org.apache.lucene.search.PrefixQuery}.
+   * A threshold of 4 seems to be a good balance.
+   */
+  protected final int NUM_CONSECUTIVELY_REJECTED_TERMS_THRESHOLD = 4;
 
-  protected int blockPrefixRunAutomatonState;
-  protected int blockPrefixLen;
+  protected final Automaton automaton;
+  protected final ByteRunAutomaton runAutomaton;
+  protected final boolean finite;
+  protected final BytesRef commonSuffix; // maybe null
+  protected final int minTermLength;
+  protected final AutomatonNextTermCalculator nextStringCalculator;
 
   /**
-   * Number of bytes accepted by the last call to {@link 
#runAutomatonForState}.
+   * Set this when our current mode is seeking to this term.  Set to null 
after.
+   */
+  protected BytesRef seekTerm;
+  /**
+   * Number of bytes accepted by the automaton when validating the current 
term.
+   */
+  protected int numMatchedBytes;
+  /**
+   * Automaton states reached when validating the current term, from 0 to 
{@link #numMatchedBytes} - 1.
+   */
+  protected int[] states;
+  /**
+   * Block iteration order determined when scanning the terms in the current 
block.
*/
-  protected int numBytesAccepted;
+  protected BlockIteration blockIteration;
   /**
-   * Whether the current term is beyond the automaton common prefix.
-   * If true this means the enumeration should stop immediately.
+   * Counter of the number of consecutively rejected terms.
+   * Depending on {@link #NUM_CONSECUTIVELY_REJECTED_TERMS_THRESHOLD}, this 
may trigger a jump to a block away.
*/
-  protected boolean beyondCommonPrefix;
+  protected int numConsecutivelyRejectedTerms;
 
-  public IntersectBlockReader(CompiledAutomaton compiled, BytesRef startTerm,
-  IndexDictionary.BrowserSupplier 
dictionaryBrowserSupplier, IndexInput blockInput, PostingsReaderBase

[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1270: LUCENE-9237: Faster UniformSplit IntersectBlockReader.

2020-02-24 Thread GitBox

dsmiley commented on a change in pull request #1270: LUCENE-9237: Faster 
UniformSplit IntersectBlockReader.
URL: https://github.com/apache/lucene-solr/pull/1270#discussion_r383517461
 
 

 ##
 File path: 
lucene/codecs/src/java/org/apache/lucene/codecs/uniformsplit/IntersectBlockReader.java
 ##
 @@ -18,260 +18,337 @@
 package org.apache.lucene.codecs.uniformsplit;
 
 import java.io.IOException;
-import java.util.Objects;
+import java.util.Arrays;
 
 import org.apache.lucene.codecs.PostingsReaderBase;
 import org.apache.lucene.index.TermState;
 import org.apache.lucene.index.TermsEnum;
 import org.apache.lucene.store.IndexInput;
+import org.apache.lucene.util.ArrayUtil;
 import org.apache.lucene.util.BytesRef;
 import org.apache.lucene.util.BytesRefBuilder;
 import org.apache.lucene.util.IntsRefBuilder;
-import org.apache.lucene.util.StringHelper;
 import org.apache.lucene.util.automaton.Automaton;
 import org.apache.lucene.util.automaton.ByteRunAutomaton;
 import org.apache.lucene.util.automaton.CompiledAutomaton;
-import org.apache.lucene.util.automaton.Operations;
 import org.apache.lucene.util.automaton.Transition;
 
 /**
  * The "intersect" {@link TermsEnum} response to {@link 
UniformSplitTerms#intersect(CompiledAutomaton, BytesRef)},
  * intersecting the terms with an automaton.
+ * 
+ * By design of the UniformSplit block keys, it is less efficient than
+ * {@code org.apache.lucene.codecs.blocktree.IntersectTermsEnum} for {@link 
org.apache.lucene.search.FuzzyQuery} (-37%).
+ * It is slightly slower for {@link org.apache.lucene.search.WildcardQuery} 
(-5%) and slightly faster for
+ * {@link org.apache.lucene.search.PrefixQuery} (+5%).
+ *
+ * @lucene.experimental
  */
 public class IntersectBlockReader extends BlockReader {
 
-  protected final AutomatonNextTermCalculator nextStringCalculator;
-  protected final ByteRunAutomaton runAutomaton;
-  protected final BytesRef commonSuffixRef; // maybe null
-  protected final BytesRef commonPrefixRef;
-  protected final BytesRef startTerm; // maybe null
+  /**
+   * Block iteration order. Whether to move next block, jump to a block away, 
or end the iteration.
+   */
+  protected enum BlockIteration {NEXT, SEEK, END}
 
-  /** Set this when our current mode is seeking to this term.  Set to null 
after. */
-  protected BytesRef seekTerm;
+  /**
+   * Threshold that controls when to attempt to jump to a block away.
+   * 
+   * This counter is 0 when entering a block. It is incremented each time a 
term is rejected by the automaton.
+   * When the counter is greater than or equal to this threshold, then we 
compute the next term accepted by
+   * the automaton, with {@link AutomatonNextTermCalculator}, and we jump to a 
block away if the next term
+   * accepted is greater than the immediate next term in the block.
+   * 
+   * A low value, for example 1, improves the performance of automatons 
requiring many jumps, for example
+   * {@link org.apache.lucene.search.FuzzyQuery} and most {@link 
org.apache.lucene.search.WildcardQuery}.
+   * A higher value improves the performance of automatons with less or no 
jump, for example
+   * {@link org.apache.lucene.search.PrefixQuery}.
+   * A threshold of 4 seems to be a good balance.
+   */
+  protected final int NUM_CONSECUTIVELY_REJECTED_TERMS_THRESHOLD = 4;
 
-  protected int blockPrefixRunAutomatonState;
-  protected int blockPrefixLen;
+  protected final Automaton automaton;
+  protected final ByteRunAutomaton runAutomaton;
+  protected final boolean finite;
+  protected final BytesRef commonSuffix; // maybe null
+  protected final int minTermLength;
+  protected final AutomatonNextTermCalculator nextStringCalculator;
 
   /**
-   * Number of bytes accepted by the last call to {@link 
#runAutomatonForState}.
+   * Set this when our current mode is seeking to this term.  Set to null 
after.
+   */
+  protected BytesRef seekTerm;
+  /**
+   * Number of bytes accepted by the automaton when validating the current 
term.
+   */
+  protected int numMatchedBytes;
+  /**
+   * Automaton states reached when validating the current term, from 0 to 
{@link #numMatchedBytes} - 1.
+   */
+  protected int[] states;
+  /**
+   * Block iteration order determined when scanning the terms in the current 
block.
*/
-  protected int numBytesAccepted;
+  protected BlockIteration blockIteration;
   /**
-   * Whether the current term is beyond the automaton common prefix.
-   * If true this means the enumeration should stop immediately.
+   * Counter of the number of consecutively rejected terms.
+   * Depending on {@link #NUM_CONSECUTIVELY_REJECTED_TERMS_THRESHOLD}, this 
may trigger a jump to a block away.
*/
-  protected boolean beyondCommonPrefix;
+  protected int numConsecutivelyRejectedTerms;
 
-  public IntersectBlockReader(CompiledAutomaton compiled, BytesRef startTerm,
-  IndexDictionary.BrowserSupplier 
dictionaryBrowserSupplier, IndexInput blockInput, PostingsReaderBase

[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1270: LUCENE-9237: Faster UniformSplit IntersectBlockReader.

2020-02-24 Thread GitBox

dsmiley commented on a change in pull request #1270: LUCENE-9237: Faster 
UniformSplit IntersectBlockReader.
URL: https://github.com/apache/lucene-solr/pull/1270#discussion_r383536370
 
 

 ##
 File path: 
lucene/codecs/src/java/org/apache/lucene/codecs/uniformsplit/IntersectBlockReader.java
 ##
 @@ -285,64 +362,66 @@ public void seekExact(long ord) {
   }
 
   @Override
-  public SeekStatus seekCeil(BytesRef text) {
+  public void seekExact(BytesRef term, TermState state) {
 throw new UnsupportedOperationException();
   }
 
   @Override
-  public void seekExact(BytesRef term, TermState state) {
+  public SeekStatus seekCeil(BytesRef text) {
 throw new UnsupportedOperationException();
   }
 
   /**
* This is a copy of AutomatonTermsEnum.  Since it's an inner class, the 
outer class can
 
 Review comment:
   Well; it's _mostly_ a copy of AutomatonTermsEnum now :-/  The duplication is 
a shame.  Just insert the word "_mostly_" and it satisfies me.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13910) Create security news feed on website with RSS/Atom feed

2020-02-24 Thread Uwe Schindler (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043952#comment-17043952
 ] 

Uwe Schindler commented on SOLR-13910:
--

The header line wraps now on my computer. It looks like the menu font size is a 
bit too large. I will reduce it a bit, so it's 0.92rem instead of 1rem. It 
seems to depend on your browser and screen resolution if this occurs.

> Create security news feed on website with RSS/Atom feed
> ---
>
> Key: SOLR-13910
> URL: https://issues.apache.org/jira/browse/SOLR-13910
> Project: Solr
>  Issue Type: Task
>  Components: website
>Reporter: Adam Walz
>Assignee: Jan Høydahl
>Priority: Minor
> Attachments: recent-security-ann.png, security-page-with-table.png, 
> security-page-with-table.png, solr-security-page.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> From [~janhoy]
> We're in the process of migrating our web site to Git and in that same
> process we also change CMS from an ASF one to Pelican. The new site has
> built-in support for news posts as individual files and also RSS feeds of
> those. So I propose to add [https://lucene.apache.org/solr/security.html]
> to the site, including a list of newest CVEs and an RSS/Atom feed to go
> along with it. This way users have ONE place to visit to check security
> announcements and they can monitor RSS to be alerted once we post a new
> announcement.
> We could also add RSS feeds for Lucene-core news and Solr-news sections
> of course.
> At the same time I propose that the news on the front-page 
> [lucene.apache.org|http://lucene.apache.org/]
> is replaced with widgets that show the title only of the last 3 announcements
> from Lucene, Solr and PyLucene sub projects. That front page is waaay
> too long :)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9201) Port documentation-lint task to Gradle build

2020-02-24 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043951#comment-17043951
 ] 

Dawid Weiss commented on LUCENE-9201:
-

A custom javadoc invocation is certainly possible and could possibly make 
things easier in the long run. You'd need to declare inputs/ outputs properly 
though so that it is skippable. Those javadoc invocations take a long time  in 
precommit.

> Port documentation-lint task to Gradle build
> 
>
> Key: LUCENE-9201
> URL: https://issues.apache.org/jira/browse/LUCENE-9201
> Project: Lucene - Core
>  Issue Type: Sub-task
>Affects Versions: master (9.0)
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Major
> Attachments: LUCENE-9201-ecj-2.patch, LUCENE-9201-ecj.patch, 
> LUCENE-9201-missing-docs.patch, LUCENE-9201.patch, javadocGRADLE.png, 
> javadocHTML4.png, javadocHTML5.png
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Ant build's "documentation-lint" target consists of those two sub targets.
>  * "-ecj-javadoc-lint" (Javadoc linting by ECJ)
>  * "-documentation-lint"(Missing javadocs / broken links check by python 
> scripts)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8987) Move Lucene web site from svn to git

2020-02-24 Thread Uwe Schindler (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043942#comment-17043942
 ] 

Uwe Schindler commented on LUCENE-8987:
---

bq. I attempted a fix to the CSS caching issue. It is just a simple Pelican 
variable that gets injected for every unversioned CSS and JS in our HTML 
templates. See https://github.com/apache/lucene-site/pull/13 - Adding this 
should make the new front page load well for everyone after publishing 

I improved the CSS/JS caching: Whenever the {{v=X}} query string is 
appended, the underlying Apache is now sending a Cache-Control header. This 
will cache the resources for longer time (I started with 10 days). This 
improves page loads, as not event If-Modified-Since requests need to be done.

> Move Lucene web site from svn to git
> 
>
> Key: LUCENE-8987
> URL: https://issues.apache.org/jira/browse/LUCENE-8987
> Project: Lucene - Core
>  Issue Type: Task
>  Components: general/website
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
> Attachments: lucene-site-repo.png
>
>
> INFRA just enabled [a new way of configuring website 
> build|https://s.apache.org/asfyaml] from a git branch, [see dev list 
> email|https://lists.apache.org/thread.html/b6f7e40bece5e83e27072ecc634a7815980c90240bc0a2ccb417f1fd@%3Cdev.lucene.apache.org%3E].
>  It allows for automatic builds of both staging and production site, much 
> like the old CMS. We can choose to auto publish the html content of an 
> {{output/}} folder, or to have a bot build the site using 
> [Pelican|https://github.com/getpelican/pelican] from a {{content/}} folder.
> The goal of this issue is to explore how this can be done for 
> [http://lucene.apache.org|http://lucene.apache.org/] by, by creating a new 
> git repo {{lucene-site}}, copy over the site from svn, see if it can be 
> "Pelicanized" easily and then test staging. Benefits are that more people 
> will be able to edit the web site and we can take PRs from the public (with 
> GitHub preview of pages).
> Non-goals:
>  * Create a new web site or a new graphic design
>  * Change from Markdown to Asciidoc



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9241) fix most memory-hungry tests

2020-02-24 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043928#comment-17043928
 ] 

Dawid Weiss commented on LUCENE-9241:
-

I have reviewed it as well. :) Except for the things I mentioned I didn't think 
anything else was worth mentioning. Direct memory allocation may be misleading 
in that it is still allocation but escapes the heap... but I don't have an 
opinion on that (whether it's a good thing or not) so I'll just leave it up to 
you.

> fix most memory-hungry tests
> 
>
> Key: LUCENE-9241
> URL: https://issues.apache.org/jira/browse/LUCENE-9241
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9241.patch
>
>
> Currently each test jvm has Xmx of 512M. With a modern macbook pro this is 
> 4GB which is pretty crazy.
> On the other hand, if we fix a few edge cases, tests can work with lower 
> heaps such as 128M. This can save many gigabytes (also it finds interesting 
> memory waste/issues).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] andyvuong commented on a change in pull request #1223: SOLR-14213: Configuring Solr Cloud to use Shared Storage

2020-02-24 Thread GitBox

andyvuong commented on a change in pull request #1223: SOLR-14213: Configuring 
Solr Cloud to use Shared Storage
URL: https://github.com/apache/lucene-solr/pull/1223#discussion_r383548881
 
 

 ##
 File path: 
solr/core/src/java/org/apache/solr/store/shared/SharedStoreManager.java
 ##
 @@ -43,68 +43,38 @@
 
   public SharedStoreManager(ZkController controller) {
 zkController = controller;
-// initialize BlobProcessUtil with the SharedStoreManager for background 
processes to be ready
-blobProcessUtil = new BlobProcessUtil(zkController.getCoreContainer());
-blobCoreSyncer = new BlobCoreSyncer();
-sharedCoreConcurrencyController = new 
SharedCoreConcurrencyController(zkController.getCoreContainer());
-  }
-  
-  @VisibleForTesting
-  public void initBlobStorageProvider(BlobStorageProvider blobStorageProvider) 
{
-this.blobStorageProvider = blobStorageProvider;
-  }
-  
-  @VisibleForTesting
-  public void initBlobProcessUtil(BlobProcessUtil processUtil) {
-if (blobProcessUtil != null) {
-  blobProcessUtil.shutdown();
-}
-blobProcessUtil = processUtil;
+blobStorageProvider = new BlobStorageProvider();
+blobDeleteManager = new 
BlobDeleteManager(getBlobStorageProvider().getClient());
+corePullTracker = new CorePullTracker();
+sharedShardMetadataController = new 
SharedShardMetadataController(zkController.getSolrCloudManager());
+sharedCoreConcurrencyController = new 
SharedCoreConcurrencyController(sharedShardMetadataController);
   }
   
-  /*
-   * Initiates a SharedShardMetadataController if it doesn't exist and returns 
one 
+  /**
+   * Start blob processes that depend on an initiated SharedStoreManager
*/
+  public void load() {
+blobCoreSyncer = new BlobCoreSyncer();
 
 Review comment:
   For the first problem, there are shared storage components that have 
corecontainer injected explicitly in their api methods or via other 
dependencies that have getters opening access to it (zkcontroller for example). 
There are also shared storage components that have it injected in the 
constructor. Thinking about and looking at it again, it's kind of a mess 
identifying the initialization flows/orders and I might need to refactor a 
bunch of things here for better consistency


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13965) Adding new functions to GraphHandler should be same as Streamhandler

2020-02-24 Thread Lucene/Solr QA (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043879#comment-17043879
 ] 

Lucene/Solr QA commented on SOLR-13965:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} | 
{color:green}  1m 10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Check forbidden APIs {color} | 
{color:green}  1m  6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} | 
{color:green}  1m  6s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate ref guide {color} | 
{color:green}  1m  6s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 46m 
43s{color} | {color:green} core in the patch passed. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 51m 24s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | SOLR-13965 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12994461/SOLR-13965.02.patch |
| Optional Tests |  compile  javac  unit  ratsources  checkforbiddenapis  
validatesourcepatterns  validaterefguide  |
| uname | Linux lucene1-us-west 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 
10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
 |
| git revision | master / 1770797387d |
| ant | version: Apache Ant(TM) version 1.10.5 compiled on March 28 2019 |
| Default Java | LTS |
|  Test Results | 
https://builds.apache.org/job/PreCommit-SOLR-Build/688/testReport/ |
| modules | C: solr/core solr/solr-ref-guide U: solr |
| Console output | 
https://builds.apache.org/job/PreCommit-SOLR-Build/688/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> Adding new functions to GraphHandler should be same as Streamhandler
> 
>
> Key: SOLR-13965
> URL: https://issues.apache.org/jira/browse/SOLR-13965
> Project: Solr
>  Issue Type: Improvement
>  Components: streaming expressions
>Affects Versions: 8.3
>Reporter: David Eric Pugh
>Priority: Minor
> Attachments: SOLR-13965.01.patch, SOLR-13965.02.patch
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Currently you add new functions to GraphHandler differently than you do in 
> StreamHandler.  We should have one way of extending the handlers that support 
> streaming expressions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9227) Make page ready for pure HTTPS

2020-02-24 Thread Jira



[ 
https://issues.apache.org/jira/browse/LUCENE-9227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043855#comment-17043855
 ] 

Jan Høydahl commented on LUCENE-9227:
-

Tested with browser and curl. The redirect works, but I know nothing about STS 
:) 

> Make page ready for pure HTTPS
> --
>
> Key: LUCENE-9227
> URL: https://issues.apache.org/jira/browse/LUCENE-9227
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>Priority: Blocker
>
> The web page can currently be visited using HTTPS but this brings warning:
> - Both search providers create a form that passes USER ENTERED INPUT using no 
> encryption. This is not allowed due to GDPR. We have to fix this asap. It 
> looks like [~otis] search is working with HTTPS (if we change domain name), 
> but the Lucidworks does not
> - There were some CSS files loaded with HTTP (fonts from Google - this was 
> fixed)
> Once those 2 problems are fixed (I grepped for HTTP and still found many 
> links with HTTP, but looks like no images or scripts or css anymore), I'd 
> like to add a permanent redirect http://lucene.apache.org/ -> 
> https://lucene.apache.org to the htaccess template file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14137) Boosting by date (and perhaps others) shows a steady decline 6.6->8.3

2020-02-24 Thread Erick Erickson (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043840#comment-17043840
 ] 

Erick Erickson commented on SOLR-14137:
---

The programs I use to generate docs and run Jmeter are here: 
[https://github.com/ErickErickson/index_doc_generator.] It's a bit of a mess, I 
was trying several different things. But if people want to work with it I can 
help untangle it.

> Boosting by date (and perhaps others) shows a steady decline 6.6->8.3
> -
>
> Key: SOLR-14137
> URL: https://issues.apache.org/jira/browse/SOLR-14137
> Project: Solr
>  Issue Type: Improvement
>Reporter: Erick Erickson
>Priority: Major
> Attachments: Screen Shot 2019-12-19 at 2.35.41 PM.png, Screen Shot 
> 2019-12-19 at 3.09.37 PM.png, Screen Shot 2019-12-19 at 3.31.16 PM.png, 
> second_run.png
>
>
> Moving a user's list discussion over here.
> {color:#00}The very short form is that from Solr 6.6.1 to Solr 8.3.1, the 
> throughput for date boosting in my tests dropped by 40+%{color}
> {color:#00}I’ve been hearing about slowdowns in successive Solr releases 
> with boost functions, so I dug into it a bit. The test setup is just a 
> boost-by-date with an additional big OR clause of 100 random words so I’d be 
> sure to hit a bunch of docs. I figured that if there were few hits, the 
> signal would be lost in the noise, but I didn’t look at the actual hit 
> counts.{color}
> {color:#00}I saw several Solr JIRAs about this subject, but they were 
> slightly different, although quite possibly the same underlying issue. So I 
> tried to get this down to a very specific form of a query.{color}
> {color:#00}I’ve also seen some cases in the wild where the response was 
> proportional to the number of segments, thus my optimize experiments.{color}
> {color:#00}Here are the results, explanation below. O stands for 
> optimized to one segment. I spot checked pdate against 6.6, 7.1 and 8.3 and 
> they weren’t significantly different performance wise from tdate. All have 
> docValues enabled. I ran these against a multiValued=“false” field. All the 
> tests pegged all my CPUs. Jmeter is being run on a different machine than 
> Solr. Only one Solr was running for any test.{color}
> {color:#00}Solr version   queries/min   {color}
> {color:#00}6.6.1  3,400  {color}
> {color:#00}6.6.1 O   4,800  {color}
> {color:#00}7.1 2,800   {color}
> {color:#00}7.1 O 4,200   {color}
> {color:#00}7.7.1  2,400   {color}
> {color:#00}7.7.1 O  3,500    {color}
> {color:#00}8.3.1 2,000    {color}
> {color:#00}8.3.1 O  2,600    {color}
> {color:#00}The tests I’ve been running just index 20M docs into a single 
> core, then run the exact same 10,000 queries against them from jmeter with 24 
> threads. Spot checks showed no hits on the queryResultCache.{color}
> {color:#00}A query looks like this: {color}
> {color:#00}rows=0&\{!boost b=recip(ms(NOW, 
> INSERT_FIELD_HERE),3.16e-11,1,1)}text_txt:(campaigners OR adjourned OR 
> anyplace…97 more random words){color}
> {color:#00}There is no faceting. No grouping. No sorting.{color}
> {color:#00}I fill in INSERT_FIELD_HERE through jmeter magic. I’m running 
> the exact same queries for every test.{color}
> {color:#00}One wildcard is that I did regenerate the index for each major 
> revision, and the chose random words from the same list of words, as well as 
> random times (bounded in the same range though) so the docs are not 
> completely identical. The index was in the native format for that major 
> version even if slightly different between versions. I ran the test once, 
> then ran it again after optimizing the index.{color}
> {color:#00}I haven’t dug any farther, if anyone’s interested I can throw 
> a profiler at, say, 8.3 and see what I can see, although I’m not going to 
> have time to dive into this any time soon. I’d be glad to run some tests 
> though. I saved the queries and the indexes so running a test would  only 
> take a few minutes.{color}
> {color:#00}While I concentrated on date fields, the docs have date, int, 
> and long fields, both docValues=true and docValues=false, each variant with 
> multiValued=true and multiValued=false and both Trie and Point (where 
> possible) variants as well as a pretty simple text field.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9237) Faster TermsEnum intersect for UniformSplit

2020-02-24 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043837#comment-17043837
 ] 

David Smiley commented on LUCENE-9237:
--

Were you able to do a comparison while keeping the term dictionary memory usage 
equal?  This will take some repeated tweaking of the parameters that 
UniformSplit provides and then examine the size of the term dict files (or some 
similar approach).  Annoying; i know.  Without doing this, we allow any 
postings format to cheat by using memory gratuitously over its competitor.   An 
analogy is doing tour de france competition and not checking who is on drugs 
:-D. Or at least allowing an equal amount of drugs for the contestants -- LOL I 
amuse myself.  Also, check that the on-heap vs off-heap FST usage is equivalent 
amongst the contestants, as this is easily toggled by any format.

> Faster TermsEnum intersect for UniformSplit
> ---
>
> Key: LUCENE-9237
> URL: https://issues.apache.org/jira/browse/LUCENE-9237
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Bruno Roustant
>Assignee: Bruno Roustant
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> New version of TermsEnum intersect for UniformSplit. It is 75% more efficient 
> than the previous version for FuzzyQuery.
> Compared to BlockTree IntersectTermsEnum:
>  - It is still slower for FuzzyQuery (-37%) but it is faster than the 
> previous version (which was -65%).
>  - It is slightly slower for WildcardQuery (-5%).
>  - It is slightly faster for PrefixQuery (+5%). Sometimes benchmarks show 
> more improvement (I've seen up to +17% a fourth of the time).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14223) PublicKeyHandler consumes a lot of entropy during tests

2020-02-24 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043804#comment-17043804
 ] 

ASF subversion and git services commented on SOLR-14223:


Commit 1770797387d761706c6d93253a3759d885f662c4 in lucene-solr's branch 
refs/heads/master from Mike
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=1770797 ]

SOLR-14223 Create RSAKeyPair from disk (#1217)

* Create properties for PublicKeyHandler to read existing keys from disk
* Move pregenerated keys from core/test-files to test-framework
* Update tests to use existing keys instead of new keys each run

> PublicKeyHandler consumes a lot of entropy during tests
> ---
>
> Key: SOLR-14223
> URL: https://issues.apache.org/jira/browse/SOLR-14223
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.4, 8.0
>Reporter: Mike Drob
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> After the changes in SOLR-12354 to eagerly create a {{PublicKeyHandler}} for 
> the CoreContainer, the creation of the underlying {{RSAKeyPair}} uses 
> {{SecureRandom}} to generate primes. This eats up a lot of system entropy and 
> can slow down tests significantly (I observed it adding 10s to an individual 
> test).
> Similar to what we do for SSL config for tests, we can swap in a non blocking 
> implementation of SecureRandom for the key pair generation to allow multiple 
> tests to run better in parallel. Primality testing with BigInteger is also 
> slow, so I'm not sure how much total speedup we can get here, maybe it's 
> worth checking if there are faster implementations out there in other 
> libraries.
> In production cases, this also blocks creation of all cores. We should only 
> create the Handler if necessary, i.e. if the existing authn/z tell us that 
> they won't support internode requests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (SOLR-14223) PublicKeyHandler consumes a lot of entropy during tests

2020-02-24 Thread Mike Drob (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob resolved SOLR-14223.
--
Fix Version/s: master (9.0)
 Assignee: Mike Drob
   Resolution: Fixed

> PublicKeyHandler consumes a lot of entropy during tests
> ---
>
> Key: SOLR-14223
> URL: https://issues.apache.org/jira/browse/SOLR-14223
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 7.4, 8.0
>Reporter: Mike Drob
>Assignee: Mike Drob
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> After the changes in SOLR-12354 to eagerly create a {{PublicKeyHandler}} for 
> the CoreContainer, the creation of the underlying {{RSAKeyPair}} uses 
> {{SecureRandom}} to generate primes. This eats up a lot of system entropy and 
> can slow down tests significantly (I observed it adding 10s to an individual 
> test).
> Similar to what we do for SSL config for tests, we can swap in a non blocking 
> implementation of SecureRandom for the key pair generation to allow multiple 
> tests to run better in parallel. Primality testing with BigInteger is also 
> slow, so I'm not sure how much total speedup we can get here, maybe it's 
> worth checking if there are faster implementations out there in other 
> libraries.
> In production cases, this also blocks creation of all cores. We should only 
> create the Handler if necessary, i.e. if the existing authn/z tell us that 
> they won't support internode requests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] madrob merged pull request #1217: SOLR-14223 PublicKeyHandler consumes a lot of entropy during tests

2020-02-24 Thread GitBox

madrob merged pull request #1217: SOLR-14223 PublicKeyHandler consumes a lot of 
entropy during tests
URL: https://github.com/apache/lucene-solr/pull/1217
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dnhatn commented on a change in pull request #1274: LUCENE-9164: Prevent IW from closing gracefully if threads are still modifying

2020-02-24 Thread GitBox

dnhatn commented on a change in pull request #1274: LUCENE-9164: Prevent IW 
from closing gracefully if threads are still modifying
URL: https://github.com/apache/lucene-solr/pull/1274#discussion_r383433653
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
 ##
 @@ -3132,8 +3139,9 @@ public final long prepareCommit() throws IOException {
* @return true iff this method flushed at least on segment to 
disk.
* @lucene.experimental
*/
+  @SuppressWarnings("try")
   public final boolean flushNextBuffer() throws IOException {
-try {
+try (Closeable finalizer = acquireModificationLease()){
 
 Review comment:
   nit: add a space before `{`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dnhatn commented on a change in pull request #1274: LUCENE-9164: Prevent IW from closing gracefully if threads are still modifying

2020-02-24 Thread GitBox

dnhatn commented on a change in pull request #1274: LUCENE-9164: Prevent IW 
from closing gracefully if threads are still modifying
URL: https://github.com/apache/lucene-solr/pull/1274#discussion_r383431054
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
 ##
 @@ -1552,19 +1551,25 @@ public long deleteDocuments(Query... queries) throws 
IOException {
   }
 }
 
-try {
-  long seqNo = docWriter.deleteQueries(queries);
-  if (seqNo < 0) {
-seqNo = -seqNo;
-processEvents(true);
-  }
-
-  return seqNo;
+try (Closeable finalizer = acquireModificationLease()) {
+  return maybeProcessEvents(docWriter.deleteQueries(queries));
 } catch (VirtualMachineError tragedy) {
   tragicEvent(tragedy, "deleteDocuments(Query..)");
   throw tragedy;
 }
   }
+  private Closeable acquireModificationLease() {
 
 Review comment:
   nit: add a new line


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dnhatn commented on a change in pull request #1274: LUCENE-9164: Prevent IW from closing gracefully if threads are still modifying

2020-02-24 Thread GitBox

dnhatn commented on a change in pull request #1274: LUCENE-9164: Prevent IW 
from closing gracefully if threads are still modifying
URL: https://github.com/apache/lucene-solr/pull/1274#discussion_r383433409
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
 ##
 @@ -3560,25 +3569,19 @@ private boolean doFlush(boolean applyAllDeletes) 
throws IOException {
 doBeforeFlush();
 testPoint("startDoFlush");
 boolean success = false;
-try {
+try (Closeable finalizer = acquireModificationLease()){
 
 Review comment:
   nit: add a space before `{`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dnhatn commented on a change in pull request #1274: LUCENE-9164: Prevent IW from closing gracefully if threads are still modifying

2020-02-24 Thread GitBox

dnhatn commented on a change in pull request #1274: LUCENE-9164: Prevent IW 
from closing gracefully if threads are still modifying
URL: https://github.com/apache/lucene-solr/pull/1274#discussion_r383443954
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/index/IndexWriter.java
 ##
 @@ -2417,7 +2424,7 @@ public long deleteAll() throws IOException {
  */
 try {
   synchronized (fullFlushLock) {
-try (Closeable finalizer = docWriter.lockAndAbortAll()) {
+try (Closeable finalizer = 
acquireModificationLease(docWriter.lockAndAbortAll())) {
 
 Review comment:
   Do you have to release locks in the reverse order? Can we have two 
try-resources here instead of passing the lock to `acquireModificationLease` ? 
If so, we can remove the `in` parameter from `acquireModificationLease`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-9227) Make page ready for pure HTTPS

2020-02-24 Thread Uwe Schindler (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043794#comment-17043794
 ] 

Uwe Schindler edited comment on LUCENE-9227 at 2/24/20 7:56 PM:


I committed the following to htaccess.template:

{noformat}

  Header always set Strict-Transport-Security "max-age=300"


  RewriteRule ^ https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]

{noformat}

This is IMHO the most consistent way to express this. There are shorter ways, 
but the if/else statements are easier to read:
- If user is on HTTPS, he/she gets STS header (for testing purposes limited to 
300s)
- If user is on HTTP, he/she gets redirect to HTTPS (permanent)

{noformat}
Uwe Schindler@VEGA:~ > curl -I https://lucene.staged.apache.org/
HTTP/1.1 200 OK
Date: Mon, 24 Feb 2020 19:40:37 GMT
Server: Apache
Strict-Transport-Security: max-age=300
Last-Modified: Fri, 21 Feb 2020 12:58:09 GMT
ETag: "394a-59f1592c57599"
Accept-Ranges: bytes
Content-Length: 14666
Vary: Accept-Encoding
Content-Type: text/html

Uwe Schindler@VEGA:~ > curl -I http://lucene.staged.apache.org/test?hallo
HTTP/1.1 301 Moved Permanently
Date: Mon, 24 Feb 2020 19:44:03 GMT
Server: Apache
Location: https://lucene.staged.apache.org/test?hallo
Content-Type: text/html; charset=iso-8859-1
{noformat}

I plan to merge this to master quite soon, so please test it! I will keep the 
STS header with 300seconds for a while and then raise to one year, if no 
complaints are coming.


was (Author: thetaphi):
I committed the following to htaccess.template:

{noformat}

  Header always set Strict-Transport-Security "max-age=300"


  RewriteRule ^ https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]

{noformat}

This is IMHO the most consistent way to express this. There are shorter ways, 
but the if/else statements are easier to read:
- If user is on HTTPS, he gets STS header (for testing purposes, limited to 
300s)
- If user is on HTTP, he gets redirect to HTTPS (permanent)

{noformat}
Uwe Schindler@VEGA:~ > curl -I https://lucene.staged.apache.org/
HTTP/1.1 200 OK
Date: Mon, 24 Feb 2020 19:40:37 GMT
Server: Apache
Strict-Transport-Security: max-age=300
Last-Modified: Fri, 21 Feb 2020 12:58:09 GMT
ETag: "394a-59f1592c57599"
Accept-Ranges: bytes
Content-Length: 14666
Vary: Accept-Encoding
Content-Type: text/html

Uwe Schindler@VEGA:~ > curl -I http://lucene.staged.apache.org/test?hallo
HTTP/1.1 301 Moved Permanently
Date: Mon, 24 Feb 2020 19:44:03 GMT
Server: Apache
Location: https://lucene.staged.apache.org/test?hallo
Content-Type: text/html; charset=iso-8859-1
{noformat}

I plan to merge this to master quite soon, so please test it! I will keep the 
STS header with 300seconds for a while and then raise to one year, if no 
complaints are coming.

> Make page ready for pure HTTPS
> --
>
> Key: LUCENE-9227
> URL: https://issues.apache.org/jira/browse/LUCENE-9227
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>Priority: Blocker
>
> The web page can currently be visited using HTTPS but this brings warning:
> - Both search providers create a form that passes USER ENTERED INPUT using no 
> encryption. This is not allowed due to GDPR. We have to fix this asap. It 
> looks like [~otis] search is working with HTTPS (if we change domain name), 
> but the Lucidworks does not
> - There were some CSS files loaded with HTTP (fonts from Google - this was 
> fixed)
> Once those 2 problems are fixed (I grepped for HTTP and still found many 
> links with HTTP, but looks like no images or scripts or css anymore), I'd 
> like to add a permanent redirect http://lucene.apache.org/ -> 
> https://lucene.apache.org to the htaccess template file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-9227) Make page ready for pure HTTPS

2020-02-24 Thread Uwe Schindler (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043794#comment-17043794
 ] 

Uwe Schindler edited comment on LUCENE-9227 at 2/24/20 7:55 PM:


I committed the following to htaccess.template:

{noformat}

  Header always set Strict-Transport-Security "max-age=300"


  RewriteRule ^ https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]

{noformat}

This is IMHO the most consistent way to express this. There are shorter ways, 
but the if/else statements are easier to read:
- If user is on HTTPS, he gets STS header (for testing purposes, limited to 
300s)
- If user is on HTTP, he gets redirect to HTTPS (permanent)

{noformat}
Uwe Schindler@VEGA:~ > curl -I https://lucene.staged.apache.org/
HTTP/1.1 200 OK
Date: Mon, 24 Feb 2020 19:40:37 GMT
Server: Apache
Strict-Transport-Security: max-age=300
Last-Modified: Fri, 21 Feb 2020 12:58:09 GMT
ETag: "394a-59f1592c57599"
Accept-Ranges: bytes
Content-Length: 14666
Vary: Accept-Encoding
Content-Type: text/html

Uwe Schindler@VEGA:~ > curl -I http://lucene.staged.apache.org/test?hallo
HTTP/1.1 301 Moved Permanently
Date: Mon, 24 Feb 2020 19:44:03 GMT
Server: Apache
Location: https://lucene.staged.apache.org/test?hallo
Content-Type: text/html; charset=iso-8859-1
{noformat}

I plan to merge this to master quite soon, so please test it! I will keep the 
STS header with 300seconds for a while and then raise to one year, if no 
complaints are coming.


was (Author: thetaphi):
I committed the following to htaccess.template:

{noformat}

  Header always set Strict-Transport-Security "max-age=300"


  RewriteRule ^ https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]

{noformat}

This is IMHO the most consistent way to express this. There are shorter ways, 
but the if/else statements are easier to read:
- If user is on HTTPS, he gets STS header (for testing purposes, limited to 
300s)
- If user is on HTTP, he gets redirect to HTTPS (permanent)

{noformat}
Uwe Schindler@VEGA:~ > curl -I https://lucene.staged.apache.org/
HTTP/1.1 200 OK
Date: Mon, 24 Feb 2020 19:40:37 GMT
Server: Apache
Strict-Transport-Security: max-age=300
Last-Modified: Fri, 21 Feb 2020 12:58:09 GMT
ETag: "394a-59f1592c57599"
Accept-Ranges: bytes
Content-Length: 14666
Vary: Accept-Encoding
Content-Type: text/html

Uwe Schindler@VEGA:~ > curl -I http://lucene.staged.apache.org/test?hallo
HTTP/1.1 301 Moved Permanently
Date: Mon, 24 Feb 2020 19:44:03 GMT
Server: Apache
Location: https://lucene.staged.apache.org/test?hallo
Content-Type: text/html; charset=iso-8859-1

Uwe Schindler@VEGA:~ > curl -I https://lucene.staged.apache.org/
HTTP/1.1 200 OK
Date: Mon, 24 Feb 2020 19:44:09 GMT
Server: Apache
Strict-Transport-Security: max-age=300
Last-Modified: Fri, 21 Feb 2020 12:58:09 GMT
ETag: "394a-59f1592c57599"
Accept-Ranges: bytes
Content-Length: 14666
Vary: Accept-Encoding
Content-Type: text/html
{noformat}

I plan to merge this to master quite soon, so please test it! I will keep the 
STS header with 300seconds for a while and then raise to one year, if no 
complaints are coming.

> Make page ready for pure HTTPS
> --
>
> Key: LUCENE-9227
> URL: https://issues.apache.org/jira/browse/LUCENE-9227
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>Priority: Blocker
>
> The web page can currently be visited using HTTPS but this brings warning:
> - Both search providers create a form that passes USER ENTERED INPUT using no 
> encryption. This is not allowed due to GDPR. We have to fix this asap. It 
> looks like [~otis] search is working with HTTPS (if we change domain name), 
> but the Lucidworks does not
> - There were some CSS files loaded with HTTP (fonts from Google - this was 
> fixed)
> Once those 2 problems are fixed (I grepped for HTTP and still found many 
> links with HTTP, but looks like no images or scripts or css anymore), I'd 
> like to add a permanent redirect http://lucene.apache.org/ -> 
> https://lucene.apache.org to the htaccess template file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9227) Make page ready for pure HTTPS

2020-02-24 Thread Uwe Schindler (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043794#comment-17043794
 ] 

Uwe Schindler commented on LUCENE-9227:
---

I committed the following to htaccess.template:

{noformat}

  Header always set Strict-Transport-Security "max-age=300"


  RewriteRule ^ https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]

{noformat}

This is IMHO the most consistent way to express this. There are shorter ways, 
but the if/else statements are easier to read:
- If user is on HTTPS, he gets STS header (for testing purposes, limited to 
300s)
- If user is on HTTP, he gets redirect to HTTPS (permanent)

{noformat}
Uwe Schindler@VEGA:~ > curl -I https://lucene.staged.apache.org/
HTTP/1.1 200 OK
Date: Mon, 24 Feb 2020 19:40:37 GMT
Server: Apache
Strict-Transport-Security: max-age=300
Last-Modified: Fri, 21 Feb 2020 12:58:09 GMT
ETag: "394a-59f1592c57599"
Accept-Ranges: bytes
Content-Length: 14666
Vary: Accept-Encoding
Content-Type: text/html

Uwe Schindler@VEGA:~ > curl -I http://lucene.staged.apache.org/test?hallo
HTTP/1.1 301 Moved Permanently
Date: Mon, 24 Feb 2020 19:44:03 GMT
Server: Apache
Location: https://lucene.staged.apache.org/test?hallo
Content-Type: text/html; charset=iso-8859-1

Uwe Schindler@VEGA:~ > curl -I https://lucene.staged.apache.org/
HTTP/1.1 200 OK
Date: Mon, 24 Feb 2020 19:44:09 GMT
Server: Apache
Strict-Transport-Security: max-age=300
Last-Modified: Fri, 21 Feb 2020 12:58:09 GMT
ETag: "394a-59f1592c57599"
Accept-Ranges: bytes
Content-Length: 14666
Vary: Accept-Encoding
Content-Type: text/html
{noformat}

I plan to merge this to master quite soon, so please test it! I will keep the 
STS header with 300seconds for a while and then raise to one year, if no 
complaints are coming.

> Make page ready for pure HTTPS
> --
>
> Key: LUCENE-9227
> URL: https://issues.apache.org/jira/browse/LUCENE-9227
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>Priority: Blocker
>
> The web page can currently be visited using HTTPS but this brings warning:
> - Both search providers create a form that passes USER ENTERED INPUT using no 
> encryption. This is not allowed due to GDPR. We have to fix this asap. It 
> looks like [~otis] search is working with HTTPS (if we change domain name), 
> but the Lucidworks does not
> - There were some CSS files loaded with HTTP (fonts from Google - this was 
> fixed)
> Once those 2 problems are fixed (I grepped for HTTP and still found many 
> links with HTTP, but looks like no images or scripts or css anymore), I'd 
> like to add a permanent redirect http://lucene.apache.org/ -> 
> https://lucene.apache.org to the htaccess template file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13965) Adding new functions to GraphHandler should be same as Streamhandler

2020-02-24 Thread David Eric Pugh (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043785#comment-17043785
 ] 

David Eric Pugh commented on SOLR-13965:


LGTM

> Adding new functions to GraphHandler should be same as Streamhandler
> 
>
> Key: SOLR-13965
> URL: https://issues.apache.org/jira/browse/SOLR-13965
> Project: Solr
>  Issue Type: Improvement
>  Components: streaming expressions
>Affects Versions: 8.3
>Reporter: David Eric Pugh
>Priority: Minor
> Attachments: SOLR-13965.01.patch, SOLR-13965.02.patch
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Currently you add new functions to GraphHandler differently than you do in 
> StreamHandler.  We should have one way of extending the handlers that support 
> streaming expressions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13965) Adding new functions to GraphHandler should be same as Streamhandler

2020-02-24 Thread Christine Poerschke (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043782#comment-17043782
 ] 

Christine Poerschke commented on SOLR-13965:


Accounting for the \{{StreamHandler.addExpressiblePlugins}} factoring out 
(already committed above) the just attached \{{SOLR-13965.02.patch}} is what 
remains here from the https://github.com/apache/lucene-solr/pull/1033 I think.

If there are no concerns or objections I'll aim to commit that later this week.

> Adding new functions to GraphHandler should be same as Streamhandler
> 
>
> Key: SOLR-13965
> URL: https://issues.apache.org/jira/browse/SOLR-13965
> Project: Solr
>  Issue Type: Improvement
>  Components: streaming expressions
>Affects Versions: 8.3
>Reporter: David Eric Pugh
>Priority: Minor
> Attachments: SOLR-13965.01.patch, SOLR-13965.02.patch
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Currently you add new functions to GraphHandler differently than you do in 
> StreamHandler.  We should have one way of extending the handlers that support 
> streaming expressions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-13965) Adding new functions to GraphHandler should be same as Streamhandler

2020-02-24 Thread Christine Poerschke (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christine Poerschke updated SOLR-13965:
---
Attachment: SOLR-13965.02.patch

> Adding new functions to GraphHandler should be same as Streamhandler
> 
>
> Key: SOLR-13965
> URL: https://issues.apache.org/jira/browse/SOLR-13965
> Project: Solr
>  Issue Type: Improvement
>  Components: streaming expressions
>Affects Versions: 8.3
>Reporter: David Eric Pugh
>Priority: Minor
> Attachments: SOLR-13965.01.patch, SOLR-13965.02.patch
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Currently you add new functions to GraphHandler differently than you do in 
> StreamHandler.  We should have one way of extending the handlers that support 
> streaming expressions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14278) data loss during live shard split if leader dies

2020-02-24 Thread Yonik Seeley (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043780#comment-17043780
 ] 

Yonik Seeley commented on SOLR-14278:
-

Testing update: I let the test loop overnight with split shard commented out.  
There were no failures.  With the split in the test, the failure rate looks 
somewhere between 30-50% on my hardware. 

> data loss during live shard split if leader dies
> 
>
> Key: SOLR-14278
> URL: https://issues.apache.org/jira/browse/SOLR-14278
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While trying to develop better tests for shared storage (SOLR-13101), I ran 
> across another failure for normal replica types as well (one of the first 
> things I do when a test fails for shared storage is to try and validate that 
> normal NRT replicas succeed.)  The PR I'll open has a test adapted from the 
> one in SOLR-13813 for master.
> Scenario:
>   - indexing is happening during shard split
>   - leader is killed shortly after (before the split has finished) and never 
> brought back up
>   - there are often some missing documents at the end
> While it's possible that the simulated killing of the node in the unit test 
> is imperfect, I haven't reproduced a failure if I comment out the split 
> command and just kill the leader.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9248) Change internal code names of postingsFormats to use 84 suffix

2020-02-24 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043766#comment-17043766
 ] 

David Smiley commented on LUCENE-9248:
--

WDYT [~jpountz]? Are all users of {{Lucene84PostingsWriter}} / Reader affected, 
thus nearly all formats?

If we do a 8.4.1 I think this should be released in such a bug-fix version.

In this issue I'd also like to update the Solr docs on the text tagger to 
suggest the FST format as more of a tip with a caveat.  And also add to the 
upgrade notes on Lucene & Solr sides.

> Change internal code names of postingsFormats to use 84 suffix
> --
>
> Key: LUCENE-9248
> URL: https://issues.apache.org/jira/browse/LUCENE-9248
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
>
> Some postings formats write the postings differently as of Lucene 8.4 due to 
> changes -- LUCENE-9027 and LUCENE-9116.  Blocktree was transitioned in a 
> backwards-compatible way but some (all?) others were not.  Consequently an 
> attempt of the new version to read an old index will fail due to some 
> non-obvious error.  I propose here using a simple version suffix on these 
> postings formats like "84" (thus "FST84" as one example).  I see some already 
> use a suffix but were not bumped for 8.4.  This is a really simple change and 
> doesn't address the problem of us not noticing future needs to version bump.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] danmuzi commented on issue #1287: LUCENE-8954: refactor Nori analyzer

2020-02-24 Thread GitBox

danmuzi commented on issue #1287: LUCENE-8954: refactor Nori analyzer
URL: https://github.com/apache/lucene-solr/pull/1287#issuecomment-590491692
 
 
   The previous PR(https://github.com/apache/lucene-solr/pull/1276) contains 
the lint error about unused import statement.
   So I reverted it on https://github.com/apache/lucene-solr/pull/1285.
   Sorry to make you confused. @jimczi 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] danmuzi opened a new pull request #1287: LUCENE-8954: refactor Nori analyzer

2020-02-24 Thread GitBox

danmuzi opened a new pull request #1287: LUCENE-8954: refactor Nori analyzer
URL: https://github.com/apache/lucene-solr/pull/1287
 
 
   LUCENE-8954 is an issue created in August last year.
   (https://issues.apache.org/jira/browse/LUCENE-8954)
   The patch is already pushed in master branch. (#839)
   But I forgot to put it in branch_8x.
   So this PR is for it.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-9248) Change internal code names of postingsFormats to use 84 suffix

2020-02-24 Thread David Smiley (Jira)

David Smiley created LUCENE-9248:


 Summary: Change internal code names of postingsFormats to use 84 
suffix
 Key: LUCENE-9248
 URL: https://issues.apache.org/jira/browse/LUCENE-9248
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Reporter: David Smiley
Assignee: David Smiley


Some postings formats write the postings differently as of Lucene 8.4 due to 
changes -- LUCENE-9027 and LUCENE-9116.  Blocktree was transitioned in a 
backwards-compatible way but some (all?) others were not.  Consequently an 
attempt of the new version to read an old index will fail due to some 
non-obvious error.  I propose here using a simple version suffix on these 
postings formats like "84" (thus "FST84" as one example).  I see some already 
use a suffix but were not bumped for 8.4.  This is a really simple change and 
doesn't address the problem of us not noticing future needs to version bump.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] cpoerschke opened a new pull request #1286: SOLR-14279: remove CSVStrategy's deprecated setters

2020-02-24 Thread GitBox

cpoerschke opened a new pull request #1286: SOLR-14279: remove CSVStrategy's 
deprecated setters
URL: https://github.com/apache/lucene-solr/pull/1286
 
 
   https://issues.apache.org/jira/browse/SOLR-14279


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (SOLR-14279) remove CSVStrategy's deprecated setters

2020-02-24 Thread Christine Poerschke (Jira)

Christine Poerschke created SOLR-14279:
--

 Summary: remove CSVStrategy's deprecated setters
 Key: SOLR-14279
 URL: https://issues.apache.org/jira/browse/SOLR-14279
 Project: Solr
  Issue Type: Task
Reporter: Christine Poerschke


[https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.4.1/solr/core/src/java/org/apache/solr/internal/csv/CSVStrategy.java#L117]

one possible approach:
* change remaining callers to not use the deprecated setters
* remove setters
* make members
* final remove deprecated {{ImmutableCSVStrategy}} class



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9171) Synonyms Boost by Payload

2020-02-24 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043737#comment-17043737
 ] 

David Smiley commented on LUCENE-9171:
--

Alan, I think you forgot CHANGES.txt entries. Please ensure you add suitable 
entries in _both_ Lucene's and Solr's CHANGES.txt.  Personally I would have 
committed the work under this Lucene issue and not Solr, but it's debatable I 
suppose.  Also, please add "@lucene.experimental" to some of QueryBuilder's 
methods since we want the freedom to change this API at minor release 
boundaries.

> Synonyms Boost by Payload
> -
>
> Key: LUCENE-9171
> URL: https://issues.apache.org/jira/browse/LUCENE-9171
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/queryparser
>Reporter: Alessandro Benedetti
>Priority: Major
> Fix For: 8.5
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I have been working in the additional capability of boosting queries by terms 
> payload through a parameter to enable it in Lucene Query Builder.
> This has been done targeting the Synonyms Query.
> It is parametric, so it meant to see no difference unless the feature is 
> enabled.
> Solr has its bits to comply thorugh its SynonymsQueryStyles



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9234) Keep write support for old codecs?

2020-02-24 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043730#comment-17043730
 ] 

David Smiley commented on LUCENE-9234:
--

I tend to agree with Rob.  Distributed systems on top of Lucene should be able 
to cope with the status quo, and this may mean more work for replica placement 
to consider the version if this wasn't thought of in the past.  And a truly 
big/hard-core user could do some relatively basic Lucene re-packaging to ship 
the previous version if they were sufficiently motivated to care.  Not all big 
search users would even care about this since a re-index or backup/restore may 
be feasible (it is where I work).

> Keep write support for old codecs?
> --
>
> Key: LUCENE-9234
> URL: https://issues.apache.org/jira/browse/LUCENE-9234
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Adrien Grand
>Priority: Minor
>
> Currenty we maintain read/write support for the latest codec in lucene/core, 
> and read-only support for codecs of previous versions (up to \{N-1\}.0}) in  
> lucene/backward-codecs. We often keep write support in test-framework for 
> testing purposes only.
> This raises challenges for Elasticsearch with regard to rolling upgrades: we 
> have some users who index very large amounts of data on clusters that are 
> quite large, so that rolling upgrades take significant time. Meanwhile, 
> several indices may be created.
> Allocating indices when the cluster has nodes of different versions requires 
> care as Lucene indices created on nodes with a newer version cannot be read 
> by the nodes running the older version. It is possible to force primary 
> replicas to be allocated on the older nodes, but this brings other problems 
> like availability, uneven disk usage across nodes, or moving a lot of data 
> around.
> If Lucene could write data using the minimum version that exists in the 
> cluster, this would avoid this problem as the written data could be read by 
> any node of the cluster. I understand this change would not come for free, 
> especially when it comes to testing as we'd need to make sure that older 
> Lucene versions can read indices created by this "compatibility mode".
> I'd be curious to understand whether this is a problem for Solr too, if not 
> how this problem is being handled, and maybe whether there are other problems 
> that you have encountered that would also benefit from the ability to write 
> data with an older format.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (SOLR-14272) Remove autoReplicaFailoverBadNodeExpiration and autoReplicaFailoverWorkLoopDelay for 9.0 as it was deprecated in 7.1

2020-02-24 Thread Anshum Gupta (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anshum Gupta resolved SOLR-14272.
-
Resolution: Fixed

> Remove autoReplicaFailoverBadNodeExpiration and 
> autoReplicaFailoverWorkLoopDelay for 9.0 as it was deprecated in 7.1
> 
>
> Key: SOLR-14272
> URL: https://issues.apache.org/jira/browse/SOLR-14272
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Anshum Gupta
>Assignee: Anshum Gupta
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> 'autoReplicaFailoverBadNodeExpiration' and 'autoReplicaFailoverWorkLoopDelay' 
> parameters were deprecated in 7.1 after the 'autoAddReplicas' feature was 
> ported to autoscaling.
> We should remove them from the code to get rid of the cruft.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14272) Remove autoReplicaFailoverBadNodeExpiration and autoReplicaFailoverWorkLoopDelay for 9.0 as it was deprecated in 7.1

2020-02-24 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043712#comment-17043712
 ] 

ASF subversion and git services commented on SOLR-14272:


Commit 7ba9d4d756e50680b88ee10af2f13a8791588fe4 in lucene-solr's branch 
refs/heads/master from Anshum Gupta
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=7ba9d4d ]

SOLR-14272: Remove autoReplicaFailoverBadNodeExpiration and 
autoReplicaFailoverWorkLoopDelay for 9.0 as it was deprecated in 7.1 (#1269)

* SOLR-14272: Remove autoReplicaFailoverBadNodeExpiration and 
autoReplicaFailoverWorkLoopDelay for 9.0 as it was deprecated in 7.1


> Remove autoReplicaFailoverBadNodeExpiration and 
> autoReplicaFailoverWorkLoopDelay for 9.0 as it was deprecated in 7.1
> 
>
> Key: SOLR-14272
> URL: https://issues.apache.org/jira/browse/SOLR-14272
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Anshum Gupta
>Assignee: Anshum Gupta
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> 'autoReplicaFailoverBadNodeExpiration' and 'autoReplicaFailoverWorkLoopDelay' 
> parameters were deprecated in 7.1 after the 'autoAddReplicas' feature was 
> ported to autoscaling.
> We should remove them from the code to get rid of the cruft.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14272) Remove autoReplicaFailoverBadNodeExpiration and autoReplicaFailoverWorkLoopDelay for 9.0 as it was deprecated in 7.1

2020-02-24 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043713#comment-17043713
 ] 

ASF subversion and git services commented on SOLR-14272:


Commit 7ba9d4d756e50680b88ee10af2f13a8791588fe4 in lucene-solr's branch 
refs/heads/master from Anshum Gupta
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=7ba9d4d ]

SOLR-14272: Remove autoReplicaFailoverBadNodeExpiration and 
autoReplicaFailoverWorkLoopDelay for 9.0 as it was deprecated in 7.1 (#1269)

* SOLR-14272: Remove autoReplicaFailoverBadNodeExpiration and 
autoReplicaFailoverWorkLoopDelay for 9.0 as it was deprecated in 7.1


> Remove autoReplicaFailoverBadNodeExpiration and 
> autoReplicaFailoverWorkLoopDelay for 9.0 as it was deprecated in 7.1
> 
>
> Key: SOLR-14272
> URL: https://issues.apache.org/jira/browse/SOLR-14272
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Anshum Gupta
>Assignee: Anshum Gupta
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> 'autoReplicaFailoverBadNodeExpiration' and 'autoReplicaFailoverWorkLoopDelay' 
> parameters were deprecated in 7.1 after the 'autoAddReplicas' feature was 
> ported to autoscaling.
> We should remove them from the code to get rid of the cruft.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] anshumg merged pull request #1269: SOLR-14272: Remove autoReplicaFailoverBadNodeExpiration and autoReplicaFailoverWorkLoopDelay for 9.0 as it was deprecated in 7.1

2020-02-24 Thread GitBox

anshumg merged pull request #1269: SOLR-14272: Remove 
autoReplicaFailoverBadNodeExpiration and autoReplicaFailoverWorkLoopDelay for 
9.0 as it was deprecated in 7.1
URL: https://github.com/apache/lucene-solr/pull/1269
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Reopened] (LUCENE-8954) Refactor Nori(Korean) Analyzer

2020-02-24 Thread Namgyu Kim (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namgyu Kim reopened LUCENE-8954:


There is a lint error in patch.
Sorry for confusing.

> Refactor Nori(Korean) Analyzer
> --
>
> Key: LUCENE-8954
> URL: https://issues.apache.org/jira/browse/LUCENE-8954
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Namgyu Kim
>Assignee: Namgyu Kim
>Priority: Minor
> Fix For: 8.x, master (9.0)
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> There are many codes that can be refactored in the Nori analyzer.
> (whitespace, wrong type casting, unnecessary throws, C-style array, ...)
> I think it's good to proceed if we can.
> It has nothing to do with the actual working of Nori.
> I'll just remove unnecessary code and make the code simple.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8954) Refactor Nori(Korean) Analyzer

2020-02-24 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043710#comment-17043710
 ] 

ASF subversion and git services commented on LUCENE-8954:
-

Commit 80372341426344f7d89a36adefbd178fb0e2548a in lucene-solr's branch 
refs/heads/branch_8x from Namgyu Kim
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8037234 ]

Revert "LUCENE-8954: refactor Nori analyzer"

This reverts commit 29b7e1a95c3a8857ef8ce05c0679c66e04b1f3e0.

> Refactor Nori(Korean) Analyzer
> --
>
> Key: LUCENE-8954
> URL: https://issues.apache.org/jira/browse/LUCENE-8954
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Namgyu Kim
>Assignee: Namgyu Kim
>Priority: Minor
> Fix For: 8.x, master (9.0)
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> There are many codes that can be refactored in the Nori analyzer.
> (whitespace, wrong type casting, unnecessary throws, C-style array, ...)
> I think it's good to proceed if we can.
> It has nothing to do with the actual working of Nori.
> I'll just remove unnecessary code and make the code simple.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] danmuzi merged pull request #1285: Revert "LUCENE-8954: refactor Nori analyzer"

2020-02-24 Thread GitBox

danmuzi merged pull request #1285: Revert "LUCENE-8954: refactor Nori analyzer"
URL: https://github.com/apache/lucene-solr/pull/1285
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] danmuzi opened a new pull request #1285: Revert "LUCENE-8954: refactor Nori analyzer"

2020-02-24 Thread GitBox

danmuzi opened a new pull request #1285: Revert "LUCENE-8954: refactor Nori 
analyzer"
URL: https://github.com/apache/lucene-solr/pull/1285
 
 
   There is a lint error in patch.
   Sorry for confusing.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8954) Refactor Nori(Korean) Analyzer

2020-02-24 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043706#comment-17043706
 ] 

ASF subversion and git services commented on LUCENE-8954:
-

Commit 904ba2540b3c7b9a1d19f70941bac62d822b2926 in lucene-solr's branch 
refs/heads/revert-1276-branch_8x from Namgyu Kim
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=904ba25 ]

Revert "LUCENE-8954: refactor Nori analyzer"

This reverts commit 29b7e1a95c3a8857ef8ce05c0679c66e04b1f3e0.


> Refactor Nori(Korean) Analyzer
> --
>
> Key: LUCENE-8954
> URL: https://issues.apache.org/jira/browse/LUCENE-8954
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Namgyu Kim
>Assignee: Namgyu Kim
>Priority: Minor
> Fix For: 8.x, master (9.0)
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> There are many codes that can be refactored in the Nori analyzer.
> (whitespace, wrong type casting, unnecessary throws, C-style array, ...)
> I think it's good to proceed if we can.
> It has nothing to do with the actual working of Nori.
> I'll just remove unnecessary code and make the code simple.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14274) Multiple CoreContainers will register the same JVM Metrics

2020-02-24 Thread Mike Drob (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043684#comment-17043684
 ] 

Mike Drob commented on SOLR-14274:
--

I think the behavior that we want varies with what kind of metric we are 
registering. If it is a core-specific metric then replacing makes sense. If it 
is a JVM or OS metric, then replacing might not make as much sense.

I'm looking at this with replacing the idea of a binary force flag with an enum 
dictating what to do in case of conflict - replace, skip, or fail.

> Multiple CoreContainers will register the same JVM Metrics
> --
>
> Key: SOLR-14274
> URL: https://issues.apache.org/jira/browse/SOLR-14274
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Mike Drob
>Priority: Major
>
> When running multiple CoreContainer in the same JVM, either because we called 
> {{SolrCloudTestCase.configureCluster(int n)}} with {{n > 1}} or because we 
> have multiple tests running in the same JVM in succession, we will have 
> contention on the shared JVM {{metricsRegistry}} as they each replace the 
> existing metrics with their own. Further, with multiple nodes at the same 
> time, some of these metrics will be incorrect anyway, since they will only 
> reflect a single core container. Others will be fine since I think they are 
> reading system-level information so it doesn't matter where it comes from.
> I think this is a test-only issue, since the circumstances where somebody is 
> running multiple core containers in a single JVM in production should be 
> rare, but maybe there are edge cases affected with EmbeddedSolrServer and 
> MapReduce or Spark, or other unusual deployment patterns.
> Removing the metrics registration entirely can speed up 
> {{configureCluster(100).build()}} on my machine from 2 minutes to 30 seconds, 
> so I'm optimistic that there can be gains here without sacrificing the 
> feature entirely.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] bruno-roustant commented on a change in pull request #1281: LUCENE-9245: Optimize AutomatonTermsEnum memory and automaton Operations.getCommonPrefixBytesRef.

2020-02-24 Thread GitBox

bruno-roustant commented on a change in pull request #1281: LUCENE-9245: 
Optimize AutomatonTermsEnum memory and automaton 
Operations.getCommonPrefixBytesRef.
URL: https://github.com/apache/lucene-solr/pull/1281#discussion_r383377648
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/util/automaton/Operations.java
 ##
 @@ -1091,25 +1091,33 @@ public static String getCommonPrefix(Automaton a) {
* @return common prefix, which can be an empty (length 0) BytesRef (never 
null)
*/
   public static BytesRef getCommonPrefixBytesRef(Automaton a) {
 
 Review comment:
   Ok, removed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13411) CompositeIdRouter calculates wrong route hash if atomic update is used for route.field

2020-02-24 Thread Lucene/Solr QA (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043654#comment-17043654
 ] 

Lucene/Solr QA commented on SOLR-13411:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} | 
{color:green}  1m  4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Check forbidden APIs {color} | 
{color:green}  1m  4s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} | 
{color:green}  1m  4s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 49m 
29s{color} | {color:green} core in the patch passed. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 53m 25s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | SOLR-13411 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12994329/SOLR-13411.patch |
| Optional Tests |  compile  javac  unit  ratsources  checkforbiddenapis  
validatesourcepatterns  |
| uname | Linux lucene1-us-west 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 
10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
 |
| git revision | master / b4c2e279a94 |
| ant | version: Apache Ant(TM) version 1.10.5 compiled on March 28 2019 |
| Default Java | LTS |
|  Test Results | 
https://builds.apache.org/job/PreCommit-SOLR-Build/687/testReport/ |
| modules | C: solr/core U: solr/core |
| Console output | 
https://builds.apache.org/job/PreCommit-SOLR-Build/687/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> CompositeIdRouter calculates wrong route hash if atomic update is used for 
> route.field
> --
>
> Key: SOLR-13411
> URL: https://issues.apache.org/jira/browse/SOLR-13411
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 7.5
>Reporter: Niko Himanen
>Assignee: Mikhail Khludnev
>Priority: Minor
> Attachments: SOLR-13411.patch, SOLR-13411.patch
>
>
> If collection is created with router.field -parameter to define some other 
> field than uniqueField as route field and document update comes containing 
> route field updated using atomic update syntax (for example set=123), hash 
> for document routing is calculated from "set=123" and not from 123 which is 
> the real value which may lead into routing document to wrong shard.
>  
> This happens in CompositeIdRouter#sliceHash, where field value is used as is 
> for hash calculation.
>  
> I think there are two possible solutions to fix this:
> a) Allow use of atomic update also for route.field, but use real value 
> instead of atomic update syntax to route document into right shard.
> b) Deny atomic update for route.field and throw exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] bruno-roustant commented on a change in pull request #1281: LUCENE-9245: Optimize AutomatonTermsEnum memory and automaton Operations.getCommonPrefixBytesRef.

2020-02-24 Thread GitBox

bruno-roustant commented on a change in pull request #1281: LUCENE-9245: 
Optimize AutomatonTermsEnum memory and automaton 
Operations.getCommonPrefixBytesRef.
URL: https://github.com/apache/lucene-solr/pull/1281#discussion_r383351481
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/index/AutomatonTermsEnum.java
 ##
 @@ -54,18 +56,20 @@
   private final boolean finite;
   // array of sorted transitions for each state, indexed by state number
   private final Automaton automaton;
-  // for path tracking: each long records gen when we last
+  // for path tracking: each short records gen when we last
   // visited the state; we use gens to avoid having to clear
-  private final long[] visited;
 
 Review comment:
   Good catch.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] jpountz opened a new pull request #1284: LUCENE-9247: Add tests for `checkIntegrity`.

2020-02-24 Thread GitBox

jpountz opened a new pull request #1284: LUCENE-9247: Add tests for 
`checkIntegrity`.
URL: https://github.com/apache/lucene-solr/pull/1284
 
 
   This adds a test to `BaseIndexFileFormatTestCase` that the combination
   of opening a reader and calling `checkIntegrity` on it reads all bytes
   of all files (including index headers and footers). This would help
   detect most cases when `checkIntegrity` is not implemented correctly.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-9247) Test that checkIntegrity doesn't miss any file

2020-02-24 Thread Adrien Grand (Jira)

Adrien Grand created LUCENE-9247:


 Summary: Test that checkIntegrity doesn't miss any file
 Key: LUCENE-9247
 URL: https://issues.apache.org/jira/browse/LUCENE-9247
 Project: Lucene - Core
  Issue Type: Test
Reporter: Adrien Grand


An Elasticsearch test found out that CompressingStoredFieldsReader neither 
checks the integrity of its index at open time nor when checkIntegrity is 
called. We should have a test that detects this kind of bug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-8954) Refactor Nori(Korean) Analyzer

2020-02-24 Thread Namgyu Kim (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namgyu Kim resolved LUCENE-8954.

Resolution: Fixed

> Refactor Nori(Korean) Analyzer
> --
>
> Key: LUCENE-8954
> URL: https://issues.apache.org/jira/browse/LUCENE-8954
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Namgyu Kim
>Assignee: Namgyu Kim
>Priority: Minor
> Fix For: 8.x, master (9.0)
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> There are many codes that can be refactored in the Nori analyzer.
> (whitespace, wrong type casting, unnecessary throws, C-style array, ...)
> I think it's good to proceed if we can.
> It has nothing to do with the actual working of Nori.
> I'll just remove unnecessary code and make the code simple.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-8954) Refactor Nori(Korean) Analyzer

2020-02-24 Thread Namgyu Kim (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namgyu Kim updated LUCENE-8954:
---
Fix Version/s: master (9.0)
   8.x

> Refactor Nori(Korean) Analyzer
> --
>
> Key: LUCENE-8954
> URL: https://issues.apache.org/jira/browse/LUCENE-8954
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Namgyu Kim
>Assignee: Namgyu Kim
>Priority: Minor
> Fix For: 8.x, master (9.0)
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> There are many codes that can be refactored in the Nori analyzer.
> (whitespace, wrong type casting, unnecessary throws, C-style array, ...)
> I think it's good to proceed if we can.
> It has nothing to do with the actual working of Nori.
> I'll just remove unnecessary code and make the code simple.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8954) Refactor Nori(Korean) Analyzer

2020-02-24 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043592#comment-17043592
 ] 

ASF subversion and git services commented on LUCENE-8954:
-

Commit 29b7e1a95c3a8857ef8ce05c0679c66e04b1f3e0 in lucene-solr's branch 
refs/heads/branch_8x from Namgyu Kim
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=29b7e1a ]

LUCENE-8954: refactor Nori analyzer

Signed-off-by: Namgyu Kim 

> Refactor Nori(Korean) Analyzer
> --
>
> Key: LUCENE-8954
> URL: https://issues.apache.org/jira/browse/LUCENE-8954
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Namgyu Kim
>Assignee: Namgyu Kim
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> There are many codes that can be refactored in the Nori analyzer.
> (whitespace, wrong type casting, unnecessary throws, C-style array, ...)
> I think it's good to proceed if we can.
> It has nothing to do with the actual working of Nori.
> I'll just remove unnecessary code and make the code simple.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] danmuzi merged pull request #1276: LUCENE-8954: refactor Nori analyzer

2020-02-24 Thread GitBox

danmuzi merged pull request #1276: LUCENE-8954: refactor Nori analyzer
URL: https://github.com/apache/lucene-solr/pull/1276
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] danmuzi commented on issue #1276: LUCENE-8954: refactor Nori analyzer

2020-02-24 Thread GitBox

danmuzi commented on issue #1276: LUCENE-8954: refactor Nori analyzer
URL: https://github.com/apache/lucene-solr/pull/1276#issuecomment-590368344
 
 
   Thanks for checking, @jimczi
   I'll merge this commit :D


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] rmuir commented on a change in pull request #1281: LUCENE-9245: Optimize AutomatonTermsEnum memory and automaton Operations.getCommonPrefixBytesRef.

2020-02-24 Thread GitBox

rmuir commented on a change in pull request #1281: LUCENE-9245: Optimize 
AutomatonTermsEnum memory and automaton Operations.getCommonPrefixBytesRef.
URL: https://github.com/apache/lucene-solr/pull/1281#discussion_r383312415
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/util/automaton/Operations.java
 ##
 @@ -1091,25 +1091,33 @@ public static String getCommonPrefix(Automaton a) {
* @return common prefix, which can be an empty (length 0) BytesRef (never 
null)
*/
   public static BytesRef getCommonPrefixBytesRef(Automaton a) {
 
 Review comment:
   I don't think this is the right tradeoff. It makes the code more complex, 
saving the cost of creating a few simple ordinary objects. I hate to say i 
don't trust your benchmark, but I don't trust your benchmark represents a 
typical case here. We should keep this code simple, there are other ways it can 
be improved.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-13411) CompositeIdRouter calculates wrong route hash if atomic update is used for route.field

2020-02-24 Thread Dr Oleg Savrasov (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-13411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dr Oleg Savrasov updated SOLR-13411:

Attachment: SOLR-13411.patch

> CompositeIdRouter calculates wrong route hash if atomic update is used for 
> route.field
> --
>
> Key: SOLR-13411
> URL: https://issues.apache.org/jira/browse/SOLR-13411
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 7.5
>Reporter: Niko Himanen
>Assignee: Mikhail Khludnev
>Priority: Minor
> Attachments: SOLR-13411.patch, SOLR-13411.patch
>
>
> If collection is created with router.field -parameter to define some other 
> field than uniqueField as route field and document update comes containing 
> route field updated using atomic update syntax (for example set=123), hash 
> for document routing is calculated from "set=123" and not from 123 which is 
> the real value which may lead into routing document to wrong shard.
>  
> This happens in CompositeIdRouter#sliceHash, where field value is used as is 
> for hash calculation.
>  
> I think there are two possible solutions to fix this:
> a) Allow use of atomic update also for route.field, but use real value 
> instead of atomic update syntax to route document into right shard.
> b) Deny atomic update for route.field and throw exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13411) CompositeIdRouter calculates wrong route hash if atomic update is used for route.field

2020-02-24 Thread Dr Oleg Savrasov (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043562#comment-17043562
 ] 

Dr Oleg Savrasov commented on SOLR-13411:
-

Minor fix for failed test.

> CompositeIdRouter calculates wrong route hash if atomic update is used for 
> route.field
> --
>
> Key: SOLR-13411
> URL: https://issues.apache.org/jira/browse/SOLR-13411
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 7.5
>Reporter: Niko Himanen
>Assignee: Mikhail Khludnev
>Priority: Minor
> Attachments: SOLR-13411.patch, SOLR-13411.patch
>
>
> If collection is created with router.field -parameter to define some other 
> field than uniqueField as route field and document update comes containing 
> route field updated using atomic update syntax (for example set=123), hash 
> for document routing is calculated from "set=123" and not from 123 which is 
> the real value which may lead into routing document to wrong shard.
>  
> This happens in CompositeIdRouter#sliceHash, where field value is used as is 
> for hash calculation.
>  
> I think there are two possible solutions to fix this:
> a) Allow use of atomic update also for route.field, but use real value 
> instead of atomic update syntax to route document into right shard.
> b) Deny atomic update for route.field and throw exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9212) Intervals.multiterm() should take a CompiledAutomaton

2020-02-24 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043554#comment-17043554
 ] 

ASF subversion and git services commented on LUCENE-9212:
-

Commit b4c2e279a94988c26b61d4fb95ec208081f0448a in lucene-solr's branch 
refs/heads/master from Alan Woodward
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b4c2e27 ]

LUCENE-9212: Fix precommit


> Intervals.multiterm() should take a CompiledAutomaton
> -
>
> Key: LUCENE-9212
> URL: https://issues.apache.org/jira/browse/LUCENE-9212
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 8.5
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> LUCENE-9028 added a `multiterm` factory method for intervals that accepts an 
> arbitrary Automaton, and converts it internally into a CompiledAutomaton.  
> This isn't necessarily correct behaviour, however, because Automatons can be 
> defined in both binary and unicode space, and there's no way of telling which 
> it is when it comes to compiling them.  In particular, for automatons 
> produced by FuzzyTermsEnum, we need to convert them to unicode before 
> compilation.
> The `multiterm` factory should just take `CompiledAutomaton` directly, and we 
> should deprecate the methods that take `Automaton` and remove in master.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] rmuir commented on a change in pull request #1281: LUCENE-9245: Optimize AutomatonTermsEnum memory and automaton Operations.getCommonPrefixBytesRef.

2020-02-24 Thread GitBox

rmuir commented on a change in pull request #1281: LUCENE-9245: Optimize 
AutomatonTermsEnum memory and automaton Operations.getCommonPrefixBytesRef.
URL: https://github.com/apache/lucene-solr/pull/1281#discussion_r383289307
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/index/AutomatonTermsEnum.java
 ##
 @@ -54,18 +56,20 @@
   private final boolean finite;
   // array of sorted transitions for each state, indexed by state number
   private final Automaton automaton;
-  // for path tracking: each long records gen when we last
+  // for path tracking: each short records gen when we last
   // visited the state; we use gens to avoid having to clear
-  private final long[] visited;
 
 Review comment:
   visited-state-tracking is only needed when the automaton accepts an infinite 
language. We use it for loop detection. I think before we get too fancy with 
how we clear it, we should first stop being stupid about it?
   
   So it is wasteful that we do this stuff when `finite == true` (example: 
fuzzy query) because we will never even look for a loop. its just that the 
current code unconditionally records states that it visited.
   
   I think first, in the ctor when `finite == true`, `visited[]` can be 
initialized to `null` or `new long[0]` or something, and we change this line:
   ```
   visited[state] = curGen;
   ```
   to something like this:
   ```
   if (!finite)
 visited[state] = curGen;
   ```
   
   I agree we should separately avoid tracking 64 bits per state when only 1 is 
needed. But before optimizing the storage, first lets avoid doing this stuff at 
all for ones like complex fuzzy queries?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] jpountz opened a new pull request #1283: LUCENE-9246: Remove `dOff` argument from `LZ4#decompress`.

2020-02-24 Thread GitBox

jpountz opened a new pull request #1283: LUCENE-9246: Remove `dOff` argument 
from `LZ4#decompress`.
URL: https://github.com/apache/lucene-solr/pull/1283
 
 
   It is always set to 0 at call sites.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-9246) Remove "destOff" argument from LZ4#decompress

2020-02-24 Thread Adrien Grand (Jira)

Adrien Grand created LUCENE-9246:


 Summary: Remove "destOff" argument from LZ4#decompress
 Key: LUCENE-9246
 URL: https://issues.apache.org/jira/browse/LUCENE-9246
 Project: Lucene - Core
  Issue Type: Task
Reporter: Adrien Grand


All call sites set it to 0, and  it appears to not be handled properly when set 
to a different value than 0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] rmuir commented on a change in pull request #1281: LUCENE-9245: Optimize AutomatonTermsEnum memory and automaton Operations.getCommonPrefixBytesRef.

2020-02-24 Thread GitBox

rmuir commented on a change in pull request #1281: LUCENE-9245: Optimize 
AutomatonTermsEnum memory and automaton Operations.getCommonPrefixBytesRef.
URL: https://github.com/apache/lucene-solr/pull/1281#discussion_r383282576
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/index/AutomatonTermsEnum.java
 ##
 @@ -188,7 +188,11 @@ private boolean nextString() {
 savedStates.setIntAt(0, 0);
 
 while (true) {
-  curGen++;
+  if (++curGen == 0) {
+// Clear the visited states every time curGen overflows (so very 
infrequently to not impact average perf).
+curGen++;
 
 Review comment:
   Can we remove this unnecessary increment. Also i'd change the comment from 
`overflows` to `wraps`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] bruno-roustant commented on a change in pull request #1281: LUCENE-9245: Optimize AutomatonTermsEnum memory and automaton Operations.getCommonPrefixBytesRef.

2020-02-24 Thread GitBox

bruno-roustant commented on a change in pull request #1281: LUCENE-9245: 
Optimize AutomatonTermsEnum memory and automaton 
Operations.getCommonPrefixBytesRef.
URL: https://github.com/apache/lucene-solr/pull/1281#discussion_r383260144
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/index/AutomatonTermsEnum.java
 ##
 @@ -160,17 +161,18 @@ private void setLinear(int position) {
 if (maxInterval != 0xff)
   maxInterval++;
 int length = position + 1; /* position + maxTransition */
-if (linearUpperBound.bytes.length < length)
-  linearUpperBound.bytes = new byte[length];
+if (linearUpperBound == null) {
+  linearUpperBound = new BytesRef(ArrayUtil.oversize(Math.max(length, 16), 
Byte.BYTES));
+} else if (linearUpperBound.bytes.length < length) {
+  linearUpperBound.bytes = new byte[ArrayUtil.oversize(length, 
Byte.BYTES)];
 
 Review comment:
   +1 thanks


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] rmuir commented on a change in pull request #1281: LUCENE-9245: Optimize AutomatonTermsEnum memory and automaton Operations.getCommonPrefixBytesRef.

2020-02-24 Thread GitBox

rmuir commented on a change in pull request #1281: LUCENE-9245: Optimize 
AutomatonTermsEnum memory and automaton Operations.getCommonPrefixBytesRef.
URL: https://github.com/apache/lucene-solr/pull/1281#discussion_r383259436
 
 

 ##
 File path: lucene/core/src/java/org/apache/lucene/index/AutomatonTermsEnum.java
 ##
 @@ -160,17 +161,18 @@ private void setLinear(int position) {
 if (maxInterval != 0xff)
   maxInterval++;
 int length = position + 1; /* position + maxTransition */
-if (linearUpperBound.bytes.length < length)
-  linearUpperBound.bytes = new byte[length];
+if (linearUpperBound == null) {
+  linearUpperBound = new BytesRef(ArrayUtil.oversize(Math.max(length, 16), 
Byte.BYTES));
+} else if (linearUpperBound.bytes.length < length) {
+  linearUpperBound.bytes = new byte[ArrayUtil.oversize(length, 
Byte.BYTES)];
 
 Review comment:
   I don't think we should have the additional null check path here. It is not 
worth it to save 10 bytes :). Let's make linearUpperBound final again. Better 
to initialize it to `new BytesRef()` if you really want to save 10 bytes for 
the case that its not used, does not require additional branches in the code, 
it will just get extended by the length check.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9241) fix most memory-hungry tests

2020-02-24 Thread Bruno Roustant (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043486#comment-17043486
 ] 

Bruno Roustant commented on LUCENE-9241:


As expected I saw no noticeable impact in the luceneutil benchmarks.

> fix most memory-hungry tests
> 
>
> Key: LUCENE-9241
> URL: https://issues.apache.org/jira/browse/LUCENE-9241
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9241.patch
>
>
> Currently each test jvm has Xmx of 512M. With a modern macbook pro this is 
> 4GB which is pretty crazy.
> On the other hand, if we fix a few edge cases, tests can work with lower 
> heaps such as 128M. This can save many gigabytes (also it finds interesting 
> memory waste/issues).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13411) CompositeIdRouter calculates wrong route hash if atomic update is used for route.field

2020-02-24 Thread Lucene/Solr QA (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043465#comment-17043465
 ] 

Lucene/Solr QA commented on SOLR-13411:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 1 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
4s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} | 
{color:green}  1m  2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Check forbidden APIs {color} | 
{color:green}  1m  2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} | 
{color:green}  1m  2s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 48m 53s{color} 
| {color:red} core in the patch failed. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 52m 51s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | solr.update.TestUpdate |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | SOLR-13411 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12994301/SOLR-13411.patch |
| Optional Tests |  compile  javac  unit  ratsources  checkforbiddenapis  
validatesourcepatterns  |
| uname | Linux lucene1-us-west 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 
10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
 |
| git revision | master / 19fe1eee68d |
| ant | version: Apache Ant(TM) version 1.10.5 compiled on March 28 2019 |
| Default Java | LTS |
| unit | 
https://builds.apache.org/job/PreCommit-SOLR-Build/686/artifact/out/patch-unit-solr_core.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-SOLR-Build/686/testReport/ |
| modules | C: solr/core U: solr/core |
| Console output | 
https://builds.apache.org/job/PreCommit-SOLR-Build/686/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> CompositeIdRouter calculates wrong route hash if atomic update is used for 
> route.field
> --
>
> Key: SOLR-13411
> URL: https://issues.apache.org/jira/browse/SOLR-13411
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 7.5
>Reporter: Niko Himanen
>Assignee: Mikhail Khludnev
>Priority: Minor
> Attachments: SOLR-13411.patch
>
>
> If collection is created with router.field -parameter to define some other 
> field than uniqueField as route field and document update comes containing 
> route field updated using atomic update syntax (for example set=123), hash 
> for document routing is calculated from "set=123" and not from 123 which is 
> the real value which may lead into routing document to wrong shard.
>  
> This happens in CompositeIdRouter#sliceHash, where field value is used as is 
> for hash calculation.
>  
> I think there are two possible solutions to fix this:
> a) Allow use of atomic update also for route.field, but use real value 
> instead of atomic update syntax to route document into right shard.
> b) Deny atomic update for route.field and throw exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Assigned] (SOLR-13411) CompositeIdRouter calculates wrong route hash if atomic update is used for route.field

2020-02-24 Thread Mikhail Khludnev (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-13411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev reassigned SOLR-13411:
---

Assignee: Mikhail Khludnev

> CompositeIdRouter calculates wrong route hash if atomic update is used for 
> route.field
> --
>
> Key: SOLR-13411
> URL: https://issues.apache.org/jira/browse/SOLR-13411
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 7.5
>Reporter: Niko Himanen
>Assignee: Mikhail Khludnev
>Priority: Minor
> Attachments: SOLR-13411.patch
>
>
> If collection is created with router.field -parameter to define some other 
> field than uniqueField as route field and document update comes containing 
> route field updated using atomic update syntax (for example set=123), hash 
> for document routing is calculated from "set=123" and not from 123 which is 
> the real value which may lead into routing document to wrong shard.
>  
> This happens in CompositeIdRouter#sliceHash, where field value is used as is 
> for hash calculation.
>  
> I think there are two possible solutions to fix this:
> a) Allow use of atomic update also for route.field, but use real value 
> instead of atomic update syntax to route document into right shard.
> b) Deny atomic update for route.field and throw exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-9212) Intervals.multiterm() should take a CompiledAutomaton

2020-02-24 Thread Alan Woodward (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward resolved LUCENE-9212.
---
Fix Version/s: 8.5
   Resolution: Fixed

> Intervals.multiterm() should take a CompiledAutomaton
> -
>
> Key: LUCENE-9212
> URL: https://issues.apache.org/jira/browse/LUCENE-9212
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 8.5
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> LUCENE-9028 added a `multiterm` factory method for intervals that accepts an 
> arbitrary Automaton, and converts it internally into a CompiledAutomaton.  
> This isn't necessarily correct behaviour, however, because Automatons can be 
> defined in both binary and unicode space, and there's no way of telling which 
> it is when it comes to compiling them.  In particular, for automatons 
> produced by FuzzyTermsEnum, we need to convert them to unicode before 
> compilation.
> The `multiterm` factory should just take `CompiledAutomaton` directly, and we 
> should deprecate the methods that take `Automaton` and remove in master.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9212) Intervals.multiterm() should take a CompiledAutomaton

2020-02-24 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043427#comment-17043427
 ] 

ASF subversion and git services commented on LUCENE-9212:
-

Commit 19fe1eee68d83f73c8416b319bd1b38c6e73f053 in lucene-solr's branch 
refs/heads/master from Alan Woodward
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=19fe1ee ]

LUCENE-9212: Remove deprecated Intervals.multiterm() methods


> Intervals.multiterm() should take a CompiledAutomaton
> -
>
> Key: LUCENE-9212
> URL: https://issues.apache.org/jira/browse/LUCENE-9212
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> LUCENE-9028 added a `multiterm` factory method for intervals that accepts an 
> arbitrary Automaton, and converts it internally into a CompiledAutomaton.  
> This isn't necessarily correct behaviour, however, because Automatons can be 
> defined in both binary and unicode space, and there's no way of telling which 
> it is when it comes to compiling them.  In particular, for automatons 
> produced by FuzzyTermsEnum, we need to convert them to unicode before 
> compilation.
> The `multiterm` factory should just take `CompiledAutomaton` directly, and we 
> should deprecate the methods that take `Automaton` and remove in master.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9212) Intervals.multiterm() should take a CompiledAutomaton

2020-02-24 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043425#comment-17043425
 ] 

ASF subversion and git services commented on LUCENE-9212:
-

Commit 90028a7b935ad3205a8a6837cbb7ce1e9dbb6dff in lucene-solr's branch 
refs/heads/branch_8x from Alan Woodward
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=90028a7 ]

LUCENE-9212: Intervals.multiterm() should take CompiledAutomaton


> Intervals.multiterm() should take a CompiledAutomaton
> -
>
> Key: LUCENE-9212
> URL: https://issues.apache.org/jira/browse/LUCENE-9212
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> LUCENE-9028 added a `multiterm` factory method for intervals that accepts an 
> arbitrary Automaton, and converts it internally into a CompiledAutomaton.  
> This isn't necessarily correct behaviour, however, because Automatons can be 
> defined in both binary and unicode space, and there's no way of telling which 
> it is when it comes to compiling them.  In particular, for automatons 
> produced by FuzzyTermsEnum, we need to convert them to unicode before 
> compilation.
> The `multiterm` factory should just take `CompiledAutomaton` directly, and we 
> should deprecate the methods that take `Automaton` and remove in master.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9212) Intervals.multiterm() should take a CompiledAutomaton

2020-02-24 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043422#comment-17043422
 ] 

ASF subversion and git services commented on LUCENE-9212:
-

Commit ffb7cafe9351cd6cd5181bc06dd053d586f6d63f in lucene-solr's branch 
refs/heads/master from Alan Woodward
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ffb7caf ]

LUCENE-9212: Intervals.multiterm() should take CompiledAutomaton


> Intervals.multiterm() should take a CompiledAutomaton
> -
>
> Key: LUCENE-9212
> URL: https://issues.apache.org/jira/browse/LUCENE-9212
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> LUCENE-9028 added a `multiterm` factory method for intervals that accepts an 
> arbitrary Automaton, and converts it internally into a CompiledAutomaton.  
> This isn't necessarily correct behaviour, however, because Automatons can be 
> defined in both binary and unicode space, and there's no way of telling which 
> it is when it comes to compiling them.  In particular, for automatons 
> produced by FuzzyTermsEnum, we need to convert them to unicode before 
> compilation.
> The `multiterm` factory should just take `CompiledAutomaton` directly, and we 
> should deprecate the methods that take `Automaton` and remove in master.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] romseygeek commented on issue #1243: LUCENE-9212: Intervals.multiterm() should take CompiledAutomaton

2020-02-24 Thread GitBox

romseygeek commented on issue #1243: LUCENE-9212: Intervals.multiterm() should 
take CompiledAutomaton
URL: https://github.com/apache/lucene-solr/pull/1243#issuecomment-590270502
 
 
   Merged as ffb7cafe9351cd6cd5181bc06dd053d586f6d63f


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] romseygeek closed pull request #1243: LUCENE-9212: Intervals.multiterm() should take CompiledAutomaton

2020-02-24 Thread GitBox

romseygeek closed pull request #1243: LUCENE-9212: Intervals.multiterm() should 
take CompiledAutomaton
URL: https://github.com/apache/lucene-solr/pull/1243
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13411) CompositeIdRouter calculates wrong route hash if atomic update is used for route.field

2020-02-24 Thread Mikhail Khludnev (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043408#comment-17043408
 ] 

Mikhail Khludnev commented on SOLR-13411:
-

Appreciate, [~osavrasov]. Let's open go/no-go vote. I push it this week. 

> CompositeIdRouter calculates wrong route hash if atomic update is used for 
> route.field
> --
>
> Key: SOLR-13411
> URL: https://issues.apache.org/jira/browse/SOLR-13411
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 7.5
>Reporter: Niko Himanen
>Priority: Minor
> Attachments: SOLR-13411.patch
>
>
> If collection is created with router.field -parameter to define some other 
> field than uniqueField as route field and document update comes containing 
> route field updated using atomic update syntax (for example set=123), hash 
> for document routing is calculated from "set=123" and not from 123 which is 
> the real value which may lead into routing document to wrong shard.
>  
> This happens in CompositeIdRouter#sliceHash, where field value is used as is 
> for hash calculation.
>  
> I think there are two possible solutions to fix this:
> a) Allow use of atomic update also for route.field, but use real value 
> instead of atomic update syntax to route document into right shard.
> b) Deny atomic update for route.field and throw exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-13411) CompositeIdRouter calculates wrong route hash if atomic update is used for route.field

2020-02-24 Thread Mikhail Khludnev (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-13411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-13411:

Status: Patch Available  (was: Open)

> CompositeIdRouter calculates wrong route hash if atomic update is used for 
> route.field
> --
>
> Key: SOLR-13411
> URL: https://issues.apache.org/jira/browse/SOLR-13411
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 7.5
>Reporter: Niko Himanen
>Priority: Minor
> Attachments: SOLR-13411.patch
>
>
> If collection is created with router.field -parameter to define some other 
> field than uniqueField as route field and document update comes containing 
> route field updated using atomic update syntax (for example set=123), hash 
> for document routing is calculated from "set=123" and not from 123 which is 
> the real value which may lead into routing document to wrong shard.
>  
> This happens in CompositeIdRouter#sliceHash, where field value is used as is 
> for hash calculation.
>  
> I think there are two possible solutions to fix this:
> a) Allow use of atomic update also for route.field, but use real value 
> instead of atomic update syntax to route document into right shard.
> b) Deny atomic update for route.field and throw exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9207) Don't build SpanQuery in QueryBuilder

2020-02-24 Thread Alan Woodward (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043401#comment-17043401
 ] 

Alan Woodward commented on LUCENE-9207:
---

I think this should probably be a 9.0-only change, particularly given that the 
parent issue is not going to be backported.  Will commit to master presently.

> Don't build SpanQuery in QueryBuilder
> -
>
> Key: LUCENE-9207
> URL: https://issues.apache.org/jira/browse/LUCENE-9207
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Subtask of LUCENE-9204.  QueryBuilder currently has special logic for graph 
> phrase queries with no slop, constructing a spanquery that attempts to follow 
> all paths using a combination of OR and NEAR queries.  Given the known bugs 
> in this type of query (LUCENE-7398) and that we would like to move span 
> queries out of core in any case, we should remove this logic and just build a 
> disjunction of phrase queries, one phrase per path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-9171) Synonyms Boost by Payload

2020-02-24 Thread Alan Woodward (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-9171:
--
Fix Version/s: 8.5

> Synonyms Boost by Payload
> -
>
> Key: LUCENE-9171
> URL: https://issues.apache.org/jira/browse/LUCENE-9171
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/queryparser
>Reporter: Alessandro Benedetti
>Priority: Major
> Fix For: 8.5
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I have been working in the additional capability of boosting queries by terms 
> payload through a parameter to enable it in Lucene Query Builder.
> This has been done targeting the Synonyms Query.
> It is parametric, so it meant to see no difference unless the feature is 
> enabled.
> Solr has its bits to comply thorugh its SynonymsQueryStyles



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-12238) Synonym Query Style Boost By Payload

2020-02-24 Thread Alan Woodward (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-12238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated SOLR-12238:
-
Fix Version/s: 8.5
 Assignee: Alan Woodward
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Synonym Query Style Boost By Payload
> 
>
> Key: SOLR-12238
> URL: https://issues.apache.org/jira/browse/SOLR-12238
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Affects Versions: 7.2
>Reporter: Alessandro Benedetti
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 8.5
>
> Attachments: SOLR-12238.patch, SOLR-12238.patch, SOLR-12238.patch, 
> SOLR-12238.patch
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> This improvement is built on top of the Synonym Query Style feature and 
> brings the possibility of boosting synonym queries using the payload 
> associated.
> It introduces two new modalities for the Synonym Query Style :
> PICK_BEST_BOOST_BY_PAYLOAD -> build a Disjunction query with the clauses 
> boosted by payload
> AS_DISTINCT_TERMS_BOOST_BY_PAYLOAD -> build a Boolean query with the clauses 
> boosted by payload
> This new synonym query styles will assume payloads are available so they must 
> be used in conjunction with a token filter able to produce payloads.
> An synonym.txt example could be :
> # Synonyms used by Payload Boost
> tiger => tiger|1.0, Big_Cat|0.8, Shere_Khan|0.9
> leopard => leopard, Big_Cat|0.8, Bagheera|0.9
> lion => lion|1.0, panthera leo|0.99, Simba|0.8
> snow_leopard => panthera uncia|0.99, snow leopard|1.0
> A simple token filter to populate the payloads from such synonym.txt is :
>  delimiter="|"/>



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-9171) Synonyms Boost by Payload

2020-02-24 Thread Alan Woodward (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-9171:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Resolved by SOLR-12238

> Synonyms Boost by Payload
> -
>
> Key: LUCENE-9171
> URL: https://issues.apache.org/jira/browse/LUCENE-9171
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/queryparser
>Reporter: Alessandro Benedetti
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I have been working in the additional capability of boosting queries by terms 
> payload through a parameter to enable it in Lucene Query Builder.
> This has been done targeting the Synonyms Query.
> It is parametric, so it meant to see no difference unless the feature is 
> enabled.
> Solr has its bits to comply thorugh its SynonymsQueryStyles



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-12238) Synonym Query Style Boost By Payload

2020-02-24 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-12238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043390#comment-17043390
 ] 

ASF subversion and git services commented on SOLR-12238:


Commit 2752d50dd1dcf758a32dc573d02967612a2cf1ff in lucene-solr's branch 
refs/heads/branch_8x from Alessandro Benedetti
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=2752d50 ]

SOLR-12238: Handle boosts in QueryBuilder

QueryBuilder now detects per-term boosts supplied by a BoostAttribute when
building queries using a TokenStream.  This commit also adds a 
DelimitedBoostTokenFilter
that parses boosts from tokens using a delimiter token, and exposes this in Solr


> Synonym Query Style Boost By Payload
> 
>
> Key: SOLR-12238
> URL: https://issues.apache.org/jira/browse/SOLR-12238
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Affects Versions: 7.2
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: SOLR-12238.patch, SOLR-12238.patch, SOLR-12238.patch, 
> SOLR-12238.patch
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> This improvement is built on top of the Synonym Query Style feature and 
> brings the possibility of boosting synonym queries using the payload 
> associated.
> It introduces two new modalities for the Synonym Query Style :
> PICK_BEST_BOOST_BY_PAYLOAD -> build a Disjunction query with the clauses 
> boosted by payload
> AS_DISTINCT_TERMS_BOOST_BY_PAYLOAD -> build a Boolean query with the clauses 
> boosted by payload
> This new synonym query styles will assume payloads are available so they must 
> be used in conjunction with a token filter able to produce payloads.
> An synonym.txt example could be :
> # Synonyms used by Payload Boost
> tiger => tiger|1.0, Big_Cat|0.8, Shere_Khan|0.9
> leopard => leopard, Big_Cat|0.8, Bagheera|0.9
> lion => lion|1.0, panthera leo|0.99, Simba|0.8
> snow_leopard => panthera uncia|0.99, snow leopard|1.0
> A simple token filter to populate the payloads from such synonym.txt is :
>  delimiter="|"/>



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] juanka588 commented on issue #1282: Lucene 9236

2020-02-24 Thread GitBox

juanka588 commented on issue #1282: Lucene 9236
URL: https://github.com/apache/lucene-solr/pull/1282#issuecomment-590257020
 
 
   Please review each commit apart.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] juanka588 commented on a change in pull request #1282: Lucene 9236

2020-02-24 Thread GitBox

juanka588 commented on a change in pull request #1282: Lucene 9236
URL: https://github.com/apache/lucene-solr/pull/1282#discussion_r383186292
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80BinaryConsumer.java
 ##
 @@ -48,6 +53,16 @@ public Lucene80BinaryConsumer(SegmentWriteState state) {
 this.maxDoc = state.segmentInfo.maxDoc();
   }
 
+  @Override
+  public CompositeFieldMetadata addBinary(FieldInfo field, DocValuesProducer 
valuesProducer, IndexOutput indexOutput) throws IOException {
+ByteBuffersDataOutput delegate = 
ByteBuffersDataOutput.newResettableInstance();
 
 Review comment:
   this can be replaced with a BinaryEntry Object


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-12238) Synonym Query Style Boost By Payload

2020-02-24 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-12238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043374#comment-17043374
 ] 

ASF subversion and git services commented on SOLR-12238:


Commit 663611c99c7d48dd31d53ea17644fcecd5e0fad7 in lucene-solr's branch 
refs/heads/master from Alessandro Benedetti
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=663611c ]

[SOLR-12238] Synonym Queries boost (#357)

SOLR-12238: Handle boosts in QueryBuilder

QueryBuilder now detects per-term boosts supplied by a BoostAttribute when
building queries using a TokenStream.  This commit also adds a 
DelimitedBoostTokenFilter
that parses boosts from tokens using a delimiter token, and exposes this in Solr

> Synonym Query Style Boost By Payload
> 
>
> Key: SOLR-12238
> URL: https://issues.apache.org/jira/browse/SOLR-12238
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Affects Versions: 7.2
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: SOLR-12238.patch, SOLR-12238.patch, SOLR-12238.patch, 
> SOLR-12238.patch
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> This improvement is built on top of the Synonym Query Style feature and 
> brings the possibility of boosting synonym queries using the payload 
> associated.
> It introduces two new modalities for the Synonym Query Style :
> PICK_BEST_BOOST_BY_PAYLOAD -> build a Disjunction query with the clauses 
> boosted by payload
> AS_DISTINCT_TERMS_BOOST_BY_PAYLOAD -> build a Boolean query with the clauses 
> boosted by payload
> This new synonym query styles will assume payloads are available so they must 
> be used in conjunction with a token filter able to produce payloads.
> An synonym.txt example could be :
> # Synonyms used by Payload Boost
> tiger => tiger|1.0, Big_Cat|0.8, Shere_Khan|0.9
> leopard => leopard, Big_Cat|0.8, Bagheera|0.9
> lion => lion|1.0, panthera leo|0.99, Simba|0.8
> snow_leopard => panthera uncia|0.99, snow leopard|1.0
> A simple token filter to populate the payloads from such synonym.txt is :
>  delimiter="|"/>



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] juanka588 commented on a change in pull request #1282: Lucene 9236

2020-02-24 Thread GitBox

juanka588 commented on a change in pull request #1282: Lucene 9236
URL: https://github.com/apache/lucene-solr/pull/1282#discussion_r383185344
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80NumericProducer.java
 ##
 @@ -0,0 +1,541 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.codecs.lucene80;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.Map;
+
+import org.apache.lucene.codecs.composite.CompositeDocValuesProducer;
+import org.apache.lucene.codecs.composite.CompositeFieldMetadata;
+import org.apache.lucene.index.CorruptIndexException;
+import org.apache.lucene.index.DocValues;
+import org.apache.lucene.index.FieldInfo;
+import org.apache.lucene.index.NumericDocValues;
+import org.apache.lucene.index.SortedNumericDocValues;
+import org.apache.lucene.store.IndexInput;
+import org.apache.lucene.store.RandomAccessInput;
+import org.apache.lucene.util.Accountable;
+import org.apache.lucene.util.LongValues;
+import org.apache.lucene.util.RamUsageEstimator;
+import org.apache.lucene.util.packed.DirectMonotonicReader;
+import org.apache.lucene.util.packed.DirectReader;
+
+public class Lucene80NumericProducer{
+  private final int maxDoc;
+
+  public Lucene80NumericProducer(int maxDoc) {
+this.maxDoc = maxDoc;
+  }
+
+  static class NumericEntry implements Accountable {
+long[] table;
+int blockShift;
+byte bitsPerValue;
+long docsWithFieldOffset;
+long docsWithFieldLength;
+short jumpTableEntryCount;
+byte denseRankPower;
+long numValues;
+long minValue;
+long gcd;
+long valuesOffset;
+long valuesLength;
+long valueJumpTableOffset; // -1 if no jump-table
+
+@Override
+public long ramBytesUsed() {
+  return table == null ? 0L : RamUsageEstimator.sizeOf(table);
+}
+  }
+
+  static class SortedNumericEntry extends NumericEntry {
+int numDocsWithField;
+DirectMonotonicReader.Meta addressesMeta;
+long addressesOffset;
+long addressesLength;
+
+@Override
+public long ramBytesUsed() {
+  long ramBytesUsed = super.ramBytesUsed();
+  ramBytesUsed += addressesMeta == null ? 0L : 
addressesMeta.ramBytesUsed();
+  return ramBytesUsed;
+}
+  }
+
+  static NumericEntry readNumeric(IndexInput meta) throws IOException {
+NumericEntry entry = new NumericEntry();
+readNumeric(meta, entry);
+return entry;
+  }
+
+  static void readNumeric(IndexInput meta, NumericEntry entry) throws 
IOException {
+entry.docsWithFieldOffset = meta.readLong();
+entry.docsWithFieldLength = meta.readLong();
+entry.jumpTableEntryCount = meta.readShort();
+entry.denseRankPower = meta.readByte();
+entry.numValues = meta.readLong();
+int tableSize = meta.readInt();
+if (tableSize > 256) {
+  throw new CorruptIndexException("invalid table size: " + tableSize, 
meta);
+}
+if (tableSize >= 0) {
+  entry.table = new long[tableSize];
+  for (int i = 0; i < tableSize; ++i) {
+entry.table[i] = meta.readLong();
+  }
+}
+if (tableSize < -1) {
+  entry.blockShift = -2 - tableSize;
+} else {
+  entry.blockShift = -1;
+}
+entry.bitsPerValue = meta.readByte();
+entry.minValue = meta.readLong();
+entry.gcd = meta.readLong();
+entry.valuesOffset = meta.readLong();
+entry.valuesLength = meta.readLong();
+entry.valueJumpTableOffset = meta.readLong();
+  }
+
+  static SortedNumericEntry readSortedNumeric(IndexInput meta) throws 
IOException {
+SortedNumericEntry entry = new SortedNumericEntry();
+readNumeric(meta, entry);
+entry.numDocsWithField = meta.readInt();
+if (entry.numDocsWithField != entry.numValues) {
+  entry.addressesOffset = meta.readLong();
+  final int blockShift = meta.readVInt();
+  entry.addressesMeta = DirectMonotonicReader.loadMeta(meta, 
entry.numDocsWithField + 1, blockShift);
+  entry.addressesLength = meta.readLong();
+}
+return entry;
+  }
+
+  public SortedNumericDocValues getSortedNumeric(SortedNumericEntry entry, 
IndexInput data) throws IOException {
+if (entry.numValues == entry.numDocsWithField) {
+

[jira] [Commented] (SOLR-12238) Synonym Query Style Boost By Payload

2020-02-24 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-12238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043373#comment-17043373
 ] 

ASF subversion and git services commented on SOLR-12238:


Commit 663611c99c7d48dd31d53ea17644fcecd5e0fad7 in lucene-solr's branch 
refs/heads/master from Alessandro Benedetti
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=663611c ]

[SOLR-12238] Synonym Queries boost (#357)

SOLR-12238: Handle boosts in QueryBuilder

QueryBuilder now detects per-term boosts supplied by a BoostAttribute when
building queries using a TokenStream.  This commit also adds a 
DelimitedBoostTokenFilter
that parses boosts from tokens using a delimiter token, and exposes this in Solr

> Synonym Query Style Boost By Payload
> 
>
> Key: SOLR-12238
> URL: https://issues.apache.org/jira/browse/SOLR-12238
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Affects Versions: 7.2
>Reporter: Alessandro Benedetti
>Priority: Major
> Attachments: SOLR-12238.patch, SOLR-12238.patch, SOLR-12238.patch, 
> SOLR-12238.patch
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> This improvement is built on top of the Synonym Query Style feature and 
> brings the possibility of boosting synonym queries using the payload 
> associated.
> It introduces two new modalities for the Synonym Query Style :
> PICK_BEST_BOOST_BY_PAYLOAD -> build a Disjunction query with the clauses 
> boosted by payload
> AS_DISTINCT_TERMS_BOOST_BY_PAYLOAD -> build a Boolean query with the clauses 
> boosted by payload
> This new synonym query styles will assume payloads are available so they must 
> be used in conjunction with a token filter able to produce payloads.
> An synonym.txt example could be :
> # Synonyms used by Payload Boost
> tiger => tiger|1.0, Big_Cat|0.8, Shere_Khan|0.9
> leopard => leopard, Big_Cat|0.8, Bagheera|0.9
> lion => lion|1.0, panthera leo|0.99, Simba|0.8
> snow_leopard => panthera uncia|0.99, snow leopard|1.0
> A simple token filter to populate the payloads from such synonym.txt is :
>  delimiter="|"/>



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] juanka588 commented on a change in pull request #1282: Lucene 9236

2020-02-24 Thread GitBox

juanka588 commented on a change in pull request #1282: Lucene 9236
URL: https://github.com/apache/lucene-solr/pull/1282#discussion_r383184968
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80NumericConsumer.java
 ##
 @@ -0,0 +1,319 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.codecs.lucene80;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.lucene.codecs.DocValuesProducer;
+import org.apache.lucene.index.FieldInfo;
+import org.apache.lucene.index.SegmentWriteState;
+import org.apache.lucene.index.SortedNumericDocValues;
+import org.apache.lucene.search.DocIdSetIterator;
+import org.apache.lucene.store.ByteBuffersDataOutput;
+import org.apache.lucene.store.IndexOutput;
+import org.apache.lucene.util.ArrayUtil;
+import org.apache.lucene.util.MathUtil;
+import org.apache.lucene.util.packed.DirectMonotonicWriter;
+import org.apache.lucene.util.packed.DirectWriter;
+
+import static 
org.apache.lucene.codecs.lucene80.Lucene80DocValuesFormat.DIRECT_MONOTONIC_BLOCK_SHIFT;
+import static 
org.apache.lucene.codecs.lucene80.Lucene80DocValuesFormat.NUMERIC_BLOCK_SHIFT;
+import static 
org.apache.lucene.codecs.lucene80.Lucene80DocValuesFormat.NUMERIC_BLOCK_SIZE;
+
+public class Lucene80NumericConsumer{
+
+  private final int maxDoc;
+
+  public Lucene80NumericConsumer(SegmentWriteState state) {
+this.maxDoc = state.segmentInfo.maxDoc();
+  }
+
+  public void addSortedNumericField(FieldInfo field, DocValuesProducer 
valuesProducer, IndexOutput data, IndexOutput meta) throws IOException {
+long[] stats = writeValues(field, valuesProducer, data, meta);
+int numDocsWithField = Math.toIntExact(stats[0]);
+long numValues = stats[1];
+assert numValues >= numDocsWithField;
+
+meta.writeInt(numDocsWithField);
+if (numValues > numDocsWithField) {
+  long start = data.getFilePointer();
+  meta.writeLong(start);
+  meta.writeVInt(DIRECT_MONOTONIC_BLOCK_SHIFT);
+
+  final DirectMonotonicWriter addressesWriter = 
DirectMonotonicWriter.getInstance(meta, data, numDocsWithField + 1L, 
DIRECT_MONOTONIC_BLOCK_SHIFT);
+  long addr = 0;
+  addressesWriter.add(addr);
+  SortedNumericDocValues values = valuesProducer.getSortedNumeric(field);
+  for (int doc = values.nextDoc(); doc != DocIdSetIterator.NO_MORE_DOCS; 
doc = values.nextDoc()) {
+addr += values.docValueCount();
+addressesWriter.add(addr);
+  }
+  addressesWriter.finish();
+  meta.writeLong(data.getFilePointer() - start);
+}
+  }
+
+  private static class MinMaxTracker {
+long min, max, numValues, spaceInBits;
+
+MinMaxTracker() {
+  reset();
+  spaceInBits = 0;
+}
+
+private void reset() {
+  min = Long.MAX_VALUE;
+  max = Long.MIN_VALUE;
+  numValues = 0;
+}
+
+/**
+ * Accumulate a new value.
+ */
+void update(long v) {
+  min = Math.min(min, v);
+  max = Math.max(max, v);
+  ++numValues;
+}
+
+/**
+ * Update the required space.
+ */
+void finish() {
+  if (max > min) {
+spaceInBits += DirectWriter.unsignedBitsRequired(max - min) * 
numValues;
+  }
+}
+
+/**
+ * Update space usage and get ready for accumulating values for the next 
block.
+ */
+void nextBlock() {
+  finish();
+  reset();
+}
+  }
+
+  public long[] writeValues(FieldInfo field, DocValuesProducer valuesProducer, 
IndexOutput data, IndexOutput meta) throws IOException {
 
 Review comment:
   added data and meta index output


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands,

[GitHub] [lucene-solr] romseygeek merged pull request #357: [SOLR-12238] Synonym Queries boost

2020-02-24 Thread GitBox

romseygeek merged pull request #357: [SOLR-12238] Synonym Queries boost
URL: https://github.com/apache/lucene-solr/pull/357
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] juanka588 commented on a change in pull request #1282: Lucene 9236

2020-02-24 Thread GitBox

juanka588 commented on a change in pull request #1282: Lucene 9236
URL: https://github.com/apache/lucene-solr/pull/1282#discussion_r383184671
 
 

 ##
 File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesProducer.java
 ##
 @@ -121,1445 +116,80 @@ private void readFields(ChecksumIndexInput meta, 
FieldInfos infos) throws IOExce
   }
   byte type = meta.readByte();
   if (type == Lucene80DocValuesFormat.NUMERIC) {
-numerics.put(info.name, readNumeric(meta));
+numerics.put(info.name, Lucene80NumericProducer.readNumeric(meta));
   } else if (type == Lucene80DocValuesFormat.BINARY) {
-binaries.put(info.name, readBinary(meta));
+binaries.put(info.name, Lucene80BinaryProducer.readBinary(meta, 
version));
   } else if (type == Lucene80DocValuesFormat.SORTED) {
-sorted.put(info.name, readSorted(meta));
+sorted.put(info.name, Lucene80SortedSetProducer.readSorted(meta));
   } else if (type == Lucene80DocValuesFormat.SORTED_SET) {
-sortedSets.put(info.name, readSortedSet(meta));
+sortedSets.put(info.name, 
Lucene80SortedSetProducer.readSortedSet(meta));
   } else if (type == Lucene80DocValuesFormat.SORTED_NUMERIC) {
-sortedNumerics.put(info.name, readSortedNumeric(meta));
+sortedNumerics.put(info.name, 
Lucene80NumericProducer.readSortedNumeric(meta));
   } else {
 throw new CorruptIndexException("invalid type: " + type, meta);
   }
 }
   }
 
-  private NumericEntry readNumeric(ChecksumIndexInput meta) throws IOException 
{
-NumericEntry entry = new NumericEntry();
-readNumeric(meta, entry);
-return entry;
-  }
-
-  private void readNumeric(ChecksumIndexInput meta, NumericEntry entry) throws 
IOException {
-entry.docsWithFieldOffset = meta.readLong();
-entry.docsWithFieldLength = meta.readLong();
-entry.jumpTableEntryCount = meta.readShort();
-entry.denseRankPower = meta.readByte();
-entry.numValues = meta.readLong();
-int tableSize = meta.readInt();
-if (tableSize > 256) {
-  throw new CorruptIndexException("invalid table size: " + tableSize, 
meta);
-}
-if (tableSize >= 0) {
-  entry.table = new long[tableSize];
-  ramBytesUsed += RamUsageEstimator.sizeOf(entry.table);
-  for (int i = 0; i < tableSize; ++i) {
-entry.table[i] = meta.readLong();
-  }
-}
-if (tableSize < -1) {
-  entry.blockShift = -2 - tableSize;
-} else {
-  entry.blockShift = -1;
-}
-entry.bitsPerValue = meta.readByte();
-entry.minValue = meta.readLong();
-entry.gcd = meta.readLong();
-entry.valuesOffset = meta.readLong();
-entry.valuesLength = meta.readLong();
-entry.valueJumpTableOffset = meta.readLong();
-  }
-
-  private BinaryEntry readBinary(ChecksumIndexInput meta) throws IOException {
-BinaryEntry entry = new BinaryEntry();
-entry.dataOffset = meta.readLong();
-entry.dataLength = meta.readLong();
-entry.docsWithFieldOffset = meta.readLong();
-entry.docsWithFieldLength = meta.readLong();
-entry.jumpTableEntryCount = meta.readShort();
-entry.denseRankPower = meta.readByte();
-entry.numDocsWithField = meta.readInt();
-entry.minLength = meta.readInt();
-entry.maxLength = meta.readInt();
-if ((version >= Lucene80DocValuesFormat.VERSION_BIN_COMPRESSED && 
entry.numDocsWithField > 0) ||  entry.minLength < entry.maxLength) {
-  entry.addressesOffset = meta.readLong();
-
-  // Old count of uncompressed addresses 
-  long numAddresses = entry.numDocsWithField + 1L;
-  // New count of compressed addresses - the number of compresseed blocks
-  if (version >= Lucene80DocValuesFormat.VERSION_BIN_COMPRESSED) {
-entry.numCompressedChunks = meta.readVInt();
-entry.docsPerChunkShift = meta.readVInt();
-entry.maxUncompressedChunkSize = meta.readVInt();
-numAddresses = entry.numCompressedChunks;
-  }  
-  
-  final int blockShift = meta.readVInt();
-  entry.addressesMeta = DirectMonotonicReader.loadMeta(meta, numAddresses, 
blockShift);
-  ramBytesUsed += entry.addressesMeta.ramBytesUsed();
-  entry.addressesLength = meta.readLong();
-}
-return entry;
-  }
-
-  private SortedEntry readSorted(ChecksumIndexInput meta) throws IOException {
-SortedEntry entry = new SortedEntry();
-entry.docsWithFieldOffset = meta.readLong();
-entry.docsWithFieldLength = meta.readLong();
-entry.jumpTableEntryCount = meta.readShort();
-entry.denseRankPower = meta.readByte();
-entry.numDocsWithField = meta.readInt();
-entry.bitsPerValue = meta.readByte();
-entry.ordsOffset = meta.readLong();
-entry.ordsLength = meta.readLong();
-readTermDict(meta, entry);
-return entry;
-  }
-
-  private SortedSetEntry readSortedSet(ChecksumIndexInput meta) throws 
IOException {
-SortedSetEntry entry = new SortedSetEntry();
-b

[jira] [Updated] (LUCENE-9236) Having a modular Doc Values format

2020-02-24 Thread juan camilo rodriguez duran (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

juan camilo rodriguez duran updated LUCENE-9236:

Description: 
 Today DocValues Consumer/Producer require override 5 different methods, even 
if you only want to use one and given that one given field can only support one 
doc values type at same time.

 

In the attached PR I’ve implemented a new modular version of those classes 
(consumer/producer) each one having a single responsibility and writing in the 
same unique file.

This is mainly a refactor of the existing format opening the possibility to 
override or implement the sub-format you need.

 

I’ll do in 3 steps:
 # Create a CompositeDocValuesFormat and moving the code of 
Lucene80DocValuesFormat in separate classes, without modifying the inner code. 
At same time I created a Lucene85CompositeDocValuesFormat based on these 
changes.
 # I’ll introduce some basic components for writing doc values in general such 
as:
 ## DocumentIdSetIterator Serializer: used in each type of field based on an 
IndexedDISI.
 ## Document Ordinals Serializer: Used in Sorted and SortedSet for deduplicate 
values using a dictionary.
 ## Document Boundaries Serializer (optional used only for multivalued fields: 
SortedNumeric and SortedSet)
 ## TermsEnum Serializer: useful to write and read the terms dictionary for 
sorted and sorted set doc values.
 # I’ll create the new Sub-DocValues format using the previous components.

 

PR: [https://github.com/apache/lucene-solr/pull/1282]

  was:
 Today DocValues Consumer/Producer require override 5 different methods, even 
if you only want to use one and given that one given field can only support one 
doc values type at same time.

 

In the attached PR I’ve implemented a new modular version of those classes 
(consumer/producer) each one having a single responsibility and writing in the 
same unique file.

This is mainly a refactor of the existing format opening the possibility to 
override or implement the sub-format you need.

 

I’ll do in 3 steps:
 # Create a CompositeDocValuesFormat and moving the code of 
Lucene80DocValuesFormat in separate classes, without modifying the inner code. 
At same time I created a Lucene85CompositeDocValuesFormat based on these 
changes.
 # I’ll introduce some basic components for writing doc values in general such 
as:
 ## DocumentIdSetIterator Serializer: used in each type of field based on an 
IndexedDISI.
 ## Document Ordinals Serializer: Used in Sorted and SortedSet for deduplicate 
values using a dictionary.
 ## Document Boundaries Serializer (optional used only for multivalued fields: 
SortedNumeric and SortedSet)
 ## TermsEnum Serializer: useful to write and read the terms dictionary for 
sorted and sorted set doc values.
 # I’ll create the new Sub-DocValues format using the previous components.


> Having a modular Doc Values format
> --
>
> Key: LUCENE-9236
> URL: https://issues.apache.org/jira/browse/LUCENE-9236
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: juan camilo rodriguez duran
>Priority: Minor
>  Labels: docValues
>
>  Today DocValues Consumer/Producer require override 5 different methods, even 
> if you only want to use one and given that one given field can only support 
> one doc values type at same time.
>  
> In the attached PR I’ve implemented a new modular version of those classes 
> (consumer/producer) each one having a single responsibility and writing in 
> the same unique file.
> This is mainly a refactor of the existing format opening the possibility to 
> override or implement the sub-format you need.
>  
> I’ll do in 3 steps:
>  # Create a CompositeDocValuesFormat and moving the code of 
> Lucene80DocValuesFormat in separate classes, without modifying the inner 
> code. At same time I created a Lucene85CompositeDocValuesFormat based on 
> these changes.
>  # I’ll introduce some basic components for writing doc values in general 
> such as:
>  ## DocumentIdSetIterator Serializer: used in each type of field based on an 
> IndexedDISI.
>  ## Document Ordinals Serializer: Used in Sorted and SortedSet for 
> deduplicate values using a dictionary.
>  ## Document Boundaries Serializer (optional used only for multivalued 
> fields: SortedNumeric and SortedSet)
>  ## TermsEnum Serializer: useful to write and read the terms dictionary for 
> sorted and sorted set doc values.
>  # I’ll create the new Sub-DocValues format using the previous components.
>  
> PR: [https://github.com/apache/lucene-solr/pull/1282]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands,

[GitHub] [lucene-solr] juanka588 opened a new pull request #1282: Lucene 9236

2020-02-24 Thread GitBox

juanka588 opened a new pull request #1282: Lucene 9236
URL: https://github.com/apache/lucene-solr/pull/1282
 
 
   
   
   
   # Description
   
   Please provide a short description of the changes you're making with this 
pull request.
   
   # Solution
   
   Please provide a short description of the approach taken to implement your 
solution.
   
   # Tests
   
   Please describe the tests you've developed or run to confirm this patch 
implements the feature or solves the problem.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [ ] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [ ] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [ ] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [ ] I have developed this patch against the `master` branch.
   - [ ] I have run `ant precommit` and the appropriate test suite.
   - [ ] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13411) CompositeIdRouter calculates wrong route hash if atomic update is used for route.field

2020-02-24 Thread Dr Oleg Savrasov (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043361#comment-17043361
 ] 

Dr Oleg Savrasov commented on SOLR-13411:
-

Patch for option

> b) Deny atomic update for route.field and throw exception.

is provided

> CompositeIdRouter calculates wrong route hash if atomic update is used for 
> route.field
> --
>
> Key: SOLR-13411
> URL: https://issues.apache.org/jira/browse/SOLR-13411
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 7.5
>Reporter: Niko Himanen
>Priority: Minor
> Attachments: SOLR-13411.patch
>
>
> If collection is created with router.field -parameter to define some other 
> field than uniqueField as route field and document update comes containing 
> route field updated using atomic update syntax (for example set=123), hash 
> for document routing is calculated from "set=123" and not from 123 which is 
> the real value which may lead into routing document to wrong shard.
>  
> This happens in CompositeIdRouter#sliceHash, where field value is used as is 
> for hash calculation.
>  
> I think there are two possible solutions to fix this:
> a) Allow use of atomic update also for route.field, but use real value 
> instead of atomic update syntax to route document into right shard.
> b) Deny atomic update for route.field and throw exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

1 2 >

1 - 100 of 123 matches

Mail list logo