Disable or rollback; I'm good either way.  I think you should un-bump the
FST version since the feature becomes entirely experimental.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Mon, Jul 15, 2019 at 12:34 PM Ishan Chattopadhyaya <
[email protected]> wrote:

> +1 to rollback and having a 8.3 as soon as we nail this down (even if that
> is days or 1-2 weeks after 8.2).
>
> On Mon, 15 Jul, 2019, 9:22 PM Michael Sokolov, <[email protected]> wrote:
>
>> I guess whether we roll back depends on timing. I think we are close
>> to a release though, and these changes are complex and will require
>> further testing, so rollback seems reasonable to me. I think from code
>> management perspective it will be simplest to disable direct
>> addressing for now, rather than actually reverting the various commits
>> that are in place. I can post a patch doing that today.
>>
>> I like the ideas you have for compressing FSTs further. It was
>> bothering me that we store the labels needlessly. I do think that
>> before making more radical changes to Arc though, I would like to add
>> some encapsulation so that we can be a bit freer without being
>> concerned about the abstraction leaking (Several classes depend on the
>> Arc internals today). EG I'd like to make its members private and add
>> getters. I know this is a performance-sensitive area, and maybe we had
>> a reason for not using them? Do we have some experience that suggests
>> that would be a performance issue? My assumption is that JIT
>> compilation would make that free, but I haven't tested.
>>
>> On Mon, Jul 15, 2019 at 11:36 AM Adrien Grand <[email protected]> wrote:
>> >
>> > That would be great. I wonder that we could also make the encoding a
>> > bit more efficient. For instance I noticed that arc metadata is pretty
>> > large in some cases (in the 10-20 bytes) which make gaps very costly.
>> > Associating each label with a dense id and having an intermediate
>> > lookup, ie. lookup label -> id and then id->arc offset instead of
>> > doing label->arc directly could save a lot of space in some cases?
>> > Also it seems that we are repeating the label in the arc metadata when
>> > array-with-gaps is used, even though it shouldn't be necessary since
>> > the label is implicit from the address?
>> >
>> > Do you think we can have a mitigation for worst-case scenarii in 8.2
>> > or should we revert from branch_8_2 to keep the release process going
>> > and work on this for 8.3?
>> >
>> > On Mon, Jul 15, 2019 at 5:12 PM Michael Sokolov <[email protected]>
>> wrote:
>> > >
>> > > Thanks for the nice test, Adrien. Yes, the tradeoff of direct
>> > > addressing is heavily data-dependent. I think we can improve the
>> > > situation here by tracking, per-FST instance, the size increase we're
>> > > seeing while building (or perhaps do a preliminary pass before
>> > > building) in order to decide whether to apply the encoding.
>> > >
>> > > On Mon, Jul 15, 2019 at 9:02 AM Adrien Grand <[email protected]>
>> wrote:
>> > > >
>> > > > I dug this a bit and suspect that the issue is mostly with one field
>> > > > that is not part of the data but auto-generated: the ID field. It
>> is a
>> > > > slight variant of Flake IDs, so it's not random, it includes a
>> > > > timestamp and a sequence number, and I suspect that its patterns
>> > > > combined with the larger alphabet than ascii makes this size
>> increase
>> > > > more likely than with the data set you tested against.
>> > > >
>> > > > For instance I ran the following code with direct array addressing
>> on
>> > > > and off to simulate a worst-case scenario.
>> > > >
>> > > >   public static void main(String[] args) throws IOException {
>> > > >     Directory dir = FSDirectory.open(Paths.get("/tmp/a"));
>> > > >     IndexWriter w = new IndexWriter(dir, new
>> > > > IndexWriterConfig().setOpenMode(OpenMode.CREATE));
>> > > >     byte[] b = new byte[5];
>> > > >     Random r = new Random(0);
>> > > >     for (int i = 0; i < 1000000; ++i) {
>> > > >       r.nextBytes(b);
>> > > >       for (int j = 0; j < b.length; ++j) {
>> > > >         b[j] &= 0xfc; // make this byte a multiple of 4
>> > > >       }
>> > > >       Document doc = new Document();
>> > > >       StringField field = new StringField("f", new BytesRef(b),
>> Store.NO);
>> > > >       doc.add(field);
>> > > >       w.addDocument(doc);
>> > > >     }
>> > > >     w.forceMerge(1);
>> > > >     IndexReader reader = DirectoryReader.open(w);
>> > > >     w.close();
>> > > >     if (reader.leaves().size() != 1) {
>> > > >       throw new Error();
>> > > >     }
>> > > >     LeafReader leaf = reader.leaves().get(0).reader();
>> > > >     System.out.println(((SegmentReader) leaf).ramBytesUsed());
>> > > >     reader.close();
>> > > >     dir.close();
>> > > >   }
>> > > >
>> > > > When direct addressing is enabled (default), I get 586079. If I
>> > > > disable direct addressing by applying the below patch, then I get
>> > > > 156228 - about 3.75x less.
>> > > >
>> > > > diff --git
>> a/lucene/core/src/java/org/apache/lucene/util/fst/FST.java
>> > > > b/lucene/core/src/java/org/apache/lucene/util/fst/FST.java
>> > > > index f308f1a..ff99cc2 100644
>> > > > --- a/lucene/core/src/java/org/apache/lucene/util/fst/FST.java
>> > > > +++ b/lucene/core/src/java/org/apache/lucene/util/fst/FST.java
>> > > > @@ -647,7 +647,7 @@ public final class FST<T> implements
>> Accountable {
>> > > >        // array that may have holes in it so that we can address the
>> > > > arcs directly by label without
>> > > >        // binary search
>> > > >        int labelRange = nodeIn.arcs[nodeIn.numArcs - 1].label -
>> > > > nodeIn.arcs[0].label + 1;
>> > > > -      boolean writeDirectly = labelRange > 0 && labelRange <
>> > > > Builder.DIRECT_ARC_LOAD_FACTOR * nodeIn.numArcs;
>> > > > +      boolean writeDirectly = false; // labelRange > 0 &&
>> labelRange
>> > > > < Builder.DIRECT_ARC_LOAD_FACTOR * nodeIn.numArcs;
>> > > >
>> > > >        //System.out.println("write int @pos=" + (fixedArrayStart-4)
>> +
>> > > > " numArcs=" + nodeIn.numArcs);
>> > > >        // create the header
>> > > >
>> > > > On Mon, Jul 15, 2019 at 2:33 PM Michael Sokolov <[email protected]>
>> wrote:
>> > > > >
>> > > > > OK, both LUCENE-8781 and LUCENE-8895 were introduced in 8.2.0. I
>> see
>> > > > > most of the other data sets report an increase more in the 10-15%
>> > > > > range, which is expected. I'm curious what the makeup of that http
>> > > > > logs data set is -- I guess it's HTTP logs :) Is the data public?
>> > > > >
>> > > > >
>> > > > > On Mon, Jul 15, 2019 at 7:23 AM Ignacio Vera <[email protected]>
>> wrote:
>> > > > > >
>> > > > > > The change to Lucene 8.2.0 snapshot was done on July 10th.
>> Previous to that the Lucene version was 8.1.0.
>> > > > > >
>> > > > > > On Mon, Jul 15, 2019 at 12:53 PM Michael Sokolov <
>> [email protected]> wrote:
>> > > > > >>
>> > > > > >> Hmm that's possible, although the jump is bigger than anything
>> I
>> > > > > >> observed while testing. I assume these charts are building off
>> of
>> > > > > >> apache/master, or something close to that? If so, then the
>> timing is
>> > > > > >> off a bit. LUCENE-8781 was pushed quite a while before that,
>> and then
>> > > > > >> https://issues.apache.org/jira/browse/LUCENE-8895 which
>> extended the
>> > > > > >> encoding to be the default (not just for postings) was pushed
>> on July
>> > > > > >> 2 or so, but the chart shows a jump on July 10?
>> > > > > >>
>> > > > > >> On Mon, Jul 15, 2019 at 4:03 AM Ignacio Vera <
>> [email protected]> wrote:
>> > > > > >> >
>> > > > > >> > Hi,
>> > > > > >> >
>> > > > > >> > We observed using a snapshot of Lucene 8.2 that there is an
>> increase of around 30% on the memory usage of IndexReaders for some of the
>> test datasets, for example:
>> > > > > >> >
>> > > > > >> >
>> https://elasticsearch-benchmarks.elastic.co/#tracks/http-logs/nightly/default/30d
>> > > > > >> >
>> > > > > >> > We suspect this is due to this change:
>> https://issues.apache.org/jira/browse/LUCENE-8781
>> > > > > >> >
>> > > > > >> > On Sun, Jul 14, 2019 at 7:10 AM David Smiley <
>> [email protected]> wrote:
>> > > > > >> >>
>> > > > > >> >> Since there won't be any 8.1.2 yet some issues got fixed
>> for 8.1.2 and there is an 8.1.2 section in CHANGES.txt those issues might
>> not be very noticeable to users that only look at the published HTML
>> version (e.g. https://lucene.apache.org/solr/8_1_1/changes/Changes.html
>> ).  Maybe 8.1.2 should be integrated into 8.2.0 in CHANGES.txt?  Despite
>> this, I see at least one of those issues got into the curated release notes
>> / highlights any way -- thanks Ignacio.
>> > > > > >> >>
>> > > > > >> >> ~ David Smiley
>> > > > > >> >> Apache Lucene/Solr Search Developer
>> > > > > >> >> http://www.linkedin.com/in/davidwsmiley
>> > > > > >> >>
>> > > > > >> >>
>> > > > > >> >> On Fri, Jul 12, 2019 at 9:40 AM Jan Høydahl <
>> [email protected]> wrote:
>> > > > > >> >>>
>> > > > > >> >>> Please use HTTPS in the links to download pages.
>> > > > > >> >>>
>> > > > > >> >>> Jan Høydahl
>> > > > > >> >>>
>> > > > > >> >>> 12. jul. 2019 kl. 09:04 skrev Ignacio Vera <
>> [email protected]>:
>> > > > > >> >>>
>> > > > > >> >>> Ishan: I had a look into the issues and I have no
>> objections as far as they get properly reviewed if possible. It will be
>> good to commit the shortly so they go through a few CI iterations in case
>> something gets broken. I am planning to build the first RC early next week
>> as there are no blockers for the release.
>> > > > > >> >>>
>> > > > > >> >>> Steve: Than you so much, I need to work on getting the
>> right permissions.
>> > > > > >> >>>
>> > > > > >> >>> Finally I wrote a draft for the release notes for Lucene
>> and Solr. It would be good if someone with more experience in Solr can
>> review/modify my attempt as it is difficult for me to know which are the
>> most important bits. Here are the links to the drafts (not they are in
>> wiki, let me know if you have problems accessing them):
>> > > > > >> >>>
>> > > > > >> >>> Lucene:
>> > > > > >> >>>
>> https://cwiki.apache.org/confluence/pages/resumedraft.action?draftId=120732808&draftShareId=cb366dc4-c136-4505-9c37-60bde5db2550&src=shareui&src.shareui.timestamp=1562914476369
>> > > > > >> >>>
>> > > > > >> >>> Solr:
>> > > > > >> >>>
>> https://cwiki.apache.org/confluence/pages/resumedraft.action?draftId=120732972&draftShareId=5cace703-b80b-49c4-a07f-55b891683f90&src=shareui&src.shareui.timestamp=1562914529931
>> > > > > >> >>>
>> > > > > >> >>> On Thu, Jul 11, 2019 at 6:36 PM Ishan Chattopadhyaya <
>> [email protected]> wrote:
>> > > > > >> >>>>
>> > > > > >> >>>> Hi Ignacio,
>> > > > > >> >>>> I wish to include two security bug fixes (not
>> vulnerabilities, but feature regressions due to Authorization plugin),
>> SOLR-13472 and SOLR-13619. I can commit both shortly, attempting to write a
>> unit test for it (which is proving harder to do than reproducing, fixing
>> and testing manually). Please let me know if you have any concerns.
>> > > > > >> >>>> Regards,
>> > > > > >> >>>> Ishan
>> > > > > >> >>>>
>> > > > > >> >>>> On Thu, 11 Jul, 2019, 9:12 PM Tomoko Uchida, <
>> [email protected]> wrote:
>> > > > > >> >>>>>
>> > > > > >> >>>>> Hi Ignacio,
>> > > > > >> >>>>>
>> > > > > >> >>>>> LUCENE-8907 was fixed. (I have reverted a series of
>> commits which
>> > > > > >> >>>>> cause backwards incompatibility on Lucene 8.x.)
>> > > > > >> >>>>> Thank you for waiting for that!
>> > > > > >> >>>>>
>> > > > > >> >>>>> Tomoko
>> > > > > >> >>>>>
>> > > > > >> >>>>> 2019年7月11日(木) 22:44 Uwe Schindler <[email protected]>:
>> > > > > >> >>>>> >
>> > > > > >> >>>>> > Hi,
>> > > > > >> >>>>> >
>> > > > > >> >>>>> >
>> > > > > >> >>>>> >
>> > > > > >> >>>>> > I enabled the policeman Jenkins Jobs for 8.2 branch.
>> > > > > >> >>>>> >
>> > > > > >> >>>>> >
>> > > > > >> >>>>> >
>> > > > > >> >>>>> > Uwe
>> > > > > >> >>>>> >
>> > > > > >> >>>>> >
>> > > > > >> >>>>> >
>> > > > > >> >>>>> > -----
>> > > > > >> >>>>> >
>> > > > > >> >>>>> > Uwe Schindler
>> > > > > >> >>>>> >
>> > > > > >> >>>>> > Achterdiek 19, D-28357 Bremen
>> > > > > >> >>>>> >
>> > > > > >> >>>>> > https://www.thetaphi.de
>> > > > > >> >>>>> >
>> > > > > >> >>>>> > eMail: [email protected]
>> > > > > >> >>>>> >
>> > > > > >> >>>>> >
>> > > > > >> >>>>> >
>> > > > > >> >>>>> > From: Ignacio Vera <[email protected]>
>> > > > > >> >>>>> > Sent: Thursday, July 11, 2019 1:05 PM
>> > > > > >> >>>>> > To: [email protected]
>> > > > > >> >>>>> > Subject: Re: Lucene/Solr 8.2.0
>> > > > > >> >>>>> >
>> > > > > >> >>>>> >
>> > > > > >> >>>>> >
>> > > > > >> >>>>> > Hi,
>> > > > > >> >>>>> >
>> > > > > >> >>>>> >
>> > > > > >> >>>>> >
>> > > > > >> >>>>> > The branch has been created, As a reminder, this
>> branch is on feature freeze and only documentation or build patches should
>> be committed. I will be waiting for LUCENE-8907 to start building the first
>> release candidate.
>> > > > > >> >>>>> >
>> > > > > >> >>>>> > Let me know if there is any other blocker before we
>> can start the release process.
>> > > > > >> >>>>> >
>> > > > > >> >>>>> >
>> > > > > >> >>>>> >
>> > > > > >> >>>>> > It seems I do not have the permissions to create the
>> Jenkins jobs for this branch, maybe Steve can help here?
>> > > > > >> >>>>> >
>> > > > > >> >>>>> >
>> > > > > >> >>>>> >
>> > > > > >> >>>>> > Thanks,
>> > > > > >> >>>>> >
>> > > > > >> >>>>> >
>> > > > > >> >>>>> >
>> > > > > >> >>>>> > Ignacio
>> > > > > >> >>>>> >
>> > > > > >> >>>>> >
>> > > > > >> >>>>> >
>> > > > > >> >>>>> > On Thu, Jul 11, 2019 at 4:51 AM David Smiley <
>> [email protected]> wrote:
>> > > > > >> >>>>> >
>> > > > > >> >>>>> > BTW for 8.2.0 I updated Solr's CHANGES.txt to split
>> out issues that seemed to be Improvements that were not really New Features.
>> > > > > >> >>>>> >
>> > > > > >> >>>>> >
>> > > > > >> >>>>> > ~ David Smiley
>> > > > > >> >>>>> >
>> > > > > >> >>>>> > Apache Lucene/Solr Search Developer
>> > > > > >> >>>>> >
>> > > > > >> >>>>> > http://www.linkedin.com/in/davidwsmiley
>> > > > > >> >>>>> >
>> > > > > >> >>>>> >
>> > > > > >> >>>>> >
>> > > > > >> >>>>> >
>> > > > > >> >>>>> >
>> > > > > >> >>>>> > On Wed, Jul 10, 2019 at 10:38 AM Ignacio Vera <
>> [email protected]> wrote:
>> > > > > >> >>>>> >
>> > > > > >> >>>>> > Thanks Tomoko for taking care of that.
>> > > > > >> >>>>> >
>> > > > > >> >>>>> >
>> > > > > >> >>>>> >
>> > > > > >> >>>>> > On Wed, Jul 10, 2019 at 4:03 PM Đạt Cao Mạnh <
>> [email protected]> wrote:
>> > > > > >> >>>>> >
>> > > > > >> >>>>> > Hi Ignacio,
>> > > > > >> >>>>> >
>> > > > > >> >>>>> >
>> > > > > >> >>>>> >
>> > > > > >> >>>>> > 8.1.2 bugfix release will cancelled. You can go ahead
>> with 8.2 release.
>> > > > > >> >>>>> >
>> > > > > >> >>>>> >
>> > > > > >> >>>>> >
>> > > > > >> >>>>> > Thanks!
>> > > > > >> >>>>> >
>> > > > > >> >>>>> >
>> > > > > >> >>>>> >
>> > > > > >> >>>>> > On Wed, 10 Jul 2019 at 20:38, Tomoko Uchida <
>> [email protected]> wrote:
>> > > > > >> >>>>> >
>> > > > > >> >>>>> > Hi,
>> > > > > >> >>>>> > I opened a blocker issue a while ago for release 8.2:
>> > > > > >> >>>>> > https://issues.apache.org/jira/browse/LUCENE-8907
>> > > > > >> >>>>> >
>> > > > > >> >>>>> > Sorry about that, I noticed the backwards
>> incompatibility we have to
>> > > > > >> >>>>> > deal with today. If there are no objections, I will
>> revert the all
>> > > > > >> >>>>> > related commits from the branch_8x and 8_2 in a few
>> days.
>> > > > > >> >>>>> >
>> > > > > >> >>>>> > Thanks,
>> > > > > >> >>>>> > Tomoko
>> > > > > >> >>>>> >
>> > > > > >> >>>>> > 2019年7月10日(水) 22:02 Ignacio Vera <[email protected]>:
>> > > > > >> >>>>> > >
>> > > > > >> >>>>> > > Hi,
>> > > > > >> >>>>> > >
>> > > > > >> >>>>> > > All the issues listed above has been already
>> committed and I see no blockers for release 8.2. I will cut the branch
>> tomorrow around 10am CEST and I will wait for the decision on the bug
>> release 8.1.2 to schedule the build of the first release candidate. Please
>> let us know if this is troublesome for you.
>> > > > > >> >>>>> > >
>> > > > > >> >>>>> > > Thanks,
>> > > > > >> >>>>> > >
>> > > > > >> >>>>> > > Ignacio
>> > > > > >> >>>>> > >
>> > > > > >> >>>>> > >
>> > > > > >> >>>>> > > On Tue, Jul 2, 2019 at 2:59 AM Joel Bernstein <
>> [email protected]> wrote:
>> > > > > >> >>>>> > >>
>> > > > > >> >>>>> > >> I've got one issue that I'd like to get in (
>> https://issues.apache.org/jira/browse/SOLR-13589), which I should have
>> wrapped up in a day or two. +1 for around July 10th.
>> > > > > >> >>>>> > >>
>> > > > > >> >>>>> > >> On Mon, Jul 1, 2019 at 5:14 PM Nicholas Knize <
>> [email protected]> wrote:
>> > > > > >> >>>>> > >>>
>> > > > > >> >>>>> > >>> +1 for starting the 8.2 release process. I think
>> it would be good to get the LUCENE-8632 feature into 8.2 along with the BKD
>> improvements and changes in LUCENE-8888 and LUCENE-8896
>> > > > > >> >>>>> > >>>
>> > > > > >> >>>>> > >>> Nicholas Knize, Ph.D., GISP
>> > > > > >> >>>>> > >>> Geospatial Software Guy  |  Elasticsearch
>> > > > > >> >>>>> > >>> Apache Lucene PMC Member and Committer
>> > > > > >> >>>>> > >>> [email protected]
>> > > > > >> >>>>> > >>>
>> > > > > >> >>>>> > >>>
>> > > > > >> >>>>> > >>> On Wed, Jun 26, 2019 at 9:34 AM Ignacio Vera <
>> [email protected]> wrote:
>> > > > > >> >>>>> > >>>>
>> > > > > >> >>>>> > >>>> Hi all,
>> > > > > >> >>>>> > >>>>
>> > > > > >> >>>>> > >>>> 8.1 has been released on May 16th and we have new
>> features, enhancements and fixes that are not released yet so I'd like to
>> start thinking in releasing Lucene/Solr 8.2.0.
>> > > > > >> >>>>> > >>>>
>> > > > > >> >>>>> > >>>> I can create the 8.2 branch in two weeks time
>> (around July 10th) and build the first RC by the end of that week if that
>> works for everyone. Please let me know if there are bug fixes that needs to
>> be fixed in 8.2 and might not be ready by then.
>> > > > > >> >>>>> > >>>>
>> > > > > >> >>>>> > >>>> Cheers,
>> > > > > >> >>>>> > >>>>
>> > > > > >> >>>>> > >>>> Ignacio
>> > > > > >> >>>>> >
>> > > > > >> >>>>> >
>> ---------------------------------------------------------------------
>> > > > > >> >>>>> > To unsubscribe, e-mail:
>> [email protected]
>> > > > > >> >>>>> > For additional commands, e-mail:
>> [email protected]
>> > > > > >> >>>>> >
>> > > > > >> >>>>> > --
>> > > > > >> >>>>> >
>> > > > > >> >>>>> > Best regards,
>> > > > > >> >>>>> >
>> > > > > >> >>>>> > Cao Mạnh Đạt
>> > > > > >> >>>>> >
>> > > > > >> >>>>> > E-mail: [email protected]
>> > > > > >> >>>>>
>> > > > > >> >>>>>
>> ---------------------------------------------------------------------
>> > > > > >> >>>>> To unsubscribe, e-mail:
>> [email protected]
>> > > > > >> >>>>> For additional commands, e-mail:
>> [email protected]
>> > > > > >> >>>>>
>> > > > > >>
>> > > > > >>
>> ---------------------------------------------------------------------
>> > > > > >> To unsubscribe, e-mail: [email protected]
>> > > > > >> For additional commands, e-mail: [email protected]
>> > > > > >>
>> > > > >
>> > > > >
>> ---------------------------------------------------------------------
>> > > > > To unsubscribe, e-mail: [email protected]
>> > > > > For additional commands, e-mail: [email protected]
>> > > > >
>> > > >
>> > > >
>> > > > --
>> > > > Adrien
>> > > >
>> > > >
>> ---------------------------------------------------------------------
>> > > > To unsubscribe, e-mail: [email protected]
>> > > > For additional commands, e-mail: [email protected]
>> > > >
>> > >
>> > > ---------------------------------------------------------------------
>> > > To unsubscribe, e-mail: [email protected]
>> > > For additional commands, e-mail: [email protected]
>> > >
>> >
>> >
>> > --
>> > Adrien
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: [email protected]
>> > For additional commands, e-mail: [email protected]
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>

Reply via email to