I think the big changes in the o.a.l.search package are over... :-) - Worked the whole day on it.
Merging branches with TortoiseSVN works really good, you can even edit the conflicts directly in the diff view. Used it when fixing the IR/IW hell deprecations in the BW branch. Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -----Original Message----- > From: Michael McCandless [mailto:luc...@mikemccandless.com] > Sent: Tuesday, October 13, 2009 5:01 PM > To: java-dev@lucene.apache.org > Subject: Re: [jira] Commented: (LUCENE-1458) Further steps towards > flexible indexing > > Yes please! > > Mike > > On Tue, Oct 13, 2009 at 10:40 AM, Mark Miller <markrmil...@gmail.com> > wrote: > > I can trunk it once more if you'd like - its already pretty out of date > :) > > > > If you havn't started anyway ... > > > > > > Michael McCandless wrote: > >> OK I will cut a branch & commit Mark's last patch onto it, unless > >> anyone has objections soonish... > >> > >> I'll also branch (twig?) the back compat branch so we can commit the > >> patch there as well. > >> > >> Mike > >> > >> On Mon, Oct 12, 2009 at 10:50 PM, Mark Miller <markrmil...@gmail.com> > wrote: > >> > >>> SVN is about as good at merging branches as any of us are with a patch > >>> and trunk unfortunately. But that can still be somewhat more > convenient > >>> than all these huge patches, with different people at different > stages. > >>> > >>> Depends on how many people end up working on this though. Any more > than > >>> 2, and I think the branch has got to be worth it. > >>> > >>> From my perspective, it doesn't make any of the merging process any > >>> easier - but it can be easier than juggling all these patches - you > have > >>> a central code base that can always be targeted for current merging. > >>> > >>> Michael Busch wrote: > >>> > >>>> I think it's supposed to work pretty good - though I have no personal > >>>> experience with merging branches with svn. > >>>> > >>>> I think we should try it - then we'll know! :) > >>>> > >>>> Michael > >>>> > >>>> On 10/12/09 12:32 PM, Michael McCandless (JIRA) wrote: > >>>> > >>>>> [ > >>>>> https://issues.apache.org/jira/browse/LUCENE- > 1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment- > tabpanel&focusedCommentId=12764799#action_12764799 > >>>>> ] > >>>>> > >>>>> Michael McCandless commented on LUCENE-1458: > >>>>> -------------------------------------------- > >>>>> > >>>>> bq. Shall we create a flexible-indexing branch and commit this? > >>>>> > >>>>> I think this is a good idea. > >>>>> > >>>>> But I haven't played heavily w/ svn& branching. EG if we branch > >>>>> now, and trunk moves fast (which it still is w/ deprecation > >>>>> removals), are we going to have conflicts? Or... is svn good about > >>>>> merging branches? > >>>>> > >>>>> > >>>>> > >>>>>> Further steps towards flexible indexing > >>>>>> --------------------------------------- > >>>>>> > >>>>>> Key: LUCENE-1458 > >>>>>> URL: https://issues.apache.org/jira/browse/LUCENE- > 1458 > >>>>>> Project: Lucene - Java > >>>>>> Issue Type: New Feature > >>>>>> Components: Index > >>>>>> Affects Versions: 2.9 > >>>>>> Reporter: Michael McCandless > >>>>>> Assignee: Michael McCandless > >>>>>> Priority: Minor > >>>>>> Attachments: LUCENE-1458-back-compat.patch, > >>>>>> LUCENE-1458-back-compat.patch, LUCENE-1458-back-compat.patch, > >>>>>> LUCENE-1458-back-compat.patch, LUCENE-1458-back-compat.patch, > >>>>>> LUCENE-1458-back-compat.patch, LUCENE-1458.patch, LUCENE- > 1458.patch, > >>>>>> LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, > >>>>>> LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, > >>>>>> LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, > >>>>>> LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, > >>>>>> LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, > >>>>>> LUCENE-1458.tar.bz2 > >>>>>> > >>>>>> > >>>>>> I attached a very rough checkpoint of my current patch, to get > early > >>>>>> feedback. All tests pass, though back compat tests don't pass due > to > >>>>>> changes to package-private APIs plus certain bugs in tests that > >>>>>> happened to work (eg call TermPostions.nextPosition() too many > times, > >>>>>> which the new API asserts against). > >>>>>> [Aside: I think, when we commit changes to package-private APIs > such > >>>>>> that back-compat tests don't pass, we could go back, make a branch > on > >>>>>> the back-compat tag, commit changes to the tests to use the new > >>>>>> package private APIs on that branch, then fix nightly build to use > the > >>>>>> tip of that branch?o] > >>>>>> There's still plenty to do before this is committable! This is a > >>>>>> rather large change: > >>>>>> * Switches to a new more efficient terms dict format. This > still > >>>>>> uses tii/tis files, but the tii only stores term& long offset > >>>>>> (not a TermInfo). At seek points, tis encodes term& > freq/prox > >>>>>> offsets absolutely instead of with deltas delta. Also, > tis/tii > >>>>>> are structured by field, so we don't have to record field > number > >>>>>> in every term. > >>>>>> . > >>>>>> On first 1 M docs of Wikipedia, tii file is 36% smaller (0.99 > MB > >>>>>> -> 0.64 MB) and tis file is 9% smaller (75.5 MB -> 68.5 MB). > >>>>>> . > >>>>>> RAM usage when loading terms dict index is significantly less > >>>>>> since we only load an array of offsets and an array of String > (no > >>>>>> more TermInfo array). It should be faster to init too. > >>>>>> . > >>>>>> This part is basically done. > >>>>>> * Introduces modular reader codec that strongly decouples terms > dict > >>>>>> from docs/positions readers. EG there is no more TermInfo > used > >>>>>> when reading the new format. > >>>>>> . > >>>>>> There's nice symmetry now between reading& writing in the > codec > >>>>>> chain -- the current docs/prox format is captured in: > >>>>>> {code} > >>>>>> FormatPostingsTermsDictWriter/Reader > >>>>>> FormatPostingsDocsWriter/Reader (.frq file) and > >>>>>> FormatPostingsPositionsWriter/Reader (.prx file). > >>>>>> {code} > >>>>>> This part is basically done. > >>>>>> * Introduces a new "flex" API for iterating through the fields, > >>>>>> terms, docs and positions: > >>>>>> {code} > >>>>>> FieldProducer -> TermsEnum -> DocsEnum -> PostingsEnum > >>>>>> {code} > >>>>>> This replaces TermEnum/Docs/Positions. SegmentReader emulates > the > >>>>>> old API on top of the new API to keep back-compat. > >>>>>> > >>>>>> Next steps: > >>>>>> * Plug in new codecs (pulsing, pfor) to exercise the modularity > / > >>>>>> fix any hidden assumptions. > >>>>>> * Expose new API out of IndexReader, deprecate old API but > emulate > >>>>>> old API on top of new one, switch all core/contrib users to > the > >>>>>> new API. > >>>>>> * Maybe switch to AttributeSources as the base class for > TermsEnum, > >>>>>> DocsEnum, PostingsEnum -- this would give readers API > flexibility > >>>>>> (not just index-file-format flexibility). EG if someone > wanted > >>>>>> to store payload at the term-doc level instead of > >>>>>> term-doc-position level, you could just add a new attribute. > >>>>>> * Test performance& iterate. > >>>>>> > >>>>>> > >>>> --------------------------------------------------------------------- > >>>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > >>>> For additional commands, e-mail: java-dev-h...@lucene.apache.org > >>>> > >>>> > >>> -- > >>> - Mark > >>> > >>> http://www.lucidimagination.com > >>> > >>> > >>> > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > >>> For additional commands, e-mail: java-dev-h...@lucene.apache.org > >>> > >>> > >>> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-dev-h...@lucene.apache.org > >> > >> > > > > > > -- > > - Mark > > > > http://www.lucidimagination.com > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-dev-h...@lucene.apache.org > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org