Well, it's the usual process... pull together a big patch, open an issue, etc.
Probably because it's a large amount of code (I think?) you'll need to submit a software grant (http://www.apache.org/licenses/software-grant.txt). Mike On Thu, Oct 8, 2009 at 2:58 PM, John Wang <john.w...@gmail.com> wrote: > Awesome! > > Mike, can you let us know what the process is and the time line? > > Thanks > > -John > > On Thu, Oct 8, 2009 at 11:48 AM, Michael McCandless > <luc...@mikemccandless.com> wrote: >> >> +1! >> >> Mike >> >> On Thu, Oct 8, 2009 at 2:41 PM, John Wang <john.w...@gmail.com> wrote: >> > Hi guys: >> > >> > What are your thoughts about contributing Kamikaze as a lucene >> > contrib >> > package? We just finished porting kamikaze to lucene 2.9. With the new >> > 2.9 >> > api, it allows us for some more code tuning and optimization >> > improvements. >> > >> > We will be releasing kamikaze, it might a good time to add it to >> > the >> > lucene contrib package if there is interest. >> > >> > Thanks >> > >> > -John >> > >> > On Thu, Sep 24, 2009 at 6:20 AM, Uwe Schindler <u...@thetaphi.de> wrote: >> >> >> >> By the way: In the last RC of Lucene 2.9 we added a new method to >> >> DocIdSet >> >> called isCacheable(). It is used by e.g. CachingWrapperFilter to >> >> determine, >> >> if a DocIdSet is easy cacheable or must be copied to an OpenBitSetDISI >> >> (the >> >> default is false, so all custom DocIdSets are copied to OpenBitSetDISI >> >> by >> >> CachingWrapperFilter, even if not needed - if a DocIdSet does not do >> >> disk >> >> IO >> >> and have a fast iterator like e.g. the FieldCache ones in >> >> FieldCacheRangeFilter, it should return true; see CHANGES.txt). Maybe >> >> this >> >> should also be added to Kamikaze, which is a really nice project! >> >> Especially >> >> filter DocIdSets should pass this method to its delegate (see >> >> FilterDocIdSet >> >> in Lucene). >> >> >> >> ----- >> >> Uwe Schindler >> >> H.-H.-Meier-Allee 63, D-28213 Bremen >> >> http://www.thetaphi.de >> >> eMail: u...@thetaphi.de >> >> >> >> >> >> > -----Original Message----- >> >> > From: John Wang (JIRA) [mailto:j...@apache.org] >> >> > Sent: Thursday, September 24, 2009 3:14 PM >> >> > To: java-dev@lucene.apache.org >> >> > Subject: [jira] Commented: (LUCENE-1458) Further steps towards >> >> > flexible >> >> > indexing >> >> > >> >> > >> >> > [ https://issues.apache.org/jira/browse/LUCENE- >> >> > 1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment- >> >> > tabpanel&focusedCommentId=12759112#action_12759112 ] >> >> > >> >> > John Wang commented on LUCENE-1458: >> >> > ----------------------------------- >> >> > >> >> > Just a FYI: Kamikaze was originally started as our sandbox for Lucene >> >> > contributions until 2.4 is ready. (we needed the DocIdSet/Iterator >> >> > abstraction that was migrated from Solr) >> >> > >> >> > It has three components: >> >> > >> >> > 1) P4Delta >> >> > 2) Logical boolean operations on DocIdSet/Iterators (I have created a >> >> > jira >> >> > ticket and a patch for Lucene awhile ago with performance numbers. It >> >> > is >> >> > significantly faster than DisjunctionScorer) >> >> > 3) algorithm to determine which DocIdSet implementations to use given >> >> > some >> >> > parameters, e.g. miniD,maxid,id count etc. It learns and adjust from >> >> > the >> >> > application behavior if not all parameters are given. >> >> > >> >> > So please feel free to incorporate anything you see if or move it to >> >> > contrib. >> >> > >> >> > >> >> > > Further steps towards flexible indexing >> >> > > --------------------------------------- >> >> > > >> >> > > Key: LUCENE-1458 >> >> > > URL: >> >> > > https://issues.apache.org/jira/browse/LUCENE-1458 >> >> > > Project: Lucene - Java >> >> > > Issue Type: New Feature >> >> > > Components: Index >> >> > > Affects Versions: 2.9 >> >> > > Reporter: Michael McCandless >> >> > > Assignee: Michael McCandless >> >> > > Priority: Minor >> >> > > Attachments: LUCENE-1458-back-compat.patch, >> >> > > LUCENE-1458-back- >> >> > compat.patch, LUCENE-1458-back-compat.patch, LUCENE-1458.patch, >> >> > LUCENE- >> >> > 1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, >> >> > LUCENE-1458.patch, LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, LUCENE- >> >> > 1458.tar.bz2, LUCENE-1458.tar.bz2 >> >> > > >> >> > > >> >> > > I attached a very rough checkpoint of my current patch, to get >> >> > > early >> >> > > feedback. All tests pass, though back compat tests don't pass due >> >> > > to >> >> > > changes to package-private APIs plus certain bugs in tests that >> >> > > happened to work (eg call TermPostions.nextPosition() too many >> >> > > times, >> >> > > which the new API asserts against). >> >> > > [Aside: I think, when we commit changes to package-private APIs >> >> > > such >> >> > > that back-compat tests don't pass, we could go back, make a branch >> >> > > on >> >> > > the back-compat tag, commit changes to the tests to use the new >> >> > > package private APIs on that branch, then fix nightly build to use >> >> > > the >> >> > > tip of that branch?o] >> >> > > There's still plenty to do before this is committable! This is a >> >> > > rather large change: >> >> > > * Switches to a new more efficient terms dict format. This still >> >> > > uses tii/tis files, but the tii only stores term & long offset >> >> > > (not a TermInfo). At seek points, tis encodes term & freq/prox >> >> > > offsets absolutely instead of with deltas delta. Also, tis/tii >> >> > > are structured by field, so we don't have to record field >> >> > > number >> >> > > in every term. >> >> > > . >> >> > > On first 1 M docs of Wikipedia, tii file is 36% smaller (0.99 >> >> > > MB >> >> > > -> 0.64 MB) and tis file is 9% smaller (75.5 MB -> 68.5 MB). >> >> > > . >> >> > > RAM usage when loading terms dict index is significantly less >> >> > > since we only load an array of offsets and an array of String >> >> > > (no >> >> > > more TermInfo array). It should be faster to init too. >> >> > > . >> >> > > This part is basically done. >> >> > > * Introduces modular reader codec that strongly decouples terms >> >> > > dict >> >> > > from docs/positions readers. EG there is no more TermInfo used >> >> > > when reading the new format. >> >> > > . >> >> > > There's nice symmetry now between reading & writing in the >> >> > > codec >> >> > > chain -- the current docs/prox format is captured in: >> >> > > {code} >> >> > > FormatPostingsTermsDictWriter/Reader >> >> > > FormatPostingsDocsWriter/Reader (.frq file) and >> >> > > FormatPostingsPositionsWriter/Reader (.prx file). >> >> > > {code} >> >> > > This part is basically done. >> >> > > * Introduces a new "flex" API for iterating through the fields, >> >> > > terms, docs and positions: >> >> > > {code} >> >> > > FieldProducer -> TermsEnum -> DocsEnum -> PostingsEnum >> >> > > {code} >> >> > > This replaces TermEnum/Docs/Positions. SegmentReader emulates >> >> > > the >> >> > > old API on top of the new API to keep back-compat. >> >> > > >> >> > > Next steps: >> >> > > * Plug in new codecs (pulsing, pfor) to exercise the modularity / >> >> > > fix any hidden assumptions. >> >> > > * Expose new API out of IndexReader, deprecate old API but >> >> > > emulate >> >> > > old API on top of new one, switch all core/contrib users to the >> >> > > new API. >> >> > > * Maybe switch to AttributeSources as the base class for >> >> > > TermsEnum, >> >> > > DocsEnum, PostingsEnum -- this would give readers API >> >> > > flexibility >> >> > > (not just index-file-format flexibility). EG if someone wanted >> >> > > to store payload at the term-doc level instead of >> >> > > term-doc-position level, you could just add a new attribute. >> >> > > * Test performance & iterate. >> >> > >> >> > -- >> >> > This message is automatically generated by JIRA. >> >> > - >> >> > You can reply to this email to add a comment to the issue online. >> >> > >> >> > >> >> > --------------------------------------------------------------------- >> >> > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> >> > For additional commands, e-mail: java-dev-h...@lucene.apache.org >> >> >> >> >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> >> For additional commands, e-mail: java-dev-h...@lucene.apache.org >> >> >> > >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-dev-h...@lucene.apache.org >> > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org