Yup - you need for anything developed outside of Apache. Michael McCandless wrote: > Well, it's the usual process... pull together a big patch, open an issue, etc. > > Probably because it's a large amount of code (I think?) you'll need to > submit a software grant > (http://www.apache.org/licenses/software-grant.txt). > > Mike > > On Thu, Oct 8, 2009 at 2:58 PM, John Wang <john.w...@gmail.com> wrote: > >> Awesome! >> >> Mike, can you let us know what the process is and the time line? >> >> Thanks >> >> -John >> >> On Thu, Oct 8, 2009 at 11:48 AM, Michael McCandless >> <luc...@mikemccandless.com> wrote: >> >>> +1! >>> >>> Mike >>> >>> On Thu, Oct 8, 2009 at 2:41 PM, John Wang <john.w...@gmail.com> wrote: >>> >>>> Hi guys: >>>> >>>> What are your thoughts about contributing Kamikaze as a lucene >>>> contrib >>>> package? We just finished porting kamikaze to lucene 2.9. With the new >>>> 2.9 >>>> api, it allows us for some more code tuning and optimization >>>> improvements. >>>> >>>> We will be releasing kamikaze, it might a good time to add it to >>>> the >>>> lucene contrib package if there is interest. >>>> >>>> Thanks >>>> >>>> -John >>>> >>>> On Thu, Sep 24, 2009 at 6:20 AM, Uwe Schindler <u...@thetaphi.de> wrote: >>>> >>>>> By the way: In the last RC of Lucene 2.9 we added a new method to >>>>> DocIdSet >>>>> called isCacheable(). It is used by e.g. CachingWrapperFilter to >>>>> determine, >>>>> if a DocIdSet is easy cacheable or must be copied to an OpenBitSetDISI >>>>> (the >>>>> default is false, so all custom DocIdSets are copied to OpenBitSetDISI >>>>> by >>>>> CachingWrapperFilter, even if not needed - if a DocIdSet does not do >>>>> disk >>>>> IO >>>>> and have a fast iterator like e.g. the FieldCache ones in >>>>> FieldCacheRangeFilter, it should return true; see CHANGES.txt). Maybe >>>>> this >>>>> should also be added to Kamikaze, which is a really nice project! >>>>> Especially >>>>> filter DocIdSets should pass this method to its delegate (see >>>>> FilterDocIdSet >>>>> in Lucene). >>>>> >>>>> ----- >>>>> Uwe Schindler >>>>> H.-H.-Meier-Allee 63, D-28213 Bremen >>>>> http://www.thetaphi.de >>>>> eMail: u...@thetaphi.de >>>>> >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: John Wang (JIRA) [mailto:j...@apache.org] >>>>>> Sent: Thursday, September 24, 2009 3:14 PM >>>>>> To: java-dev@lucene.apache.org >>>>>> Subject: [jira] Commented: (LUCENE-1458) Further steps towards >>>>>> flexible >>>>>> indexing >>>>>> >>>>>> >>>>>> [ https://issues.apache.org/jira/browse/LUCENE- >>>>>> 1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment- >>>>>> tabpanel&focusedCommentId=12759112#action_12759112 ] >>>>>> >>>>>> John Wang commented on LUCENE-1458: >>>>>> ----------------------------------- >>>>>> >>>>>> Just a FYI: Kamikaze was originally started as our sandbox for Lucene >>>>>> contributions until 2.4 is ready. (we needed the DocIdSet/Iterator >>>>>> abstraction that was migrated from Solr) >>>>>> >>>>>> It has three components: >>>>>> >>>>>> 1) P4Delta >>>>>> 2) Logical boolean operations on DocIdSet/Iterators (I have created a >>>>>> jira >>>>>> ticket and a patch for Lucene awhile ago with performance numbers. It >>>>>> is >>>>>> significantly faster than DisjunctionScorer) >>>>>> 3) algorithm to determine which DocIdSet implementations to use given >>>>>> some >>>>>> parameters, e.g. miniD,maxid,id count etc. It learns and adjust from >>>>>> the >>>>>> application behavior if not all parameters are given. >>>>>> >>>>>> So please feel free to incorporate anything you see if or move it to >>>>>> contrib. >>>>>> >>>>>> >>>>>> >>>>>>> Further steps towards flexible indexing >>>>>>> --------------------------------------- >>>>>>> >>>>>>> Key: LUCENE-1458 >>>>>>> URL: >>>>>>> https://issues.apache.org/jira/browse/LUCENE-1458 >>>>>>> Project: Lucene - Java >>>>>>> Issue Type: New Feature >>>>>>> Components: Index >>>>>>> Affects Versions: 2.9 >>>>>>> Reporter: Michael McCandless >>>>>>> Assignee: Michael McCandless >>>>>>> Priority: Minor >>>>>>> Attachments: LUCENE-1458-back-compat.patch, >>>>>>> LUCENE-1458-back- >>>>>>> >>>>>> compat.patch, LUCENE-1458-back-compat.patch, LUCENE-1458.patch, >>>>>> LUCENE- >>>>>> 1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, LUCENE-1458.patch, >>>>>> LUCENE-1458.patch, LUCENE-1458.tar.bz2, LUCENE-1458.tar.bz2, LUCENE- >>>>>> 1458.tar.bz2, LUCENE-1458.tar.bz2 >>>>>> >>>>>>> I attached a very rough checkpoint of my current patch, to get >>>>>>> early >>>>>>> feedback. All tests pass, though back compat tests don't pass due >>>>>>> to >>>>>>> changes to package-private APIs plus certain bugs in tests that >>>>>>> happened to work (eg call TermPostions.nextPosition() too many >>>>>>> times, >>>>>>> which the new API asserts against). >>>>>>> [Aside: I think, when we commit changes to package-private APIs >>>>>>> such >>>>>>> that back-compat tests don't pass, we could go back, make a branch >>>>>>> on >>>>>>> the back-compat tag, commit changes to the tests to use the new >>>>>>> package private APIs on that branch, then fix nightly build to use >>>>>>> the >>>>>>> tip of that branch?o] >>>>>>> There's still plenty to do before this is committable! This is a >>>>>>> rather large change: >>>>>>> * Switches to a new more efficient terms dict format. This still >>>>>>> uses tii/tis files, but the tii only stores term & long offset >>>>>>> (not a TermInfo). At seek points, tis encodes term & freq/prox >>>>>>> offsets absolutely instead of with deltas delta. Also, tis/tii >>>>>>> are structured by field, so we don't have to record field >>>>>>> number >>>>>>> in every term. >>>>>>> . >>>>>>> On first 1 M docs of Wikipedia, tii file is 36% smaller (0.99 >>>>>>> MB >>>>>>> -> 0.64 MB) and tis file is 9% smaller (75.5 MB -> 68.5 MB). >>>>>>> . >>>>>>> RAM usage when loading terms dict index is significantly less >>>>>>> since we only load an array of offsets and an array of String >>>>>>> (no >>>>>>> more TermInfo array). It should be faster to init too. >>>>>>> . >>>>>>> This part is basically done. >>>>>>> * Introduces modular reader codec that strongly decouples terms >>>>>>> dict >>>>>>> from docs/positions readers. EG there is no more TermInfo used >>>>>>> when reading the new format. >>>>>>> . >>>>>>> There's nice symmetry now between reading & writing in the >>>>>>> codec >>>>>>> chain -- the current docs/prox format is captured in: >>>>>>> {code} >>>>>>> FormatPostingsTermsDictWriter/Reader >>>>>>> FormatPostingsDocsWriter/Reader (.frq file) and >>>>>>> FormatPostingsPositionsWriter/Reader (.prx file). >>>>>>> {code} >>>>>>> This part is basically done. >>>>>>> * Introduces a new "flex" API for iterating through the fields, >>>>>>> terms, docs and positions: >>>>>>> {code} >>>>>>> FieldProducer -> TermsEnum -> DocsEnum -> PostingsEnum >>>>>>> {code} >>>>>>> This replaces TermEnum/Docs/Positions. SegmentReader emulates >>>>>>> the >>>>>>> old API on top of the new API to keep back-compat. >>>>>>> >>>>>>> Next steps: >>>>>>> * Plug in new codecs (pulsing, pfor) to exercise the modularity / >>>>>>> fix any hidden assumptions. >>>>>>> * Expose new API out of IndexReader, deprecate old API but >>>>>>> emulate >>>>>>> old API on top of new one, switch all core/contrib users to the >>>>>>> new API. >>>>>>> * Maybe switch to AttributeSources as the base class for >>>>>>> TermsEnum, >>>>>>> DocsEnum, PostingsEnum -- this would give readers API >>>>>>> flexibility >>>>>>> (not just index-file-format flexibility). EG if someone wanted >>>>>>> to store payload at the term-doc level instead of >>>>>>> term-doc-position level, you could just add a new attribute. >>>>>>> * Test performance & iterate. >>>>>>> >>>>>> -- >>>>>> This message is automatically generated by JIRA. >>>>>> - >>>>>> You can reply to this email to add a comment to the issue online. >>>>>> >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >>>>>> For additional commands, e-mail: java-dev-h...@lucene.apache.org >>>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >>>>> For additional commands, e-mail: java-dev-h...@lucene.apache.org >>>>> >>>>> >>>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-dev-h...@lucene.apache.org >>> >>> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org > >
-- - Mark http://www.lucidimagination.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org