Hi Ghazal, I am sorry this one is a bit confusing. I think it is because a lot of people are working on it (which is great) and a lot of ideas going back and forth, causing lots of files to be uploaded, etc.
Can you tell us more about your interest in working with NFA/DFA in Lucene? I am very curious to hear any uses cases you might have, or why you are interested! In general, for contributing to lucene this link is helpful: http://wiki.apache.org/lucene-java/HowToContribute It tells you how the patch submission process works, how to get the latest code from subversion, etc. On Sat, Dec 5, 2009 at 4:58 PM, Ghazal Gharooni <ghazal.gharo...@gmail.com>wrote: > Hello, > > I am new in the community and I've completely been confused. Please anybody > help me out to know which part of codes you are working with. How should I > participate in work? Thank you! > > > > > > On Sat, Dec 5, 2009 at 1:02 PM, Uwe Schindler (JIRA) <j...@apache.org>wrote: > >> >> [ >> https://issues.apache.org/jira/browse/LUCENE-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel] >> >> Uwe Schindler updated LUCENE-1606: >> ---------------------------------- >> >> Attachment: (was: LUCENE-1606-flex.patch) >> >> > Automaton Query/Filter (scalable regex) >> > --------------------------------------- >> > >> > Key: LUCENE-1606 >> > URL: https://issues.apache.org/jira/browse/LUCENE-1606 >> > Project: Lucene - Java >> > Issue Type: New Feature >> > Components: Search >> > Reporter: Robert Muir >> > Assignee: Robert Muir >> > Priority: Minor >> > Fix For: 3.1 >> > >> > Attachments: automaton.patch, automatonMultiQuery.patch, >> automatonmultiqueryfuzzy.patch, automatonMultiQuerySmart.patch, >> automatonWithWildCard.patch, automatonWithWildCard2.patch, >> BenchWildcard.java, LUCENE-1606-flex.patch, LUCENE-1606-flex.patch, >> LUCENE-1606-flex.patch, LUCENE-1606-flex.patch, LUCENE-1606-flex.patch, >> LUCENE-1606-flex.patch, LUCENE-1606.patch, LUCENE-1606.patch, >> LUCENE-1606.patch, LUCENE-1606.patch, LUCENE-1606.patch, LUCENE-1606.patch, >> LUCENE-1606.patch, LUCENE-1606.patch, LUCENE-1606.patch, LUCENE-1606.patch, >> LUCENE-1606.patch, LUCENE-1606.patch, LUCENE-1606.patch, >> LUCENE-1606_nodep.patch >> > >> > >> > Attached is a patch for an AutomatonQuery/Filter (name can change if its >> not suitable). >> > Whereas the out-of-box contrib RegexQuery is nice, I have some very >> large indexes (100M+ unique tokens) where queries are quite slow, 2 minutes, >> etc. Additionally all of the existing RegexQuery implementations in Lucene >> are really slow if there is no constant prefix. This implementation does not >> depend upon constant prefix, and runs the same query in 640ms. >> > Some use cases I envision: >> > 1. lexicography/etc on large text corpora >> > 2. looking for things such as urls where the prefix is not constant >> (http:// or ftp://) >> > The Filter uses the BRICS package (http://www.brics.dk/automaton/) to >> convert regular expressions into a DFA. Then, the filter "enumerates" terms >> in a special way, by using the underlying state machine. Here is my short >> description from the comments: >> > The algorithm here is pretty basic. Enumerate terms but instead of >> a binary accept/reject do: >> > >> > 1. Look at the portion that is OK (did not enter a reject state in >> the DFA) >> > 2. Generate the next possible String and seek to that. >> > the Query simply wraps the filter with ConstantScoreQuery. >> > I did not include the automaton.jar inside the patch but it can be >> downloaded from http://www.brics.dk/automaton/ and is BSD-licensed. >> >> -- >> This message is automatically generated by JIRA. >> - >> You can reply to this email to add a comment to the issue online. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-dev-h...@lucene.apache.org >> >> > -- Robert Muir rcm...@gmail.com