By the way: I compiled core and corresponding tests with an old JDK 1.4 version, I found locally on my machine. Works fine!
Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -----Original Message----- > From: Uwe Schindler (JIRA) [mailto:j...@apache.org] > Sent: Monday, June 15, 2009 5:48 PM > To: java-dev@lucene.apache.org > Subject: [jira] Commented: (LUCENE-1606) Automaton Query/Filter (scalable > regex) > > > [ https://issues.apache.org/jira/browse/LUCENE- > 1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment- > tabpanel&focusedCommentId=12719606#action_12719606 ] > > Uwe Schindler commented on LUCENE-1606: > --------------------------------------- > > Doesn't seem to work, I will check the sources: > > {code} > compile-core: > [javac] Compiling 12 source files to > C:\Projects\lucene\trunk\build\contrib\regex\classes\java > [javac] > C:\Projects\lucene\trunk\contrib\regex\src\java\org\apache\lucene\search\r > egex\AutomatonFuzzyQuery.java:11: cannot access > dk.brics.automaton.Automaton > [javac] bad class file: > C:\Projects\lucene\trunk\contrib\regex\lib\automaton > .jar(dk/brics/automaton/Automaton.class) > [javac] class file has wrong version 49.0, should be 48.0 > [javac] Please remove or make sure it appears in the correct > subdirectory of > the classpath. > [javac] import dk.brics.automaton.Automaton; > [javac] ^ > [javac] 1 error > {code} > > > Automaton Query/Filter (scalable regex) > > --------------------------------------- > > > > Key: LUCENE-1606 > > URL: https://issues.apache.org/jira/browse/LUCENE-1606 > > Project: Lucene - Java > > Issue Type: New Feature > > Components: contrib/* > > Reporter: Robert Muir > > Assignee: Uwe Schindler > > Priority: Minor > > Fix For: 2.9 > > > > Attachments: automaton.patch, automatonMultiQuery.patch, > automatonmultiqueryfuzzy.patch, automatonMultiQuerySmart.patch, > automatonWithWildCard.patch, automatonWithWildCard2.patch, LUCENE- > 1606.patch > > > > > > Attached is a patch for an AutomatonQuery/Filter (name can change if its > not suitable). > > Whereas the out-of-box contrib RegexQuery is nice, I have some very > large indexes (100M+ unique tokens) where queries are quite slow, 2 > minutes, etc. Additionally all of the existing RegexQuery implementations > in Lucene are really slow if there is no constant prefix. This > implementation does not depend upon constant prefix, and runs the same > query in 640ms. > > Some use cases I envision: > > 1. lexicography/etc on large text corpora > > 2. looking for things such as urls where the prefix is not constant > (http:// or ftp://) > > The Filter uses the BRICS package (http://www.brics.dk/automaton/) to > convert regular expressions into a DFA. Then, the filter "enumerates" > terms in a special way, by using the underlying state machine. Here is my > short description from the comments: > > The algorithm here is pretty basic. Enumerate terms but instead of > a binary accept/reject do: > > > > 1. Look at the portion that is OK (did not enter a reject state in > the DFA) > > 2. Generate the next possible String and seek to that. > > the Query simply wraps the filter with ConstantScoreQuery. > > I did not include the automaton.jar inside the patch but it can be > downloaded from http://www.brics.dk/automaton/ and is BSD-licensed. > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-dev-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org