The Jakarta regexp package is a separate download. http://jakarta.apache.org/regexp/index.html.
-- Ian. On Tue, May 12, 2009 at 3:21 PM, Seid Mohammed <seidy...@gmail.com> wrote: > I need it similar functionality, but while running the above code it > breaks after outputing the following > ======================================================================== > Added Knowing yourself > Added Old clinic > Added INSIDE > Added Not INSIDE > > Default > regexcapabilities=org.apache.lucene.search.regex.javautilregexcapabilit...@0 > > org.apache.lucene.search.regex.javautilregexcapabilit...@0 > 0 hits for text:.in > 2 hits for text:.*in > 0 hits for text:.IN > 2 hits for text:.*IN > org.apache.lucene.search.regex.jakartaregexpcapabilit...@0 > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/regexp/RE > at > org.apache.lucene.search.regex.JakartaRegexpCapabilities.compile(JakartaRegexpCapabilities.java:32) > at > org.apache.lucene.search.regex.RegexTermEnum.<init>(RegexTermEnum.java:47) > at > org.apache.lucene.search.regex.RegexQuery.getEnum(RegexQuery.java:59) > at > org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:55) > at > org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:162) > at org.apache.lucene.search.Query.weight(Query.java:94) > at org.apache.lucene.search.Hits.<init>(Hits.java:76) > at org.apache.lucene.search.Searcher.search(Searcher.java:50) > at org.apache.lucene.search.Searcher.search(Searcher.java:40) > at Regex2.main(Regex2.java:43) > Caused by: java.lang.ClassNotFoundException: org.apache.regexp.RE > at java.net.URLClassLoader$1.run(URLClassLoader.java:200) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:188) > at java.lang.ClassLoader.loadClass(ClassLoader.java:306) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276) > at java.lang.ClassLoader.loadClass(ClassLoader.java:251) > at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319) > ... 10 more > =================================================================== > > thanks a lot > > On 5/11/09, Huntsman84 <tpgarci...@gmail.com> wrote: >> >> That's it!!! >> >> The problem was with the regular expression, the one I need is ".*IN"!! >> >> Thank you so much, I was turning mad... =) >> >> >> Ian Lea wrote: >>> >>> The little self-contained program below runs regex queries for a few >>> regexps against a few phrases for both the java.util and jakarta >>> regexp packages. >>> >>> Output when run with lucene 2.4.1 and jakarta-regexp 1.5 is >>> >>> Added Knowing yourself >>> Added Old clinic >>> Added INSIDE >>> Added Not INSIDE >>> >>> Default >>> regexcapabilities=org.apache.lucene.search.regex.javautilregexcapabilit...@0 >>> >>> org.apache.lucene.search.regex.javautilregexcapabilit...@0 >>> 0 hits for text:.in >>> 2 hits for text:.*in >>> 0 hits for text:.IN >>> 2 hits for text:.*IN >>> org.apache.lucene.search.regex.jakartaregexpcapabilit...@0 >>> 2 hits for text:.in >>> 2 hits for text:.*in >>> 1 hits for text:.IN >>> 2 hits for text:.*IN >>> >>> Hope that helps. >>> >>> -- >>> Ian. >>> >>> >>> import org.apache.lucene.index.*; >>> import org.apache.lucene.store.*; >>> import org.apache.lucene.document.*; >>> import org.apache.lucene.analysis.*; >>> import org.apache.lucene.analysis.standard.*; >>> import org.apache.lucene.search.*; >>> import org.apache.lucene.search.regex.*; >>> >>> public class luctest { >>> >>> public static void main(String[] _args) throws Exception { >>> RAMDirectory rdir = new RAMDirectory(); >>> IndexWriter writer = new IndexWriter(rdir, new StandardAnalyzer(), >>> true); >>> String[] docterms = { "Knowing yourself", >>> "Old clinic", >>> "INSIDE", >>> "Not INSIDE" }; >>> >>> for (String s : docterms) { >>> Document d = new Document(); >>> d.add(new Field("text", >>> s, >>> Field.Store.YES, >>> Field.Index.NOT_ANALYZED)); >>> writer.addDocument(d); >>> System.out.printf("Added %s\n", s); >>> } >>> writer.close(); >>> >>> IndexSearcher searcher = new IndexSearcher(rdir); >>> String[] queries = { ".in", ".*in", ".IN", ".*IN" }; >>> RegexCapabilities[] rcaps = { new JavaUtilRegexCapabilities(), >>> new JakartaRegexpCapabilities() }; >>> RegexQuery qx = new RegexQuery(new Term("x", "x")); >>> System.out.printf("\nDefault RegexCapabilities=%s\n\n", >>> qx.getRegexImplementation()); >>> for (RegexCapabilities rcap : rcaps) { >>> System.out.println(rcap); >>> for (String s : queries) { >>> Term t = new Term("text", s); >>> RegexQuery q = new RegexQuery(t); >>> q.setRegexImplementation(rcap); >>> Hits h = searcher.search(q); >>> System.out.printf("%s hits for %s\n", >>> h.length(), >>> q.toString()); >>> } >>> } >>> } >>> } >>> >>> >>> On Mon, May 11, 2009 at 1:39 PM, Huntsman84 <tpgarci...@gmail.com> wrote: >>>> >>>> The RegexQuery class uses that package, and for that reason the >>>> expression >>>> matches. >>>> >>>> If my records contained only one word each, this code would work, but I >>>> need >>>> to apply that regular expression to a phrase... >>>> >>>> >>>> Ian Lea wrote: >>>>> >>>>> The default regex package is java.util.regex and I can't see anywhere >>>>> that you tell it to use the Jakarta regexp package. So I don't think >>>>> that ".in" will match. Also, you are storing your contents field as >>>>> NOT_ANALYZED so you will need to be wary of case sensitivity. Maybe >>>>> this is what you want, but maybe not. >>>>> >>>>> >>>>> -- >>>>> Ian. >>>>> >>>>> >>>>> On Mon, May 11, 2009 at 9:00 AM, Huntsman84 <tpgarci...@gmail.com> >>>>> wrote: >>>>>> >>>>>> This is the code for searching: >>>>>> >>>>>> String index = "index"; >>>>>> String field = "contents"; >>>>>> IndexReader reader = IndexReader.open(index); >>>>>> Searcher searcher = new IndexSearcher(reader); >>>>>> >>>>>> System.out.println("Enter query: "); >>>>>> String line = ".IN.";//in jakarta regexp this is like * IN * >>>>>> RegexQuery rxquery = new RegexQuery(new Term(field,line)); >>>>>> Hits hits = searcher.search(rxquery); >>>>>> >>>>>> if(hits!=null){ >>>>>> for(int k = 0; k<100 && k<hits.length(); k++){ >>>>>> if(hits.doc(k)!=null) >>>>>> >>>>>> System.out.println(hits.doc(k).getField("contents").stringValue()); >>>>>> } >>>>>> } >>>>>> >>>>>> >>>>>> >>>>>> And this is the part of creating the index: >>>>>> >>>>>> >>>>>> File directory = new File("index"); >>>>>> IndexWriter writer = new IndexWriter(directory, new StandardAnalyzer(), >>>>>> true, >>>>>> IndexWriter.MaxFieldLength.LIMITED); >>>>>> List<String> records = getRecords();//returns a list of record values >>>>>> from >>>>>> database, all of them are phrases >>>>>> Iterator<String> i = records.iterator(); >>>>>> while(i.hasNext()){ >>>>>> Document doc = new Document(); >>>>>> doc.add(new Field(field, i.next(), Field.Store.YES, >>>>>> Field.Index.NOT_ANALYZED)); >>>>>> writer.addDocument(doc); >>>>>> } >>>>>> writer.optimize(); >>>>>> writer.close(); >>>>>> >>>>>> >>>>>> >>>>>> This code works as I want but just matching with the first word of the >>>>>> phrase. I think the problem is the index building, but I don't know how >>>>>> to >>>>>> fix it... >>>>>> >>>>>> Any ideas? >>>>>> >>>>>> Thank you so much!! >>>>>> >>>>>> >>>>>> >>>>>> Steven A Rowe wrote: >>>>>>> >>>>>>> On 5/8/2009 at 9:13 AM, Ian Lee wrote: >>>>>>>> I'm surprised that it matches either - don't you need ".*in" where .* >>>>>>>> means match any character zero or more times? See the javadoc for >>>>>>>> java.util.regex.Pattern, or for Jakarta Regexp if you are using that >>>>>>>> package. >>>>>>>> >>>>>>>> Unless you're an expert in regexps it is probably worth playing with >>>>>>>> them outside your lucene code to start with e.g. with simple >>>>>>>> String.matches(regexp) calls. They can take some getting used to. >>>>>>>> And try to avoid anything with backslashes if you can! >>>>>>> >>>>>>> The java.util.regex.Pattern implementation (the default RegexQuery >>>>>>> implementation) actually uses Matcher.lookingAt(), which is equivalent >>>>>>> to >>>>>>> prepending a "^" anchor to the beginning of the pattern, so if >>>>>>> Huntsman84 >>>>>>> is using the default implementation, then I agree with Ian: I'm >>>>>>> surprised >>>>>>> it matches either. >>>>>>> >>>>>>> However, the Jakarta Regexp implementation uses RE.match(), which does >>>>>>> *not* require a beginning-of-string match. >>>>>>> >>>>>>> Hunstman84, are you using the Jakarta Regexp implementation? If so, >>>>>>> then >>>>>>> like you, I'm surprised it's not matching both :). >>>>>>> >>>>>>> It would be useful to see some real code, including how you index your >>>>>>> records. >>>>>>> >>>>>>> Steve >>>>>>> >>>>>>>> On Fri, May 8, 2009 at 1:42 PM, Huntsman84 <tpgarci...@gmail.com> >>>>>>>> wrote: >>>>>>>> > >>>>>>>> > Hi, >>>>>>>> > >>>>>>>> > I am using RegexQuery for searching in a set of records wich are >>>>>>>> > phrases of several words each. My aim is to find any phrase that >>>>>>>> > contains the given group of letters (e.g. "in"). For that case, >>>>>>>> > I am building the query with the regular expression ".in.", so it >>>>>>>> > should return all phrases with contain "in", but the search only >>>>>>>> > matches with the first word of the phrase. >>>>>>>> > >>>>>>>> > For example, if my records are "Knowing yourself" and "Old >>>>>>>> > clinic", the correct search would return 2 matches, but it only >>>>>>>> > matches with "Knowing yourself". >>>>>>>> > >>>>>>>> > How could I fix this? >>>>>>> >>>>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> View this message in context: >>>>>> http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23478720.html >>>>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >>>>>> >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>>> >>>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>> >>>>> >>>>> >>>> >>>> -- >>>> View this message in context: >>>> http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23482532.html >>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>> >>>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >>> >>> >> >> -- >> View this message in context: >> http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23486350.html >> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > > -- > "RABI ZIDNI ILMA" > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org