Use JavaUtilRegexCapabilities or put the Jakarata RegEx jar on your classpath: http://jakarta.apache.org/regexp/index.html

--
- Mark

http://www.lucidimagination.com



Seid Mohammed wrote:
I need it similar functionality, but while running the above code it
breaks after outputing the following
========================================================================
Added Knowing yourself
Added Old clinic
Added INSIDE
Added Not INSIDE

Default 
regexcapabilities=org.apache.lucene.search.regex.javautilregexcapabilit...@0

org.apache.lucene.search.regex.javautilregexcapabilit...@0
0 hits for text:.in
2 hits for text:.*in
0 hits for text:.IN
2 hits for text:.*IN
org.apache.lucene.search.regex.jakartaregexpcapabilit...@0
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/regexp/RE
        at 
org.apache.lucene.search.regex.JakartaRegexpCapabilities.compile(JakartaRegexpCapabilities.java:32)
        at 
org.apache.lucene.search.regex.RegexTermEnum.<init>(RegexTermEnum.java:47)
        at org.apache.lucene.search.regex.RegexQuery.getEnum(RegexQuery.java:59)
        at 
org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:55)
        at 
org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:162)
        at org.apache.lucene.search.Query.weight(Query.java:94)
        at org.apache.lucene.search.Hits.<init>(Hits.java:76)
        at org.apache.lucene.search.Searcher.search(Searcher.java:50)
        at org.apache.lucene.search.Searcher.search(Searcher.java:40)
        at Regex2.main(Regex2.java:43)
Caused by: java.lang.ClassNotFoundException: org.apache.regexp.RE
        at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
        at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
        ... 10 more
===================================================================

thanks a lot

On 5/11/09, Huntsman84 <tpgarci...@gmail.com> wrote:
That's it!!!

The problem was with the regular expression, the one I need is ".*IN"!!

Thank you so much, I was turning mad... =)


Ian Lea wrote:
The little self-contained program below runs regex queries for a few
regexps against a few phrases for both the java.util and jakarta
regexp packages.

Output when run with lucene 2.4.1 and jakarta-regexp 1.5 is

Added Knowing yourself
Added Old clinic
Added INSIDE
Added Not INSIDE

Default
regexcapabilities=org.apache.lucene.search.regex.javautilregexcapabilit...@0

org.apache.lucene.search.regex.javautilregexcapabilit...@0
0 hits for text:.in
2 hits for text:.*in
0 hits for text:.IN
2 hits for text:.*IN
org.apache.lucene.search.regex.jakartaregexpcapabilit...@0
2 hits for text:.in
2 hits for text:.*in
1 hits for text:.IN
2 hits for text:.*IN

Hope that helps.

--
Ian.


import org.apache.lucene.index.*;
import org.apache.lucene.store.*;
import org.apache.lucene.document.*;
import org.apache.lucene.analysis.*;
import org.apache.lucene.analysis.standard.*;
import org.apache.lucene.search.*;
import org.apache.lucene.search.regex.*;

public class luctest {

    public static void main(String[] _args) throws Exception {
        RAMDirectory rdir = new RAMDirectory();
        IndexWriter writer = new IndexWriter(rdir, new StandardAnalyzer(), 
true);
        String[] docterms = { "Knowing yourself",
                              "Old clinic",
                              "INSIDE",
                              "Not INSIDE" };

        for (String s : docterms) {
            Document d = new Document();
            d.add(new Field("text",
                            s,
                            Field.Store.YES,
                            Field.Index.NOT_ANALYZED));
            writer.addDocument(d);
            System.out.printf("Added %s\n", s);
        }
        writer.close();

        IndexSearcher searcher = new IndexSearcher(rdir);
        String[] queries = { ".in", ".*in", ".IN", ".*IN" };
        RegexCapabilities[] rcaps = { new JavaUtilRegexCapabilities(),
                                      new JakartaRegexpCapabilities() };
        RegexQuery qx = new RegexQuery(new Term("x", "x"));
        System.out.printf("\nDefault RegexCapabilities=%s\n\n",
                          qx.getRegexImplementation());
        for (RegexCapabilities rcap : rcaps) {
            System.out.println(rcap);
            for (String s : queries) {
                Term t = new Term("text", s);
                RegexQuery q = new RegexQuery(t);
                q.setRegexImplementation(rcap);
                Hits h = searcher.search(q);
                System.out.printf("%s hits for %s\n",
                                  h.length(),
                                  q.toString());
            }
        }
    }
}


On Mon, May 11, 2009 at 1:39 PM, Huntsman84 <tpgarci...@gmail.com> wrote:
The RegexQuery class uses that package, and for that reason the
expression
matches.

If my records contained only one word each, this code would work, but I
need
to apply that regular expression to a phrase...


Ian Lea wrote:
The default regex package is java.util.regex and I can't see anywhere
that you tell it to use the Jakarta regexp package.  So I don't think
that ".in" will match.  Also, you are storing your contents field as
NOT_ANALYZED so you will need to be wary of case sensitivity.  Maybe
this is what you want, but maybe not.


--
Ian.


On Mon, May 11, 2009 at 9:00 AM, Huntsman84 <tpgarci...@gmail.com>
wrote:
This is the code for searching:

String index = "index";
String field = "contents";
IndexReader reader = IndexReader.open(index);
Searcher searcher = new IndexSearcher(reader);

System.out.println("Enter query: ");
String line = ".IN.";//in jakarta regexp this is like * IN *
RegexQuery rxquery = new RegexQuery(new Term(field,line));
Hits hits = searcher.search(rxquery);

if(hits!=null){
   for(int k = 0; k<100 && k<hits.length(); k++){
       if(hits.doc(k)!=null)

 System.out.println(hits.doc(k).getField("contents").stringValue());
   }
}



And this is the part of creating the index:


File directory = new File("index");
IndexWriter writer = new IndexWriter(directory, new StandardAnalyzer(),
true,
                           IndexWriter.MaxFieldLength.LIMITED);
List<String> records = getRecords();//returns a list of record values
from
database, all of them are phrases
Iterator<String> i = records.iterator();
while(i.hasNext()){
          Document doc = new Document();
          doc.add(new Field(field, i.next(), Field.Store.YES,
Field.Index.NOT_ANALYZED));
       writer.addDocument(doc);
}
writer.optimize();
writer.close();



This code works as I want but just matching with the first word of the
phrase. I think the problem is the index building, but I don't know how
to
fix it...

Any ideas?

Thank you so much!!



Steven A Rowe wrote:
On 5/8/2009 at 9:13 AM, Ian Lee wrote:
I'm surprised that it matches either - don't you need ".*in" where .*
means match any character zero or more times?  See the javadoc for
java.util.regex.Pattern, or for Jakarta Regexp if you are using that
package.

Unless you're an expert in regexps it is probably worth playing with
them outside your lucene code to start with e.g. with simple
String.matches(regexp) calls.  They can take some getting used to.
And try to avoid anything with backslashes if you can!
The java.util.regex.Pattern implementation (the default RegexQuery
implementation) actually uses Matcher.lookingAt(), which is equivalent
to
prepending a "^" anchor to the beginning of the pattern, so if
Huntsman84
is using the default implementation, then I agree with Ian: I'm
surprised
it matches either.

However, the Jakarta Regexp implementation uses RE.match(), which does
*not* require a beginning-of-string match.

Hunstman84, are you using the Jakarta Regexp implementation?  If so,
then
like you, I'm surprised it's not matching both :).

It would be useful to see some real code, including how you index your
records.

Steve

On Fri, May 8, 2009 at 1:42 PM, Huntsman84 <tpgarci...@gmail.com>
wrote:
Hi,

I am using RegexQuery for searching in a set of records wich are
phrases of several words each. My aim is to find any phrase that
contains the given group of letters (e.g. "in"). For that case,
I am building the query with the regular expression ".in.", so it
should return all phrases with contain "in", but the search only
matches with the first word of the phrase.

For example, if my records are "Knowing yourself" and "Old
clinic", the correct search would return 2 matches, but it only
matches with "Knowing yourself".

How could I fix this?
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



--
View this message in context:
http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23478720.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



--
View this message in context:
http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23482532.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



--
View this message in context:
http://www.nabble.com/RegexQuery-Incomplete-Results-tp23445235p23486350.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org







---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to