wildcard search not working on file paths

2013-10-14 Thread nischal reddy
Hi,

I have problem with doing wild card search on file path fields.

i have a field "filePath" where i store complete path of files.

i have used StringField to store the field ("i assume by default
StringField will not be tokenized") .

doc.add(new StringField(FIELD_FILE_PATH,resourcePath, Store.YES));

I am using StandardAnalyzer for IndexWriter

but since i am using a StringField the fields are not analyzed.

After the files are indexed i checked it with Luke the path seems fine. And
when i do wildcard searches with luke i am getting desired results.

But when i do the same search in my code with IndexSearcher i am getting
zero docs

My searching code looks something like this

indexSearcher.search(new WildcardQuery(new
Term("filePath","*SuperClass.cls")),100);

this is returning zero documents.

But when i just use "*" in query it is returning all the documents

indexSearcher.search(new WildcardQuery(new Term("filePath","*")),100);

only when i use some queries like prefix wildcard etc it is not working

What is possibly going wrong.

Thanks,
Nischal Y


Re: wildcard search not working on file paths

2013-10-14 Thread Ian Lea
Do some googling on leading wildcards and read things like
http://www.gossamer-threads.com/lists/lucene/java-user/175732 and pick
an option you like.


--
Ian.


On Mon, Oct 14, 2013 at 9:12 AM, nischal reddy
 wrote:
> Hi,
>
> I have problem with doing wild card search on file path fields.
>
> i have a field "filePath" where i store complete path of files.
>
> i have used StringField to store the field ("i assume by default
> StringField will not be tokenized") .
>
> doc.add(new StringField(FIELD_FILE_PATH,resourcePath, Store.YES));
>
> I am using StandardAnalyzer for IndexWriter
>
> but since i am using a StringField the fields are not analyzed.
>
> After the files are indexed i checked it with Luke the path seems fine. And
> when i do wildcard searches with luke i am getting desired results.
>
> But when i do the same search in my code with IndexSearcher i am getting
> zero docs
>
> My searching code looks something like this
>
> indexSearcher.search(new WildcardQuery(new
> Term("filePath","*SuperClass.cls")),100);
>
> this is returning zero documents.
>
> But when i just use "*" in query it is returning all the documents
>
> indexSearcher.search(new WildcardQuery(new Term("filePath","*")),100);
>
> only when i use some queries like prefix wildcard etc it is not working
>
> What is possibly going wrong.
>
> Thanks,
> Nischal Y

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: wildcard search not working on file paths

2013-10-14 Thread nischal reddy
Hi Ian,

Actually im able to do wildcard searches on all the fields except the
"filePath" field. I am able to do both the leading and trailing wildcard
searches on all the fields,
but when i do the wildcard search on filepath field it is somehow not
working, an eg file path would look some thing like this "\Samples\F1.cls"
i think because of "\" present in the field it is failing. when i do a
wildcard search with the query "filePath : *" it is indeed returning all
the docs in the index. But when i do any other wildcard searches(leading or
trailing) it is not working, any clues why it is working in other fields
and not working on "filePath" field.

TIA,
Nischal Y


On Mon, Oct 14, 2013 at 4:55 PM, Ian Lea  wrote:

> Do some googling on leading wildcards and read things like
> http://www.gossamer-threads.com/lists/lucene/java-user/175732 and pick
> an option you like.
>
>
> --
> Ian.
>
>
> On Mon, Oct 14, 2013 at 9:12 AM, nischal reddy
>  wrote:
> > Hi,
> >
> > I have problem with doing wild card search on file path fields.
> >
> > i have a field "filePath" where i store complete path of files.
> >
> > i have used StringField to store the field ("i assume by default
> > StringField will not be tokenized") .
> >
> > doc.add(new StringField(FIELD_FILE_PATH,resourcePath, Store.YES));
> >
> > I am using StandardAnalyzer for IndexWriter
> >
> > but since i am using a StringField the fields are not analyzed.
> >
> > After the files are indexed i checked it with Luke the path seems fine.
> And
> > when i do wildcard searches with luke i am getting desired results.
> >
> > But when i do the same search in my code with IndexSearcher i am getting
> > zero docs
> >
> > My searching code looks something like this
> >
> > indexSearcher.search(new WildcardQuery(new
> > Term("filePath","*SuperClass.cls")),100);
> >
> > this is returning zero documents.
> >
> > But when i just use "*" in query it is returning all the documents
> >
> > indexSearcher.search(new WildcardQuery(new Term("filePath","*")),100);
> >
> > only when i use some queries like prefix wildcard etc it is not working
> >
> > What is possibly going wrong.
> >
> > Thanks,
> > Nischal Y
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: wildcard search not working on file paths

2013-10-14 Thread Ian Lea
Seems to me that it should work.  I suggest you show us a complete
self-contained example program that demonstrates the problem.


--
Ian.


On Mon, Oct 14, 2013 at 12:42 PM, nischal reddy
 wrote:
> Hi Ian,
>
> Actually im able to do wildcard searches on all the fields except the
> "filePath" field. I am able to do both the leading and trailing wildcard
> searches on all the fields,
> but when i do the wildcard search on filepath field it is somehow not
> working, an eg file path would look some thing like this "\Samples\F1.cls"
> i think because of "\" present in the field it is failing. when i do a
> wildcard search with the query "filePath : *" it is indeed returning all
> the docs in the index. But when i do any other wildcard searches(leading or
> trailing) it is not working, any clues why it is working in other fields
> and not working on "filePath" field.
>
> TIA,
> Nischal Y
>
>
> On Mon, Oct 14, 2013 at 4:55 PM, Ian Lea  wrote:
>
>> Do some googling on leading wildcards and read things like
>> http://www.gossamer-threads.com/lists/lucene/java-user/175732 and pick
>> an option you like.
>>
>>
>> --
>> Ian.
>>
>>
>> On Mon, Oct 14, 2013 at 9:12 AM, nischal reddy
>>  wrote:
>> > Hi,
>> >
>> > I have problem with doing wild card search on file path fields.
>> >
>> > i have a field "filePath" where i store complete path of files.
>> >
>> > i have used StringField to store the field ("i assume by default
>> > StringField will not be tokenized") .
>> >
>> > doc.add(new StringField(FIELD_FILE_PATH,resourcePath, Store.YES));
>> >
>> > I am using StandardAnalyzer for IndexWriter
>> >
>> > but since i am using a StringField the fields are not analyzed.
>> >
>> > After the files are indexed i checked it with Luke the path seems fine.
>> And
>> > when i do wildcard searches with luke i am getting desired results.
>> >
>> > But when i do the same search in my code with IndexSearcher i am getting
>> > zero docs
>> >
>> > My searching code looks something like this
>> >
>> > indexSearcher.search(new WildcardQuery(new
>> > Term("filePath","*SuperClass.cls")),100);
>> >
>> > this is returning zero documents.
>> >
>> > But when i just use "*" in query it is returning all the documents
>> >
>> > indexSearcher.search(new WildcardQuery(new Term("filePath","*")),100);
>> >
>> > only when i use some queries like prefix wildcard etc it is not working
>> >
>> > What is possibly going wrong.
>> >
>> > Thanks,
>> > Nischal Y
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: wildcard search not working on file paths

2013-10-14 Thread nischal reddy
Hi Ian,

Please find a sample program below which better illustrates the scenario


public class TestWriter {
public static void main(String[] args) throws IOException {
createIndex();
searchIndex();
}

public static void createIndex() throws IOException {
Directory directory = FSDirectory.open(new File("C:\\temp"));

IndexWriterConfig iwriter = new IndexWriterConfig(
Version.LUCENE_44, new
StandardAnalyzer(Version.LUCENE_44));

IndexWriter iWriter = new IndexWriter(directory, iwriter);

Document document1 = new Document();

document1.add(new StringField("FILE_PATH",
"\\Samples\\Batching\\runner.p", Store.YES));
document1.add(new StringField("contents", "runnerfile",
Store.YES));

iWriter.addDocument(document1);

Document document2 = new Document();

document2.add(new StringField("FILE_PATH",
"\\Samples\\Business\\stopper.p", Store.YES));
document2
.add(new StringField("contents", "stopperfile",
Store.YES));

iWriter.addDocument(document2);
iWriter.commit();
iWriter.close();


}

public static void searchIndex() throws IOException {

Directory directory = FSDirectory.open(new File("C:\\temp"));
IndexReader indexReader = DirectoryReader.open(directory);
IndexSearcher indexSearcher = new IndexSearcher(indexReader);

// Create a wildcard query to get all file paths
// This query works fine and returns all the docs in index
Query query1 = new WildcardQuery(new Term("FILE_PATH", "*"));
TopDocs topDocs = indexSearcher.search(query1, 100);
System.out.println("total no of docs " + topDocs.totalHits);

// Create a wildcard query to search for paths starting with
/Samples
// This query doesnt work and returns zero docs
//doest work with "*Samples//*" either
// but works with "*Samples*"
Query query2 = new WildcardQuery(new Term("FILE_PATH",
"*Samples/*"));
TopDocs topDocs2 = indexSearcher.search(query2, 100);
System.out.println("total no of docs " + topDocs2.totalHits);

// Create a wildcard query to search for paths ending with runner.p
// This query works and returns 1 doc
Query query3 = new WildcardQuery(new Term("FILE_PATH",
"*runner.p"));
TopDocs topDocs3 = indexSearcher.search(query3, 100);
System.out.println("total no of docs " + topDocs3.totalHits);

// Queries to search in "contents" field

// Create a wildcard query to search for contents starting with
runner
// This query works and returns one doc
Query query4 = new WildcardQuery(new Term("contents", "runner*"));
TopDocs topDocs4 = indexSearcher.search(query4, 100);
System.out.println("total no of docs " + topDocs4.totalHits);

// Create a wildcard query to search for contents ending with file
// This query works and returns two  docs
Query query5 = new WildcardQuery(new Term("contents", "*file"));
TopDocs topDocs5 = indexSearcher.search(query5, 100);
System.out.println("total no of docs " + topDocs5.totalHits);

}

}


I observed that the file path seperator that i am using in the field and
lucene escape charater seem to be same. so whenever i am using a escape
character in the query the search is failing, if i dont use the escape
sequence it is returning the results properly.

Though i am escaping "\" by giving two "\\" the query is still failing.

one way to solve this problem is to replace all "\" with "/" while
indexing. and subsequently using "/" as file path seperator while searching.

But i wouldnt prefer to meddle with the filepath. So is there any
alternative to solve this problem without replacing the file path.

TIA,
Nischal Y



On Mon, Oct 14, 2013 at 10:31 PM, Ian Lea  wrote:

> Seems to me that it should work.  I suggest you show us a complete
> self-contained example program that demonstrates the problem.
>
>
> --
> Ian.
>
>
> On Mon, Oct 14, 2013 at 12:42 PM, nischal reddy
>  wrote:
> > Hi Ian,
> >
> > Actually im able to do wildcard searches on all the fields except the
> > "filePath" field. I am able to do both the leading and trailing wildcard
> > searches on all the fields,
> > but when i do the wildcard search on filepath field it is somehow not
> > working, an eg file path would look some thing like this
> "\Samples\F1.cls"
> > i think because of "\" present in the field it is failing. when i do a
> > wildcard search with the query "filePath : *" it is indeed returning all
> > the docs in the index. But when i do any other wildcard searches(leading
> or
> > trailing) it is not working, any clues why it is working in other fields
> > and not working on "filePath" field.
> >
> > TIA,
> > Nischal Y
> >
> >
> > On Mon, Oc

Re: wildcard search not working on file paths

2013-10-14 Thread Ian Lea
You seem to be indexing paths delimited by backslash then saying a
search for Samples/* doesn't match anything.  No surprises there, if
I've read your code correctly.  Since you are creating wildcard
queries directly from Terms I don't think that lucene escaping is
relevant here,  But the presence of all the backslashes in paths and
java code doesn't help.  I'd convert them all to standard unix /a/b/c
format, for searching anyway: you can always store the original if you
want to use that in results.

One further small tip: your sample program is good, with no external
dependencies, but would be even better if you used RAMDirectory.  That
way I could run it on my non-Windows system if I wanted to, with the
addition of some imports.


--
Ian.


On Mon, Oct 14, 2013 at 7:55 PM, nischal reddy
 wrote:
> Hi Ian,
>
> Please find a sample program below which better illustrates the scenario
>
>
> public class TestWriter {
> public static void main(String[] args) throws IOException {
> createIndex();
> searchIndex();
> }
>
> public static void createIndex() throws IOException {
> Directory directory = FSDirectory.open(new File("C:\\temp"));
>
> IndexWriterConfig iwriter = new IndexWriterConfig(
> Version.LUCENE_44, new
> StandardAnalyzer(Version.LUCENE_44));
>
> IndexWriter iWriter = new IndexWriter(directory, iwriter);
>
> Document document1 = new Document();
>
> document1.add(new StringField("FILE_PATH",
> "\\Samples\\Batching\\runner.p", Store.YES));
> document1.add(new StringField("contents", "runnerfile",
> Store.YES));
>
> iWriter.addDocument(document1);
>
> Document document2 = new Document();
>
> document2.add(new StringField("FILE_PATH",
> "\\Samples\\Business\\stopper.p", Store.YES));
> document2
> .add(new StringField("contents", "stopperfile",
> Store.YES));
>
> iWriter.addDocument(document2);
> iWriter.commit();
> iWriter.close();
>
>
> }
>
> public static void searchIndex() throws IOException {
>
> Directory directory = FSDirectory.open(new File("C:\\temp"));
> IndexReader indexReader = DirectoryReader.open(directory);
> IndexSearcher indexSearcher = new IndexSearcher(indexReader);
>
> // Create a wildcard query to get all file paths
> // This query works fine and returns all the docs in index
> Query query1 = new WildcardQuery(new Term("FILE_PATH", "*"));
> TopDocs topDocs = indexSearcher.search(query1, 100);
> System.out.println("total no of docs " + topDocs.totalHits);
>
> // Create a wildcard query to search for paths starting with
> /Samples
> // This query doesnt work and returns zero docs
> //doest work with "*Samples//*" either
> // but works with "*Samples*"
> Query query2 = new WildcardQuery(new Term("FILE_PATH",
> "*Samples/*"));
> TopDocs topDocs2 = indexSearcher.search(query2, 100);
> System.out.println("total no of docs " + topDocs2.totalHits);
>
> // Create a wildcard query to search for paths ending with runner.p
> // This query works and returns 1 doc
> Query query3 = new WildcardQuery(new Term("FILE_PATH",
> "*runner.p"));
> TopDocs topDocs3 = indexSearcher.search(query3, 100);
> System.out.println("total no of docs " + topDocs3.totalHits);
>
> // Queries to search in "contents" field
>
> // Create a wildcard query to search for contents starting with
> runner
> // This query works and returns one doc
> Query query4 = new WildcardQuery(new Term("contents", "runner*"));
> TopDocs topDocs4 = indexSearcher.search(query4, 100);
> System.out.println("total no of docs " + topDocs4.totalHits);
>
> // Create a wildcard query to search for contents ending with file
> // This query works and returns two  docs
> Query query5 = new WildcardQuery(new Term("contents", "*file"));
> TopDocs topDocs5 = indexSearcher.search(query5, 100);
> System.out.println("total no of docs " + topDocs5.totalHits);
>
> }
>
> }
>
>
> I observed that the file path seperator that i am using in the field and
> lucene escape charater seem to be same. so whenever i am using a escape
> character in the query the search is failing, if i dont use the escape
> sequence it is returning the results properly.
>
> Though i am escaping "\" by giving two "\\" the query is still failing.
>
> one way to solve this problem is to replace all "\" with "/" while
> indexing. and subsequently using "/" as file path seperator while searching.
>
> But i wouldnt prefer to meddle with the filepath. So is there any
> alternative to solve this problem without replacing the file path.
>
> TIA,
> Nischal Y
>
>
>
> On Mon, Oct 14, 2013