Wildcard Terms and total word or phrase count

2015-11-27 Thread Kunzman, Douglas *
Hi -

This is my first Lucene project, my other search projects have used Solr.
I would like to find the total number of WildCard terms in a set of documents 
with 0-N matches per document.
I would prefer not have to open each document where a match is found.  I need 
to be able to support wildcards but my requirements are somewhat flexible in 
about phrase search support.
Whatever is easier.

This is what I have so far.

   public static void main(String args[]) throws IOException, ParseException {
Directory idx = FSDirectory.open(path);
index("C:\\Users\\Douglas.Kunzman\\Desktop\\test_index");

Term term = new Term("Doc", "quar*");

WildcardQuery wc = new WildcardQuery(term);

SpanQuery spanTerm = new SpanMultiTermQueryWrapper(wc);
IndexReader indexReader = DirectoryReader.open(idx);

System.out.println("Term freq=" + indexReader.totalTermFreq(term));
System.out.println("Term freq=" + 
indexReader.getSumTotalTermFreq("Doc"));

IndexSearcher isearcher = new IndexSearcher(indexReader);

IndexReaderContext indexReaderContext = isearcher.getTopReaderContext();
TermContext context = TermContext.build(indexReaderContext, term);
TermStatistics termStatistics = isearcher.termStatistics(term, context);
System.out.println("termStatics=" + termStatistics.totalTermFreq());
}

Does anyone have any suggestions?  totalTermFreq is zero, but when search using 
quartz we find matches.
I'm searching the Quartz user's guide as an example.

Thanks,
Doug







Re: Wildcard Terms and total word or phrase count

2015-11-29 Thread Michael Wilkowski
It is because your index does not contain term quar* and this statistics
function is not a query (you have to pass exact form of the term). To count
terms that meet search criteria you may run search query with custom
collector and count results. Or use normal search query returning TopDocs
and just check totalHitCount (however, first option is faster because no
results are gathered and sorted).

MW
Sent from Mi phone
On 27 Nov 2015 22:06, "Kunzman, Douglas *" 
wrote:

> Hi -
>
> This is my first Lucene project, my other search projects have used Solr.
> I would like to find the total number of WildCard terms in a set of
> documents with 0-N matches per document.
> I would prefer not have to open each document where a match is found.  I
> need to be able to support wildcards but my requirements are somewhat
> flexible in about phrase search support.
> Whatever is easier.
>
> This is what I have so far.
>
>public static void main(String args[]) throws IOException,
> ParseException {
> Directory idx = FSDirectory.open(path);
> index("C:\\Users\\Douglas.Kunzman\\Desktop\\test_index");
>
> Term term = new Term("Doc", "quar*");
>
> WildcardQuery wc = new WildcardQuery(term);
>
> SpanQuery spanTerm = new
> SpanMultiTermQueryWrapper(wc);
> IndexReader indexReader = DirectoryReader.open(idx);
>
> System.out.println("Term freq=" + indexReader.totalTermFreq(term));
> System.out.println("Term freq=" +
> indexReader.getSumTotalTermFreq("Doc"));
>
> IndexSearcher isearcher = new IndexSearcher(indexReader);
>
> IndexReaderContext indexReaderContext =
> isearcher.getTopReaderContext();
> TermContext context = TermContext.build(indexReaderContext, term);
> TermStatistics termStatistics = isearcher.termStatistics(term,
> context);
> System.out.println("termStatics=" +
> termStatistics.totalTermFreq());
> }
>
> Does anyone have any suggestions?  totalTermFreq is zero, but when search
> using quartz we find matches.
> I'm searching the Quartz user's guide as an example.
>
> Thanks,
> Doug
>
>
>
>
>
>


RE: Wildcard Terms and total word or phrase count

2015-11-29 Thread Kunzman, Douglas *
Everyone -

Thanks for  getting back to me. Unfortunately, in the sample code even when I 
pass a term with no wild cards
and it is a string with multiple instances in my document the totalHitCount is 
never more than one.  Does anyone have any
ideas what I could be doing wrong?

Thanks, 
Doug

-Original Message-
From: Michael Wilkowski [mailto:m...@silenteight.com] 
Sent: Sunday, November 29, 2015 3:38 AM
To: java-user@lucene.apache.org
Subject: Re: Wildcard Terms and total word or phrase count

It is because your index does not contain term quar* and this statistics
function is not a query (you have to pass exact form of the term). To count
terms that meet search criteria you may run search query with custom
collector and count results. Or use normal search query returning TopDocs
and just check totalHitCount (however, first option is faster because no
results are gathered and sorted).

MW
Sent from Mi phone
On 27 Nov 2015 22:06, "Kunzman, Douglas *" 
wrote:

> Hi -
>
> This is my first Lucene project, my other search projects have used Solr.
> I would like to find the total number of WildCard terms in a set of
> documents with 0-N matches per document.
> I would prefer not have to open each document where a match is found.  I
> need to be able to support wildcards but my requirements are somewhat
> flexible in about phrase search support.
> Whatever is easier.
>
> This is what I have so far.
>
>public static void main(String args[]) throws IOException,
> ParseException {
> Directory idx = FSDirectory.open(path);
> index("C:\\Users\\Douglas.Kunzman\\Desktop\\test_index");
>
> Term term = new Term("Doc", "quar*");
>
> WildcardQuery wc = new WildcardQuery(term);
>
> SpanQuery spanTerm = new
> SpanMultiTermQueryWrapper(wc);
> IndexReader indexReader = DirectoryReader.open(idx);
>
> System.out.println("Term freq=" + indexReader.totalTermFreq(term));
> System.out.println("Term freq=" +
> indexReader.getSumTotalTermFreq("Doc"));
>
> IndexSearcher isearcher = new IndexSearcher(indexReader);
>
> IndexReaderContext indexReaderContext =
> isearcher.getTopReaderContext();
> TermContext context = TermContext.build(indexReaderContext, term);
> TermStatistics termStatistics = isearcher.termStatistics(term,
> context);
> System.out.println("termStatics=" +
> termStatistics.totalTermFreq());
> }
>
> Does anyone have any suggestions?  totalTermFreq is zero, but when search
> using quartz we find matches.
> I'm searching the Quartz user's guide as an example.
>
> Thanks,
> Doug
>
>
>
>
>
>


Re: Wildcard Terms and total word or phrase count

2015-11-29 Thread Jack Krupansky
You didn't post your code that creates the index. Make sure you are using a
tokenized TextField rather than a single-token StringField.

-- Jack Krupansky

On Fri, Nov 27, 2015 at 4:06 PM, Kunzman, Douglas * <
douglas.kunz...@fda.hhs.gov> wrote:

> Hi -
>
> This is my first Lucene project, my other search projects have used Solr.
> I would like to find the total number of WildCard terms in a set of
> documents with 0-N matches per document.
> I would prefer not have to open each document where a match is found.  I
> need to be able to support wildcards but my requirements are somewhat
> flexible in about phrase search support.
> Whatever is easier.
>
> This is what I have so far.
>
>public static void main(String args[]) throws IOException,
> ParseException {
> Directory idx = FSDirectory.open(path);
> index("C:\\Users\\Douglas.Kunzman\\Desktop\\test_index");
>
> Term term = new Term("Doc", "quar*");
>
> WildcardQuery wc = new WildcardQuery(term);
>
> SpanQuery spanTerm = new
> SpanMultiTermQueryWrapper(wc);
> IndexReader indexReader = DirectoryReader.open(idx);
>
> System.out.println("Term freq=" + indexReader.totalTermFreq(term));
> System.out.println("Term freq=" +
> indexReader.getSumTotalTermFreq("Doc"));
>
> IndexSearcher isearcher = new IndexSearcher(indexReader);
>
> IndexReaderContext indexReaderContext =
> isearcher.getTopReaderContext();
> TermContext context = TermContext.build(indexReaderContext, term);
> TermStatistics termStatistics = isearcher.termStatistics(term,
> context);
> System.out.println("termStatics=" +
> termStatistics.totalTermFreq());
> }
>
> Does anyone have any suggestions?  totalTermFreq is zero, but when search
> using quartz we find matches.
> I'm searching the Quartz user's guide as an example.
>
> Thanks,
> Doug
>
>
>
>
>
>


RE: Wildcard Terms and total word or phrase count

2015-11-29 Thread Kunzman, Douglas *

Jack -

Thanks a lot for taking the time to try and answer my question.   

From using Solr I knew that it needed to be a TextField.   

I'm including the entire unit tester as an attachment.

Thanks, 
Doug

-Original Message-
From: Jack Krupansky [mailto:jack.krupan...@gmail.com] 
Sent: Sunday, November 29, 2015 12:18 PM
To: java-user@lucene.apache.org
Subject: Re: Wildcard Terms and total word or phrase count

You didn't post your code that creates the index. Make sure you are using a
tokenized TextField rather than a single-token StringField.

-- Jack Krupansky

On Fri, Nov 27, 2015 at 4:06 PM, Kunzman, Douglas * <
douglas.kunz...@fda.hhs.gov> wrote:

> Hi -
>
> This is my first Lucene project, my other search projects have used Solr.
> I would like to find the total number of WildCard terms in a set of
> documents with 0-N matches per document.
> I would prefer not have to open each document where a match is found.  I
> need to be able to support wildcards but my requirements are somewhat
> flexible in about phrase search support.
> Whatever is easier.
>
> This is what I have so far.
>
>public static void main(String args[]) throws IOException,
> ParseException {
> Directory idx = FSDirectory.open(path);
> index("C:\\Users\\Douglas.Kunzman\\Desktop\\test_index");
>
> Term term = new Term("Doc", "quar*");
>
> WildcardQuery wc = new WildcardQuery(term);
>
> SpanQuery spanTerm = new
> SpanMultiTermQueryWrapper(wc);
> IndexReader indexReader = DirectoryReader.open(idx);
>
> System.out.println("Term freq=" + indexReader.totalTermFreq(term));
> System.out.println("Term freq=" +
> indexReader.getSumTotalTermFreq("Doc"));
>
> IndexSearcher isearcher = new IndexSearcher(indexReader);
>
> IndexReaderContext indexReaderContext =
> isearcher.getTopReaderContext();
> TermContext context = TermContext.build(indexReaderContext, term);
> TermStatistics termStatistics = isearcher.termStatistics(term,
> context);
> System.out.println("termStatics=" +
> termStatistics.totalTermFreq());
> }
>
> Does anyone have any suggestions?  totalTermFreq is zero, but when search
> using quartz we find matches.
> I'm searching the Quartz user's guide as an example.
>
> Thanks,
> Doug
>
>
>
>
>
>

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Wildcard Terms and total word or phrase count

2015-11-29 Thread Michael Wilkowski
Hi Doug,
your attachment is not available (likely security settings). Please put it
in github or somewhere else and provide a link to download.

MW

On Mon, Nov 30, 2015 at 2:29 AM, Kunzman, Douglas * <
douglas.kunz...@fda.hhs.gov> wrote:

>
> Jack -
>
> Thanks a lot for taking the time to try and answer my question.
>
> From using Solr I knew that it needed to be a TextField.
>
> I'm including the entire unit tester as an attachment.
>
> Thanks,
> Doug
>
> -Original Message-
> From: Jack Krupansky [mailto:jack.krupan...@gmail.com]
> Sent: Sunday, November 29, 2015 12:18 PM
> To: java-user@lucene.apache.org
> Subject: Re: Wildcard Terms and total word or phrase count
>
> You didn't post your code that creates the index. Make sure you are using a
> tokenized TextField rather than a single-token StringField.
>
> -- Jack Krupansky
>
> On Fri, Nov 27, 2015 at 4:06 PM, Kunzman, Douglas * <
> douglas.kunz...@fda.hhs.gov> wrote:
>
> > Hi -
> >
> > This is my first Lucene project, my other search projects have used Solr.
> > I would like to find the total number of WildCard terms in a set of
> > documents with 0-N matches per document.
> > I would prefer not have to open each document where a match is found.  I
> > need to be able to support wildcards but my requirements are somewhat
> > flexible in about phrase search support.
> > Whatever is easier.
> >
> > This is what I have so far.
> >
> >public static void main(String args[]) throws IOException,
> > ParseException {
> > Directory idx = FSDirectory.open(path);
> > index("C:\\Users\\Douglas.Kunzman\\Desktop\\test_index");
> >
> > Term term = new Term("Doc", "quar*");
> >
> > WildcardQuery wc = new WildcardQuery(term);
> >
> > SpanQuery spanTerm = new
> > SpanMultiTermQueryWrapper(wc);
> > IndexReader indexReader = DirectoryReader.open(idx);
> >
> > System.out.println("Term freq=" +
> indexReader.totalTermFreq(term));
> > System.out.println("Term freq=" +
> > indexReader.getSumTotalTermFreq("Doc"));
> >
> > IndexSearcher isearcher = new IndexSearcher(indexReader);
> >
> > IndexReaderContext indexReaderContext =
> > isearcher.getTopReaderContext();
> > TermContext context = TermContext.build(indexReaderContext,
> term);
> > TermStatistics termStatistics = isearcher.termStatistics(term,
> > context);
> > System.out.println("termStatics=" +
> > termStatistics.totalTermFreq());
> > }
> >
> > Does anyone have any suggestions?  totalTermFreq is zero, but when search
> > using quartz we find matches.
> > I'm searching the Quartz user's guide as an example.
> >
> > Thanks,
> > Doug
> >
> >
> >
> >
> >
> >
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>