Number of terms in a SOLR field

2009-09-30 Thread Fergus McMenemie
Hi all,

I am attempting to test some changes I made to my DIH based
indexing process. The changes only affect the way I 
describe my fields in data-config.xml, there should be no
changes to the way the data is indexed or stored.

As a QA check I was wanting to compare the results from
indexing the same data before/after the change. I was looking
for a way of getting counts of terms in each field. I 
guess Luke etc most allow this but how?

Regards Fergus.


Re: Number of terms in a SOLR field

2009-09-30 Thread Andrzej Bialecki

Fergus McMenemie wrote:

Hi all,

I am attempting to test some changes I made to my DIH based
indexing process. The changes only affect the way I 
describe my fields in data-config.xml, there should be no

changes to the way the data is indexed or stored.

As a QA check I was wanting to compare the results from
indexing the same data before/after the change. I was looking
for a way of getting counts of terms in each field. I 
guess Luke etc most allow this but how?


Luke uses brute force approach - it traverses all terms, and counts 
terms per field. This is easy to implement yourself - just get 
IndexReader.terms() enumeration and traverse it.



--
Best regards,
Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: Number of terms in a SOLR field

2009-09-30 Thread Fergus McMenemie

Fergus McMenemie wrote:
 Hi all,
 
 I am attempting to test some changes I made to my DIH based
 indexing process. The changes only affect the way I 
 describe my fields in data-config.xml, there should be no
 changes to the way the data is indexed or stored.
 
 As a QA check I was wanting to compare the results from
 indexing the same data before/after the change. I was looking
 for a way of getting counts of terms in each field. I 
 guess Luke etc most allow this but how?

Luke uses brute force approach - it traverses all terms, and counts 
terms per field. This is easy to implement yourself - just get 
IndexReader.terms() enumeration and traverse it.

Thanks Andrzej 

This is just a one off QA check. How do I get Luke to display
terms and counts?


-- 
Best regards,
Andrzej Bialecki 

Fergus.  
-- 


Re: Number of terms in a SOLR field

2009-09-30 Thread Andrzej Bialecki

Fergus McMenemie wrote:

Fergus McMenemie wrote:

Hi all,

I am attempting to test some changes I made to my DIH based
indexing process. The changes only affect the way I 
describe my fields in data-config.xml, there should be no

changes to the way the data is indexed or stored.

As a QA check I was wanting to compare the results from
indexing the same data before/after the change. I was looking
for a way of getting counts of terms in each field. I 
guess Luke etc most allow this but how?
Luke uses brute force approach - it traverses all terms, and counts 
terms per field. This is easy to implement yourself - just get 
IndexReader.terms() enumeration and traverse it.


Thanks Andrzej 


This is just a one off QA check. How do I get Luke to display
terms and counts?


1. get Luke 0.9.9
2. open index with Luke
3. Look at the Overview panel, you will see the list titled Available 
fields and term counts per field.



--
Best regards,
Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: Number of terms in a SOLR field

2009-09-30 Thread Fergus McMenemie
Fergus McMenemie wrote:
 Fergus McMenemie wrote:
 Hi all,

 I am attempting to test some changes I made to my DIH based
 indexing process. The changes only affect the way I 
 describe my fields in data-config.xml, there should be no
 changes to the way the data is indexed or stored.

 As a QA check I was wanting to compare the results from
 indexing the same data before/after the change. I was looking
 for a way of getting counts of terms in each field. I 
 guess Luke etc most allow this but how?
 Luke uses brute force approach - it traverses all terms, and counts 
 terms per field. This is easy to implement yourself - just get 
 IndexReader.terms() enumeration and traverse it.

 Thanks Andrzej 
 
 This is just a one off QA check. How do I get Luke to display
 terms and counts?

1. get Luke 0.9.9
2. open index with Luke
3. Look at the Overview panel, you will see the list titled Available 
fields and term counts per field.


Thanks,

That got me going, and I felt a little stupid after stumbling
across http://wiki.apache.org/solr/LukeRequestHandler

Regards Fergus