Re: Access indexed terms

2010-05-14 Thread Andrzej Bialecki
On 2010-05-14 11:35, manjula wijewickrema wrote:
 Hi,
 
 Is it possible to put the indexed terms into an array in lucene. For
 example, imagine I have indexed a single document in Lucene and now I want
 to acces those terms in the index. Is it possible to retrieve (call) those
 terms as array elements? If it is possible, then how?

In short: unless you created TermFrequencyVector when adding the
document, the answer is with great difficulty.

For a working code that does this see here:

http://code.google.com/p/luke/source/browse/trunk/src/org/getopt/luke/DocReconstructor.java

If you really need such kind of access in your application then add your
documents with term vectors with offsets and positions. Even then,
depending on the Analyzer you used, the process is lossy - some input
data that was discarded by Analyzer is simply no longer available.

-- 
Best regards,
Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Access indexed terms

2010-05-14 Thread manjula wijewickrema
Hi Andrzej

Thanx for the reply. But as you have mentioned, creating arrays for indexed
terms seems to be little difficult. Here my intention is to find the term
frequencies (of terms) of an indexed document. I can find the term frequency
of a particular term (giving as a query) if I specify the term in the code.
But I really want is to get the term frequency (or even the number of times
it appears in the document) of the all indexed terms (or high frequency
terms) without named them in the code. Is there an alternative way to do
that?

Thanks
Manjula


On Fri, May 14, 2010 at 4:00 PM, Andrzej Bialecki a...@getopt.org wrote:

  On 2010-05-14 11:35, manjula wijewickrema wrote:
  Hi,
 
  Is it possible to put the indexed terms into an array in lucene. For
  example, imagine I have indexed a single document in Lucene and now I
 want
  to acces those terms in the index. Is it possible to retrieve (call)
 those
  terms as array elements? If it is possible, then how?

 In short: unless you created TermFrequencyVector when adding the
 document, the answer is with great difficulty.

 For a working code that does this see here:


 http://code.google.com/p/luke/source/browse/trunk/src/org/getopt/luke/DocReconstructor.java

 If you really need such kind of access in your application then add your
 documents with term vectors with offsets and positions. Even then,
 depending on the Analyzer you used, the process is lossy - some input
 data that was discarded by Analyzer is simply no longer available.

 --
 Best regards,
 Andrzej Bialecki 
  ___. ___ ___ ___ _ _   __
 [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
 ___|||__||  \|  ||  |  Embedded Unix, System Integration
 http://www.sigram.com  Contact: info at sigram dot com


 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org




Re: Access indexed terms

2010-05-14 Thread Andrzej Bialecki
On 2010-05-14 14:24, manjula wijewickrema wrote:
 Hi Andrzej
 
 Thanx for the reply. But as you have mentioned, creating arrays for indexed
 terms seems to be little difficult. Here my intention is to find the term
 frequencies (of terms) of an indexed document. I can find the term frequency
 of a particular term (giving as a query) if I specify the term in the code.
 But I really want is to get the term frequency (or even the number of times
 it appears in the document) of the all indexed terms (or high frequency
 terms) without named them in the code. Is there an alternative way to do
 that?

Yes, see the discussion here:

https://issues.apache.org/jira/browse/LUCENE-2393


-- 
Best regards,
Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Access indexed terms

2010-05-14 Thread manjula wijewickrema
Dear Andrzej,

Thanx for your valuable help. I also noticed this HighFreqTerms approach in
the Lucene email archive and try to use it. In order to do that I have
downloaded lucene-misc-2.9.1.jar and added org.apache.lucene.misc package
into my project. Now I think I have to call this HighFreqTerms class in my
code. But I was unable to find any guidence of how to do it? If you can pls.
be kind enough to tell me how can I use this class in my code.

Thanx
Manjula


On Fri, May 14, 2010 at 6:16 PM, Andrzej Bialecki a...@getopt.org wrote:

 On 2010-05-14 14:24, manjula wijewickrema wrote:
  Hi Andrzej
 
  Thanx for the reply. But as you have mentioned, creating arrays for
 indexed
  terms seems to be little difficult. Here my intention is to find the term
  frequencies (of terms) of an indexed document. I can find the term
 frequency
  of a particular term (giving as a query) if I specify the term in the
 code.
  But I really want is to get the term frequency (or even the number of
 times
  it appears in the document) of the all indexed terms (or high frequency
  terms) without named them in the code. Is there an alternative way to do
  that?

 Yes, see the discussion here:

 https://issues.apache.org/jira/browse/LUCENE-2393


 --
  Best regards,
 Andrzej Bialecki 
  ___. ___ ___ ___ _ _   __
 [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
 ___|||__||  \|  ||  |  Embedded Unix, System Integration
 http://www.sigram.com  Contact: info at sigram dot com


 -
 To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-user-h...@lucene.apache.org