> Saving the index in text format would also be a fun codec (in 4.0) to create 
> :)

A codec like that would be welcome :)

On Wed, Sep 22, 2010 at 5:31 AM, Michael McCandless
<[email protected]> wrote:
> Saving the index in text format would also be a fun codec (in 4.0) to create 
> :)
>
> Ie, the codec would be read/write.  The performance wouldn't be great,
> but it'd be neat for debugging, teaching, transparency purposes...
>
> Mike
>
> On Tue, Sep 21, 2010 at 9:26 PM, Lance Norskog <[email protected]> wrote:
>> The Lucene CheckIndex program opens an index and walks all of the data
>> structures. It is a good start for you.
>>
>> Sahin Buyrukbilen wrote:
>>>
>>> Thank you Uwe, I will read the docs and try to do it, however do you have
>>> an
>>> example code? I need because I am not very familiar with Java.
>>>
>>> Thank you.
>>>
>>> Sahin
>>>
>>> On Tue, Sep 21, 2010 at 12:29 PM, Uwe Schindler<[email protected]>  wrote:
>>>
>>>
>>>>
>>>> Hi,
>>>>
>>>> Retrieve a TermEnum and iterate it. By that you get all terms and can
>>>> retrieve the docFreq, which is the second column in your table. Finally
>>>> for
>>>> each term you position the TermDocs enum on this term to get all document
>>>> ids. Read docs of IndexReader/TermEnum/TermDocs about this.
>>>>
>>>> Uwe
>>>>
>>>> -----
>>>> Uwe Schindler
>>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>>> http://www.thetaphi.de
>>>> eMail: [email protected]
>>>>
>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Sahin Buyrukbilen [mailto:[email protected]]
>>>>> Sent: Tuesday, September 21, 2010 9:12 AM
>>>>> To: [email protected]
>>>>> Subject: How to export lucene index to a simple text file?
>>>>>
>>>>> Hi,
>>>>>
>>>>> I am currently working on a project about private information retrieval
>>>>>
>>>>
>>>> and I
>>>>
>>>>>
>>>>> need to have an inverted index file in txt format as follows:
>>>>>
>>>>> Term t    freq t      Inverted list for t
>>>>>
>>>>> -------------------------------------------------------------------------
>>>>> and          1<6, 0.159>
>>>>> big           2<2, 0.148>  <3, 0.088>
>>>>> dark         1<6, 0.079>
>>>>> .
>>>>> .
>>>>> .
>>>>> .
>>>>>
>>>>> here the<number1, number2>  pairs are indicating: number1: doc ID, where
>>>>> term t exist with a rank of number2.
>>>>>
>>>>> I have created an index from 5492 txt files, however the index is
>>>>>
>>>>
>>>> composed
>>>> of
>>>>
>>>>>
>>>>> different files and most of the data is not in the text format.
>>>>>
>>>>> could somebody guide me to achieve this?
>>>>>
>>>>> Thank you
>>>>>
>>>>> Sahin.
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [email protected]
>>>> For additional commands, e-mail: [email protected]
>>>>
>>>>
>>>>
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to