On Tue, Sep 23, 2008 at 5:28 PM, Grant Ingersoll <[EMAIL PROTECTED]>wrote:

> So, the piece I'm missing is how do you know what field for which terms.
>  In other words how do you know xyz goes against organization and abc
> against name.  Your wording implies that you don't know this before hand,


          I guess this would be the case. The free flowing text search leads
to this issue.


> yet you are somehow suggesting that Lucene should be able to do it.
>  Correct me if I'm wrong.

      I am not sure if Lucene will be able to directly able to do it.
However Indexed Terms in Lucene can certainly be used in learning the field
of a particular word/token.
      One way, would be Lucene Index can be traversed to generated a
Learning System which will be later used to learn the field name of a
particular system. I suggest traversing the termDocs and extracting out the
words and field information which can be stored in a separate DB/Index
(Learning System). This system can then be queried 1st to determine the
field type of word. The additional time that the Learning System will
require should be compensated by having a smaller Index Size.



Thanks
Umesh




>
> -Grant
>
>
>
> On Sep 23, 2008, at 6:51 AM, Anshul jain wrote:
>
>  Here is what I'm trying to do:
>>
>> say a lucene document:
>> name: abc ^10
>> organization: xyz ^3
>>
>> ^10 and ^3 are boosts in the document.
>>
>> now if I query name: abc ^5 AND organization: xyz this will work.
>>
>> but if I query (default_field): abc^5 AND xyz this won't work.
>>
>> Now what I want is that a text can be associated with more than one field.
>> i.e.
>>
>> (field1,field2,field3):value
>> name,(default_field),title: abc^10
>> organization,(default_field),institute: xyz^3
>>
>> then both of my queries will work.
>>
>> Is it possible to do so in lucene without changing the source?
>> If no then can anyone please explain the indexing and searching
>> mechanism for lucene, so that I can start working on it.
>>
>> The solution given by the java-users won't work for me as I do not
>> want to add all the contents of the document in a single field and
>> then search for that field, as this would increase the index size and
>> I've to index more than 10 million documents. Also
>> multifieldqueryparser will make it query execution inefficient, as
>> there will be thousands of fields.
>>
>> If I start storing just a single field as: (default_field): "name abc
>> organization xyz", then it is possible that some other documents might
>> get selected that are not relevant. Also i want to boost individual
>> fields in a document.
>>
>> Anshul
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>

Reply via email to