I have a couple of questions regarding indexing and searching a document that has repeated values for the same field (specifically, the authors of a document, in this case):

Firstly, I'm adding the repeated field with this code:

for creator in creators:
doc.add(Field('creator', creator, Field.Store.YES, Field.Index.UN_TOKENIZED))

but can't find a way to read those fields back out from the index. If I use

for author in hits[i]["creator"]:
        print author

then just the first "creator" entry is returned for that document and gets split into a list of individual letters - in other words, hits[i] ["creator"] is a string and not a list.


Secondly, it doesn't seem to be possible (in PyLucene 1.9.1) to search an untokenized field using a term that contains spaces. For a document that has a creator "Doe J", the query
creator:"Doe J"
doesn't return any results, and
creator:Doe J
doesn't match what it needs to.


Has anyone found solutions to these problems already? For the first I could just replace spaces with underscores during the indexing, but that wouldn't be the ideal solution.

alf.

_______________________________________________
pylucene-dev mailing list
[email protected]
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

Reply via email to