If you know this at index time, could you index language-specific fields? i.e. text_en, text_de, title_en, title_de etc? Perhaps you could have a catch-all that contained everything too.
Then your searching would be on a per field_lang basis. PerFieldAnalyzerWrapper would automatically use the proper language-specific Analyzers. This may turn out being too clumsy if you have many fields X many languages.... Actually, this looks a lot like what SOLR could provide, perhaps with dynamic fields and the dismax query parser Best Erick On Thu, Jul 8, 2010 at 4:47 AM, Bernhard Haslhofer < bernhard.haslho...@univie.ac.at> wrote: > Hi, > > in my application I have documents that may contain terms and term > translations in multiple languages. The language tag of each term is > explicitly given and should be available in the index in order to enable > queries for documents that contain a certain term (optionally in a given > language). > > I could split the documents in a set of sub-documents each containing terms > in one specific language and a dedicated field indicating the language. But > then I need multiple queries to retrieve stored term translations from the > subdocuments. > > The IMO better alternative is not to split the document and to assign the > language tags as payloads to the terms. But then I need > > (i) a search filter that eliminates docs based on a given language tag and > > (ii) a way to access the term payloads from the documents returned by the > searcher > > For both I haven't found a solution yet. Can I write a custom PayloadFilter > or is there already some implementation available? Is it possible to access > the term payloads from the search results? > > Thanks. > Bernhard > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >