Am 07.08.2012 10:20, schrieb Danil ŢORIN: Hi Danil,
> If you do intersection (not join), maybe it make sense to put every > thing into 1 index? Just a note on that: my application performs intersections and joins (unions) on the results, depending on the query. So the index structure has to be ready for both, but intersections are clearly more complicated. > Just transform your input like "brown fox" into "ADJ:brown|<your > payload> NOUN:fox|<other payload>" I understand that this denotes "ADJ" and "NOUN" to be interpreted as the actual token and "brown" and "fox" as payloads (followed by <other payload>), right? This is a very neat approach and I have vaguely considered that. One problem is that I aim for a very high level of flexibility, meaning that additional annotations have to be addable at any point and different tokenizations apply. However, I will re-consider your suggestion, possibly applying one of multiple tokenizations as a default in this sense. > Of course I'm not aware of all the details, so my solution might not > be applicable to your project. > Maybe you could share more details, so this won't transform in "XY problem". > > Keep in mind : always optimize your index for the query usecase, > instead of blindly processing the input data. Thanks for that reminder; this becomes quite difficult in my scenario though since we want to allow for flexible changes in the index types, representing different annotations, tokenization logics etc. Best, Carsten -- Institut für Deutsche Sprache | http://www.ids-mannheim.de Projekt KorAP | http://korap.ids-mannheim.de Tel. +49-(0)621-43740789 | schno...@ids-mannheim.de Korpusanalyseplattform der nächsten Generation Next Generation Corpus Analysis Platform --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org