I mean "ADJ:brown" as a token and only the <payload> as payload, since you probably only use it for some scoring/postprocessing not the actual matching.
You can even write a filter that will emit both tokens "ADJ" and "AJD:brown" on same position (so you'll be able to do phrase queries), and still maintain join capability. On Tue, Aug 7, 2012 at 12:13 PM, Carsten Schnober <schno...@ids-mannheim.de> wrote: > Am 07.08.2012 10:20, schrieb Danil ŢORIN: > > Hi Danil, > >> If you do intersection (not join), maybe it make sense to put every >> thing into 1 index? > > Just a note on that: my application performs intersections and joins > (unions) on the results, depending on the query. So the index structure > has to be ready for both, but intersections are clearly more complicated. > >> Just transform your input like "brown fox" into "ADJ:brown|<your >> payload> NOUN:fox|<other payload>" > > I understand that this denotes "ADJ" and "NOUN" to be interpreted as the > actual token and "brown" and "fox" as payloads (followed by <other > payload>), right? > > This is a very neat approach and I have vaguely considered that. One > problem is that I aim for a very high level of flexibility, meaning that > additional annotations have to be addable at any point and different > tokenizations apply. However, I will re-consider your suggestion, > possibly applying one of multiple tokenizations as a default in this sense. > >> Of course I'm not aware of all the details, so my solution might not >> be applicable to your project. >> Maybe you could share more details, so this won't transform in "XY problem". >> >> Keep in mind : always optimize your index for the query usecase, >> instead of blindly processing the input data. > > Thanks for that reminder; this becomes quite difficult in my scenario > though since we want to allow for flexible changes in the index types, > representing different annotations, tokenization logics etc. > Best, > Carsten > > > -- > Institut für Deutsche Sprache | http://www.ids-mannheim.de > Projekt KorAP | http://korap.ids-mannheim.de > Tel. +49-(0)621-43740789 | schno...@ids-mannheim.de > Korpusanalyseplattform der nächsten Generation > Next Generation Corpus Analysis Platform > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org