I am not sure highlight will work as i suspect it will encounter the same obstacle, see in: https://github.com/elasticsearch/elasticsearch/issues/5245
as for suggestion #2, this will break our current schema and will require a significant model change (we store the data in MongoDB as well) - so, i am not sure if we are not better off to wait until #3022 is solved? for the meantime, any workaround will be appreciated... can we do some in memory searching again? (using native lucene somehow?...) On Friday, June 20, 2014 1:13:42 AM UTC+3, Itamar Syn-Hershko wrote: > > It is very hard to give you concrete advice without knowing more about > your domain and usecases, but here are 2 points that came to mind: > > 1. You can make use of the highlighting features to show the content that > matched. Highlighters can return whole blocks of text, and by using > positionIncrements correctly you can get this right. > > 2. Yes, Elasticsearch is a document-oriented storage, but is it really > necessary for you to index entire books as one document? I'd most certainly > look at indexing sections or chapters maybe even pages as single documents > and use string references to the book ID. Unless you use data from the book > level along with full-text searches on the texts, which even then in some > scenarios I would consider denormalization. > > -- > > Itamar Syn-Hershko > http://code972.com | @synhershko <https://twitter.com/synhershko> > Freelance Developer & Consultant > Author of RavenDB in Action <http://manning.com/synhershko/> > > > On Thu, Jun 19, 2014 at 10:13 PM, liorg <lior...@gmail.com <javascript:>> > wrote: > >> Well, assuming we have a book type. the book holds a lot of metadata, >> lets say something of the following: >> { >> "author": { >> "name": "Jose", >> "lastName": "Martin" >> }, >> "sections": [{ >> "chapters": [{ >> "pages": [{ >> "pageNum": 1, >> "numOfChars": 1000, >> "text": "let my people...", >> "numofWords": 125 >> }, >> { >> "pageNum": 2, >> "numOfChars": 1005, >> "text": "let my people go...", >> "numofWords": 150 >> }], >> "chapterName": "the start" >> }, >> { >> "pages": [{ >> "pageNum": 3, >> "numOfChars": 1000, >> "text": "will do...", >> "numofWords": 125 >> }, >> { >> "pageNum": 4, >> "numOfChars": 1005, >> "text": "will do later on...", >> "numofWords": 150 >> }], >> "chapterName": "the end" >> }], >> "sectionName": "prologue" >> }] >> } >> >> we want to search for all the pages that have "let my people" in their >> text and more than 100 words. >> so, when we use ES we can use nested objects and query on the nested page >> object - but the actual returned values are the books (parents) that have >> those matching pages. >> now, if we want to show the user the pages he was looking for - we cannot >> do that, as we get the whole book type returned with all its metadata and >> not just the nested objects that matched the criteria... - we need to >> search again (maybe in memory?) for the pages that matched the criteria in >> order to display the user his search results... (the whole type is returned >> as ES does not support yet in returning the nested objects that matched the >> criteria). >> >> i hope it is better understood now >> >> On Thursday, June 19, 2014 7:22:13 PM UTC+3, Itamar Syn-Hershko wrote: >> >>> This is usually something that's being solved using parent-child, but >>> the question here really is what do you mean by needing to retrieve both >>> books & pages. >>> >>> Can you describe the actual scenario and what you are trying to achieve? >>> >>> -- >>> >>> Itamar Syn-Hershko >>> http://code972.com | @synhershko <https://twitter.com/synhershko> >>> Freelance Developer & Consultant >>> Author of RavenDB in Action <http://manning.com/synhershko/> >>> >>> >>> On Thu, Jun 19, 2014 at 7:12 PM, liorg <lior...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> we have somehow a complex type holding some nested docs with arrays >>>> (lets assume an hierarchy of books and for each book we have an array of >>>> pages containing its metadata). >>>> >>>> we want to search for the nested doc - search for all the books that >>>> have the term "XYZ" in one of their pages - but we want to get back not >>>> only the book, but the pages themselves. >>>> >>>> We've understood that it's problematic to achieve with ES (see >>>> https://github.com/elasticsearch/elasticsearch/issues/3022). >>>> >>>> We have a problem to achieve it with parent child model as the data >>>> model comes from our mongodb already existing model (and besides, not sure >>>> if a parent child model fits here). >>>> >>>> so... >>>> >>>> 1. Is there any a workaround we can do to get the results of the nested >>>> doc? (the actual pages?) >>>> 2. If not, is there a recommended way we can search for the data again >>>> in memory after it was narrowed down by ES server?... >>>> 3. Any advice will be appreciated as this is quite a big obstacle in >>>> our way to implement a solution using ES. >>>> >>>> thanks, >>>> >>>> Lior >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "elasticsearch" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to elasticsearc...@googlegroups.com. >>>> >>>> To view this discussion on the web visit https://groups.google.com/d/ >>>> msgid/elasticsearch/7602d608-5730-472e-8259-763ff29614ea% >>>> 40googlegroups.com >>>> <https://groups.google.com/d/msgid/elasticsearch/7602d608-5730-472e-8259-763ff29614ea%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to elasticsearc...@googlegroups.com <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/6c3034e7-34d9-4b4d-802a-5110330b31a4%40googlegroups.com >> >> <https://groups.google.com/d/msgid/elasticsearch/6c3034e7-34d9-4b4d-802a-5110330b31a4%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c31e949a-0d6c-400c-bffd-48e203e86c52%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.