maybe using TopDocs.merge you can the same query on multiple indexes, with multireader you can also to make join operation on different indexes
2016-08-21 19:31 GMT+02:00 Cristian Lorenzetto < cristian.lorenze...@gmail.com>: > i m overviewing TopDocs.merge. > > What is the difference to use multiple SearchIndexer and then to use > TopDocs or to use MultiReader? > > 2016-08-21 2:28 GMT+02:00 Cristian Lorenzetto < > cristian.lorenze...@gmail.com>: > >> For my opinion this study dont tell any thing more than before. Obviously >> if you try to retrieve all data store in a single query the performance >> will be not good. Lucene is fantastic But no magic. The physic laws >> continue to work also with lucene. The query is designed for retrieving a >> small part of a big store, not All The store. In addition i think The time >> would be worst also if you dont sort documents. Using a sorted linked list >> persisted i dont see relevant delays . Syncerely i dont understand also gc >> memory limit with lucene algorithm. The size of memory used is not >> proporzional to the datastore size, else lucene will be not scalable. The >> problem to analize for me is another : considering The trend of big data to >> encrease in The last years , considering The classical max size of a >> database among those we know, considering The possibility or not to scale >> up sharding in lucene in arrays defined dinamically or not , we can >> evaluate if this refactoring has sense or not. >> >> Inviato da iPad >> >> > Il giorno 19 ago 2016, alle ore 05:50, Erick Erickson < >> erickerick...@gmail.com> ha scritto: >> > >> > OK, I'm a little out of my league here, but I'll plow on anyway.... >> > >> > bq: There are use cases out there where >2^31 does make sense in a >> single index >> > >> > Ok, let's put some definition to this and define the use-case >> > specifically rather than >> > be vague. I've just run an experiment for instance where I had 200M >> > docs in a single >> > shard (very small docs) and tried to sort by a date on all of them. >> > Performance on the order of >> > 5 seconds. 3B is what, 75 seconds? Does the use-case involve sorting? >> > Faceting? If >> > so the performance will probably be poor. >> > >> > This would be huge surgery I believe, and there hasn't been a >> > compelling use-case >> > in the search world for it. Unless and until that case is made I >> > suspect this idea will >> > meet with a lot of resistance. >> > >> > That said, I do understand that this is somewhat akin to "Nobody will >> > ever need more >> > than 64K of ram", meaning that some limits are assumed and eventually >> become >> > outmoded. But given Java's issues with memory and GC I suspect that >> > it'll be really >> > hard to justify the work this would take. >> > >> > FWIW, >> > Erick >> > >> > >> >> On Thu, Aug 18, 2016 at 6:31 PM, Trejkaz <trej...@trypticon.org> >> wrote: >> >>> On Thu, Aug 18, 2016 at 11:55 PM, Adrien Grand <jpou...@gmail.com> >> wrote: >> >>> No, IndexWriter enforces that the number of documents cannot go over >> >>> IndexWriter.MAX_DOCS (which is a bit less than 2^31) and >> >>> BaseCompositeReader computes the number of documents in a long >> variable and >> >>> ensures it is less than 2^31, so you cannot have indexes that contain >> more >> >>> than 2^31 documents. >> >>> >> >>> Larger collections should be written to multiple shards and use >> >>> TopDocs.merge to merge results. >> >> >> >> But hang on: >> >> * TopDocs#merge still returns a TopDocs. >> >> * TopDocs still uses an array of ScoreDoc. >> >> * ScoreDoc still uses an int doc ID. >> >> >> >> Looks like you're still screwed. >> >> >> >> I wish IndexReader would use long IDs too, because one IndexReader can >> >> be across multiple shards too - it doesn't make much sense to me that >> >> this is restricted, although "it's hard to fix in a >> >> backwards-compatible way" is certainly a good reason. :D >> >> >> >> TX >> >> >> >> --------------------------------------------------------------------- >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> > For additional commands, e-mail: java-user-h...@lucene.apache.org >> > >> > >