Definitely b). I would also suggest groups and expanding user groups at user sign in time.
MW On Thu, Jun 16, 2016 at 12:36 PM, Ian Lea <ian....@gmail.com> wrote: > I'd definitely go for b). The index will of course be larger for every > extra bit of data you store but it doesn't sound like this would make much > difference. Likewise for speed of indexing. > > > -- > Ian. > > > On Wed, Jun 15, 2016 at 2:25 PM, Geebee Coder <g.b.co...@gmail.com> wrote: > > > Hi there, > > I would like to use Lucene to solve the following problem: > > > > 1.We have about 100k customers and we have 25 millions of documents. > > > > 2.When a customer performs a text search on the document space, we want > to > > return only documents that the customer has access to. > > > > 3.The # of documents a customer owns varies a lot. some have close to 23 > > million, some have close to 10k and some own a third of the documents > etc. > > > > What is an efficient way to use Lucene in this scenario in terms of > > performance and indexing? > > We have tried a number of solutions such as > > > > a)100k boolean fields per document that indicates whether a customer has > > access to the document. > > b)A single text field that has a list of customers who owns the document > > e.g. (customers field : "abc abd cfx...") > > c) the above option with shards by customers > > > > The search&index performance for a was bad. b,c performed better for > search > > but lengthened the time needed for indexing & index size. > > We are also thinking about using a custom filter but we are concerned > about > > the memory requirements. > > > > Any ideas/suggestions would be really appreciated. > > >