Definitely b). I would also suggest groups and expanding user groups at
user sign in time.

MW

On Thu, Jun 16, 2016 at 12:36 PM, Ian Lea <ian....@gmail.com> wrote:

> I'd definitely go for b).  The index will of course be larger for every
> extra bit of data you store but it doesn't sound like this would make much
> difference.  Likewise for speed of indexing.
>
>
> --
> Ian.
>
>
> On Wed, Jun 15, 2016 at 2:25 PM, Geebee Coder <g.b.co...@gmail.com> wrote:
>
> > Hi there,
> > I would like to use Lucene to solve the following problem:
> >
> > 1.We have about 100k customers and we have 25 millions of documents.
> >
> > 2.When a customer performs a text search on the document space, we want
> to
> > return only documents that the customer has access to.
> >
> > 3.The # of documents a customer owns varies a lot. some have close to 23
> > million, some have close to 10k and some own a third of the documents
> etc.
> >
> > What is an efficient way to use Lucene in this scenario in terms of
> > performance and indexing?
> > We have tried a number of solutions such as
> >
> >  a)100k boolean fields per document that indicates whether a customer has
> > access to the document.
> >  b)A single text field that has a list of customers who owns the document
> > e.g. (customers field : "abc abd cfx...")
> > c) the above option with shards by customers
> >
> > The search&index performance for a was bad. b,c performed better for
> search
> > but lengthened the time needed for indexing & index size.
> > We are also thinking about using a custom filter but we are concerned
> about
> > the memory requirements.
> >
> > Any ideas/suggestions would be really appreciated.
> >
>

Reply via email to