On Mon, 2014-12-15 at 11:33 +0100, Michael McCandless wrote:
> On Mon, Dec 15, 2014 at 4:53 AM, Toke Eskildsen <t...@statsbiblioteket.dk> 
> wrote:

[Toke: Limit on faceting with many references]

> Hmm that's probably the DocTermOrds 16 MB internal addressing limit?

Yes, we've hit that one before. If we did not have DocValues, I would
consider it a serious deficiency of Solr.

For one of the fields in the shard I tested, we had 675M references from
256M documents to 3M unique values, with the most popular value having
18M references.

(all of which works perfectly fine & fast with DocValues, yay!)

[2 days for conversion of 900GB index]

> That's awful.  Profile it?  But, how long did it take to index in the
> first place?

Full index takes 8 days with 24 CPUs going full tilt ~=192 CPU days.
Conversion is (sadly) single threaded, so measured in total CPU time, it
is just the 2 days. Still, we can't scale parallel conversions of
multiple shards very high due to limited local storage space.

I'll put a lot more timing debug logging into the code to investigate
where the time is spend.

[TestDemoParallelLeafReader]

> The DVs can be arbitrary (not just long); it's only that the test
> cases focuses on long.

My point was that there does not seem to be any auto-guessing of field
type (especially NumericsType for numeric values) in the code. Anyway,
since that would not guarantee correct results, it seems that it is
better anyway to require the user to be specific about what should
happen.

> Have a look @ the LUCENE-6005 branch: I broke this test out as a
> separate ReindexingReader + test.  I think we could do a better
> integration between that and the schema...

Down to practicalities, we need Lucene 4.8 as our DocValues are Disk
based and that support was removed in 4.9. I hope to find the time to
look at your better solution in January.

Regards,
Toke Eskildsen, State and University Library, Denmark



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to