[
https://issues.apache.org/jira/browse/SOLR-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Erick Erickson updated SOLR-3094:
---------------------------------
Attachment: SOLR-3094.patch
OK, anyone with good javascript skills, this would be a good time to chime in...
This is a variant of SOLR-1931. The new UI calls Luke at the top level in such
a way that it enumerates all the terms in all the fields to gather the
histogram data, which takes a long time. Note, this is what the old admin
UI/Luke handler did when you clicked "schema browser" link.
Once that data is accumulated, then clicking on the individual fields and
showing that data is very fast since the data is local. But this data is
accumulated *before* any field is selected from the "schema browser" drop-down
and stored away.
I think this design is too costly, especially the "get all the data for all the
fields up-front" bit. The users pay a penalty (many minutes demonstrated) even
when they may only care about one field. So here's what I propose.
1> Tweak the LukeRequestHandler so it *requires* the fieldName parameter to
gather the historgram data. That fixes the initial display of the stats issue
that sparked this JIRA. I can do that in a few minutes, patch attached (do not
commit yet, though). Problem is there is then no way at all to get the stats
data.
2> Tweak the javascript to call the luke request handler to collect the data
for individual fields only when the user selects them from the drop-down,
stowing them away at that point so they can be revisited if desired. Here's
where I could use some help, my javascript skills are rudimentary at best. If
anyone could work the javascript I'd be happy to field test. Or even just put
some comments in the code pointing me to them. Any trunk code from after 6-Jan
will have the right Luke handler in it (see SOLR-1931).
There's also something wrong with the display of the histogram, the "bucket"
and count in each bucket are mashed together on the bottom. With non-trivial
indexes, this is largely unreadable since they're side-by-side...
Anyway, the attached patch makes it so you can get into the admin page without
paying the above penalties, but you *never* get histogram data when you go into
"schema browser". If someone applies this to work on the admin UI bit,
attaching "&fl=field1 field2" to the luke URL will cause the histogram data to
be returned for the field(s) specified.
If anyone has some spare cycles to help out here it would be outstanding.
I think something similar could be done for the old admin UI as well in terms
of only getting the fields when requested, otherwise the histogram data won't
be returned either...
> The statistics entry on the new admin UI is very slow
> -----------------------------------------------------
>
> Key: SOLR-3094
> URL: https://issues.apache.org/jira/browse/SOLR-3094
> Project: Solr
> Issue Type: Bug
> Components: Schema and Analysis
> Affects Versions: 4.0
> Environment: trunk only, all environments
> Reporter: Erick Erickson
> Assignee: Erick Erickson
> Attachments: SOLR-3094.patch
>
>
> Prompted by Robert Reynolds (SOLR-2667), the entry point in the new Admin UI
> core drill down (e.g. clicking "singlecore" takes a long time. 28-46
> *minutes* on a 13M-23M doc set.
> On an example Wikipedia index (11M) docs, I see 21 seconds, compared to less
> than 2 seconds in the old admin UI (I'm using the old admin UI linked to from
> the new UI page on trunk). I have a very simple index layout compared to a
> commercial site. Clearly something is not right. I suspect that all the terms
> are being walked.
> This is particularly an issue because this behavior happens when I click
> "singlecore", so getting to the really neat parts of the new UI is hard.
> Robert reports on a separate thread that the same behavior happens just
> hitting admin/luke in the URL which is also slow in the 3.x world, which
> hints at where the problem lies.
> I'm going to guess that the terms are being walked and we can use the tricks
> used in SOLR-1931 to deal with the fact that admin/luke takes a long time,
> and just change the call to the entry ("singlecore") for this issue.
> Robert: Thanks for pointing this out!
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]