[ 
https://issues.apache.org/jira/browse/SOLR-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-3094:
---------------------------------

    Attachment: SOLR-3094.patch

OK, anyone with good javascript skills, this would be a good time to chime in...

This is a variant of SOLR-1931. The new UI calls Luke at the top level in such 
a way that it enumerates all the terms in all the fields to gather the 
histogram data, which takes a long time. Note, this is what the old admin 
UI/Luke handler did when you clicked "schema browser" link.

Once that data is accumulated, then clicking on the individual fields and 
showing that data is very fast since the data is local. But this data is 
accumulated *before* any field is selected from the "schema browser" drop-down 
and stored away.

I think this design is too costly, especially the "get all the data for all the 
fields up-front" bit. The users pay a penalty (many minutes demonstrated) even 
when they may only care about one field. So here's what I propose.

1> Tweak the LukeRequestHandler so it *requires* the fieldName parameter to 
gather the historgram data. That fixes the initial display of the stats issue 
that sparked this JIRA. I can do that in a few minutes, patch attached (do not 
commit yet, though). Problem is there is then no way at all to get the stats 
data.

2> Tweak the javascript to call the luke request handler to collect the data 
for individual fields only when the user selects them from the drop-down, 
stowing them away at that point so they can be revisited if desired. Here's 
where I could use some help, my javascript skills are rudimentary at best. If 
anyone could work the javascript I'd be happy to field test. Or even just put 
some comments in the code pointing me to them. Any trunk code from after 6-Jan 
will have the right Luke handler in it (see SOLR-1931).

There's also something wrong with the display of the histogram, the "bucket" 
and count in each bucket are mashed together on the bottom. With non-trivial 
indexes, this is largely unreadable since they're side-by-side...

Anyway, the attached patch makes it so you can get into the admin page without 
paying the above penalties, but you *never* get histogram data when you go into 
"schema browser". If someone applies this to work on the admin UI bit, 
attaching "&fl=field1 field2" to the luke URL will cause the histogram data to 
be returned for the field(s) specified.

If anyone has some spare cycles to help out here it would be outstanding.

I think something similar could be done for the old admin UI as well in terms 
of only getting the fields when requested, otherwise the histogram data won't 
be returned either...
                
> The statistics entry on the new admin UI is very slow
> -----------------------------------------------------
>
>                 Key: SOLR-3094
>                 URL: https://issues.apache.org/jira/browse/SOLR-3094
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 4.0
>         Environment: trunk only, all environments
>            Reporter: Erick Erickson
>            Assignee: Erick Erickson
>         Attachments: SOLR-3094.patch
>
>
> Prompted by Robert Reynolds (SOLR-2667), the entry point in the new Admin UI 
> core drill down (e.g. clicking "singlecore" takes a long time. 28-46 
> *minutes* on a 13M-23M doc set.
> On an example Wikipedia index (11M) docs, I see 21 seconds, compared to less 
> than 2 seconds in the old admin UI (I'm using the old admin UI linked to from 
> the new UI page on trunk). I have a very simple index layout compared to a 
> commercial site. Clearly something is not right. I suspect that all the terms 
> are being walked.
> This is particularly an issue because this behavior happens when I click 
> "singlecore", so getting to the really neat parts of the new UI is hard.
> Robert reports on a separate thread that the same behavior happens just 
> hitting admin/luke in the URL which is also slow in the 3.x world, which 
> hints at where the problem lies.
> I'm going to guess that the terms are being walked and we can use the tricks 
> used in SOLR-1931 to deal with the fact that admin/luke takes a long time, 
> and just change the call to the entry ("singlecore") for this issue.
> Robert: Thanks for pointing this out!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to