Hi helix84:

Thanks for the prompt response! It's very much appreciated :).

Thanks also for the explanation on search and browse. That really helped clear 
things up!

I just have a few remaining follow-up questions.

1) Is it necessary to run "index-update" and/or "update-discovery-index" as 
cronjobs? The indexes and browse tables appear to be updated automatically 
(notwithstanding that cocoon caching issue), so it doesn't seem necessary. 

I imagine "index-update" should only need to be run after changes to the 
indexing structure have occurred? 

Also, "update-discovery-index" might only need to be run regularly (i.e. daily) 
with the optimize option?


2) In regards to statistics (e.g. /xmlui/statistics/ and /jspui/statistics/), 
I've figured this one out! One needs to use the commands "[dspace]/bin/dspace 
stat-initial" and "[dspace]/bin/dspace stat-report-initial" to generate the 
traditional pre-1.6 statistics. Then use the 
"stat-monthly","stat-report-monthly","stat-general", and "stat-report-general" 
commands as cronjobs moving forward for regular updates.

The first command scans the logs and the second creates the static HTML pages. 
(Interestingly, the HTML pages are only needed for the JSPUI. I guess there 
must be a mechanism where the XMLUI pulls directly from the generated .dat 
files. I assume there are no database tables for this.)

That said, there are missing statistics. The sections "Words Searched", "Items 
Viewed", and "User Logins" are empty. While anything related to search/views 
are absent in other sections as well. I figured that it might be because these 
stats are tracked in Solr, but I imagine the events are still logged the same 
as before?

I've seen statistics present in AgriOcean 1.7.1 installs (which use DSpace 
1.7), so perhaps the pre-1.6 stat scripts are looking for events that DSpace 
3.x describes differently in the logs? That's my best guess. And/or that 
AgriOcean doesn't use Solr for stats?

I would love to hear feedback on this one for people using DSpace 1.7.x/1.8.x 
and/or DSpace 3.x

--
PS: Withdrawn items still appear in Solr usage statistics in the DSpace UI but 
they show a number (perhaps their internal UID?) instead of their title. Is 
this intentional or should withdrawn items not appear in the usage statistics?
--

3) As for the cocoon cache, I've followed your suggestion of using 
"index-update", but it hasn't changed anything.

Any time I change the title of an item, the old title is presented when 
browsing (in the XMLUI - no the JSPUI) but is updated everywhere else.

This behaviour is identical even when using the SolrBrowseDAO.

So far, the only way I've gotten the correct title to show in the browse is to 
clear the cache or to add a new item (which must trigger a new page to be 
created and cached).

--

Thanks once again for your help. I greatly appreciate the insight :)

-David

-----Original Message-----
From: ivan.ma...@gmail.com [mailto:ivan.ma...@gmail.com] On Behalf Of helix84
Sent: Wednesday, 27 November 2013 8:45 PM
To: David Cook
Cc: dspace-tech
Subject: Re: [Dspace-tech] Difference between Lucene and Solr Indexing Commands

Hi David,

all your questions are very valid and as we're in a transition phase between 
the old search and Discovery, there are some minor caveats.

On Wed, Nov 27, 2013 at 8:11 AM, David Cook <dc...@prosentient.com.au> wrote:
> 1)      By default (i.e. Discovery is disabled and there is no OAI-PMH
> cronjob), Solr is only used for statistics, yes? Instead, 
> indexing/searching/retrieving metadata is handled by another entity 
> (Lucene and/or the database)?

That's correct as of DSpace 3.x. Search is handled by Lucene; browse and 
itemcounter (collections strengths, disabled by default) is handled by helper 
tables in the DB. DSpace 4.x will have Discovery by default and the indexing 
commands will be renamed for clarity.

> How do the following commands factor in? Are these commands that 
> manipulate the Lucene indexes based on the database?

Yes, in 3.x index-init and index-update work on the Lucene index and the browse 
DB tables. itemcounter works on the item counter DB table.

> (When Discovery is enabled, these
> commands no longer have any effect, correct? Instead, one uses 
> “[dspace]/bin/dspace update-discovery-index” which is the only command 
> that Solr API uses to manipulate the Lucene indexes?)

That's almost correct. When you enable Discovery, search is handled by Solr, 
but browse is still handled by the DB-backed tables, so you should use 
index-init/index-update to keep those up to date.
Alternatively, you can enable the SolrBrowseDAO (new in 3.0) class to also 
handle browse using Solr - see browseDAO.class and browseCreateDAO.class in 
dspace.cfg.

This also answers your question 3).

>                 When I tried using these index-* commands, I didn’t 
> receive any errors at the CLI, but nothing seemed to happen in DSpace 
> either. That is, there never seemed to be a need to run index-update, 
> as the metadata was always up to date.

With Discovey enabled, you would only notice it on the browse pages.

>                 Index-init didn’t seem to remove indexes either. Or if 
> it did, search then defaulted to directly querying the database?

No, search has never queried the database directly. index-init has a couple of 
options which include clearing the Lucene index or recreating the DB browse 
tables. See documentation [1].

>                 (Related question: How do you add search indexes when 
> Discovery is enabled? Or do you need to?)

Do you mean how to add a new index on a new field? [2]

> 2)      While the usage/search/workflow statistics are being populated
> correctly, the “general” statistics seem to be missing. Every time I 
> visit http://localhost:8080/xmlui/statistics or 
> http://localhost:8080/jspui/statistics, it says, “There are currently 
> no reports available for this service. Please check back later”.
>
> I’ve run “[dspace]/bin/dspace stats-util -b –r” as a desperate attempt 
> to get something, but it doesn’t produce errors or results. I get the 
> typical “Created new kernel…Loading from classloader…Using dspace 
> provided log configuration…Loading…” messages in the CLI, but that’s it.
>
> Any tips on how to get these statistics?

I think these are the old, pre-Solr statistics from before DSpace 1.6.
stats-util should be the correct command to generate them. I never used them 
myself, so I'll leave this question for someone else to answer.

> 3)      Finally, is there a time/size limit on the Cocoon cache? Surely,
> manually clearing the cache (either via the CLI or the XMLUI Control 
> Panel) isn’t the only way that it can be refreshed?

Surely there is, but i never needed to know what it is. You only ever need to 
clear the Cocoon cache if you change XSL files, Java code or some configuration 
options. When there are changes only in content, there should be no need to 
clear it.

> If I change the title of an item, it changes everywhere except when 
> doing a “browse”. The browse seems to be stuck on the old title until 
> I clear the Cocoon cache.

This sounds weird. Try index-update, which is the normal procedure and let us 
know if this happens nonetheless.

> Any ideas on this one? I suppose a cronjob might be able to do it, but 
> is there a way to do it from the CLI which doesn’t involve shutting 
> down Tomcat and starting it again?

I answered this in 1).



[1] 
https://wiki.duraspace.org/pages/viewpage.action?pageId=32474035#ReIndexingContent(forBrowseorSearch)-CreatingtheBrowse&SearchIndexes
[2] 
https://wiki.duraspace.org/display/DSDOC3x/Discovery#Discovery-ConfiguringlistsofsidebarFacetsandsearchFilters


Regards,
~~helix84

Compulsory reading: DSpace Mailing List Etiquette 
https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette



------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349351&iu=/4140/ostg.clktrk
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Reply via email to