Hello Andrea,

Thank you very much for this information. It's extremely helpful. I must say 
I'm very surprised that DSpace doesn't provide full-text searching by default 
-- only by setting up a cron job or by running the indexing manually. Searching 
the full text of text-based file formats should be the default behavior out of 
the box, not a specialized or atypical use case.

As regards running the dspace script with sudo, now that I've done it once, is 
there a way for me to determine whether I've messed up the file system 
permissions?

Many thanks,
Greg


On Jun 22, 2015, at 12:57 AM, Andrea Schweer wrote:

Hi,

On 20/06/15 06:03, Murray, Gregory wrote:
>From the DSpace bin directory I ran "sudo ./dspace index-discovery" but it had 
>no effect on the problem. I don't see anything in the documentation to 
>indicate that full-text indexing has to be enabled with a config change. 
>Surely it's enabled by default, right?

Fulltext extraction needs to be scheduled via cron, see 
https://wiki.duraspace.org/display/DSDOC5x/Scheduled+Tasks+via+Cron -- you need 
the media filter task. This task extracts the full text and it also generates 
thumbnails. That's what Hilton was referring to. Unfortunately, full-text 
extraction doesn't happen automatically, you do need to run the media filter. 
The media filter will trigger a re-index of the item once the .pdf.txt file has 
been generated, and from then on you can do fulltext searches on that item.

You may be aware that we're currently collecting use cases for DSpace. 
Automatically running the media filter when an item is ingested is mentioned in 
the Tim's comment on this use case: 
https://wiki.duraspace.org/display/DSPACE/Admin+UI+-+Run+media+filters -- if 
you think this would be useful, you may wish to "like" Tim's comment and/or 
leave a comment of your own on the use case.

Finally, unless you're running DSpace as root, you should never ever run a 
dspace command with just sudo. This can thoroughly mess up the file system 
permissions. All dspace commands need to be run as the same user that Tomcat 
runs under (the user is "dspace" if you followed the DSpace installation 
instructions, but it could be "tomcat" or something else depending on your 
OS/distribution).

cheers,
Andrea


--
Dr Andrea Schweer
IRR Technical Specialist, ITS Information Systems
The University of Waikato, Hamilton, New Zealand

------------------------------------------------------------------------------
Monitor 25 network devices or servers for free with OpManager!
OpManager is web-based network management software that monitors 
network devices and physical & virtual servers, alerts via email & sms 
for fault. Monitor 25 devices for free with no restriction. Download now
http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette

Reply via email to