Hello Andrea,
Thank you very much for this information. It's extremely helpful. I must say
I'm very surprised that DSpace doesn't provide full-text searching by default
-- only by setting up a cron job or by running the indexing manually. Searching
the full text of text-based file formats should be the default behavior out of
the box, not a specialized or atypical use case.
As regards running the dspace script with sudo, now that I've done it once, is
there a way for me to determine whether I've messed up the file system
permissions?
Many thanks,
Greg
On Jun 22, 2015, at 12:57 AM, Andrea Schweer wrote:
Hi,
On 20/06/15 06:03, Murray, Gregory wrote:
>From the DSpace bin directory I ran "sudo ./dspace index-discovery" but it had
>no effect on the problem. I don't see anything in the documentation to
>indicate that full-text indexing has to be enabled with a config change.
>Surely it's enabled by default, right?
Fulltext extraction needs to be scheduled via cron, see
https://wiki.duraspace.org/display/DSDOC5x/Scheduled+Tasks+via+Cron -- you need
the media filter task. This task extracts the full text and it also generates
thumbnails. That's what Hilton was referring to. Unfortunately, full-text
extraction doesn't happen automatically, you do need to run the media filter.
The media filter will trigger a re-index of the item once the .pdf.txt file has
been generated, and from then on you can do fulltext searches on that item.
You may be aware that we're currently collecting use cases for DSpace.
Automatically running the media filter when an item is ingested is mentioned in
the Tim's comment on this use case:
https://wiki.duraspace.org/display/DSPACE/Admin+UI+-+Run+media+filters -- if
you think this would be useful, you may wish to "like" Tim's comment and/or
leave a comment of your own on the use case.
Finally, unless you're running DSpace as root, you should never ever run a
dspace command with just sudo. This can thoroughly mess up the file system
permissions. All dspace commands need to be run as the same user that Tomcat
runs under (the user is "dspace" if you followed the DSpace installation
instructions, but it could be "tomcat" or something else depending on your
OS/distribution).
cheers,
Andrea
--
Dr Andrea Schweer
IRR Technical Specialist, ITS Information Systems
The University of Waikato, Hamilton, New Zealand
------------------------------------------------------------------------------
Monitor 25 network devices or servers for free with OpManager!
OpManager is web-based network management software that monitors
network devices and physical & virtual servers, alerts via email & sms
for fault. Monitor 25 devices for free with no restriction. Download now
http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech
List Etiquette: https://wiki.duraspace.org/display/DSPACE/Mailing+List+Etiquette