Hi Nick,

The flow is, without details:

- After ingest, Fedora sends modification message to update listeners

- GSearch receives it in 
dk.defxws.fedoragsearch.server.UpdateListener.onMessage()

- calls dk.defxws.fedoragsearch.server.GenericOperationsImpl.updateIndex()

- gets the foxml from org.fcrepo.server.management.FedoraAPIM.export() -- a 
SOAP call

- gets the Solr index document from GTransformer.transform(xslt, foxml) 

- wherein the xslt transformer calls 
GenericOperationsImpl.getDatastreamFromTika()

- which gets the datastream from 
org.fcrepo.server.access.FedoraAPIA.getDatastreamDissemination() -- a SOAP call

- and gets the index field contents from 
TransformerToText.getFromTika(datastream)

If you have used the default foxmlToSolr.xslt, you will also call 
getDatastreamDissemination() on your video streams, a waste of processing time, 
since you get no index text out of a video stream.

Therefore, you should tailor your foxmlToSolr.xslt to avoid the datastreams 
containing video streams, e.g.

<xsl:for-each select="foxml:datastream[@ID != '<your-video-datastream-id>' and 
(@CONTROL_GROUP='M' or @CONTROL_GROUP='E' or @CONTROL_GROUP='R')]">
        <xsl:value-of disable-output-escaping="yes" 
select="exts:getDatastreamFromTika($PID, $REPOSITORYNAME, @ID, 'field', 
concat('ds.', @ID), concat('dsmd_', @ID, '.'), '', $FEDORASOAP, $FEDORAUSER, 
$FEDORAPASS, $TRUSTSTOREPATH, $TRUSTSTOREPASS)"/>
</xsl:for-each>
                        
Cheers,
Gert



On 16/10/2013, at 08.26, Nick Ruest wrote:

> Hi Gert (& maybe all),
> 
> After a bunch of investigating and experimenting, I think I've made a 
> some headway. I'm now running Java 7, and the large file processing 
> seems to be running a lot smoother. But, I did notice a couple things 
> tailing the fedoragsearch.daily.log & fedora.log[1]. The datastream in 
> question is a 6.97GB video file. The server this is all running has 24G, 
> and this (-Xms18432M -Xmx18432M -XX:MaxPermSize=1024M) is my memory 
> allocation setup for firing up the entire stack.
> 
> I'm not sure if this is a GSearch problem or fcrepo problem, or both. Is 
> GSearch making the SOAP calls? The initial ingest of the file in 
> question took place via REST. So, I'm a little confused here.
> 
> Any insight/guidance would be very much appreciated!
> 
> cheers!
> 
> -nruest
> 
> [1] https://gist.github.com/ruebot/7003346
> 
> 
> On 13-10-07 05:20 AM, Gert Schmeltz Pedersen wrote:
>> Hi Nick,
>> 
>> You may see the time consumption for Fedora and for GSearch separately from 
>> the fedora.log and fedoragsearch.log, and for GSearch you may see the time 
>> for each datastream.
>> 
>> Concerning GSearch, you may see from foxmlToSolr.xslt how tika is called for 
>> the video stream. You may index datastreams on the metadata or on the 
>> contents or both, and if your foxmlToSolr.xslt by default try to index the 
>> video contents, then you should tailor your foxmlToSolr.xslt, see the 
>> GSearch documentation page about how to call tika on datastreams.
>> 
>> Gert
>> 
>> 
>> On 07/10/2013, at 04.03, Nick Ruest wrote:
>> 
>>> Hi folks,
>>> 
>>> Late last week I decided to test out ingesting some large files (5GB
>>> video file) with Plupload[1][2], and while I was able to ingest just
>>> fine through Islandora interface, I've noticed fcrepo has become
>>> basically worthless since. I wanted to try and wait it out and see if
>>> this is just its thing with large files I noticed a while back -- taking
>>> forever to decided how to handle it -- but, about 4 days later, we still
>>> have massive processes rocking[3].
>>> 
>>> Is this expected behaviour? Is this a faux pas (dude never let fcrepo
>>> manage a large file!)? Gsearch/Tika chugging away at the file forever?
>>> Or, something else?
>>> 
>>> I'm running fcrepo 3.6.2 on an Islandora stack (gsearch + solr), and
>>> here[4] is my install.properities. Let me know if you need anymore
>>> config info, or anything else.
>>> 
>>> cheers!
>>> 
>>> -nruest
>>> 
>>> [1] https://drupal.org/project/plupload
>>> [2] https://github.com/discoverygarden/islandora_plupload
>>> [3] http://i.imgur.com/3ewAeSD.jpga
>>> [4] https://gist.github.com/ruebot/01fbbec034b7331dcc94
>>> 
>>> ------------------------------------------------------------------------------
>>> October Webinars: Code for Performance
>>> Free Intel webinars can help you accelerate application performance.
>>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most 
>>> from
>>> the latest Intel processors and coprocessors. See abstracts and register >
>>> http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
>>> _______________________________________________
>>> Fedora-commons-users mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
>> 
>> 
>> ------------------------------------------------------------------------------
>> October Webinars: Code for Performance
>> Free Intel webinars can help you accelerate application performance.
>> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
>> the latest Intel processors and coprocessors. See abstracts and register >
>> http://pubads.g.doubleclick.net/gampad/clk?id=60134791&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Fedora-commons-users mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users
>> 
> 
> ------------------------------------------------------------------------------
> October Webinars: Code for Performance
> Free Intel webinars can help you accelerate application performance.
> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
> the latest Intel processors and coprocessors. See abstracts and register >
> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
> _______________________________________________
> Fedora-commons-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
_______________________________________________
Fedora-commons-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

Reply via email to