-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I was one of the people who instigated Gert to add that functionality.The 
motivation is to be able to extract technical assertions about binary 
datastreams and use them in indexing. It's not extracting content from images, 
although it could extract content from PDF files or other text-containing 
formats.

On perhaps a more useful note, you should definitely expect to alter the 
default indexing stylesheets, or even better, to create your own that are to 
your particular purposes.

- ---
A. Soroka
The University of Virginia Library

On Jul 24, 2013, at 8:32 AM, Alistair Young wrote:

> sorted it by removing the Apache Tika extraction from:
> 
> WEB-INF/classes/fgsconfigFinal/index/FgsIndex/foxmlToSolrGenerated.xslt
> 
> it seems it extracts the content and tries to index it. Not sure why it would 
> want to extract the content of an image but when it does it causes Solr to 
> fail to index the resource:
> 
> SEVERE: org.apache.solr.common.SolrException: Illegal character (NULL, 
> unicode 0) encountered: not valid in any content
> 
> Seems to only think some jpg files are not jpg files.
> 
> Alistair
> 
> -- 
> mov eax,1
> mov ebx,0
> int 80h
> 
> From: Alistair Young <[email protected]>
> Reply-To: "Support and info exchange list for Fedora users." 
> <[email protected]>
> Date: Wednesday, 24 July 2013 11:03
> To: "Support and info exchange list for Fedora users." 
> <[email protected]>
> Subject: Re: [fcrepo-user] Does gsearch index content with solr?
> 
> sorry should have mentioned, it's the content datastream, i.e. image/jpeg
> 
> Alistair
> 
> -- 
> mov eax,1
> mov ebx,0
> int 80h
> 
> From: Alistair Young <[email protected]>
> Reply-To: "Support and info exchange list for Fedora users." 
> <[email protected]>
> Date: Wednesday, 24 July 2013 10:59
> To: "Support and info exchange list for Fedora users." 
> <[email protected]>
> Subject: [fcrepo-user] Does gsearch index content with solr?
> 
> I have a weird problem. I dropped a foxml file into 
> FgsConfig/indexingXsltGenerator/foxml and configured etc but certain files, 
> when uploaded cause solr to crash:
> 
> SEVERE: org.apache.solr.common.SolrException: Illegal character (NULL, 
> unicode 0) encountered: not valid in any content
> 
> If I don't include datastream in the foxml it doesn't cause the crash, i.e. 
> remove this:
> 
> <foxml:datastream ID="AUDIT" STATE="A" CONTROL_GROUP="X" VERSIONABLE="false">
> 
> Should the foxml used to configure gsearch only contain 'metadata', i.e. DC, 
> RDF etc and not datastreams?
> 
> thanks,
> 
> Alistair
> 
> ------------------------------------------------------------------------------
> See everything from the browser to the database with AppDynamics
> Get end-to-end visibility with application monitoring from AppDynamics
> Isolate bottlenecks and diagnose root cause in seconds.
> Start your free trial of AppDynamics Pro today!
> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk_______________________________________________
> Fedora-commons-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.19 (Darwin)
Comment: GPGTools - http://gpgtools.org

iQEcBAEBAgAGBQJR78vHAAoJEATpPYSyaoIk8dsIALihgJB0b4OABcOcOnk2qthk
79JqHouayvOFwTNMHsHZMIPXQ9KlD7h/zrHVYPPOqXV8fvNb3+EeQEal5WJxs4Z3
mMevFpEpBlOWUOBAiEqayNNfnxNCGQ3ARCRXNzeiaheM43ouFCluOGkX9p3fjqSV
qq6QG862vDFvYF69rMH1NiFIUIA/QP8w/K/QzyI8qoblrzWCX2LmQ8NaH5b0oN1j
Nb0NXIQv+XOVJZeHFvbHNEzGMGMEWHKs2QsZ1auirOKaO3ccV74+gVTuvDkmmuXL
VjQQoxNBTqbkhSpoDsWPCkHE+fVGuWyFS/ffJQ/0heX1rWOkiOFgJhhGuwJOl2Y=
=s4aM
-----END PGP SIGNATURE-----

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Fedora-commons-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

Reply via email to