This message started in [email protected], but appears to be a more 
general problem with gsearch, so I'm also copying this to fedora-users.

On Mar 18, 2013, at 8:06 PM, Peter Murray <[email protected]> wrote:
> Does default configuration of GSearch for Islandora-7.x index the FULL_TEXT 
> datastream of objects created by the PDF Solution Pack?  The search engine 
> appears to index the metadata without fail.  I've even gone into the GSearch 
> updateIndex web screen and updated all of the FOXML files.  I'm using the 
> GSearch 2.5 (the version previous to the one released today) 
> 'fgsconfig-basic-for-islandora.properties' updated with the passwords and 
> locations specific to my setup.


I've dug a little deeper on this, and am still coming up stymied.  It looks 
like objects with PDFs are not getting index.  GSearch is showing this error:

DEBUG 2013-03-18 21:34:09,583 (Config) insertSystemProperties 
propertyValue=http://localhost:8080/solr
DEBUG 2013-03-18 21:34:09,594 (OperationsImpl) closeIndexSearcher 
indexName=FgsIndex
DEBUG 2013-03-18 21:34:09,595 (OperationsImpl) closeIndexReader 
indexName=FgsIndex docCount=45
ERROR 2013-03-18 21:34:09,597 (UpdateListener) Unable to perform index update 
due to Exception: Mon Mar 18 21:34:09 EDT 2013 Connection error (is Solr 
running at http://localhost:8080/solr/update ?): java.io.IOException: Server 
returned HTTP response code: 500 for URL: http://localhost:8080/solr/update
dk.defxws.fedoragsearch.server.errors.GenericSearchException: Mon Mar 18 
21:34:09 EDT 2013 Connection error (is Solr running at 
http://localhost:8080/solr/update ?): java.io.IOException: Server returned HTTP 
response code: 500 for URL: http://localhost:8080/solr/update
        at dk.defxws.fgssolr.OperationsImpl.postData(OperationsImpl.java:653)
        at dk.defxws.fgssolr.OperationsImpl.indexDoc(OperationsImpl.java:473)
        at dk.defxws.fgssolr.OperationsImpl.fromPid(OperationsImpl.java:413)


Which correlates to this SOLR error in catalina.out:

Mar 18, 2013 9:34:09 PM org.apache.solr.common.SolrException log
SEVERE: [com.ctc.wstx.exc.WstxLazyException] 
com.ctc.wstx.exc.WstxParsingException: Illegal character entity: expansion 
character (code 0xc) not a valid XML character
 at [row,col {unknown-source}]: [1668,5]
        at 
com.ctc.wstx.exc.WstxLazyException.throwLazily(WstxLazyException.java:45)
        at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:729)
        at 
com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3659)
        at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
        at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:315)
        at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:156)
        at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:79)


The discussions I'm seeing on Stack Exchange about the "…not a valid XML 
character" point to XML that is being generated with characters that are 
invalid in XML.  (In this case 0xC -- or "form feed" character.)

Before I start tracing around the guts of GSearch, is this sounding familiar to 
anyone?


Peter
--
Peter Murray
Assistant Director, Technology Services Development
LYRASIS
[email protected]
+1 678-235-2955
800.999.8558 x2955



------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
Fedora-commons-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/fedora-commons-users

Reply via email to