Hi, I am working on indexing Freebase data within EntityHub and observed following issue:
01:06:01,547 [Thread-3] ERROR jena.riot - [line: 1, col: 7 ] Element or attribute do not match QName production: QName::=(NCName':')?NCName. I would appreciate any help pertaining to this issue. Thanks, Rajan *Steps followed:* *1. Initialization: * java -jar org.apache.stanbol.entityhub.indexing.freebase-1.0.0-SNAPSHOT.jar init *2. Download the data:* Download data and copy it to https://developers.google.com/freebase/data *3. Performed execution of fbrankings-uri.sh* It generated incoming_links.txt under resources directory as follows 10888430 m.0kpv11 3741261 m.019h 2667858 m.0775xx5 2667804 m.0775xvm 1875352 m.01xryvm 1739262 m.05zppz 1369590 m.01xrzlb *4. Performed execution of fixit script* gunzip -c ${FB_DUMP} | fixit | gzip > ${FB_DUMP_fixed} *5. Rename the fixed file to freebase.rdf.gz and copy it * to indexing/resources/rdfdata *6. config/iditer.properties file has following setting* #id-namespace=http://freebase.com/ ns-prefix-state=false *7. Performed run of following command:* java -jar -Xmx32g org.apache.stanbol.entityhub.indexing.freebase-1.0.0-SNAPSHOT.jar index The error dump on stdout is as follows: 01:37:32,884 [Thread-0] INFO solryard.SolrYardIndexingDestination - ... copy Solr Configuration form /private/tmp/freebase/indexing/config/freebase to /private/tmp/freebase/indexing/destination/indexes/default/freebase 01:37:32,895 [Thread-3] INFO jenatdb.RdfResourceImporter - - bulk loading File freebase.rdf.gz using Format Lang:RDF/XML 01:37:32,896 [Thread-3] INFO jenatdb.RdfResourceImporter - -- Start triples data phase 01:37:32,896 [Thread-3] INFO jenatdb.RdfResourceImporter - ** Load empty triples table *01:37:32,948 [Thread-3] ERROR jena.riot - [line: 1, col: 7 ] Element or attribute do not match QName production: QName::=(NCName':')?NCName.* 01:37:32,948 [Thread-3] INFO jenatdb.RdfResourceImporter - -- Finish triples data phase 01:37:32,948 [Thread-3] INFO jenatdb.RdfResourceImporter - -- Finish triples load 01:37:32,960 [Thread-3] INFO source.ResourceLoader - Ignore Error for File /private/tmp/freebase/indexing/resources/rdfdata/freebase.rdf.gz and continue Additional Reference Point: *Original Freebase dump size:* 31025015397 May 14 18:10 freebase-rdf-latest.gz *Fixed Freebase dump size:* 31026818367 May 15 12:45 freebase-rdf-latest-fixed.gz *Incoming Links size: *1206745360 May 17 00:42 incoming_links.txt