Hi,

I am working on indexing Freebase data within EntityHub and observed
following issue:

01:06:01,547 [Thread-3] ERROR jena.riot - [line: 1, col: 7 ] Element or
attribute do not match QName production: QName::=(NCName':')?NCName.

I would appreciate any help pertaining to this issue.

Thanks,
Rajan

*Steps followed:*

*1. Initialization: *
java -jar org.apache.stanbol.entityhub.indexing.freebase-1.0.0-SNAPSHOT.jar
 init

*2. Download the data:*
Download data and copy it to https://developers.google.com/freebase/data

*3. Performed execution of fbrankings-uri.sh*
It generated incoming_links.txt under resources directory as follows

10888430 m.0kpv11
3741261 m.019h
2667858 m.0775xx5
2667804 m.0775xvm
1875352 m.01xryvm
1739262 m.05zppz
1369590 m.01xrzlb

*4. Performed execution of fixit script*

gunzip -c ${FB_DUMP} | fixit | gzip > ${FB_DUMP_fixed}

*5. Rename the fixed file to freebase.rdf.gz and copy it *
to indexing/resources/rdfdata

*6. config/iditer.properties file has following setting*
#id-namespace=http://freebase.com/
ns-prefix-state=false

*7. Performed run of following command:*
java -jar -Xmx32g
org.apache.stanbol.entityhub.indexing.freebase-1.0.0-SNAPSHOT.jar index

The error dump on stdout is as follows:

01:37:32,884 [Thread-0] INFO  solryard.SolrYardIndexingDestination -  ...
copy Solr Configuration form /private/tmp/freebase/indexing/config/freebase
to /private/tmp/freebase/indexing/destination/indexes/default/freebase
01:37:32,895 [Thread-3] INFO  jenatdb.RdfResourceImporter -     - bulk
loading File freebase.rdf.gz using Format Lang:RDF/XML
01:37:32,896 [Thread-3] INFO  jenatdb.RdfResourceImporter - -- Start
triples data phase
01:37:32,896 [Thread-3] INFO  jenatdb.RdfResourceImporter - ** Load empty
triples table
*01:37:32,948 [Thread-3] ERROR jena.riot - [line: 1, col: 7 ] Element or
attribute do not match QName production: QName::=(NCName':')?NCName.*
01:37:32,948 [Thread-3] INFO  jenatdb.RdfResourceImporter - -- Finish
triples data phase
01:37:32,948 [Thread-3] INFO  jenatdb.RdfResourceImporter - -- Finish
triples load
01:37:32,960 [Thread-3] INFO  source.ResourceLoader - Ignore Error for File
/private/tmp/freebase/indexing/resources/rdfdata/freebase.rdf.gz and
continue

Additional Reference Point:

*Original Freebase dump size:*  31025015397 May 14 18:10
freebase-rdf-latest.gz
*Fixed Freebase dump size:* 31026818367 May 15 12:45
freebase-rdf-latest-fixed.gz
*Incoming Links size: *1206745360 May 17 00:42 incoming_links.txt

Reply via email to