Hi Rajan, I think this is because you named you file "freebase-rdf-latest-fixed.gz". Jena assumes RDF/XML if the RDF format is not provided by the file extension. Renaming the file to "freebase-rdf-latest-fixed.nt.gz" should fix this issue.
The suggestion of Antonio to use BaseKB is also a valid option. best Rupert On Tue, May 19, 2015 at 8:32 AM, Antonio David Perez Morales <ape...@zaizi.com> wrote: > Hi Rajan > > Freebase dump contains some things that does not fit very well with the > indexer. > I advise you to use the dump provided by BaseKB (http://basekb.com) which > is a curated Freebase dump. > I did not have any problem indexing it using that dump. > > Regards > > On Mon, May 18, 2015 at 8:48 PM, Rajan Shah <raja...@gmail.com> wrote: > >> Hi, >> >> I am working on indexing Freebase data within EntityHub and observed >> following issue: >> >> 01:06:01,547 [Thread-3] ERROR jena.riot - [line: 1, col: 7 ] Element or >> attribute do not match QName production: QName::=(NCName':')?NCName. >> >> I would appreciate any help pertaining to this issue. >> >> Thanks, >> Rajan >> >> *Steps followed:* >> >> *1. Initialization: * >> java -jar org.apache.stanbol.entityhub.indexing.freebase-1.0.0-SNAPSHOT.jar >> init >> >> *2. Download the data:* >> Download data and copy it to https://developers.google.com/freebase/data >> >> *3. Performed execution of fbrankings-uri.sh* >> It generated incoming_links.txt under resources directory as follows >> >> 10888430 m.0kpv11 >> 3741261 m.019h >> 2667858 m.0775xx5 >> 2667804 m.0775xvm >> 1875352 m.01xryvm >> 1739262 m.05zppz >> 1369590 m.01xrzlb >> >> *4. Performed execution of fixit script* >> >> gunzip -c ${FB_DUMP} | fixit | gzip > ${FB_DUMP_fixed} >> >> *5. Rename the fixed file to freebase.rdf.gz and copy it * >> to indexing/resources/rdfdata >> >> *6. config/iditer.properties file has following setting* >> #id-namespace=http://freebase.com/ >> ns-prefix-state=false >> >> *7. Performed run of following command:* >> java -jar -Xmx32g >> org.apache.stanbol.entityhub.indexing.freebase-1.0.0-SNAPSHOT.jar index >> >> The error dump on stdout is as follows: >> >> 01:37:32,884 [Thread-0] INFO solryard.SolrYardIndexingDestination - ... >> copy Solr Configuration form /private/tmp/freebase/indexing/config/freebase >> to /private/tmp/freebase/indexing/destination/indexes/default/freebase >> 01:37:32,895 [Thread-3] INFO jenatdb.RdfResourceImporter - - bulk >> loading File freebase.rdf.gz using Format Lang:RDF/XML >> 01:37:32,896 [Thread-3] INFO jenatdb.RdfResourceImporter - -- Start >> triples data phase >> 01:37:32,896 [Thread-3] INFO jenatdb.RdfResourceImporter - ** Load empty >> triples table >> *01:37:32,948 [Thread-3] ERROR jena.riot - [line: 1, col: 7 ] Element or >> attribute do not match QName production: QName::=(NCName':')?NCName.* >> 01:37:32,948 [Thread-3] INFO jenatdb.RdfResourceImporter - -- Finish >> triples data phase >> 01:37:32,948 [Thread-3] INFO jenatdb.RdfResourceImporter - -- Finish >> triples load >> 01:37:32,960 [Thread-3] INFO source.ResourceLoader - Ignore Error for File >> /private/tmp/freebase/indexing/resources/rdfdata/freebase.rdf.gz and >> continue >> >> Additional Reference Point: >> >> *Original Freebase dump size:* 31025015397 May 14 18:10 >> freebase-rdf-latest.gz >> *Fixed Freebase dump size:* 31026818367 May 15 12:45 >> freebase-rdf-latest-fixed.gz >> *Incoming Links size: *1206745360 May 17 00:42 incoming_links.txt >> > > -- > > ------------------------------ > This message should be regarded as confidential. If you have received this > email in error please notify the sender and destroy it immediately. > Statements of intent shall only become binding when confirmed in hard copy > by an authorised signatory. > > Zaizi Ltd is registered in England and Wales with the registration number > 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, > London W6 7AN. -- | Rupert Westenthaler rupert.westentha...@gmail.com | Bodenlehenstraße 11 ++43-699-11108907 | A-5500 Bischofshofen | REDLINK.CO .......................................................................... | http://redlink.co/