Hi Rajan,

I think this is because you named you file
"freebase-rdf-latest-fixed.gz". Jena assumes RDF/XML if the RDF format
is not provided by the file extension. Renaming the file to
"freebase-rdf-latest-fixed.nt.gz" should fix this issue.

The suggestion of Antonio to use BaseKB is also a valid option.

best
Rupert

On Tue, May 19, 2015 at 8:32 AM, Antonio David Perez Morales
<ape...@zaizi.com> wrote:
> Hi Rajan
>
> Freebase dump contains some things that does not fit very well with the
> indexer.
> I advise you to use the dump provided by BaseKB (http://basekb.com) which
> is a curated Freebase dump.
> I did not have any problem indexing it using that dump.
>
> Regards
>
> On Mon, May 18, 2015 at 8:48 PM, Rajan Shah <raja...@gmail.com> wrote:
>
>> Hi,
>>
>> I am working on indexing Freebase data within EntityHub and observed
>> following issue:
>>
>> 01:06:01,547 [Thread-3] ERROR jena.riot - [line: 1, col: 7 ] Element or
>> attribute do not match QName production: QName::=(NCName':')?NCName.
>>
>> I would appreciate any help pertaining to this issue.
>>
>> Thanks,
>> Rajan
>>
>> *Steps followed:*
>>
>> *1. Initialization: *
>> java -jar org.apache.stanbol.entityhub.indexing.freebase-1.0.0-SNAPSHOT.jar
>>  init
>>
>> *2. Download the data:*
>> Download data and copy it to https://developers.google.com/freebase/data
>>
>> *3. Performed execution of fbrankings-uri.sh*
>> It generated incoming_links.txt under resources directory as follows
>>
>> 10888430 m.0kpv11
>> 3741261 m.019h
>> 2667858 m.0775xx5
>> 2667804 m.0775xvm
>> 1875352 m.01xryvm
>> 1739262 m.05zppz
>> 1369590 m.01xrzlb
>>
>> *4. Performed execution of fixit script*
>>
>> gunzip -c ${FB_DUMP} | fixit | gzip > ${FB_DUMP_fixed}
>>
>> *5. Rename the fixed file to freebase.rdf.gz and copy it *
>> to indexing/resources/rdfdata
>>
>> *6. config/iditer.properties file has following setting*
>> #id-namespace=http://freebase.com/
>> ns-prefix-state=false
>>
>> *7. Performed run of following command:*
>> java -jar -Xmx32g
>> org.apache.stanbol.entityhub.indexing.freebase-1.0.0-SNAPSHOT.jar index
>>
>> The error dump on stdout is as follows:
>>
>> 01:37:32,884 [Thread-0] INFO  solryard.SolrYardIndexingDestination -  ...
>> copy Solr Configuration form /private/tmp/freebase/indexing/config/freebase
>> to /private/tmp/freebase/indexing/destination/indexes/default/freebase
>> 01:37:32,895 [Thread-3] INFO  jenatdb.RdfResourceImporter -     - bulk
>> loading File freebase.rdf.gz using Format Lang:RDF/XML
>> 01:37:32,896 [Thread-3] INFO  jenatdb.RdfResourceImporter - -- Start
>> triples data phase
>> 01:37:32,896 [Thread-3] INFO  jenatdb.RdfResourceImporter - ** Load empty
>> triples table
>> *01:37:32,948 [Thread-3] ERROR jena.riot - [line: 1, col: 7 ] Element or
>> attribute do not match QName production: QName::=(NCName':')?NCName.*
>> 01:37:32,948 [Thread-3] INFO  jenatdb.RdfResourceImporter - -- Finish
>> triples data phase
>> 01:37:32,948 [Thread-3] INFO  jenatdb.RdfResourceImporter - -- Finish
>> triples load
>> 01:37:32,960 [Thread-3] INFO  source.ResourceLoader - Ignore Error for File
>> /private/tmp/freebase/indexing/resources/rdfdata/freebase.rdf.gz and
>> continue
>>
>> Additional Reference Point:
>>
>> *Original Freebase dump size:*  31025015397 May 14 18:10
>> freebase-rdf-latest.gz
>> *Fixed Freebase dump size:* 31026818367 May 15 12:45
>> freebase-rdf-latest-fixed.gz
>> *Incoming Links size: *1206745360 May 17 00:42 incoming_links.txt
>>
>
> --
>
> ------------------------------
> This message should be regarded as confidential. If you have received this
> email in error please notify the sender and destroy it immediately.
> Statements of intent shall only become binding when confirmed in hard copy
> by an authorised signatory.
>
> Zaizi Ltd is registered in England and Wales with the registration number
> 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> London W6 7AN.



-- 
| Rupert Westenthaler             rupert.westentha...@gmail.com
| Bodenlehenstraße 11                              ++43-699-11108907
| A-5500 Bischofshofen
| REDLINK.CO 
..........................................................................
| http://redlink.co/

Reply via email to