Hi Rupert and Antonio,

Thanks a lot for the reply.

I start to follow Rupert's suggestion, however it failed again at

10:56:34,152 [Thread-3] ERROR jena.riot - [line: 8722294, col: 88] illegal
escape sequence value: $ (0x24) -- Is there anyway it can be resolved for
the entire file?

I requested an access to latest BaseKB bucket, as it doesn't seem to be
open.

s3cmd ls s3://basekb-now/2015-04-15-18-54/
 --add-header="x-amz-request-payer: requester"
ERROR: Access to bucket 'basekb-now' was denied


*Couple additional questions:*

*1. indexing enhancements:*
What settings/properties one can tweak to gain most out of the indexing.

a. for ex. domain specific such as Pharmaceutical, Law etc... within
freebase
b. potential optimizations to speed up the overall indexing

*2. demo:*
I see that, in recent github commit(s) the eHealth and other demos have
been commented out. How can I get demo source code and other components for
these demos. I prefer to build it myself to see the power of stanbol.

*3. custom vocabulary:*
Suppose, I have custom vocabulary in CSV format. Is there a preferred way
to upload it to Stanbol and have it recognize my entities?

Thanks in advance,
Rajan

On Tue, May 19, 2015 at 3:01 AM, Rupert Westenthaler <
rupert.westentha...@gmail.com> wrote:

> Hi Rajan,
>
> I think this is because you named you file
> "freebase-rdf-latest-fixed.gz". Jena assumes RDF/XML if the RDF format
> is not provided by the file extension. Renaming the file to
> "freebase-rdf-latest-fixed.nt.gz" should fix this issue.
>
> The suggestion of Antonio to use BaseKB is also a valid option.
>
> best
> Rupert
>
> On Tue, May 19, 2015 at 8:32 AM, Antonio David Perez Morales
> <ape...@zaizi.com> wrote:
> > Hi Rajan
> >
> > Freebase dump contains some things that does not fit very well with the
> > indexer.
> > I advise you to use the dump provided by BaseKB (http://basekb.com)
> which
> > is a curated Freebase dump.
> > I did not have any problem indexing it using that dump.
> >
> > Regards
> >
> > On Mon, May 18, 2015 at 8:48 PM, Rajan Shah <raja...@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> I am working on indexing Freebase data within EntityHub and observed
> >> following issue:
> >>
> >> 01:06:01,547 [Thread-3] ERROR jena.riot - [line: 1, col: 7 ] Element or
> >> attribute do not match QName production: QName::=(NCName':')?NCName.
> >>
> >> I would appreciate any help pertaining to this issue.
> >>
> >> Thanks,
> >> Rajan
> >>
> >> *Steps followed:*
> >>
> >> *1. Initialization: *
> >> java -jar
> org.apache.stanbol.entityhub.indexing.freebase-1.0.0-SNAPSHOT.jar
> >>  init
> >>
> >> *2. Download the data:*
> >> Download data and copy it to
> https://developers.google.com/freebase/data
> >>
> >> *3. Performed execution of fbrankings-uri.sh*
> >> It generated incoming_links.txt under resources directory as follows
> >>
> >> 10888430 m.0kpv11
> >> 3741261 m.019h
> >> 2667858 m.0775xx5
> >> 2667804 m.0775xvm
> >> 1875352 m.01xryvm
> >> 1739262 m.05zppz
> >> 1369590 m.01xrzlb
> >>
> >> *4. Performed execution of fixit script*
> >>
> >> gunzip -c ${FB_DUMP} | fixit | gzip > ${FB_DUMP_fixed}
> >>
> >> *5. Rename the fixed file to freebase.rdf.gz and copy it *
> >> to indexing/resources/rdfdata
> >>
> >> *6. config/iditer.properties file has following setting*
> >> #id-namespace=http://freebase.com/
> >> ns-prefix-state=false
> >>
> >> *7. Performed run of following command:*
> >> java -jar -Xmx32g
> >> org.apache.stanbol.entityhub.indexing.freebase-1.0.0-SNAPSHOT.jar index
> >>
> >> The error dump on stdout is as follows:
> >>
> >> 01:37:32,884 [Thread-0] INFO  solryard.SolrYardIndexingDestination -
> ...
> >> copy Solr Configuration form
> /private/tmp/freebase/indexing/config/freebase
> >> to /private/tmp/freebase/indexing/destination/indexes/default/freebase
> >> 01:37:32,895 [Thread-3] INFO  jenatdb.RdfResourceImporter -     - bulk
> >> loading File freebase.rdf.gz using Format Lang:RDF/XML
> >> 01:37:32,896 [Thread-3] INFO  jenatdb.RdfResourceImporter - -- Start
> >> triples data phase
> >> 01:37:32,896 [Thread-3] INFO  jenatdb.RdfResourceImporter - ** Load
> empty
> >> triples table
> >> *01:37:32,948 [Thread-3] ERROR jena.riot - [line: 1, col: 7 ] Element or
> >> attribute do not match QName production: QName::=(NCName':')?NCName.*
> >> 01:37:32,948 [Thread-3] INFO  jenatdb.RdfResourceImporter - -- Finish
> >> triples data phase
> >> 01:37:32,948 [Thread-3] INFO  jenatdb.RdfResourceImporter - -- Finish
> >> triples load
> >> 01:37:32,960 [Thread-3] INFO  source.ResourceLoader - Ignore Error for
> File
> >> /private/tmp/freebase/indexing/resources/rdfdata/freebase.rdf.gz and
> >> continue
> >>
> >> Additional Reference Point:
> >>
> >> *Original Freebase dump size:*  31025015397 May 14 18:10
> >> freebase-rdf-latest.gz
> >> *Fixed Freebase dump size:* 31026818367 May 15 12:45
> >> freebase-rdf-latest-fixed.gz
> >> *Incoming Links size: *1206745360 May 17 00:42 incoming_links.txt
> >>
> >
> > --
> >
> > ------------------------------
> > This message should be regarded as confidential. If you have received
> this
> > email in error please notify the sender and destroy it immediately.
> > Statements of intent shall only become binding when confirmed in hard
> copy
> > by an authorised signatory.
> >
> > Zaizi Ltd is registered in England and Wales with the registration number
> > 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road,
> > London W6 7AN.
>
>
>
> --
> | Rupert Westenthaler             rupert.westentha...@gmail.com
> | Bodenlehenstraße 11                              ++43-699-11108907
> | A-5500 Bischofshofen
> | REDLINK.CO
> ..........................................................................
> | http://redlink.co/
>

Reply via email to