> Hi Rupert,
> I have the following under felix configurations :
> EntityHub Referenced Site Configuration
> ID ................................................. ITdbpedia
> EntityHub Cache configuration
> ID ................................................. ITdbpediaIndex
> Cache mappings ........................... empty
> Sol Yard Configuration
> ID .................................................. ITdbpediaIndex
> Solr Index/Core .............................. ITdbpedia
> Use default SolrCore configuration .... unflagged
> but honestly I can't find any solr/core where should I look for ?

The SolrCores are just OSGI services and no Components. Because of
that you can only see them in the Services Tab on the Felix
Webconsole. You will need to search for
"org.apache.solr.core.SolrCore" and than inspect the metdata. The
value of the "org.apache.solr.core.SolrCore.name" property needs to
match value configured for "Solr Index/Core" in your SolrYard.

> I already produced my index, as you pointed out, modifying
> indexing/config/indexing.properties
> but unfortunately I didn't know I had to change the indexingDestination
> maybe this is the problem ?
> IndexingDestination=org.apache.stanbol.entityhub.indexing.destination.solryard.SolrYardIndexingDestination,solrConf,boosts:fieldboosts

If you have not changed this, than the default SolrCore schema was
used for indexing. I do not think that this will have an major impact
to the resulting index as the dbpedia configuration only differs in
some minor things from the default.

> Moreover once I produced the correct zip file Itdbpedia.solr.zip (I changed

the name should be "Itdbpedia.solrindex.zip". In addition note that
names are case sensitive. So if you use ITdbpedia, than you should
also name the file with upper case IT

> the indexing properties, so I don't have to change the folder manually as
> you said) I have to save it to stanbol/datafiles and then restart stanbol,
> right ?

No restarting the server will not work. Replace the file and than
stop/start the bundle via the bundle tab of the Felix Webconsole. As
soon as you stop the current index should be deleted (you can check
this by looking at the folder
"{stanbol-working-dir}/stanbol/indexes/default"). When you start the
bundle again the index should be re-initialised based on the current

ok than I have to replace it with something more complex [1]

[1] http://en.wikiquote.org/wiki/The_Hitchhiker%27s_Guide_to_the_Galaxy#Preface

> Is it possibile to create a single index with both languages (it, en)
> version of dbpedia ?
> I think this is very difficult to manage isn't it?

The simple answer is YES: just add the RDF files with the Italian
labels, short/long abstracts

e.g. curl http://downloads.dbpedia.org/3.8/it/labels_en_uris_it.nt.bz2
| bzcat | head

<http://www.w3.org/2000/01/rdf-schema#label> "Armonium"@it .
<http://www.w3.org/2000/01/rdf-schema#label> "Antropologia"@it .
<http://www.w3.org/2000/01/rdf-schema#label> "Agricoltura"@it .

The complex answer is also YES: While there are Italian labels,
comments ... available for http://dbpedia.org/resources this only
include those where there is an English counterpart available.
Entities of the Italian Wikipedia that do not have an English version
are not included. If you want to have all Italian Entities you will
need to use the Italian dbpedia (http://it.dbpedia.org/resources)

e.g. curl http://downloads.dbpedia.org/3.8/it/labels_it.nt.bz2 | bzcat | head

<http://www.w3.org/2000/01/rdf-schema#label> "Armonium"@it .
<http://www.w3.org/2000/01/rdf-schema#label> "Antropologia"@it .
<http://www.w3.org/2000/01/rdf-schema#label> "Agricoltura"@it .

By comparing the file size of the labels_it.nt.bz2 (18M) and
        labels_en_uris_it.nt.bz2 (7.9M) you can easily see that with the
English dbpedia you will not have all the Italian entities available.

To integrate two languages you need the
"interlanguage_links_it.nt.bz2". This defines links from the Italian
entities to all other languages.

<http://dbpedia.org/resource/Harmonium> .

For indexing you need to do the following:

1. Calculate the incoming_links.txt file for the Italian page links

2. Download all the RDF files you need

    * basically the same you currently use from
http://downloads.dbpedia.org/3.8/en/ but now from
    * language specific labels from other languages you are interested in.
         IMPORTANT: use the
         files and NOT the
    * include http://downloads.dbpedia.org/3.8/en/instance_types_en.nq.bz2

3. You will need to add the LdpathSourceProcessor to the list of
entityProcessor in the indexing.properties file. The configuration
should look like


4. Create an LDPath [2] program that merges all the data you need with
the Italian dbpedia resource.

[2] http://code.google.com/p/ldpath/

The configuration in (3) refers to the ldpath file "dbpedia.ldpath".
This is a text file that is expected to be located within the
"indexing/config" directory. I will not give an LDpath introduction,
but what you need is something like

1: rdfs:label = (rdfs:label | dbp-ont:wikiPageInterLanguageLink/rdfs:label);
2: skos:altLabel = (^dbp-ont:wikiPageRedirects/rdfs:label |
3: rdfs:comment = (rdfs:label | dbp-ont:wikiPageInterLanguageLink/rdfs:label);
4: dbp-ont:abstract = (dbp-ont:abstract |
5: rdf:type = (rdf:type | dbp-ont:wikiPageInterLanguageLink/rdf:type);

NOTE: you will need to remove the '{line-number}: ' before using this ldpath

(1) merges the rdfs:labels of the current Entity (the Italian label)
with labels of entities referenced by inter language links. So this
will ensure that you have labels for all languages for the Italian
(2) merges labels of redirected pages to the skos:altLabel field. For
this to work you will need to include the
"redirects_{language}.nt.bz2" file in the languages you are interested
(3) same as for rdfs:labels but for short abstracts
(4) the same but for long abstracts
(5) rdf:type statements might be missing for Italian. So I merge those
as well with types from other languages. I would recommend to only
include types for the English dbpedia

5. Add surfaceForms mapping to the mappings.txt file

# add rdfs:labels and rdfs:labels of redirected sites to dbp-ont:surfaceForm
rdfs:label > dbp-ont:surfaceForm
skos:altLabel > dbp-ont:surfaceForm

Those two mappings ensure that both the rdfs:label and skos:altLabel
values are also stored in the dbp-ont:surfaceForm field. This allows
you to allow the Stanbol Enhancer (or more precisely the
NamedEntityLinkingEngine or KeywordLinkingEngine) to match against
labels of redirected pages by changing the name field form the default
rdfs:label to dbp-ont:surfaceForm

Let me conclude that I have never tried this exact use case myself,
but I have already created several dbpedia indexing with very similar
configurations. When using LDPath during indexing you need to expect
higher indexing times and you might also need to assign more memory to
the indexing tool.

Please also note http://markmail.org/message/67ivlyoxfqad6xoe as you
will most likely need process dbpedia files for some languages using

    bzcat ${filename}.bz2 \
        | sed 's/\\\\/\\u005c\\u005c/g;s/\\\([^u"]\)/\\u005c\1/g' \
        | gzip -c > ${filename}.gz
    rm -f ${filename}.bz2


