[ 
https://issues.apache.org/jira/browse/STANBOL-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15895824#comment-15895824
 ] 

Rafa Haro commented on STANBOL-1458:
------------------------------------

The problem has arised after upgrading the clerezza parsers. Last clerezza 
parsers use <http://www.w3.org/1999/02/22-rdf-syntax-ns#langString> as datatype 
for Literals including language annotation and not xsd:string anymore. That was 
not expected by Stanbol Resources adapters. For fixing it I directly check the 
langString datatype when the XSD_DATATYPE_VALUE_MAPPING does not provide any 
match for the inspected resource at Resource2ValueAdapter class. If the type is 
rdf:langString and the language is not null, I directly create an RdfText Value 
using valueFactory.createText(literal.getLexicalForm(), 
literal.getLanguage().toString());

> Fields Language is being filtered while creating entities into Solr Yard 
> based Managed Sites
> --------------------------------------------------------------------------------------------
>
>                 Key: STANBOL-1458
>                 URL: https://issues.apache.org/jira/browse/STANBOL-1458
>             Project: Stanbol
>          Issue Type: Bug
>          Components: Entityhub
>    Affects Versions: 1.0.0
>            Reporter: Rafa Haro
>            Assignee: Rafa Haro
>              Labels: managed_site
>
> When entities are created through Managed Sites REST API, fields containing 
> xml:lang annotations are being stored into Solr (Yard) using only the field 
> value and not also the language. This is preventing, among other things, 
> Entity Linking engine to found the entities when the language is detected 
> first. Even if the Entity Linking engine is configured without any predefined 
> language, the entities are not found.
> Taking a look into the code, The StringConverter within the IndexValueFactory 
> is, by purpose, ignoring the language for xsd:string based DataTypes. 
> TextConverter (which is bound to entityhub:text type) is indexing the 
> language along with the value. The problem is that, when uploading the 
> entities through the API, the Clerezza Serializer is of course not able to 
> understand entityhub:text data type, so it is always parsing the text fields 
> as xsd:string.
> Proposed solution is to include the language, if exists, also for String 
> DataTypes as Text based are doing



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to