[
https://issues.apache.org/jira/browse/STANBOL-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15895824#comment-15895824
]
Rafa Haro commented on STANBOL-1458:
------------------------------------
The problem has arised after upgrading the clerezza parsers. Last clerezza
parsers use <http://www.w3.org/1999/02/22-rdf-syntax-ns#langString> as datatype
for Literals including language annotation and not xsd:string anymore. That was
not expected by Stanbol Resources adapters. For fixing it I directly check the
langString datatype when the XSD_DATATYPE_VALUE_MAPPING does not provide any
match for the inspected resource at Resource2ValueAdapter class. If the type is
rdf:langString and the language is not null, I directly create an RdfText Value
using valueFactory.createText(literal.getLexicalForm(),
literal.getLanguage().toString());
> Fields Language is being filtered while creating entities into Solr Yard
> based Managed Sites
> --------------------------------------------------------------------------------------------
>
> Key: STANBOL-1458
> URL: https://issues.apache.org/jira/browse/STANBOL-1458
> Project: Stanbol
> Issue Type: Bug
> Components: Entityhub
> Affects Versions: 1.0.0
> Reporter: Rafa Haro
> Assignee: Rafa Haro
> Labels: managed_site
>
> When entities are created through Managed Sites REST API, fields containing
> xml:lang annotations are being stored into Solr (Yard) using only the field
> value and not also the language. This is preventing, among other things,
> Entity Linking engine to found the entities when the language is detected
> first. Even if the Entity Linking engine is configured without any predefined
> language, the entities are not found.
> Taking a look into the code, The StringConverter within the IndexValueFactory
> is, by purpose, ignoring the language for xsd:string based DataTypes.
> TextConverter (which is bound to entityhub:text type) is indexing the
> language along with the value. The problem is that, when uploading the
> entities through the API, the Clerezza Serializer is of course not able to
> understand entityhub:text data type, so it is always parsing the text fields
> as xsd:string.
> Proposed solution is to include the language, if exists, also for String
> DataTypes as Text based are doing
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)