Rafa Haro created STANBOL-1458:
----------------------------------

             Summary: Fields Language is being filtered while creating entities 
into Solr Yard based Managed Sites
                 Key: STANBOL-1458
                 URL: https://issues.apache.org/jira/browse/STANBOL-1458
             Project: Stanbol
          Issue Type: Bug
          Components: Entityhub
    Affects Versions: 1.0.0
            Reporter: Rafa Haro
            Assignee: Rafa Haro


When entities are created through Managed Sites REST API, fields containing 
xml:lang annotations are being stored into Solr (Yard) using only the field 
value and not also the language. This is preventing, among other things, Entity 
Linking engine to found the entities when the language is detected first. Even 
if the Entity Linking engine is configured without any predefined language, the 
entities are not found.

Taking a look into the code, The StringConverter within the IndexValueFactory 
is, by purpose, ignoring the language for xsd:string based DataTypes. 
TextConverter (which is bound to entityhub:text type) is indexing the language 
along with the value. The problem is that, when uploading the entities through 
the API, the Clerezza Serializer is of course not able to understand 
entityhub:text data type, so it is always parsing the text fields as xsd:string.

Proposed solution is to include the language, if exists, also for String 
DataTypes as Text based are doing



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to