For any type of search application, you not only want to do case and accent
folding, but also Unicode normalization <http://unicode.org/reports/tr15/>
(you could have both precomposed and combining accent versions of the è in
Isère).  Typically a search engine could be directed to normalize both the
text before indexing and the query.  If DBpedia doesn't support this, you
could look at using something like Apache Jena's SOLR-based text search
support <http://jena.apache.org/documentation/query/text-query.html>.

Tom


On Fri, Jun 27, 2014 at 8:00 AM, Andrea Di Menna <ninn...@gmail.com> wrote:

> Hi,
> there is no magic in that.
> It only happens that wikipedia has got a page Isere (
> http://en.wikipedia.org/wiki/Isere) which is actually a mere redirect to
> Isère (http://en.wikipedia.org/wiki/Is%C3%A8re).
> Hence the framework links the two DBpedia entities together in a triple
>
>    - dbpedia:Isere dbpedia-owl:wikiPageRedirects dbpedia:Isère
>    -
>    - However, I think this is not always true for all the pages which
>    contain non-ASCII chars, that is wikipedia is not filled with redirects
>    from ASCII folded pages.
>    -
>    - This is why in my opinion you should enrich the data with additional
>    triples which link ASCII folded and other languages labels to the original
>    entity, e.g.
>    - dbpedia:Italy rdfs:label "Italy"@en
>    - dbpedia:Italy rdfs:label "Italia"@it
>    - and
>    - dbpedia:Isère rdfs:label "Isère"@en
>    - dbpedia:Isère rdfs:label  "Isere"@en
>    -
>    - (this is just an example, I would not use rdfs:label for the ASCII
>    folded label but another property).
>    -
>    - Hope this helps.
>    -
>    - Cheers
>    - Andrea
>
>
>
> 2014-06-27 13:46 GMT+02:00 Mohammad Ghufran <emghuf...@gmail.com>:
>
> Hello,
>>
>> Thank you for your reply. Yes, I tried doing that. If i try to remove the
>> accents, i normally get a redirection page in the search results. I can
>> then get the resource uri for this result and get the actual resource page.
>> However, this only happens sometimes. For example, a region in France
>> called Isère has the following page: http://dbpedia.org/page/Is%C3%A8re
>> . If i access the page without the accent, I am still redirected to the
>> correct page. However, if I search for the plain string in the label, I
>> don't get any results. Here is the query I am using:
>>
>> PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
>> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
>> SELECT DISTINCT ?place
>> WHERE
>> {
>>     ?place a dbpedia-owl:PopulatedPlace .
>>     ?place rdfs:label ?label .
>>     FILTER (str(?label)= "Isere") .
>> }
>>
>> The language is not known a-priori, as i said in my earlier message. I am
>> trying to make my code language independent. So I cannot use the language.
>>
>> What is interesting is the fact that dbpedia itself redirects the url
>> http://dbpedia.org/page/Isere to http://dbpedia.org/page/Is%C3%A8re . I
>> am wondering how this "magic" is done.
>>
>> Mohammad Ghufran
>>
>>
>> On Fri, Jun 27, 2014 at 1:08 PM, Romain Beaumont <romain.r...@gmail.com>
>> wrote:
>>
>>> Hello,
>>> I think you are going to do some preprocessing. For example to handle
>>> accents, you can just remove them (in your program/script/...) before
>>> transforming it to sparql.
>>> Some labels are present in different languages in DBpedia, maybe you
>>> could use that ?
>>>
>>>
>>>
>>> 2014-06-27 10:57 GMT+02:00 Mohammad Ghufran <emghuf...@gmail.com>:
>>>
>>>> Hello,
>>>>
>>>> I am using dbpedia to work with locations in order to compare them and
>>>> determine if two locations are same / similar and to what extent. Since my
>>>> data source can be user input, the data normally does not match the exact
>>>> resource / label defined in dbpedia.
>>>>
>>>> I am using the sparql endpoint for this (right now, i am using the
>>>> dbpedia endpoint but i intend to use a local mirror at a later stage).
>>>>
>>>> I am looking to address this but still haven't found a good way to do
>>>> so. I give an example here to elaborate. Take for example the
>>>> region Rhône-Alpes in France. If i search for Rhone-Alpes in the label, i
>>>> don't see any results. Neither in the disambiguation pages or even through
>>>> the keyword search (Lookup) api.
>>>>
>>>> Is there a way to address this issue? I want to query such that i get
>>>> the page Rhône-Alpes as one of the results when i search for Rhone-Alpes
>>>> for example. This also extends to labels in different languages. My input
>>>> does not specify the language so the input might be in different languages.
>>>> For instance, Italia, Italy, Italie all refer to the country Italy in
>>>> different languages.
>>>>
>>>> Thank you for any suggestions / help in advance.
>>>>
>>>> Best Regards,
>>>> Ghufran
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Open source business process management suite built on Java and Eclipse
>>>> Turn processes into business applications with Bonita BPM Community
>>>> Edition
>>>> Quickly connect people, data, and systems into organized workflows
>>>> Winner of BOSSIE, CODIE, OW2 and Gartner awards
>>>> http://p.sf.net/sfu/Bonitasoft
>>>> _______________________________________________
>>>> Dbpedia-discussion mailing list
>>>> Dbpedia-discussion@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>>>
>>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Open source business process management suite built on Java and Eclipse
>> Turn processes into business applications with Bonita BPM Community
>> Edition
>> Quickly connect people, data, and systems into organized workflows
>> Winner of BOSSIE, CODIE, OW2 and Gartner awards
>> http://p.sf.net/sfu/Bonitasoft
>> _______________________________________________
>> Dbpedia-discussion mailing list
>> Dbpedia-discussion@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>
>>
>
>
> ------------------------------------------------------------------------------
> Open source business process management suite built on Java and Eclipse
> Turn processes into business applications with Bonita BPM Community Edition
> Quickly connect people, data, and systems into organized workflows
> Winner of BOSSIE, CODIE, OW2 and Gartner awards
> http://p.sf.net/sfu/Bonitasoft
> _______________________________________________
> Dbpedia-discussion mailing list
> Dbpedia-discussion@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>
>
------------------------------------------------------------------------------
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft
_______________________________________________
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Reply via email to