RE: How-to get results of comparison between documents

Moshe Recanati Sun, 29 Jun 2014 23:44:36 -0700

Hi Rupert,
Thanks a lot, I'll check your suggestions and see if I can implement it.

Regards,
Moshe

Regards,
Moshe Recanati
SVP Engineering
Office + 972-73-2617564
Mobile  + 972-52-6194481
Skype    :  recanati

more at  www.kmslh.com

-----Original Message-----
From: Rupert Westenthaler [mailto:rupert.westentha...@gmail.com] 
Sent: Friday, June 27, 2014 4:41 PM
To: dev@stanbol.apache.org
Subject: Re: How-to get results of comparison between documents

Hi,

Its a bit hard to answer to your very generic question. But at it looks like an 
interesting (an demanding) use case I will try to provide some useful 
information ...

The query for the latest phone made by Samsung can be easily answered by Solr 
if you have an index with all data (including the release
date) of mobile phones. But as you write this on this mailing list I assume 
that you do not have structured data with such information but instead intend 
to extract those information form unstructured text.

In the following I will try to summarize some possible things that might be 
interesting to you:

* If you want to detect new Entities  - e.g. a new Smart Phone you do not yet 
have in your database - you will need Named Entity Recognition (NER). Such 
things need to be trained for specific languages, specific types of writings 
(news vs. forum slang) and also the type of entities. Stanbol is integrated 
with OpenNLP and Stanford NLP. So models trained for such frameworks can also 
be used with Stanbol.
* If you do already have a vocabularies with Entities you are interested in 
(e.g. all Smart Phones, Vendors, ...) you can use Entity Linking to detect 
mentions of those in unstructured texts. This is also supported by Apache 
Stanbol.
* If you have documents describing an Entity (e.g. a fact sheet for a new smart 
phone) you need an engine that extracts facts. Such an engine will first need 
to detect a feature (e.g. "release date") in the unstructured text and then 
extract and assign the value to it. I am currently working on such an engine, 
but it is not yet available in Stanbol.
* If you have a vocabulary with Entities (e.g. all Smart Phones) with some 
basic information, but you want to enrich your database with more facts parsed 
form unstructured texts such as news articles, forum posts ... To do this you 
need an engine that can detect settings.
Where a setting is defined as a union over multiple participates, activities 
and parameters. To to add an new information to an entity you will need to 
extract an Setting where this entity participates and has an assigned 
parameter. The sentence "The Samsung Galaxy S10 will be released in Okt. 2019" 
is an example of such a Setting. Also news articles also mention sentences such 
as "iPhone 4 weights 137grams" or "dimensions of the Galaxy Grand 2 are 
146.8×75.3×8.9mm". Such an engine is currently not available in Stanbol. 
However Cristian Petroaca is working since some time on extracting settings 
like that.

I hope this information answers your question and can help to make your use 
case more clear

best
Rupert

On Thu, Jun 26, 2014 at 8:21 AM, Moshe Recanati <mos...@kmslh.com> wrote:
>
> Hi,
>
> I'm new to apache stanbol.
>
> Until now we used Solr as our search engine.
>
> We would like to enhance the capabilities and be able to enhance it with 
> semantic capabilities and this is the reason we're trying stanbol.
>
>
>
> Let's assume I've several documents that describe mobile phone specification 
> with index on release date and vendor.
>
> I want to query \ ask these documents 'What's the latest phone made by 
> Samsung?' and get the latest document based on release date.
>
>
>
> Please describe how can I do it (if at all).
>
>
>
> Regards,
>
> Moshe Recanati
>
> SVP Engineering
>
> Office + 972-73-2617564
>
> Mobile  + 972-52-6194481
>
> Skype    :  recanati
>
> more at  www.kmslh.com
>
>

-- 
| Rupert Westenthaler             rupert.westentha...@gmail.com
| Bodenlehenstraße 11                              ++43-699-11108907
| A-5500 Bischofshofen
| REDLINK.CO 
..........................................................................
| http://redlink.co/

RE: How-to get results of comparison between documents

Reply via email to