Hi Rupert, Thanks a lot, I'll check your suggestions and see if I can implement it.
Regards, Moshe Regards, Moshe Recanati SVP Engineering Office + 972-73-2617564 Mobile + 972-52-6194481 Skype : recanati more at www.kmslh.com -----Original Message----- From: Rupert Westenthaler [mailto:rupert.westentha...@gmail.com] Sent: Friday, June 27, 2014 4:41 PM To: dev@stanbol.apache.org Subject: Re: How-to get results of comparison between documents Hi, Its a bit hard to answer to your very generic question. But at it looks like an interesting (an demanding) use case I will try to provide some useful information ... The query for the latest phone made by Samsung can be easily answered by Solr if you have an index with all data (including the release date) of mobile phones. But as you write this on this mailing list I assume that you do not have structured data with such information but instead intend to extract those information form unstructured text. In the following I will try to summarize some possible things that might be interesting to you: * If you want to detect new Entities - e.g. a new Smart Phone you do not yet have in your database - you will need Named Entity Recognition (NER). Such things need to be trained for specific languages, specific types of writings (news vs. forum slang) and also the type of entities. Stanbol is integrated with OpenNLP and Stanford NLP. So models trained for such frameworks can also be used with Stanbol. * If you do already have a vocabularies with Entities you are interested in (e.g. all Smart Phones, Vendors, ...) you can use Entity Linking to detect mentions of those in unstructured texts. This is also supported by Apache Stanbol. * If you have documents describing an Entity (e.g. a fact sheet for a new smart phone) you need an engine that extracts facts. Such an engine will first need to detect a feature (e.g. "release date") in the unstructured text and then extract and assign the value to it. I am currently working on such an engine, but it is not yet available in Stanbol. * If you have a vocabulary with Entities (e.g. all Smart Phones) with some basic information, but you want to enrich your database with more facts parsed form unstructured texts such as news articles, forum posts ... To do this you need an engine that can detect settings. Where a setting is defined as a union over multiple participates, activities and parameters. To to add an new information to an entity you will need to extract an Setting where this entity participates and has an assigned parameter. The sentence "The Samsung Galaxy S10 will be released in Okt. 2019" is an example of such a Setting. Also news articles also mention sentences such as "iPhone 4 weights 137grams" or "dimensions of the Galaxy Grand 2 are 146.8×75.3×8.9mm". Such an engine is currently not available in Stanbol. However Cristian Petroaca is working since some time on extracting settings like that. I hope this information answers your question and can help to make your use case more clear best Rupert On Thu, Jun 26, 2014 at 8:21 AM, Moshe Recanati <mos...@kmslh.com> wrote: > > Hi, > > I'm new to apache stanbol. > > Until now we used Solr as our search engine. > > We would like to enhance the capabilities and be able to enhance it with > semantic capabilities and this is the reason we're trying stanbol. > > > > Let's assume I've several documents that describe mobile phone specification > with index on release date and vendor. > > I want to query \ ask these documents 'What's the latest phone made by > Samsung?' and get the latest document based on release date. > > > > Please describe how can I do it (if at all). > > > > Regards, > > Moshe Recanati > > SVP Engineering > > Office + 972-73-2617564 > > Mobile + 972-52-6194481 > > Skype : recanati > > more at www.kmslh.com > > -- | Rupert Westenthaler rupert.westentha...@gmail.com | Bodenlehenstraße 11 ++43-699-11108907 | A-5500 Bischofshofen | REDLINK.CO .......................................................................... | http://redlink.co/