Thank you Antonio and Rupert for your clarifications. So, we need to work on a date time extraction engine from the scratch (with out using any of the mentioned third party libraries) as the base line implementation.
We will implement other possible approaches as advanced features later. Correct me if I'm wrong. I'm working on this and will keep posted on the progress. On Mon, Jan 20, 2014 at 10:34 AM, Rupert Westenthaler < rupert.westentha...@gmail.com> wrote: > Hi Jayani, Antonio, > > With "base-line" I mean, that it is IMHO important to have a > functionality also present in the default distribution of Stanbol. > With a Regex based solution this is possible. With implementations > based on GPL licensed projects it is not. > > Having a "base-line" implementation would allow to start users with > the Regex based DateExtractionEngine and if this one does not fit the > requirements they would look for alternatives and find advanced > options that would require them do manually download and install > additional GPL licensed software. > > best > Rupert > > > On Fri, Jan 17, 2014 at 9:46 AM, Antonio David Perez Morales > <ape...@zaizi.com> wrote: > > Hi Jayani > > > > What Rupert means is that it would be good to have a "RegEx" Enhancement > > Engine which extracts/creates TextAnnotations based on regular > expressions > > configured in the engine. > > This way you can configure one engine of this type and provide a regular > > expression for extract dates and times. > > > > After that, we can take a look at the projects pointed out by Rupert in > > order to be integrated in Stanbol. > > > > Regards > > > > > > On Fri, Jan 17, 2014 at 9:39 AM, Jayani Withanawasam < > > jayaniwithanawa...@gmail.com> wrote: > > > >> Thank you Rupert and Anuj for your suggestions. I'm going through the > links > >> you have provided. > >> > >> Rupert, > >> > >> What did you mean by base-line engine that is directly integrated in > >> Stanbol with Regex based approach? > >> > >> Appreciate if you can further elaborate this. > >> > >> > >> On Fri, Nov 29, 2013 at 11:35 AM, Rupert Westenthaler < > >> rupert.westentha...@gmail.com> wrote: > >> > >> > Hi Anuj > >> > > >> > On Thu, Nov 28, 2013 at 1:51 PM, Anuj Kumar <anujs...@gmail.com> > wrote: > >> > > I second that. Regex will work better w.r.t. the default trained > model > >> of > >> > > OpenNLP. > >> > > >> > Both such projects do look interesting: > >> > > >> > > Also, take a look at this extractor- > >> > https://code.google.com/p/heideltime/ and > >> > > >> > As this is GPLv3 you can not directly use it to implement an > >> > EnhancementEngine that is part of the Stanbol Codebase. Integrating it > >> > via a RESTful service would be an option. > >> > > >> > > Stanford's tagger- http://nlp.stanford.edu/downloads/sutime.shtml#! > >> > > >> > The same is true for SuTime as all Stanford NLP components are under > GPL. > >> > > >> > If we want to integrate those projects I suggest to extend the Stanbol > >> > RESTful NLP protocol [1] and service [2] so that it can represent > >> > date/time points and ranges. SuTime support could be added to the > >> > already existing Stanbol-Stanford integration [3]. For HeidelTime one > >> > would need to implement a similar component. > >> > > >> > > >> > But before integrating those I would prefer to have a base-line engine > >> > that is directly integrated in Stanbol. Looks like a Regex based > >> > approach could be sufficient for that. WDYT Jayani? > >> > > >> > best > >> > Rupert > >> > > >> > [1] https://issues.apache.org/jira/browse/STANBOL-878 > >> > [2] https://issues.apache.org/jira/browse/STANBOL-892 > >> > [3] https://github.com/westei/stanbol-stanfordnlp > >> > > >> > > > >> > > It will be useful to have similar temporal expression enhancement > >> engine > >> > in > >> > > Stanbol. > >> > > > >> > > Regards, > >> > > Anuj > >> > > > >> > > > >> > > On Thu, Nov 28, 2013 at 11:05 AM, Rupert Westenthaler < > >> > > rupert.westentha...@gmail.com> wrote: > >> > > > >> > >> Hi Jayani, > >> > >> > >> > >> I was not even aware that there exists a Time model for OpenNLP. > >> > >> Documentation shows that this uses a purely statistical model so I > am > >> > >> wondering about the quality. Note also that OpenNLP only provides a > >> > >> prebuilt model for English [1]. > >> > >> > >> > >> AFAIK OpenNLP will only provide you with the information that some > >> > >> tokens do represent a date. It will not provide you the parsed > >> > >> xsd:dateTime. So if you use this Engine you will still need to > >> > >> implement this part of your own. So most likely you will end up > using > >> > >> regex patterns to parse the actual time from the Tokens marked by > >> > >> OpenNLP as time. > >> > >> > >> > >> So I am wondering if it is not better to start with Regex from the > >> > >> beginning. If you search for "Regey Date Time extraction" you can > >> > >> fined a huge set of example you could start from. > >> > >> > >> > >> best > >> > >> Rupert > >> > >> > >> > >> > >> > >> [1] http://opennlp.sourceforge.net/models-1.5/ > >> > >> > >> > >> > >> > >> > >> > >> On Thu, Nov 28, 2013 at 5:15 AM, Jayani Withanawasam > >> > >> <jayaniwithanawa...@gmail.com> wrote: > >> > >> > Hi Dileepa, > >> > >> > > >> > >> > Thank you so much for your valuble feedback. I'm working on this. > >> > >> > > >> > >> > > >> > >> > On Mon, Nov 25, 2013 at 9:00 PM, Dileepa Jayakody < > >> > >> dileepajayak...@gmail.com > >> > >> >> wrote: > >> > >> > > >> > >> >> Hi Jayani, > >> > >> >> > >> > >> >> There are several enhancement engines in Stanbol developed > based on > >> > >> >> OpenNLP. (opennlp-ner, opennlp-sentence, opennlp-pos...See [1]) > >> > Each of > >> > >> >> these engines focus on a particular enhancement aspect using > >> OpenNLP. > >> > >> >> Therefore I think it's better to write a new engine for temporal > >> > >> >> extractions rather than extending the OpenNLP-NER engine. > >> > >> >> > >> > >> >> Thanks, > >> > >> >> Dileepa > >> > >> >> > >> > >> >> [1] > >> > >> >> > >> > >> > >> > > >> > https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/opennlp > >> > >> >> > >> > >> >> > >> > >> >> On Mon, Nov 25, 2013 at 4:30 PM, Jayani Withanawasam < > >> > >> >> jayaniwithanawa...@gmail.com> wrote: > >> > >> >> > >> > >> >> > Hi, > >> > >> >> > > >> > >> >> > I'm researching on adding new enhancement engine for > extracting > >> > date > >> > >> and > >> > >> >> > time (Temporal extraction) to Stanbol as suggested by Rupert. > >> > >> >> > > >> > >> >> > There, it is being found that OpenNLP has an entity extraction > >> unit > >> > >> for > >> > >> >> > date and time. > >> > >> >> > Also, I noticed that OpenNLP is already integrated to Stanbol > in > >> > NER > >> > >> >> > engine. > >> > >> >> > > >> > >> >> > So, as per my understanding, there are two options to extract > >> date > >> > and > >> > >> >> > time. > >> > >> >> > > >> > >> >> > One is to have a seperate enhancement engine for date and time > >> > >> >> information > >> > >> >> > extraction. Another one is to add date time extraction as a > code > >> > >> >> > enhancement to exisitng OpenNLP NER engine. > >> > >> >> > > >> > >> >> > What is your opinion on this? Is there any other approach > which > >> you > >> > >> think > >> > >> >> > that would be better? > >> > >> >> > > >> > >> >> > Thank you > >> > >> >> > Jayani > >> > >> >> > > >> > >> >> > >> > >> > >> > >> > >> > >> > >> > >> -- > >> > >> | Rupert Westenthaler rupert.westentha...@gmail.com > >> > >> | Bodenlehenstraße 11 > ++43-699-11108907 > >> > >> | A-5500 Bischofshofen > >> > >> > >> > > >> > > >> > > >> > -- > >> > | Rupert Westenthaler rupert.westentha...@gmail.com > >> > | Bodenlehenstraße 11 ++43-699-11108907 > >> > | A-5500 Bischofshofen > >> > > >> > > > > -- > > > > ------------------------------ > > This message should be regarded as confidential. If you have received > this > > email in error please notify the sender and destroy it immediately. > > Statements of intent shall only become binding when confirmed in hard > copy > > by an authorised signatory. > > > > Zaizi Ltd is registered in England and Wales with the registration number > > 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, > > London W6 7AN. > > > > -- > | Rupert Westenthaler rupert.westentha...@gmail.com > | Bodenlehenstraße 11 ++43-699-11108907 > | A-5500 Bischofshofen >