Hi Jayani Perfect. I can help you if you want in the implementation of this engine or in questions about the classes used in the Enhancement Engine or about OSGI.
Feel free to ask. Regards On Mon, Jan 27, 2014 at 8:13 AM, Jayani Withanawasam < jayaniwithanawa...@gmail.com> wrote: > Thank you Antonio and Rupert for your clarifications. > > So, we need to work on a date time extraction engine from the scratch (with > out using any of the mentioned third party libraries) as the base line > implementation. > > We will implement other possible approaches as advanced features later. > Correct me if I'm wrong. I'm working on this and will keep posted on the > progress. > > > > On Mon, Jan 20, 2014 at 10:34 AM, Rupert Westenthaler < > rupert.westentha...@gmail.com> wrote: > > > Hi Jayani, Antonio, > > > > With "base-line" I mean, that it is IMHO important to have a > > functionality also present in the default distribution of Stanbol. > > With a Regex based solution this is possible. With implementations > > based on GPL licensed projects it is not. > > > > Having a "base-line" implementation would allow to start users with > > the Regex based DateExtractionEngine and if this one does not fit the > > requirements they would look for alternatives and find advanced > > options that would require them do manually download and install > > additional GPL licensed software. > > > > best > > Rupert > > > > > > On Fri, Jan 17, 2014 at 9:46 AM, Antonio David Perez Morales > > <ape...@zaizi.com> wrote: > > > Hi Jayani > > > > > > What Rupert means is that it would be good to have a "RegEx" > Enhancement > > > Engine which extracts/creates TextAnnotations based on regular > > expressions > > > configured in the engine. > > > This way you can configure one engine of this type and provide a > regular > > > expression for extract dates and times. > > > > > > After that, we can take a look at the projects pointed out by Rupert in > > > order to be integrated in Stanbol. > > > > > > Regards > > > > > > > > > On Fri, Jan 17, 2014 at 9:39 AM, Jayani Withanawasam < > > > jayaniwithanawa...@gmail.com> wrote: > > > > > >> Thank you Rupert and Anuj for your suggestions. I'm going through the > > links > > >> you have provided. > > >> > > >> Rupert, > > >> > > >> What did you mean by base-line engine that is directly integrated in > > >> Stanbol with Regex based approach? > > >> > > >> Appreciate if you can further elaborate this. > > >> > > >> > > >> On Fri, Nov 29, 2013 at 11:35 AM, Rupert Westenthaler < > > >> rupert.westentha...@gmail.com> wrote: > > >> > > >> > Hi Anuj > > >> > > > >> > On Thu, Nov 28, 2013 at 1:51 PM, Anuj Kumar <anujs...@gmail.com> > > wrote: > > >> > > I second that. Regex will work better w.r.t. the default trained > > model > > >> of > > >> > > OpenNLP. > > >> > > > >> > Both such projects do look interesting: > > >> > > > >> > > Also, take a look at this extractor- > > >> > https://code.google.com/p/heideltime/ and > > >> > > > >> > As this is GPLv3 you can not directly use it to implement an > > >> > EnhancementEngine that is part of the Stanbol Codebase. Integrating > it > > >> > via a RESTful service would be an option. > > >> > > > >> > > Stanford's tagger- > http://nlp.stanford.edu/downloads/sutime.shtml#! > > >> > > > >> > The same is true for SuTime as all Stanford NLP components are under > > GPL. > > >> > > > >> > If we want to integrate those projects I suggest to extend the > Stanbol > > >> > RESTful NLP protocol [1] and service [2] so that it can represent > > >> > date/time points and ranges. SuTime support could be added to the > > >> > already existing Stanbol-Stanford integration [3]. For HeidelTime > one > > >> > would need to implement a similar component. > > >> > > > >> > > > >> > But before integrating those I would prefer to have a base-line > engine > > >> > that is directly integrated in Stanbol. Looks like a Regex based > > >> > approach could be sufficient for that. WDYT Jayani? > > >> > > > >> > best > > >> > Rupert > > >> > > > >> > [1] https://issues.apache.org/jira/browse/STANBOL-878 > > >> > [2] https://issues.apache.org/jira/browse/STANBOL-892 > > >> > [3] https://github.com/westei/stanbol-stanfordnlp > > >> > > > >> > > > > >> > > It will be useful to have similar temporal expression enhancement > > >> engine > > >> > in > > >> > > Stanbol. > > >> > > > > >> > > Regards, > > >> > > Anuj > > >> > > > > >> > > > > >> > > On Thu, Nov 28, 2013 at 11:05 AM, Rupert Westenthaler < > > >> > > rupert.westentha...@gmail.com> wrote: > > >> > > > > >> > >> Hi Jayani, > > >> > >> > > >> > >> I was not even aware that there exists a Time model for OpenNLP. > > >> > >> Documentation shows that this uses a purely statistical model so > I > > am > > >> > >> wondering about the quality. Note also that OpenNLP only > provides a > > >> > >> prebuilt model for English [1]. > > >> > >> > > >> > >> AFAIK OpenNLP will only provide you with the information that > some > > >> > >> tokens do represent a date. It will not provide you the parsed > > >> > >> xsd:dateTime. So if you use this Engine you will still need to > > >> > >> implement this part of your own. So most likely you will end up > > using > > >> > >> regex patterns to parse the actual time from the Tokens marked by > > >> > >> OpenNLP as time. > > >> > >> > > >> > >> So I am wondering if it is not better to start with Regex from > the > > >> > >> beginning. If you search for "Regey Date Time extraction" you can > > >> > >> fined a huge set of example you could start from. > > >> > >> > > >> > >> best > > >> > >> Rupert > > >> > >> > > >> > >> > > >> > >> [1] http://opennlp.sourceforge.net/models-1.5/ > > >> > >> > > >> > >> > > >> > >> > > >> > >> On Thu, Nov 28, 2013 at 5:15 AM, Jayani Withanawasam > > >> > >> <jayaniwithanawa...@gmail.com> wrote: > > >> > >> > Hi Dileepa, > > >> > >> > > > >> > >> > Thank you so much for your valuble feedback. I'm working on > this. > > >> > >> > > > >> > >> > > > >> > >> > On Mon, Nov 25, 2013 at 9:00 PM, Dileepa Jayakody < > > >> > >> dileepajayak...@gmail.com > > >> > >> >> wrote: > > >> > >> > > > >> > >> >> Hi Jayani, > > >> > >> >> > > >> > >> >> There are several enhancement engines in Stanbol developed > > based on > > >> > >> >> OpenNLP. (opennlp-ner, opennlp-sentence, opennlp-pos...See > [1]) > > >> > Each of > > >> > >> >> these engines focus on a particular enhancement aspect using > > >> OpenNLP. > > >> > >> >> Therefore I think it's better to write a new engine for > temporal > > >> > >> >> extractions rather than extending the OpenNLP-NER engine. > > >> > >> >> > > >> > >> >> Thanks, > > >> > >> >> Dileepa > > >> > >> >> > > >> > >> >> [1] > > >> > >> >> > > >> > >> > > >> > > > >> > > > https://svn.apache.org/repos/asf/stanbol/trunk/enhancement-engines/opennlp > > >> > >> >> > > >> > >> >> > > >> > >> >> On Mon, Nov 25, 2013 at 4:30 PM, Jayani Withanawasam < > > >> > >> >> jayaniwithanawa...@gmail.com> wrote: > > >> > >> >> > > >> > >> >> > Hi, > > >> > >> >> > > > >> > >> >> > I'm researching on adding new enhancement engine for > > extracting > > >> > date > > >> > >> and > > >> > >> >> > time (Temporal extraction) to Stanbol as suggested by > Rupert. > > >> > >> >> > > > >> > >> >> > There, it is being found that OpenNLP has an entity > extraction > > >> unit > > >> > >> for > > >> > >> >> > date and time. > > >> > >> >> > Also, I noticed that OpenNLP is already integrated to > Stanbol > > in > > >> > NER > > >> > >> >> > engine. > > >> > >> >> > > > >> > >> >> > So, as per my understanding, there are two options to > extract > > >> date > > >> > and > > >> > >> >> > time. > > >> > >> >> > > > >> > >> >> > One is to have a seperate enhancement engine for date and > time > > >> > >> >> information > > >> > >> >> > extraction. Another one is to add date time extraction as a > > code > > >> > >> >> > enhancement to exisitng OpenNLP NER engine. > > >> > >> >> > > > >> > >> >> > What is your opinion on this? Is there any other approach > > which > > >> you > > >> > >> think > > >> > >> >> > that would be better? > > >> > >> >> > > > >> > >> >> > Thank you > > >> > >> >> > Jayani > > >> > >> >> > > > >> > >> >> > > >> > >> > > >> > >> > > >> > >> > > >> > >> -- > > >> > >> | Rupert Westenthaler rupert.westentha...@gmail.com > > >> > >> | Bodenlehenstraße 11 > > ++43-699-11108907 > > >> > >> | A-5500 Bischofshofen > > >> > >> > > >> > > > >> > > > >> > > > >> > -- > > >> > | Rupert Westenthaler rupert.westentha...@gmail.com > > >> > | Bodenlehenstraße 11 ++43-699-11108907 > > >> > | A-5500 Bischofshofen > > >> > > > >> > > > > > > -- > > > > > > ------------------------------ > > > This message should be regarded as confidential. If you have received > > this > > > email in error please notify the sender and destroy it immediately. > > > Statements of intent shall only become binding when confirmed in hard > > copy > > > by an authorised signatory. > > > > > > Zaizi Ltd is registered in England and Wales with the registration > number > > > 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, > > > London W6 7AN. > > > > > > > > -- > > | Rupert Westenthaler rupert.westentha...@gmail.com > > | Bodenlehenstraße 11 ++43-699-11108907 > > | A-5500 Bischofshofen > > > -- ------------------------------ This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately. Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory. Zaizi Ltd is registered in England and Wales with the registration number 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, London W6 7AN.