Hi Tharindu, Totally forgot about STANBOL-1278. Fixed it today in the morning. Thanks for the remainder.
Related to the architecture: I like the general Architecture containing of a * TopicClassifier * TrainingSet as this allows to have different implementations of managing the training set (e.g. in Solr, a RDF tripleStore, a database or simple files in a file system) and TopicClassifiers (Solr, OpenNLP, Mahout, ...) Note also the the trainingSet part is optional and only required for TopicClassifier that can dynamically update their classification models. The interfaces itself will need to be adapted/improved as more implementations will be added. The current API looks a bit tailored to the Solr based implementation. If possible I would like to have Cross Validation to be implemented in an implementation independent way. please also note STANBOL-1294 [1] best Rupert [1] https://issues.apache.org/jira/browse/STANBOL-1294 On Tue, Mar 11, 2014 at 2:04 PM, Tharindu Rusira <tharindurus...@gmail.com> wrote: > On Tue, Mar 4, 2014 at 8:06 PM, Rupert Westenthaler < > rupert.westentha...@gmail.com> wrote: > >> Hi Tharindu >> >> Thanks for you interest! >> >> > As mentioned in STANBOL-197[1], a topic classification engine for Stanbol >> > is being developed. In [2], it is discussed that this engine uses Solr >> > based classifier and the possibility of introducing more rigorous >> > classifiers based on OpenNLP and Mahout. >> > I am interested in this idea and like to contribute for future >> > developments. So I would like to know the current status of the project. >> >> The most recent release of the Topic Classification Engine mentioned >> in [1] and [2] contains of thee modules [3]. >> >> NOTE that the trunk version of this engine (1.0.0-SNAPSHOT) has still >> an open issue [4] that breaks the RESTful API used for training. >> >> Hi Rupert, > Thanks for your suggestions and I would like to clarify further questions > regarding the existing topic classifier. > > The said issue(STANBOL-1278) occurs at the run time and apparently it is > due to an osgi bundle dependency resolution issue. So what I was thinking > is, if we are integrating tools as I've suggested in my previous mail, do > we have to adhere to the existing architecture or is it desirable to go for > a new topic classification design? > Why I'm asking this is, if we decide to stick to the existing architecture, > resolving STANBOL-1278 will also be a part of the required workflow. > Further, the current design was done to integrate a Solr instance. So we > need to consider the possibility/feasibility of using the existing design > to integrate Mahout and OpenNLP. > > Your opinion on this matter is highly appreciated. > > Thanks, > > > >> AFAIK no work regarding OpenNLP or Mahout was done. If you want to >> implement other topic classification engine you can start from the >> Enhancement-Engine archetype [5]. Topic engines are expected to >> contribute fise:TopicAnnotation [6] to the metadata of the content >> item. >> >> Having other topic classification engine would be really great. If you >> have questions or need some help feel free to ask here or directly in >> the #stanbol channel on freenode. >> >> best >> Rupert >> >> >> > [1] https://issues.apache.org/jira/browse/STANBOL-197 >> > [2] https://vimeo.com/45633053 >> [3] http://search.maven.org/#search|ga|1|stanbol%20topic >> [4] https://issues.apache.org/jira/browse/STANBOL-1278 >> [5] >> http://svn.apache.org/repos/asf/stanbol/trunk/development/archetypes/enhancement-engine/ >> [6] >> http://stanbol.apache.org/docs/trunk/components/enhancer/enhancementstructure.html#fisetopicannotation >> >> On Tue, Mar 4, 2014 at 1:20 PM, Tharindu Rusira >> <tharindurus...@gmail.com> wrote: >> > Hi all, >> > As mentioned in STANBOL-197[1], a topic classification engine for Stanbol >> > is being developed. In [2], it is discussed that this engine uses Solr >> > based classifier and the possibility of introducing more rigorous >> > classifiers based on OpenNLP and Mahout. >> > I am interested in this idea and like to contribute for future >> > developments. So I would like to know the current status of the project. >> > >> > [1] https://issues.apache.org/jira/browse/STANBOL-197 >> > [2] https://vimeo.com/45633053 >> > >> > Thanks, >> > >> > -- >> > M.P. Tharindu Rusira Kumara >> > >> > Department of Computer Science and Engineering, >> > University of Moratuwa, >> > Sri Lanka. >> > +94757033733 >> > www.tharindu-rusira.blogspot.com >> >> >> >> -- >> | Rupert Westenthaler rupert.westentha...@gmail.com >> | Bodenlehenstraße 11 ++43-699-11108907 >> | A-5500 Bischofshofen >> > > > > -- > M.P. Tharindu Rusira Kumara > > Department of Computer Science and Engineering, > University of Moratuwa, > Sri Lanka. > +94757033733 > www.tharindu-rusira.blogspot.com -- | Rupert Westenthaler rupert.westentha...@gmail.com | Bodenlehenstraße 11 ++43-699-11108907 | A-5500 Bischofshofen