Hello Stanbol Developers, Background I’m a data engineering manager with avid interest in NLP implementation and my partner, Kit Blake (cc), is a serial entrepreneur who’s done extensive work in building and implementing CMS systems (he is also more of a quasi-tech product manager). I’m based in Hong Kong and he’s in Rotterdam.
I’ve been using Stanbol for the last few years. I’m also part of the developer mailing list but haven’t contributed code as I’m not a developer. Overview Recently we came across a challenge sponsored by the UN <https://uniteideas.spigit.com/unga-resolutions/Page/Home> for extracting information from General Assembly Resolutions based on certain ontologies. Objective The objective of the challenge is to carry out automatic entity extraction and content analysis to identify the following elements in UN General Assembly resolutions: Structures: Title, proponent authority, identification numbers, date of approval; Preamble (one or more paragraphs stating purpose, aims, and justification of a resolution); Operative paragraphs (one or more paragraphs detailing the resolution); Closing formula; Annexes. Entities: e.g. persons, roles, countries, places, deadlines, references to concepts relevant to the “United Nations Bibliographic Information System” (UNBIS) or “Sustainable Development Goals Interface Ontology” (SDGIO) of UN Environment. Content analysis: Preambular paragraphs: references, citations, mentions etc. Operative paragraphs: identify who does invite/ask/require/demand what (actions, requests, recommendations, etc.) and organize into machine-understandable data structures. I think Stanbol would be the perfect tool for this purpose. The ‘Structure' and 'Content Analysis' parts can be done by indexing their main UNDO Ontology <https://github.com/UNSCEB-HLCM/undo/tree/master/ontology/current> and the ‘Entities' can be extracted by DBPedia as well as the other ontologies that they’ve mentioned. Development Needs We’ve entered the challenge to submit a Stanbol based solution but are realising now that we need help with the development of a solution, primarily for two tasks. 1. Adding their ontology (undo.owl from here <https://github.com/UNSCEB-HLCM/undo/tree/master/ontology/current>) into Stanbol, to be used alongside DBPedia. I’ve managed to follow the instructions in these two pages - https://blog.zagwozdka.com/stanbol-getting-started-c047558856ec <https://blog.zagwozdka.com/stanbol-getting-started-c047558856ec> and https://stanbol.apache.org/docs/trunk/customvocabulary.html <https://stanbol.apache.org/docs/trunk/customvocabulary.html> - and create an index but am unable to initialise it. Once I achieve this, I’ll also probably try to add the other two ontologies. 2. Using the REST Interface to present all their documents to our instance of Stanbol, receiving back the results, and displaying them. I’m guessing this might've been easier with CMS Adapter and ContentHub but since those components are not part of the latest Stanbol version, I understand that we need to use the REST interface. Request We’d love to hear from anyone who might be interested in contributing. As you can see, there is no monetary benefit but we sure get bragging rights. And the GATE team is also submitting an entry so it could be kind of a face-off between GATE and Stanbol - I’m not trying to instigate any skirmishes - just hinting at friendly and healthy competition. :) Alternatively, if someone can point me to a more lucid explanation for solving the two above problems (especially the first one), I’ll do the implementation on my own. Of course, I’ll be forever grateful for this help and we'll mention the contribution in our submission. The deadline for submissions is April 12th, so we’d highly appreciate responses sooner rather than later. Also, please feel free to let me know if anything aforementioned is unclear. Thank you, -Abhi PS: On a separate note, if any of you have suggestions on how quasi-tech folks like me can contribute to the development, I’ll be more than happy to help. I’m very comfortable with SQL, can code a bit in Python, and am fairly conversant with OO concepts.