Hello Stanbol Developers,

Background
I’m a data engineering manager with avid interest in NLP implementation and my 
partner, Kit Blake (cc), is a serial entrepreneur who’s done extensive work in 
building and implementing CMS systems (he is also more of a quasi-tech product 
manager). I’m based in Hong Kong and he’s in Rotterdam.

I’ve been using Stanbol for the last few years. I’m also part of the developer 
mailing list but haven’t contributed code as I’m not a developer. 

Overview
Recently we came across a challenge sponsored by the UN 
<https://uniteideas.spigit.com/unga-resolutions/Page/Home> for extracting 
information from General Assembly Resolutions based on certain ontologies.

Objective
The objective of the challenge is to carry out automatic entity extraction and 
content analysis to identify the following elements in UN General Assembly 
resolutions:

      Structures:

Title, proponent authority, identification numbers, date of approval;
Preamble (one or more paragraphs stating purpose, aims, and justification of a 
resolution);
Operative paragraphs (one or more paragraphs detailing the resolution);
Closing formula;
Annexes.

Entities: e.g. persons, roles, countries, places, deadlines, references to 
concepts relevant to the “United Nations Bibliographic Information System” 
(UNBIS) or “Sustainable Development Goals Interface Ontology” (SDGIO) of UN 
Environment.

Content analysis:
Preambular paragraphs: references, citations, mentions etc.
Operative paragraphs: identify who does invite/ask/require/demand what 
(actions, requests, recommendations, etc.) and organize into 
machine-understandable data structures.

I think Stanbol would be the perfect tool for this purpose. The ‘Structure' and 
'Content Analysis' parts can be done by indexing their main UNDO Ontology 
<https://github.com/UNSCEB-HLCM/undo/tree/master/ontology/current> and the 
‘Entities' can be extracted by DBPedia as well as the other ontologies that 
they’ve mentioned.

Development Needs
We’ve entered the challenge to submit a Stanbol based solution but are 
realising now that we need help with the development of a solution, primarily 
for two tasks.

1. Adding their ontology (undo.owl from here 
<https://github.com/UNSCEB-HLCM/undo/tree/master/ontology/current>) into 
Stanbol, to be used alongside DBPedia. I’ve managed to follow the instructions 
in these two pages - 
https://blog.zagwozdka.com/stanbol-getting-started-c047558856ec 
<https://blog.zagwozdka.com/stanbol-getting-started-c047558856ec> and 
https://stanbol.apache.org/docs/trunk/customvocabulary.html 
<https://stanbol.apache.org/docs/trunk/customvocabulary.html> - and create an 
index but am unable to initialise it. Once I achieve this, I’ll also probably 
try to add the other two ontologies.

2. Using the REST Interface to present all their documents to our instance of 
Stanbol, receiving back the results, and displaying them. I’m guessing this 
might've been easier with CMS Adapter and ContentHub but since those components 
are not part of the latest Stanbol version, I understand that we need to use 
the REST interface.

Request
We’d love to hear from anyone who might be interested in contributing. As you 
can see, there is no monetary benefit but we sure get bragging rights. And the 
GATE team is also submitting an entry so it could be kind of a face-off between 
GATE and Stanbol - I’m not trying to instigate any skirmishes - just hinting at 
friendly and healthy competition. :)

Alternatively, if someone can point me to a more lucid explanation for solving 
the two above problems (especially the first one),  I’ll do the implementation 
on my own. Of course, I’ll be forever grateful for this help and we'll mention 
the contribution in our submission.

The deadline for submissions is April 12th, so we’d highly appreciate responses 
sooner rather than later. Also, please feel free to let me know if anything 
aforementioned is unclear.

Thank you,
-Abhi

PS: On a separate note, if any of you have suggestions on how quasi-tech folks 
like me can contribute to the development, I’ll be more than happy to help. I’m 
very comfortable with SQL, can code a bit in Python, and am fairly conversant 
with OO concepts.

Reply via email to