Re: current version of source for syntactic match / relevance component

Jörn Kottmann Wed, 17 Aug 2011 16:28:40 -0700

On 8/18/11 1:17 AM, Boris Galitsky wrote:

Hello
attached are three packages which is our current version of ourproposed contribution of syntactic match / text relevance componentfor openNLP.

Did everyone get the attachments? Usually we use jira for this, becausemail attachments used to be removed

when posted here. Not sure why I got it anyway.

I suggest that you additionally open a jira issue for this contribution,and then attach the zip files to it.


Here is the link to it:
https://issues.apache.org/jira/browse/OPENNLP

To start looking at it, please go to SyntMatcherTest.java and see theresults how commonality between sentences are computed.Then you can go to ParseTreeChunkTest.java and see how the operationof syntactic generalization is applied to particular chunks.
As an application, we selected the problem of content generation whenrelevance is critical.Please go to "RelatedSentenceFinder" and see which sentences mightserve as seeds for content generation.The system goes on the web and finds somewhat relevant sentences tothe seed ones and tries to "write an article".
As examples of auto-generated articles using this technology please see
http://www.allvoices.com/contributed-news/9423860-best-things-to-do-in-san-francisco-jazz-and-blues-festival

http://www.allvoices.com/contributed-news/9415063-britney-spears-femme-fatale-in-north-sf-bay-area

http://www.allvoices.com/contributed-news/9381803-cirque-du-soleil-quidam
This articles were generated using this class
RelatedSentenceFinder.java

Hence the proposed structure of our contribution:
package opennlp.tools.similarity, main and test: implementation ofsyntactic matchpackage opennlp.tools.similarity.apps: the content generation appleveraging syntactic match for sentence-level similarity
package opennlp.tools.similarity.apps.utils: utils for the above.
What we needs to be done before full consideration for contributioncan be done:1) make it use latest openNLP (now it is using a modified version of2008's openNLP, although pretty stable, working for 2 years inindustrial settings)
2) fix all tests, add more tests
3) clean the implementation and application code
4) add more applications to show more working scenarios of syntactic match
5) in addition to academic papers, have better docs for developers

We have a sandbox where it could live for a while until it is ready tobe released togetherwith the current head code. I would suggest to move it there, and thenmaybe we havea good chance to release it with one of the coming 1.5 series releasesor 1.6.


Would that work for you?

I will have a look at the code tomorrow.

Jörn

Re: current version of source for syntactic match / relevance component

Reply via email to