Hello,
thanks for that.
Should we proceed with the contribution?
The next steps are roughly as follows:
- Create a jira issue for the contribution, and attach the source code to it
- Do a vote to accept it on the dev list
- Do IP clearance
IP clearance will most likely include signing these papers:
http://www.apache.org/licenses/software-grant.txt
http://www.apache.org/licenses/icla.txt
http://www.apache.org/licenses/cla-corporate.txt (if you work on the
code during your day job)
After we are through these steps we can import the code into our
subversion repository.
Jörn
On 8/15/11 8:39 PM, Boris Galitsky wrote:
Hi Jason and Jörn
I will briefly comment on how our approach is different from the authors
below:http://www.cs.utexas.edu/~ai-lab/downloadPublication.php?filename=http://www.cs.utexas.edu/users/ml/papers/kim.coling10.pdf&citation=In+%3Ci%3EProceedings+of+the+23rd+International+Conference+on+Computational+Linguistics+%28COLING+2010%29%3C%2Fi%3E%2C+543--551%2C+Beijing%2C+China%2C+August+2010.Sure,
having something that maps trees to logical forms would be useful.
Boris, I would recommend you look at papers in Ray Mooney's group on
semantic parsing:
http://www.cs.utexas.edu/~ml/publications/area/77/learning_for_semantic_parsing
"The authors align naturallanguage sentences to their correct meaning
representations given the ambiguous supervision
provided by a grounded language acquisition scenario".This approach takes a
vertical domain, applies statistical learning and learns to find a better meaning
representation, taking into account, in particular, parsing information. Mooney's et
al approach cant directly map a syntactic tree structure into a logic form
'structure', at least it does not intend to do so.
If a vertical domain changes, one have to re-train. It is adequate for a
robocap competition but not really for an industrial app in a horizontal
domain, in my opinion.
What we are describing/proposing does not go as high semantically as Mooney et
al, but it is domain - independent and is directly (in a structured, not
statistical) way linked to syntactic parse tree, so a user does not have to
worry about re-training. After training, if we have a fixed set of meaning
(meaning representations in Mooneys' terms), his system would give a higher
accuracy than ours, but his settings are not really plausible for industrial
cases like search relevance and text relevance in a broader domain. What we
observed is that overlap of syntactic tree, properly transformed, is usually
good enough to accept/reject relevance
In particular, Ruifang Ge (who is now at Facebook) did phrase structure to
logical form learning:
http://www.cs.utexas.edu/~ai-lab/pub-view.php?PubID=126959
I definitely enjoyed reading the phd thesis, nice survey part! Earlier work of Mooney at
al used Inductive Logic Programming to learn commonalities between syntactic structure.
Our approach kind of takes it to extreme: syntactic parse trees are considered a special
case of logic formulas and Inductive Logic Programming 's anti-unification is defined
DIRECTLY on syntactic parse trees.I am more skeptical about universality of 'semantic
grammar' unless we focus on a given text classification domain. So my understanding is
lets not go too far up in semantic representation unless the classification domain is
fixed, there is no such thing as most accurate semantic representation for everything
(unless we are in a so restricted domain as specific database querying). So I can see
"Meaning Representation Language Grammar" as a different component of openNLP,
but it is hard for me to see how a search engineer (not a linguist) can just plug it in
and leverage it in an industrial application.
RegardsBoris