On 22 March 2011 17:04, Mohit Taneja <[email protected]> wrote:
> Hi,
>
> I have been digging about the use of flag diactrics in morphological
> analysis
> (http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Flag_diacritics_in_lttoolbox).
> I understood the need for it , which would be mostly in cases in which the
> languages have prefix inflection as well as circumfix inflection, in
> addition to the suffix inflection.
>
> So, when one is checking for different analysis/generation from root word,
> there could be certain pairs of suffix and prefix inflections which are just
> not possible with each other, so to avoid them we use flag diactrics.
>
> But, I am not able to understand that how this thing is done currently at
> compile time and how can we port this functionality to runtime. I have been
> trying to read stuff from the FSM book. Also, I checked out the code from
> svn and compiled lttoolbox. And with that i tried to lt-expand the
> dictionary given here
> http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code/Flag_diacritics_in_lttoolbox
> . When doing so, I get an error : Error (19): Invalid node '<cdefs>'.
That's just a speculation as to how it would look. Your mission,
should you choose to accept it, would be to /also/ implement the
change to the dictionary format.
Presumably, going by that page, the implementation would keep a second
Alphabet of symbols (the cdefs)[1], and each transduction would be
checked for those symbols, and, if present, that there is 1) more than
one and 2) that they match, otherwise the transduction is discarded.
The code to discard a transition is already implemented for compounds
-- only the change to the compiler to add the two new XML elements,
and the new runtime check for those symbols.
[1] This is probably a little impractical - it would probably be
better to just add them the same way as a regular sdef, and keep a
list of the integers corresponding to cdefs:
void
Compiler::procCDef()
{
// If it's already defined, it may have been as an sdef
if(alphabet.isSymbolDefined(symbol))
{
wcerr << L"Error (" << xmlTextReaderGetParserLineNumber(reader);
wcerr << L"): Symbol already defined: '" << symbol << L"'." << endl;
exit(EXIT_FAILURE);
}
alphabet.includeSymbol(L"<"+attrib(COMPILER_N_ATTR)+L">");
cdefs.push_back(alphabet(L"<"+attrib(COMPILER_N_ATTR)+L">"));
}
(you would need to add
list<int> cdefs;
to compiler.h in lttoolbox, and add code to write the list to the
output, but that shouldn't take more than 5 minutes; doing it this
way, the code to handle <c/> would be exactly the same as for <s/>)
You should probably think of some other project to add to your
proposal, because I really don't think this would take 3 months (or
even 3 weeks) to implement. Jacob already did the hard part of it for
compounds.
--
<Leftmost> jimregan, that's because deep inside you, you are evil.
<Leftmost> Also not-so-deep inside you.
------------------------------------------------------------------------------
Enable your software for Intel(R) Active Management Technology to meet the
growing manageability and security demands of your customers. Businesses
are taking advantage of Intel(R) vPro (TM) technology - will your software
be a part of the solution? Download the Intel(R) Manageability Checker
today! http://p.sf.net/sfu/intel-dev2devmar
_______________________________________________
Apertium-stuff mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/apertium-stuff