Hi, MIkel,
Thanks for your prompt and informative reply!
I will think carefully about the questions and make updates to the
application.
Because of time difference, I have to get some rest.
Best,
Gang
2013/4/25 Mikel Forcada <m...@dlsi.ua.es>
> Gang,
> my second round of responses is http://piratepad.net/gang-s-notes I have
> to think harder because I don't know how to solve the problem you pose.
>
> Mikel
>
> Al 04/24/2013 05:57 PM, En/na Gang Chen ha escrit:
>
> Hi,Mikel,
>
> Thank you for your guidance!
>
> During the last 2 days, I was mainly focused on reading the paper and
> writing my application. The good news are that I understand the
> unsupervised traning alroghtm, which I think is indeed the most
> mathematically heavy part, and that the first draft of application is done
> :)
>
> First of all, would you please have a look over the application draft?
> Your advices are always welcome! Please see here:
> https://github.com/elephantgcc/gsoc-2013/blob/master/Application
>
>
> -----------------------------------------------------------split
> line--------------------------------------------------------------------------------------------------------------------
>
> OK, I can't wait to share with you the example that helped me to go
> throught all the maths in the unserpervised part of the paper.
>
> Suppose that we only have two sentences:
> A ^B/x/y$ C
> A ^D/x/y/z$ C
>
> where A, B and C are all words, x, y, and z are possible tags for the
> context A_C, and we only focus on this context.
>
> At the initial stage,
>
> n_0 (A_x_C) = 1 * 1/2 + 1 * 1/3 = 5/6
> n_0 (A_y_C) = 1 * 1/2 + 1 * 1/3 = 5/6
> n_0 (A_z_C) = 1 * 1/3 = 1/3
>
> where n_0 (A_x_C) donates the estimated count that context A_C should
> tag the middle word as 'x', at the 0-th iteration.
>
> then using equation (10) in the paper, the iteration begins with the
> 1-st iteration,
>
> n_1 (A_x_C) = 5/6 * ( 1 * 1/(5/6 + 5/6) + 1 * 1/(5/6 + 5/6 + 1/3) ) =
> 11/12
> n_1 (A_y_C) = 5/6 * ( 1 * 1/(5/6 + 5/6) + 1 * 1/(5/6 + 5/6 + 1/3) ) = 11/12
> n_1 (A_z_C) = 5/6 * ( 1 * 1/(5/6 + 5/6 + 1/3)
> ) = 1/6
>
> and to the 2-nd iteration:
>
> n_2 (A_x_C) = 11/12 * ( 1 * 1/(11/12 + 11/12) + 1 * 1/(11/12 + 11/12 +
> 1/6) ) = 23/24
> n_2 (A_y_C) = 11/12 * ( 1 * 1/(11/12 + 11/12) + 1 * 1/(11/12 + 11/12 +
> 1/6) ) = 23/24
> n_2 (A_z_C) = 11/12 * ( 1 * 1/(11/12 +
> 11/12 + 1/6) ) = 1/12
>
> ...
>
> In this way, the wheels are running!
>
> So finally the A_C context will outupt either 'x' or 'y' as the best
> tag, which is in consistancy with the intuitive.
>
> I guess you must be happy when you first invented the algorithm :)
>
> So far, the finite-state transducer *minimization* part still remains a
> problem to me. I think I still need to spare some time to learn about it.
>
> -----------------------------------------------------------split
> line-----------------------------------------------------------------------------------------------------------
>
> The following is the reply to your last mail.
>
> > For that, you will have to study the current .tsx format and make
> sense of it, as your tagger will use exactly that format.
>
> For the TXS format you mentioned, I've make sense of it, by reading the
> en-es package's example.
>
> > Forbid rules can be applied to the input text before actually training
> or running the tagger. You will also need to find a good way to store
> probabilities or turn them into rules which can be read and perhaps edited
> using linguistic knowledge.
>
> To be honest, I didn't quite catch your point here.
>
> For example, we have a FORBID rule that forbids the tag sequence "a x",
> and we have training text as:
>
> ^A/a/b$ ^B/x/y$
>
> My question is:
>
> It seems that we can't just drop 'a' in A and 'x' in B in the input
> text, because "a y" and "b x" are not forbidden. If 'a' of A and 'x' of B
> have been dropped together, we will never have "a y" and "b x". so how to
> apply forbid rules *before* training?
>
> However, I came up with a way that seems can apply forbid rule *during
> tagging*. Let me explain.
>
> For example, if we have a forbid rule as "a x", and the input sentence
> to be tagged is as following:
> ^A/a/b$ ^B/x/y$
>
> Firstly, A has been tagged as 'a' with the help of context #_B (# is the
> sentence start). So, for B, the candidate 'x' is directly FORBIDDEN. This
> is what I mean by "during tagging".
>
> However, there seems to be a problem with this appoach, that the forbid
> pair "a x" still occupies some probability during and after the training
> procedure. This might affect the precision of the tagger? What do you think
> of it?
>
>
> Look forward to your reply!
>
>
>
> Best,
>
> Gang
>
>
>
>
>
> 2013/4/21 Mikel Forcada <m...@dlsi.ua.es>
>
>> Gang,
>>
>> great stuff; I haven't checked it exhaustively but as far as I am testing
>> it seems to behave as expected.
>>
>> Now it is time to move on to preparing your application. For that, you
>> will have to study the current .tsx format and make sense of it, as your
>> tagger will use exactly that format.
>>
>> Forbid rules can be applied to the input text before actually training or
>> running the tagger. You will also need to find a good way to store
>> probabilities or turn them into rules which can be read and perhaps edited
>> using linguistic knowledge.
>>
>> Please do not hesitate to ask any questions to me or to the list.
>>
>> Best,
>>
>> Mikel
>>
>>
>> --
>> Mikel L. Forcada (http://www.dlsi.ua.es/~mlf/)
>> Departament de Llenguatges i Sistemes Informàtics
>> Universitat d'Alacant
>> E-03071 Alacant, Spain
>> Phone: +34 96 590 9776
>> Fax: +34 96 590 9326
>>
>>
>
>
> --
> Mikel L. Forcada (http://www.dlsi.ua.es/~mlf/)
> Departament de Llenguatges i Sistemes Informàtics
> Universitat d'Alacant
> E-03071 Alacant, Spain
> Phone: +34 96 590 9776
> Fax: +34 96 590 9326
>
>
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff