Re: Catogorising strings into random versus non-random

Christian Gollwitzer Mon, 21 Dec 2015 03:02:48 -0800

Am 21.12.15 um 11:53 schrieb Christian Gollwitzer:

So for the spaces, either use a proper trainig material (some long
corpus from Wikipedia or such), with punctuation removed. Then it will
catch the correct probabilities at word boundaries. Or preprocess by
removing the spaces.


     Christian

PS: The real log-likelihood would become -infinity, when some pair doesnot appear at all in the training set (esp. the numbers, e.g.). I usedthe 1/total in the defaultdict to mitigate that. You could tweak thatvalue a bit. The larger the corpus, the sharper it will divide byitself, too.


        Christian
--
https://mail.python.org/mailman/listinfo/python-list

Re: Catogorising strings into random versus non-random

Reply via email to