-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 On Mon, Jul 27, 2009 at 6:40 AM, Peter Bienstman wrote: > Sounds very interesting for when the student is only interested in a > relatively small corpus, like in that example of old testament Greek. I'm not > so sure it's immediately applicable to living languages, where I guess you can > just rely on standard frequency lists.
One of his points seems to be that just going down the frequency list loses you a lot: it's possible you'll need to go way down the list before *any* sentences become understandable. So his algorithms will try a mix of low and high frequency words, searching for whichever group of, say, 10 words will lead to the most translated sentences. This may not reflect the frequency count (imagine that there are 5 sentences which are the sole usage of some rare word Z which is ranked #1000; the best solution might be to learn #1, #2, #3, and Z to get 5 sentences, while just knowing 1-4 will leave the learner adrift since those 4 words are all used in sentences with rarer verbs and nouns). Since this sounds very much like a NP-hard problem, his code uses interesting search techniques like simulated annealing. Which might imply that efficiency would be a concern on a large corpus like the Proust I suggested, but he doesn't seem to've looked into its scalability. - -- gwern -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) iEYEAREKAAYFAkpth4YACgkQvpDo5Pfl1oIcHgCgkI/h58EfbLplUgWmAkJ4Rbni EkwAn3Gtok4rtLzR+yPdoFLGJE/r4PGS =isaQ -----END PGP SIGNATURE----- --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "mnemosyne-proj-users" group. To post to this group, send email to mnemosyne-proj-users@googlegroups.com To unsubscribe from this group, send email to mnemosyne-proj-users+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/mnemosyne-proj-users?hl=en -~----------~----~----~----~------~----~------~--~---