Very quickly i believe that the problem you confront is more "perceptual" and at least as far as i can see you need a very good and robust "feature extraction" in order to be capable to compute the similarity (or "distance" in terms of machine learning/pattern classification)between the texts, which is somewhat quite difficult( i don't want to say intractable and get "disappointed",because "equivalent" problems that arise for instance in speech recognition are inherently pretty difficult).I have done some work over speech recognition (direction of work:applied and more theoretical) The feature extraction is essential(according to my perspective)but the features that you will get should be a mix of statistics,data mining,etc and also should take in account the underlying lexical and grammatical structure,the type of the text (if it is just a simple text,or a more scholar article,etc you would be suprised by the variabilty which is intrinsic in such applications) of the text and of course to have a large corpus for the training of your algorithm(which means that you also need to be " virtuoso "in databases especially to work with high dimensional data..).. At least this is my draft thought over your problem. Good luck.. In case you need something do not hesitate to sent me email.
--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Algorithm Geeks" group. To post to this group, send email to algogeeks@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups-beta.google.com/group/algogeeks -~----------~----~----~----~------~----~------~--~---