Very quickly i believe that the problem you confront is more
"perceptual" and  at least as far as i can see you need a very good
and robust "feature extraction" in order to be capable to compute  the
similarity (or "distance" in terms of machine learning/pattern
classification)between the texts, which is somewhat quite difficult( i
don't want to say intractable and get "disappointed",because
"equivalent" problems that arise for instance in speech recognition
are inherently pretty difficult).I have done some work over speech
recognition (direction of work:applied and more theoretical)
The feature extraction is essential(according to my perspective)but
the features that you will get should be a mix of statistics,data
mining,etc and also should take in account the underlying lexical and
grammatical structure,the type of the text (if it is just a simple
text,or a more scholar article,etc you would be suprised by the
variabilty which is intrinsic in such applications) of the text and of
course to have a large corpus for the training of your algorithm(which
means that you also need to be " virtuoso "in databases especially to
work with high dimensional data..)..
At least this is my draft thought over your problem.
Good luck..
In case you need something do not hesitate to sent me email.

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Algorithm Geeks" group.
To post to this group, send email to algogeeks@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups-beta.google.com/group/algogeeks
-~----------~----~----~----~------~----~------~--~---

Reply via email to