Dear Harry, First of all, thanks to everybody who has reached out to us so far, either on or off list!
On Thu, May 8, 2025 at 2:23 PM Harry Spier <[email protected]> wrote: > Dear Sebastian > > You wrote: > >> The Dharmamitra project is preparing the training of OCR models for >> typeset Devanagi editions this summer, >> > > Can you say a little more about the project. > 1) Is this to have different capabilities than SanskritCR (already > available). https://ocr.sanskritdictionary.com/ . From my brief use of > SanskritCR it seems to work well for printed editions from the first half > of the 20th century. > If I understand correctly, this is a wrapper for Google (=Cloud Vision?) OCR? Cloud vision as of now still struggles to some extent on complex ligatures of lesser common fonts. We plan to train an end-to-end vision language model. There really is no promise for this to work beyond what the best current solutions already achieve, but we want to give it a try. > 2) Is it the actual fonts you want, or sanskrit text written in different > fonts. > Actual fonts, yes! > > 3) If it's fonts you are looking for, is it unicode fonts you want. The > reason I'm asking is that the bulk of the literature (older than the last > 20 years or so) typeset by computer would be in non-unicode fonts. > We really are interested in eveyrthing, while unicode is the best we might invest time to make other fonts work if needed. With many thanks, Sebastian Nehrdich > Thanks, > Harry Spier >
_______________________________________________ INDOLOGY mailing list [email protected] https://list.indology.info/mailman/listinfo/indology
