Thanks a lot, Harry, for explaining the technical background. This
definitely explains the strange encoding. For the time being, re-OCRing
it seems to be one manageable approach (already pointed out by Tim
Cahill), so I will try the Google cloud SDK which works in the
background of SanskritCR.

Best, Oliver

On 09/06/2022 16:40, Harry Spier wrote:
Oliver,
When I open the link you gave, in a browser it gives the file name in
the upper left corner as drahyayana_shrauta_sutra.qxd . The extension
qxd is for QuarkXpress files.  QuarkXpress is publishing software.  When
I download the file, it downloads as a pdf but when I look at the
properties, the fonts in the file are embedded Type 1 postscript fonts.

MSTT315b9a0609O15504302

MSTT319c623cc2O17006000

MSTT31ab77a7ccO21306200

MSTT31b3f9fa67O15204300

So it looks like QuarkXpress has disguised the names of the fonts it
used in creating the pdf.


So as far as I can see, this is a (probably quite old) pdf file created
from a QuarkXpress file.  Since the fonts aren't unicode fonts, and the
names of the fonts are disguised,  the only thing I can think of is to
make a jpeg of each page and enter it into SanskritCR
https://ocr.sanskritdictionary.com/
<https://ocr.sanskritdictionary.com/> and then manually correct the errors.

Quite laborious but less laborious than typing the whole thing by hand
again.
Harry Spier


On Thu, Jun 9, 2022 at 12:32 AM Oliver Hellwig via INDOLOGY
<[email protected] <mailto:[email protected]>> wrote:

    Dear all,

    I came across this digitized version of the Drahyayana Srauta Sutra:

    
http://www.hinduonline.co/vedicreserve/kalpa/shrauta/drahyayana_shrauta_sutra.pdf
    
<http://www.hinduonline.co/vedicreserve/kalpa/shrauta/drahyayana_shrauta_sutra.pdf>

    Everything seems fine, but when I try to copy-paste the text, the result
    for the first line looks like:

    {;Á;y,≈*tsU]m

    (This should be the name of the text.)

    Does anybody know how to obtain readable Devanagari from this kind of
    custom encoding?

    Best, Oliver

    ---
    Oliver Hellwig, IVS Zürich/ILI Düsseldorf

    _______________________________________________
    INDOLOGY mailing list
    [email protected] <mailto:[email protected]>
    https://list.indology.info/mailman/listinfo/indology
    <https://list.indology.info/mailman/listinfo/indology>


_______________________________________________
INDOLOGY mailing list
[email protected]
https://list.indology.info/mailman/listinfo/indology

Reply via email to