I am new to Python. I am trying to extract text from the bookmarks in a PDF
file that would provide the data for a Word template merge. I have gotten down
to a string of text pulled out of the list object that I got from using PyPDF2
module. I am stuck on now to get the data out of the string that I need. I am
calling it a string, but Python is recognizing as a dictionary object.
Here is the string:
{'/Title': '1F: Progress Notes Src.: MILANI, JOHN C Tmt. Dt.: 05/12/2014 -
05/28/2014 (9 pages)', '/Page': IndirectObject(465, 0), '/Type': '/FitB'}
What a want is the following to end up as fields on my Word template merge:
MedSourceFirstName: "John"
MedSourceLastName: "Milani"
MedSourceLastTreatment: "05/28/2014"
If I use keys() on the dictionary I get this:
['/Title', '/Page', '/Type']I was hoping "Src" and Tmt Dt." would be treated as
keys. Seems like the key/value pair of a dictionary would translate nicely to
fieldname and fielddata for a Word document merge. Here is my code so far.
[python]import PyPDF2
pdfFileObj=open('x.pdf','rb')
pdfReader=PyPDF2.PdfFileReader(pdfFileObj)
MyList=pdfReader.getOutlines()
MyDict=(MyList[-1][0])
print(isinstance(MyDict,dict))
print(MyDict)
print(list(MyDict.keys()))[/python]
I get this output in Sublime Text:
True
{'/Title': '1F: Progress Notes Src.: MILANI, JOHN C Tmt. Dt.: 05/12/2014 -
05/28/2014 (9 pages)', '/Page': IndirectObject(465, 0), '/Type': '/FitB'}
['/Title', '/Page', '/Type']
[Finished in 0.4s]
Thank you in advance for any suggestions.
--
https://mail.python.org/mailman/listinfo/python-list