On 13/02/18 13:11, Stanley Denman wrote:
I am trying to performance a regex on a "string" of text that python isinstance 
is telling me is a dictionary.  When I run the code I get the following error:

{'/Title': '1F:  Progress Notes  Src.:  MILANI, JOHN C Tmt. Dt.:  05/12/2014 - 
05/28/2014 (9 pages)', '/Page': IndirectObject(465, 0), '/Type': '/FitB'}

Traceback (most recent call last):
   File "C:\Users\stand\Desktop\PythonSublimeText.py", line 9, in <module>
     x=MyRegex.findall(MyDict)
TypeError: expected string or bytes-like object

Here is the "string" of code I am working with:

{'/Title': '1F:  Progress Notes  Src.:  MILANI, JOHN C Tmt. Dt.:  05/12/2014 - 
05/28/2014 (9 pages)', '/Page': IndirectObject(465, 0), '/Type': '/FitB'}

I want to grab the name "MILANI, JOHN C" and the last date "-mm/dd/yyyy" as a 
pair such that if I have  X numbers of string like the above I will end out with N pairs of values 
(name and date)/  Here is my code:
import PyPDF2,re
pdfFileObj=open('x.pdf','rb')
pdfReader=PyPDF2.PdfFileReader(pdfFileObj)
Result=pdfReader.getOutlines()
MyDict=(Result[-1][0])
print(MyDict)
print(isinstance(MyDict,dict))
MyRegex=re.compile(r"MILANI,")
x=MyRegex.findall(MyDict)
print(x)

As the error message says, re.findall() expects a string. A dictionary is in no sense a string, so passing it in whole like that won't work. If you know that the name will always show up in the title field, you can pass just the title:

  x = MyRegex.findall(MyDict['/Title'])

Otherwise you will have to loop through all the entries in the dictionary:

  for entry in MyDict.values():
    x = MyRegex.findall(entry)
    # ...and do something with x

I rather suspect you are going to find that the titles aren't in a very systematic format, though.

--
Rhodri James *-* Kynesim Ltd
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to