I am trying to create a Python script using the PyPDF Module. What the script does it take the 'Root' folder, merges all the PDFs in it and outputs the merged PDF in an 'Output' folder and renames it to 'Root.pdf' (the folder which containes the split PDFs). What it does then is do the same with the sub-directories, giving the final output a name equal to the sub-directories.
I'm stuck when coming to process the sub-directories, giving me an error code related to some hex values. (it seems that it is getting a null value which is not in hex) Please not that this happens only with certain PDF files. All of them are non-corrupted PDFs and can be opened with any PDFViewer. This is the error I get: Traceback (most recent call last): File "C:\Documents and Settings\student3\Desktop\Test\pdfMerger… line 76, in <module> files_recursively(path) File "C:\Documents and Settings\student3\Desktop\Test\pdfMerger… line 74, in files_recursively os.path.walk(path, process_file, ()) File "C:\Python27\lib\ntpath.py", line 263, in walk walk(name, func, arg) File "C:\Python27\lib\ntpath.py", line 259, in walk func(arg, top, names) File "C:\Documents and Settings\student3\Desktop\Test\pdfMerger… line 38, in process_file pdf = PdfFileReader(file( filename, "rb")) File "C:\Python27\lib\site-packages\pyPdf\pdf… line 374, in __init__ self.read(stream) File "C:\Python27\lib\site-packages\pyPdf\pdf… line 775, in read newTrailer = readObject(stream, self) File "C:\Python27\lib\site-packages\pyPdf\gen… line 67, in readObject return DictionaryObject.readFromStream(stream, pdf) File "C:\Python27\lib\site-packages\pyPdf\gen… line 531, in readFromStream value = readObject(stream, pdf) File "C:\Python27\lib\site-packages\pyPdf\gen… line 58, in readObject return ArrayObject.readFromStream(stream, pdf) File "C:\Python27\lib\site-packages\pyPdf\gen… line 153, in readFromStream arr.append(readObject(stream, pdf)) File "C:\Python27\lib\site-packages\pyPdf\gen… line 69, in readObject return readHexStringFromStream(stream) File "C:\Python27\lib\site-packages\pyPdf\gen… line 276, in readHexStringFromStream txt += chr(int(x, base=16)) ValueError: invalid literal for int() with base 16: '\x00\x00' -- http://mail.python.org/mailman/listinfo/python-list