srujana-kuntumalla opened a new pull request, #2892:
URL: https://github.com/apache/tika/pull/2892

   ## Summary
   
   - Adds `PDF.HAS_SIGNATURE_FIELDS` (`pdf:hasSignatureFields`) metadata 
property to report the presence of AcroForm `/FT /Sig` fields, regardless of 
whether a signature has been applied
   - Refactors `PDFParser.extractSignatures()` to iterate over 
`PDDocument.getSignatureFields()` instead of `getSignatureDictionaries()`, so 
unsigned signature fields are detected; `TikaCoreProperties.HAS_SIGNATURE` is 
still only set when a field has an actual `PDSignature` applied
   - Adds a minimal test PDF (`testPDF_unsigned_sig_field.pdf`) with an 
unsigned signature field and `/SigFlags 3`
   - Updates `testSignatureInAcroForm` to assert the new property and tightens 
expectations; adds `testUnsignedSignatureField` for TIKA-4756
   
   ## Motivation
   
   PDFs that contain AcroForm signature fields (`/FT /Sig`) but have not yet 
been signed were previously indistinguishable from PDFs with no signature 
infrastructure. This matters for downstream applications such as PDF/A 
converters (e.g. OCRmyPDF) that need to skip documents with signature fields to 
avoid invalidating future signatures.
   
   ## Test plan
   
   - [x] `PDFParserTest#testUnsignedSignatureField` — new test asserting 
`pdf:hasSignatureFields=true` and no `HAS_SIGNATURE` on a PDF with unsigned sig 
field
   - [x] `PDFParserTest#testSignatureInAcroForm` — existing test updated to 
assert the new property; confirms no actual signature is set on 
`testPDF_acroform3.pdf`
   
   Fixes: https://issues.apache.org/jira/browse/TIKA-4756
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to