[
https://issues.apache.org/jira/browse/TIKA-4756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18089257#comment-18089257
]
ASF GitHub Bot commented on TIKA-4756:
--------------------------------------
srujana-kuntumalla opened a new pull request, #2892:
URL: https://github.com/apache/tika/pull/2892
## Summary
- Adds `PDF.HAS_SIGNATURE_FIELDS` (`pdf:hasSignatureFields`) metadata
property to report the presence of AcroForm `/FT /Sig` fields, regardless of
whether a signature has been applied
- Refactors `PDFParser.extractSignatures()` to iterate over
`PDDocument.getSignatureFields()` instead of `getSignatureDictionaries()`, so
unsigned signature fields are detected; `TikaCoreProperties.HAS_SIGNATURE` is
still only set when a field has an actual `PDSignature` applied
- Adds a minimal test PDF (`testPDF_unsigned_sig_field.pdf`) with an
unsigned signature field and `/SigFlags 3`
- Updates `testSignatureInAcroForm` to assert the new property and tightens
expectations; adds `testUnsignedSignatureField` for TIKA-4756
## Motivation
PDFs that contain AcroForm signature fields (`/FT /Sig`) but have not yet
been signed were previously indistinguishable from PDFs with no signature
infrastructure. This matters for downstream applications such as PDF/A
converters (e.g. OCRmyPDF) that need to skip documents with signature fields to
avoid invalidating future signatures.
## Test plan
- [x] `PDFParserTest#testUnsignedSignatureField` — new test asserting
`pdf:hasSignatureFields=true` and no `HAS_SIGNATURE` on a PDF with unsigned sig
field
- [x] `PDFParserTest#testSignatureInAcroForm` — existing test updated to
assert the new property; confirms no actual signature is set on
`testPDF_acroform3.pdf`
Fixes: https://issues.apache.org/jira/browse/TIKA-4756
🤖 Generated with [Claude Code](https://claude.com/claude-code)
> Detecting Signatures in PDFs with AcroForm
> ------------------------------------------
>
> Key: TIKA-4756
> URL: https://issues.apache.org/jira/browse/TIKA-4756
> Project: Tika
> Issue Type: Improvement
> Components: metadata
> Reporter: Willy T. Koch
> Priority: Minor
> Labels: Signature
> Attachments: image-2026-06-11-18-05-01-275.png, sigflags_sample.pdf,
> signature.png
>
>
> We see that PDFs that have an Acroform that contains a signture /Sig fields
> aren't detected by the /meta analysis. It detects the AcroForm with
> "pdf:hasAcroFormFields": "true", but nothing on the /Sig part. They are
> created directly in Adobe Acrobat which is also possible in the Free version.
> It would be very useful to also return "hasSignature": "true" (or some
> other signature: property) in these kinds of filees, so we can handle it on
> our end. We use this to exluce PDFs with digital signatures from being
> reconverted to PDF/A.
>
> When I run it through the OCRmyPDF, it flags it as digitally signed and
> exits, which is how I first noticed it.
> _ocrmypdf sigflags_sample.pdf sigflags_sample_ocrmypdf.pdf_
> _DigitalSignatureError: Input PDF has a digital signature. OCR would alter
> the document,_
> _invalidating the signature._
>
> I've attached a small sample PDF with AcroForm and Signature to reproduce the
> issue.
>
> Willy T. Koch
> Technical Product manager,
> Public 360°
> Norway
--
This message was sent by Atlassian Jira
(v8.20.10#820010)