Dear Bert,

Thank you very much for the response.

I was aware of pdftools - but did not recall any such functionality. I have 
checked again (both pdftools, qpdf and the 3rd one): unfortunately, they do not 
implement such functionality. There might be other packages, which I missed.

However, the functionality is feasible. I will add a few more details - maybe 
someone picks up the task.

It is possible to edit manually the pdf-file, though it is quite cumbersome to 
find the right annotation.

1. One needs to edit the values both in the \QuadPoints and the \Rect in the 
\AP object.

2. Modifying the color is trickier:
\C() encodes the color and \CA the alpha channel (= 1): but neither Acrobat, 
nor MIcrosoft Edge update the color. The value of the color encoded in the 
stream is used instead.

It is possible to "trick" Edge: modify the \C color and set "\ca 1" (in the 
stream block) to a lower value (e.g. "\ca 0.99"). MS Edge will then accept the 
modified color (but Acrobat ignores it). Changing the value in the stream is 
the actual solution.

Note: non-rectangular shapes can be specified as well.

I hope that some of the referenced packages pick up this task.

Sincerely,

Leonard

________________________________
From: Bert Gunter <bgunter.4...@gmail.com>
Sent: Saturday, June 1, 2024 9:23 PM
To: Leo Mada <leo.m...@syonic.eu>
Cc: r-help@r-project.org <r-help@r-project.org>
Subject: Re: [R] Tools to modify highlighted areas in pdf documents?

Search!

on rseek.org<http://rseek.org>, the query "modify pdf documents in R" brought 
up the staplr package. A quick web search with the same query brought up the 
pdftools package.

These were cursory efforts, so you may well find more. You will have to 
determine whether and to what degree any meet your needs.

-- Bert

On Sat, Jun 1, 2024 at 9:16 AM Leo Mada via R-help 
<r-help@r-project.org<mailto:r-help@r-project.org>> wrote:
Dear R-Users,

Are there any packages that enable the modifications of highlighted areas / 
annotations in pdf documents?

It seems feasible - I have explored some R code (see below). However, I would 
rather avoid to reinvent the wheel.

The problem:
When highlighting pdf-documents with Microsoft Edge, the bounding box is 
sometimes misplaced, and quite ugly so. It also lacks the ability to draw lines 
or arrows.

On the other hand, I did not get used to Acrobat Reader: it usually involves 
much more effort to add specific highlights. Lines can be drawn, but are NOT 
straight!

Are there tools to change the size/position of highlights?
Or to add highlights and underline words?
 Changing position/size manually by editing the data in the pdf-document is 
possible. Changing the color is more trickier (somehow possible in Microsoft 
Edger; though the direct approach to rewrite the actual stream is better). 
Maybe there are some tools to do it?

Some R code is below.

Sincerely,

Leonard
#########

library(zip)

con = file("_some_pdf_.pdf", "rb")

NL = 0
# - very dirty hack;
# - assumes Annotations are in the last fragment/chunk;
while(TRUE) {
    tmp = readBin(con, "raw", 1024*128 + 515);
if(length(tmp) == 0) break;
x = tmp;
# isNL = (x == 10) | (x == 13);
isNL = (x == 13);
isNL = isNL & (x[which(isNL) + 1] == 10);
    NL = NL + sum(isNL);
}

close(con)

idP = which(isNL)

idS = 935; # will vary with pdf and Annotations and ...;
nLast = 4; # usually 2 chunks
idx = idP[seq(idS, length.out = nLast)]

# Check: Right position?
# tmp = x[seq(idx[1] + 2, idx[1 + 2] - 1)]
# intToUtf8(tmp)

tmp = inflate(x[seq(idx[1] + 2, idx[nLast] - 1)])
intToUtf8(tmp$output)

# Output of inflate: an Example
# "/GS gs .56078434 .87058824 .97647059 rg\n
# 337.298 183.836 m 364.322 183.836 l 364.322 171.83 l 337.298 171.83 l h f\n"

# Note: /BBox[ 337.298 171.83 364.322 183.836]

The raw pdf data:

1948 0 obj
<</AP<</N 1949 0 R >>/C[ 0.560784 0.870588 0.976471]/CA 1/F 
4/PDFIUM_HasGeneratedAP true/QuadPoints[ 337.298 186 364.322 186 337.298 174.6 
364.322 174.6]/Rect[ 337.298 174.6 364.322 186]/Subtype/Highlight/Type/Annot>>
endobj
1949 0 obj
<</BBox[ 337.298 171.83 364.322 183.836]/Filter/FlateDecode/FormType 1/Length 
86/Matrix[ 1 0 0 1 0 0]/Resources<</ExtGState<</GS<</AIS false/BM/Multiply/CA 
1/Type/ExtGState/ca 1>>>>>>/Subtype/Form/Type/XObject>>stream
xœE˱
€0  Àž)~ “ä Û™€ Ø P@ ûKˆ"Оtó²¢ß jÉC© ðT#ŠBš›zª
WŸH—Ò 9(AÃ  š
Kùäøų _ iÀŽmz dR ²
endstream
endobj


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org<mailto:R-help@r-project.org> mailing list -- To 
UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to