Dear R-Users,
Are there any packages that enable the modifications of highlighted areas /
annotations in pdf documents?
It seems feasible - I have explored some R code (see below). However, I would
rather avoid to reinvent the wheel.
The problem:
When highlighting pdf-documents with Microsoft Edge, the bounding box is
sometimes misplaced, and quite ugly so. It also lacks the ability to draw lines
or arrows.
On the other hand, I did not get used to Acrobat Reader: it usually involves
much more effort to add specific highlights. Lines can be drawn, but are NOT
straight!
Are there tools to change the size/position of highlights?
Or to add highlights and underline words?
Changing position/size manually by editing the data in the pdf-document is
possible. Changing the color is more trickier (somehow possible in Microsoft
Edger; though the direct approach to rewrite the actual stream is better).
Maybe there are some tools to do it?
Some R code is below.
Sincerely,
Leonard
#########
library(zip)
con = file("_some_pdf_.pdf", "rb")
NL = 0
# - very dirty hack;
# - assumes Annotations are in the last fragment/chunk;
while(TRUE) {
tmp = readBin(con, "raw", 1024*128 + 515);
if(length(tmp) == 0) break;
x = tmp;
# isNL = (x == 10) | (x == 13);
isNL = (x == 13);
isNL = isNL & (x[which(isNL) + 1] == 10);
NL = NL + sum(isNL);
}
close(con)
idP = which(isNL)
idS = 935; # will vary with pdf and Annotations and ...;
nLast = 4; # usually 2 chunks
idx = idP[seq(idS, length.out = nLast)]
# Check: Right position?
# tmp = x[seq(idx[1] + 2, idx[1 + 2] - 1)]
# intToUtf8(tmp)
tmp = inflate(x[seq(idx[1] + 2, idx[nLast] - 1)])
intToUtf8(tmp$output)
# Output of inflate: an Example
# "/GS gs .56078434 .87058824 .97647059 rg\n
# 337.298 183.836 m 364.322 183.836 l 364.322 171.83 l 337.298 171.83 l h f\n"
# Note: /BBox[ 337.298 171.83 364.322 183.836]
The raw pdf data:
1948 0 obj
<</AP<</N 1949 0 R >>/C[ 0.560784 0.870588 0.976471]/CA 1/F
4/PDFIUM_HasGeneratedAP true/QuadPoints[ 337.298 186 364.322 186 337.298 174.6
364.322 174.6]/Rect[ 337.298 174.6 364.322 186]/Subtype/Highlight/Type/Annot>>
endobj
1949 0 obj
<</BBox[ 337.298 171.83 364.322 183.836]/Filter/FlateDecode/FormType 1/Length
86/Matrix[ 1 0 0 1 0 0]/Resources<</ExtGState<</GS<</AIS false/BM/Multiply/CA
1/Type/ExtGState/ca 1>>>>>>/Subtype/Form/Type/XObject>>stream
xœE˱
€0 Àž)~“äÛ™€ØP@ûKˆ"Оtó²¢ßjÉC©ðT#ŠBš›zª
WŸH—Ò9(AÃ š
KùäøÅ³_iÀŽmz dR²
endstream
endobj
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.