[bug #55107] PDFPIC: .psbb: support extraction of MediaBox from pdf files

2022-09-25 Thread G. Branden Robinson
Update of bug #55107 (project groff):

 Planned Release:  1.23.0 => None   


___

Reply to this item at:

  

___
Message sent via Savannah
https://savannah.gnu.org/




[bug #55107] PDFPIC: .psbb: support extraction of MediaBox from pdf files

2021-10-30 Thread Deri James
Follow-up Comment #13, bug #55107 (project groff):

Keith,

Sorry, you will have to help me here, if you want me to understand what you
meant.

> I am willing to pursue an extended lex/yacc implementation, (subject to Deri
actually answering the question I've now asked twice, without a response...

This implies to me that your pursuit of an extended lex/yacc implementation
which copes with pdf 1.5 extensions to the standard is subject to me answering
your question, i.e. it cannot be done until you receive the answer. This is
why it looks like you were claiming I was holding you up.

It is progress that you realised I was pointing you to Brendan's reference in
comment #4, not my silly mistake in the previous comment, I hope you have time
to read my answer to your cropbox question.

As to whether this ticket refers to PDFPIC or not: I think the answer is in
the title of the ticket and also the fact that you textually linked bug #61324
to this ticket in your email:-

https://lists.gnu.org/archive/html/groff/2021-09/msg00057.html

I.E. That this original ticket would solve the problems with using pdfinfo
with PDFPIC, the email chain is even referenced in bug #61324.

As to what upsets me is the impression you gave that I had been unhelpful in
not answering your question, whereas it was lack of dilligence in not reading
an email which actually starts with "Hi Keith". I do try to be a helpful chap
because I have had so much help from so many people.

Cheers 

Deri


___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #55107] PDFPIC: .psbb: support extraction of MediaBox from pdf files

2021-10-30 Thread Keith Marshall
Follow-up Comment #12, bug #55107 (project groff):

Deri,

[comment #10 comment #10:]
> If you read the email referenced by Branden in comment #4
> You will see that I have answered your question, I apologise if
> you had difficulty understanding my answer regarding CropBox,
> I'm happy to try and explain it again, if needed.
Sorry, but I never saw that e-mail, and I certainly don't see an answer to my
question, or indeed any reference to CropBox, in what Branden quoted in
comment #4.  You confused me, when you omitted an appropriate link in comment
#3; I will look at the reference Branden suggested, when I have time.
> My proposal (D) was never intended to be a .pdfbb replacement (nor .psbb),
remember this ticket concerns problems with .PDFPIC
No, it doesn't ... it relates to an original suggestion, from four years ago,
to extend the functionality of the built-in psbb request, so that it could
return bounding box co-ordinates from single image PDF files, just as it
originally did for just EPS files, and so that an eventual implementation of
PDFPIC could avoid unsafe forks of third party tools, such as pdfinfo.
> If accusing someone on a public forum of holding you up by not answering
your question (when they have helpfully done so), is considered disgusting
language, perhaps we can be a bit less perl!
I accused you of nothing of the sort!  1) You are not holding me up in any
way, because this ticket is of only minor interest to me; I have plenty of
other tasks to occupy my time, and 2) it wasn't your apparent failure to
answer my question, (because I never saw your e-mail), that I considered to be
disgusting ... it is the perl language itself, to which I attribute that
description.

___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #55107] PDFPIC: .psbb: support extraction of MediaBox from pdf files

2021-10-30 Thread Deri James
Follow-up Comment #11, bug #55107 (project groff):


Am I correct in guessing that a bounding box/MediaBox extractor would have a
lot of shared logic for PS and PDF?  If so, one preprocessor could perform
both tasks.  I guess we could call it "grobb" or something, and assign a -B
flag to it in groff(1).  In fact, if we have a preprocessor and claim an
option letter, it's probably best if it's as general-purpose as is
reasonable.


No, unfortunately, there are significant differences. However, my option (D)
does not envisage a .pdfbb, but rather in the manner of tbl which scans the
source looking for .TS/.TE and replacing intervening lines with its own code,
it would look for .PDFPIC and insert code to achieve what PDFPIC does today.
The same is not true for .PSPIC, since we have to retain .psbb since someone
may be using it outside of its use in PSPIC.

___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #55107] PDFPIC: .psbb: support extraction of MediaBox from pdf files

2021-10-30 Thread Deri James
Follow-up Comment #10, bug #55107 (project groff):


Personally, I don't see a justification for implementing psbb as a
preprocessor.  I am willing to pursue an extended lex/yacc implementation,
(subject to Deri actually answering the question I've now asked twice, without
a response: should CropBoxes, or any of PDF's other bounding box attributes,
have precedence over the MediaBox attributes?), but if you insist on pursuing
a solution in Perl ... a disgusting language, IMO ... then I'm out.


If you read the email referenced by Branden in
comment #4
You will see that I have answered your question, I apologise if you had
difficulty understanding my answer regarding CropBox, I'm happy to try and
explain it again, if needed.

My proposal (D) was never intended to be a .pdfbb replacement (nor .psbb),
remember this ticket concerns problems with .PDFPIC so a pre-processor would
simply add extra lines to the source to achieve what .PDFPIC currently does,
in the same idiom that pic and tbl do now.

If accusing someone on a public forum of holding you up by not answering your
question (when they have helpfully done so), is considered disgusting
language, perhaps we can be a bit less perl! :-)



___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #55107] PDFPIC: .psbb: support extraction of MediaBox from pdf files

2021-10-30 Thread Keith Marshall
Follow-up Comment #9, bug #55107 (project groff):

Branden,

[comment #7 comment #7:]
> > but troff then incurs the overhead of setting up an IPC pipeline, to
capture the preprocessor output, then fork the preprocessor,
> 
> groff already handles all of this.
I don't think that it does...
> Some of the most important bits are:
> 
>
https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/groff/groff.cpp#n54
>
https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/groff/groff.cpp#n576
> https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/groff/pipeline.c
> 
> ...or I am badly misunderstanding you.
Consider pre-grohtml; that *isn't* run within groff's normal pipeline; it is
forked, with its own subsidiary pipeline, as and when required.  One of my
earliest contributions to groff was to make that subsidiary pipeline setup,
and the associated fork, MS-Windows compatible, and I would anticipate a
similar overhead, if psbb were to be delegated to a preprocessor.


___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #55107] PDFPIC: .psbb: support extraction of MediaBox from pdf files

2021-10-30 Thread Keith Marshall
Follow-up Comment #8, bug #55107 (project groff):

[comment #5 comment #5:]
> Am I correct in guessing that a bounding box/MediaBox extractor would have a
lot of shared logic for PS and PDF?
No.  I wrote my proposed extractor as a lex/yacc parser; from its initial
state, it diverges into two entirely distinct branches of execution, on the
basis of whether the first few bytes of the image file are '%!PS-Adobe-' or
'%PDF-', (and aborts, if anything else); the two branches converge only at the
bitter end, when the yacc terminal rule (ultimately) assigns the troff
bounding box registers.

FWIW, my EPS parsing code is feature complete.  OTOH, the PDF parsing for PDF
works only for PDF-1.4 conformant files, (it lacks rules for interpretation of
XRefStm objects, Object streams, and deflated content).  The "significant
development", to which I referred, is extend the existing lex pattern set, and
yacc grammar, to support those additional features, (if required).

Personally, I don't see a justification for implementing psbb as a
preprocessor.  I am willing to pursue an extended lex/yacc implementation,
(subject to Deri actually answering the question I've now asked twice, without
a response: should CropBoxes, or any of PDF's other bounding box attributes,
have precedence over the MediaBox attributes?), but if you insist on pursuing
a solution in Perl ... a disgusting language, IMO ... then I'm out.

___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #55107] PDFPIC: .psbb: support extraction of MediaBox from pdf files

2021-10-30 Thread G. Branden Robinson
Follow-up Comment #7, bug #55107 (project groff):

Hi, Keith,

[comment #6 comment #6:]

> Sorry, I don't know; the choice was made, w.r.t. EPS files, before my
association with the project began.  I can imagine, however, it may be a
matter of convenience, because by having the psbb parsing code accessible as
linked in functions, the assignment to troff's internal registers is
straightforward,

I suppose, but putting


.nr llx 1234u
.nr lly 5678u
.nr urx 6789u
.nr ury 7890u


onto the output stream isn't too hard, either.  (Though while I'm here I'll
note the unnecessary intrusion into the user's name space.  There's no reason
these couldn't have been called groff*bbox*llx and similar, for instance.  If
the user desires short names for these, the `aln` request is at hand.  Beyond
that, isn't PSPIC pretty much the _only_ invoker of the `psbb` request in the
first place?)

> whereas, if the parsing is delegated to a preprocessor, not only does that
preprocessor have to implement the same parsing logic,

I'm thinking more of "moving" than "reimplmenting" here, though I admit if my
other spitball idea of writing the thing in Perl would indeed make it a
question of reimplementation.

> but troff then incurs the overhead of setting up an IPC pipeline, to capture
the preprocessor output, then fork the preprocessor,

groff already handles all of this.  Some of the most important bits are:

https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/groff/groff.cpp#n54
https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/groff/groff.cpp#n576
https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/groff/pipeline.c

...or I am badly misunderstanding you.

> and ultimately, reinterpret the preprocessor output to assign the requisite
register values.

The contract of writing something called a groff preprocessor is that you will
emit output that GNU troff can parse.

Every time I've looked at the psbb code in GNU troff

I've become uneasy.  It _feels_ strongly like something that should have a
higher barrier of interfacing around it.  It's about 5% of the source lines of
the file, which contains the implementations of about 100 requests _and_ the
troff program's command-line option parsing and diagnostic handling to boot. 
The psbb part of the file defines its own bespoke classes, and and most of the
psbb-related functions call only each other or standard C library functions
(like sscanf, strlen, and fseek), not other troff functions.

I wonder if anyone else feels a similar unease.

___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #55107] PDFPIC: .psbb: support extraction of MediaBox from pdf files

2021-10-30 Thread Keith Marshall
Follow-up Comment #6, bug #55107 (project groff):

Branden,

[comment #2 comment #2:]
> I have [an] historical question.
> 
> Does anyone know why `psbb` wasn't made a preprocessor in the first place? 
...
Sorry, I don't know; the choice was made, w.r.t. EPS files, before my
association with the project began.  I can imagine, however, it may be a
matter of convenience, because by having the psbb parsing code accessible as
linked in functions, the assignment to troff's internal registers is
straightforward, whereas, if the parsing is delegated to a preprocessor, not
only does that preprocessor have to implement the same parsing logic, but
troff then incurs the overhead of setting up an IPC pipeline, to capture the
preprocessor output, then fork the preprocessor, and ultimately, reinterpret
the preprocessor output to assign the requisite register values.

___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #55107] PDFPIC: .psbb: support extraction of MediaBox from pdf files

2021-10-30 Thread G. Branden Robinson
Follow-up Comment #5, bug #55107 (project groff):

I'll approximate the conclusion of Deri's mail of 13 October.

[Keith Marshall wrote:]
>> some further (non-trivial) development effort will be required, to support
concealment of trailer dictionaries and cross reference tables within /XRefStm
objects.

[Deri James wrote:]
> There are several options which would address this problem, i.e. non
portability of grep and desirability of avoiding groff unsafe mode.

> A) Replace grep with sed/awk (still requires unsafe mode).

> B) Use psbb (requires "non-trivial development").

> C) Use pdfbb (requires hook in input.cpp to call pdfbb and return results).

> D) Convert pdfbb to be a pre-gropdf (i.e. a preprocessor like pre-grohtml)
which would look for .PDFPIC and replace with the appropriate calls to \X'pdf:
pdfpic’ and add vertical space with .sp.

> (A) is obviously the easiest and quickest, (C) and (D) are not too much
work, since the parser required is already in use.

Okay, it's Branden again.  My inclination is (A) to get a short-run fix in
place to get the splinter out of users' paws no matter when groff 1.23.0 gets
released, and (D) for the longer run.

I would emphasize, lest the point be overlooked, that a preprocessor's
interface to the rest of a troff system is a file stream.  Therefore we can
write them in Perl, the shell, or yet another language if necessary.  Perl
seems the most likely alternative since we already have a Perl v5.6.1
dependency, and none on any other popular scripting language except for the
shell, which is a pretty gross language to write a parser in (though I admit
I've done it).

Am I correct in guessing that a bounding box/MediaBox extractor would have a
lot of shared logic for PS and PDF?  If so, one preprocessor could perform
both tasks.  I guess we could call it "grobb" or something, and assign a -B
flag to it in groff(1).  In fact, if we have a preprocessor and claim an
option letter, it's probably best if it's as general-purpose as is reasonable.

___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #55107] PDFPIC: .psbb: support extraction of MediaBox from pdf files

2021-10-30 Thread G. Branden Robinson
Follow-up Comment #4, bug #55107 (project groff):


[comment #3 comment #3:]
> This was my option (D) in this email (for PDFPIC, but also applies to
.psbb):-
> 
> https://savannah.gnu.org/bugs/?55107
> 

I think the link intended was:

https://lists.gnu.org/archive/html/groff/2021-10/msg00044.html

...and thank you--that is indeed a helpful summary.  In fact, why don't I just
quote it for the sake of this bug's context?

But I'll do that in the next comment, I see Heinz isn't in the CC list for
this ticket, so I'll add him first.



___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #55107] PDFPIC: .psbb: support extraction of MediaBox from pdf files

2021-10-30 Thread Deri James
Follow-up Comment #3, bug #55107 (project groff):

This was my option (D) in this email (for PDFPIC, but also applies to
.psbb):-

https://savannah.gnu.org/bugs/?55107


___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #55107] PDFPIC: .psbb: support extraction of MediaBox from pdf files

2021-10-27 Thread G. Branden Robinson
Update of bug #55107 (project groff):

  Status:None => Need Info  

___

Follow-up Comment #2:

I have a historical question.

Does anyone know why `psbb` wasn't made a preprocessor in the first place?  I
have an imperfect command of the issues, but it seems like a better fit.  A
preprocessor can run whatever external commands it needs, any images to be
included have to exist by the time troff(1) itself runs or .psbb wouldn't have
anything to work on anyway, and pushing this work to a preprocessor avoids the
need for .sy requests and unsafe mode.

___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #55107] PDFPIC: .psbb: support extraction of MediaBox from pdf files

2021-10-12 Thread Keith Marshall
Follow-up Comment #1, bug #55107 (project groff):

In this mailing-list message
[1], Deri
 offered two PDF files, namely
Picture.pdf
[2] and
croptest.pdf
[3],
from which the original prototype code
[4],
as referenced on this ticket, is unable to extract any valid MediaBox
specification.

In this follow-up message
[5], I
explained that the failure to extract the MediaBox from Picture.pdf was caused
by an omission from the groff-psbb lexer's pattern matching rules for the PDF
dictionary scanning state, resulting in mishandling of nested dictionaries;
this is readily resolved by the [file #52093 attached patch][6].

OTOH, croptest.pdf uses new PDF (post PDF-1.5) features, and lacks any trailer
dictionary, or free-standing cross reference table, (both of which are
_required_ by the current groff-psbb prototype implementation); to support
these new PDF features, substantial additions to the current implementation
will be required.

[1]: https://lists.nongnu.org/archive/html/groff/2021-09/msg00064.html
[2]: https://lists.nongnu.org/archive/html/groff/2021-09/pdf7tyGN4NLTE.pdf
[3]: https://lists.nongnu.org/archive/html/groff/2021-09/pdfBjudbNbwI2.pdf
[4]:
https://osdn.net/users/keith/pf/groff-psbb/scm/tree/e25e11c6770a3d7a2e98cbcfce66dbffd7d8b5a0/
[5]: https://lists.nongnu.org/archive/html/groff/2021-10/msg00043.html
[6]: [file #52093 patch file #52093]

___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #55107] PDFPIC: .psbb: support extraction of MediaBox from pdf files

2021-10-12 Thread Keith Marshall
Additional Item Attachment, bug #55107 (project groff):

File name: nested-dictionary.patchSize:0 KB




___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #55107] PDFPIC: .psbb: support extraction of MediaBox from pdf files

2018-11-26 Thread Bertrand Garrigues
Update of bug #55107 (project groff):

 Planned Release:None => 1.22.5 


___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/


___
bug-groff mailing list
bug-groff@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-groff


[bug #55107] PDFPIC: .psbb: support extraction of MediaBox from pdf files

2018-11-26 Thread Bertrand Garrigues
URL:
  

 Summary: PDFPIC: .psbb: support extraction of MediaBox from
pdf files
 Project: GNU troff
Submitted by: bgarrigues
Submitted on: Mon 26 Nov 2018 11:14:48 PM UTC
Category: Core
Severity: 3 - Normal
  Item Group: New feature
  Status: None
 Privacy: Public
 Assigned to: None
 Open/Closed: Open
 Discussion Lock: Any
 Planned Release: None

___

Details:

See the discussion on PDFPIC:

https://lists.gnu.org/archive/html/groff/2018-08/msg00080.html
https://lists.gnu.org/archive/html/groff/2017-10/msg00017.html

and the prototype proposed by Keith:

https://osdn.net/users/keith/pf/groff-psbb/wiki/FrontPage






___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/


___
bug-groff mailing list
bug-groff@gnu.org
https://lists.gnu.org/mailman/listinfo/bug-groff