[bug #55107] PDFPIC: .psbb: support extraction of MediaBox from pdf files

2021-10-30 Thread Deri James
Follow-up Comment #3, bug #55107 (project groff):

This was my option (D) in this email (for PDFPIC, but also applies to
.psbb):-

https://savannah.gnu.org/bugs/?55107


___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #55107] PDFPIC: .psbb: support extraction of MediaBox from pdf files

2021-10-30 Thread G. Branden Robinson
Follow-up Comment #4, bug #55107 (project groff):


[comment #3 comment #3:]
> This was my option (D) in this email (for PDFPIC, but also applies to
.psbb):-
> 
> https://savannah.gnu.org/bugs/?55107
> 

I think the link intended was:

https://lists.gnu.org/archive/html/groff/2021-10/msg00044.html

...and thank you--that is indeed a helpful summary.  In fact, why don't I just
quote it for the sake of this bug's context?

But I'll do that in the next comment, I see Heinz isn't in the CC list for
this ticket, so I'll add him first.



___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #55107] PDFPIC: .psbb: support extraction of MediaBox from pdf files

2021-10-30 Thread G. Branden Robinson
Follow-up Comment #5, bug #55107 (project groff):

I'll approximate the conclusion of Deri's mail of 13 October.

[Keith Marshall wrote:]
>> some further (non-trivial) development effort will be required, to support
concealment of trailer dictionaries and cross reference tables within /XRefStm
objects.

[Deri James wrote:]
> There are several options which would address this problem, i.e. non
portability of grep and desirability of avoiding groff unsafe mode.

> A) Replace grep with sed/awk (still requires unsafe mode).

> B) Use psbb (requires "non-trivial development").

> C) Use pdfbb (requires hook in input.cpp to call pdfbb and return results).

> D) Convert pdfbb to be a pre-gropdf (i.e. a preprocessor like pre-grohtml)
which would look for .PDFPIC and replace with the appropriate calls to \X'pdf:
pdfpic’ and add vertical space with .sp.

> (A) is obviously the easiest and quickest, (C) and (D) are not too much
work, since the parser required is already in use.

Okay, it's Branden again.  My inclination is (A) to get a short-run fix in
place to get the splinter out of users' paws no matter when groff 1.23.0 gets
released, and (D) for the longer run.

I would emphasize, lest the point be overlooked, that a preprocessor's
interface to the rest of a troff system is a file stream.  Therefore we can
write them in Perl, the shell, or yet another language if necessary.  Perl
seems the most likely alternative since we already have a Perl v5.6.1
dependency, and none on any other popular scripting language except for the
shell, which is a pretty gross language to write a parser in (though I admit
I've done it).

Am I correct in guessing that a bounding box/MediaBox extractor would have a
lot of shared logic for PS and PDF?  If so, one preprocessor could perform
both tasks.  I guess we could call it "grobb" or something, and assign a -B
flag to it in groff(1).  In fact, if we have a preprocessor and claim an
option letter, it's probably best if it's as general-purpose as is reasonable.

___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #55107] PDFPIC: .psbb: support extraction of MediaBox from pdf files

2021-10-30 Thread Keith Marshall
Follow-up Comment #6, bug #55107 (project groff):

Branden,

[comment #2 comment #2:]
> I have [an] historical question.
> 
> Does anyone know why `psbb` wasn't made a preprocessor in the first place? 
...
Sorry, I don't know; the choice was made, w.r.t. EPS files, before my
association with the project began.  I can imagine, however, it may be a
matter of convenience, because by having the psbb parsing code accessible as
linked in functions, the assignment to troff's internal registers is
straightforward, whereas, if the parsing is delegated to a preprocessor, not
only does that preprocessor have to implement the same parsing logic, but
troff then incurs the overhead of setting up an IPC pipeline, to capture the
preprocessor output, then fork the preprocessor, and ultimately, reinterpret
the preprocessor output to assign the requisite register values.

___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #55107] PDFPIC: .psbb: support extraction of MediaBox from pdf files

2021-10-30 Thread G. Branden Robinson
Follow-up Comment #7, bug #55107 (project groff):

Hi, Keith,

[comment #6 comment #6:]

> Sorry, I don't know; the choice was made, w.r.t. EPS files, before my
association with the project began.  I can imagine, however, it may be a
matter of convenience, because by having the psbb parsing code accessible as
linked in functions, the assignment to troff's internal registers is
straightforward,

I suppose, but putting


.nr llx 1234u
.nr lly 5678u
.nr urx 6789u
.nr ury 7890u


onto the output stream isn't too hard, either.  (Though while I'm here I'll
note the unnecessary intrusion into the user's name space.  There's no reason
these couldn't have been called groff*bbox*llx and similar, for instance.  If
the user desires short names for these, the `aln` request is at hand.  Beyond
that, isn't PSPIC pretty much the _only_ invoker of the `psbb` request in the
first place?)

> whereas, if the parsing is delegated to a preprocessor, not only does that
preprocessor have to implement the same parsing logic,

I'm thinking more of "moving" than "reimplmenting" here, though I admit if my
other spitball idea of writing the thing in Perl would indeed make it a
question of reimplementation.

> but troff then incurs the overhead of setting up an IPC pipeline, to capture
the preprocessor output, then fork the preprocessor,

groff already handles all of this.  Some of the most important bits are:

https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/groff/groff.cpp#n54
https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/groff/groff.cpp#n576
https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/groff/pipeline.c

...or I am badly misunderstanding you.

> and ultimately, reinterpret the preprocessor output to assign the requisite
register values.

The contract of writing something called a groff preprocessor is that you will
emit output that GNU troff can parse.

Every time I've looked at the psbb code in GNU troff

I've become uneasy.  It _feels_ strongly like something that should have a
higher barrier of interfacing around it.  It's about 5% of the source lines of
the file, which contains the implementations of about 100 requests _and_ the
troff program's command-line option parsing and diagnostic handling to boot. 
The psbb part of the file defines its own bespoke classes, and and most of the
psbb-related functions call only each other or standard C library functions
(like sscanf, strlen, and fseek), not other troff functions.

I wonder if anyone else feels a similar unease.

___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #55107] PDFPIC: .psbb: support extraction of MediaBox from pdf files

2021-10-30 Thread Keith Marshall
Follow-up Comment #8, bug #55107 (project groff):

[comment #5 comment #5:]
> Am I correct in guessing that a bounding box/MediaBox extractor would have a
lot of shared logic for PS and PDF?
No.  I wrote my proposed extractor as a lex/yacc parser; from its initial
state, it diverges into two entirely distinct branches of execution, on the
basis of whether the first few bytes of the image file are '%!PS-Adobe-' or
'%PDF-', (and aborts, if anything else); the two branches converge only at the
bitter end, when the yacc terminal rule (ultimately) assigns the troff
bounding box registers.

FWIW, my EPS parsing code is feature complete.  OTOH, the PDF parsing for PDF
works only for PDF-1.4 conformant files, (it lacks rules for interpretation of
XRefStm objects, Object streams, and deflated content).  The "significant
development", to which I referred, is extend the existing lex pattern set, and
yacc grammar, to support those additional features, (if required).

Personally, I don't see a justification for implementing psbb as a
preprocessor.  I am willing to pursue an extended lex/yacc implementation,
(subject to Deri actually answering the question I've now asked twice, without
a response: should CropBoxes, or any of PDF's other bounding box attributes,
have precedence over the MediaBox attributes?), but if you insist on pursuing
a solution in Perl ... a disgusting language, IMO ... then I'm out.

___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #55107] PDFPIC: .psbb: support extraction of MediaBox from pdf files

2021-10-30 Thread Keith Marshall
Follow-up Comment #9, bug #55107 (project groff):

Branden,

[comment #7 comment #7:]
> > but troff then incurs the overhead of setting up an IPC pipeline, to
capture the preprocessor output, then fork the preprocessor,
> 
> groff already handles all of this.
I don't think that it does...
> Some of the most important bits are:
> 
>
https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/groff/groff.cpp#n54
>
https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/groff/groff.cpp#n576
> https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/groff/pipeline.c
> 
> ...or I am badly misunderstanding you.
Consider pre-grohtml; that *isn't* run within groff's normal pipeline; it is
forked, with its own subsidiary pipeline, as and when required.  One of my
earliest contributions to groff was to make that subsidiary pipeline setup,
and the associated fork, MS-Windows compatible, and I would anticipate a
similar overhead, if psbb were to be delegated to a preprocessor.


___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #55107] PDFPIC: .psbb: support extraction of MediaBox from pdf files

2021-10-30 Thread Deri James
Follow-up Comment #10, bug #55107 (project groff):


Personally, I don't see a justification for implementing psbb as a
preprocessor.  I am willing to pursue an extended lex/yacc implementation,
(subject to Deri actually answering the question I've now asked twice, without
a response: should CropBoxes, or any of PDF's other bounding box attributes,
have precedence over the MediaBox attributes?), but if you insist on pursuing
a solution in Perl ... a disgusting language, IMO ... then I'm out.


If you read the email referenced by Branden in
comment #4
You will see that I have answered your question, I apologise if you had
difficulty understanding my answer regarding CropBox, I'm happy to try and
explain it again, if needed.

My proposal (D) was never intended to be a .pdfbb replacement (nor .psbb),
remember this ticket concerns problems with .PDFPIC so a pre-processor would
simply add extra lines to the source to achieve what .PDFPIC currently does,
in the same idiom that pic and tbl do now.

If accusing someone on a public forum of holding you up by not answering your
question (when they have helpfully done so), is considered disgusting
language, perhaps we can be a bit less perl! :-)



___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #55107] PDFPIC: .psbb: support extraction of MediaBox from pdf files

2021-10-30 Thread Deri James
Follow-up Comment #11, bug #55107 (project groff):


Am I correct in guessing that a bounding box/MediaBox extractor would have a
lot of shared logic for PS and PDF?  If so, one preprocessor could perform
both tasks.  I guess we could call it "grobb" or something, and assign a -B
flag to it in groff(1).  In fact, if we have a preprocessor and claim an
option letter, it's probably best if it's as general-purpose as is
reasonable.


No, unfortunately, there are significant differences. However, my option (D)
does not envisage a .pdfbb, but rather in the manner of tbl which scans the
source looking for .TS/.TE and replacing intervening lines with its own code,
it would look for .PDFPIC and insert code to achieve what PDFPIC does today.
The same is not true for .PSPIC, since we have to retain .psbb since someone
may be using it outside of its use in PSPIC.

___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #55107] PDFPIC: .psbb: support extraction of MediaBox from pdf files

2021-10-30 Thread Keith Marshall
Follow-up Comment #12, bug #55107 (project groff):

Deri,

[comment #10 comment #10:]
> If you read the email referenced by Branden in comment #4
> You will see that I have answered your question, I apologise if
> you had difficulty understanding my answer regarding CropBox,
> I'm happy to try and explain it again, if needed.
Sorry, but I never saw that e-mail, and I certainly don't see an answer to my
question, or indeed any reference to CropBox, in what Branden quoted in
comment #4.  You confused me, when you omitted an appropriate link in comment
#3; I will look at the reference Branden suggested, when I have time.
> My proposal (D) was never intended to be a .pdfbb replacement (nor .psbb),
remember this ticket concerns problems with .PDFPIC
No, it doesn't ... it relates to an original suggestion, from four years ago,
to extend the functionality of the built-in psbb request, so that it could
return bounding box co-ordinates from single image PDF files, just as it
originally did for just EPS files, and so that an eventual implementation of
PDFPIC could avoid unsafe forks of third party tools, such as pdfinfo.
> If accusing someone on a public forum of holding you up by not answering
your question (when they have helpfully done so), is considered disgusting
language, perhaps we can be a bit less perl!
I accused you of nothing of the sort!  1) You are not holding me up in any
way, because this ticket is of only minor interest to me; I have plenty of
other tasks to occupy my time, and 2) it wasn't your apparent failure to
answer my question, (because I never saw your e-mail), that I considered to be
disgusting ... it is the perl language itself, to which I attribute that
description.

___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #61379] [tmac] Detect use of preprocessor "tbl" and "table wider than line"

2021-10-30 Thread Bjarni Ingi Gislason
Follow-up Comment #3, bug #61379 (project groff):

  The subject "Detect ... "table wider than line" is misleading.

  "tbl" already reports that, but gives no information about the line
length and width of the table.

  I did not see any simple way to add that information to "tbl",
so I added it to the patch.


___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #61401] [grohtml] doesn't remap \- in device escapes

2021-10-30 Thread G. Branden Robinson
Update of bug #61401 (project groff):

  Status: In Progress => Fixed  
 Open/Closed:Open => Closed 
 Planned Release:None => 1.23.0 

___

Follow-up Comment #2:


commit eb695ab2b5e2bae54afa102355c493bda6e29d3e
Author: G. Branden Robinson 
Date:   Sat Oct 30 15:45:29 2021 +1100

[troff]: Fix Savannah #61401.

[troff]: Handle special character escape sequences that map to basic
Latin glyphs in device control escape sequences consistently among
output devices.

* src/roff/troff/input.cpp (encode_char): Rearrange conditionals.  This
  is the logic that puts the "whatever" within a \X'whatever' escape
  sequence into GNU troff's intermediate output.  Handle stretchable and
  unstretchable space escape sequences ("\ " and \~") first.  Then, if
  the token is a special character escape sequence, retrieve its
  "contents" (glyph name).  Move the basic Latin mapping for the seven
  glyph names '-', 'aq', 'dq', 'ga', 'ha', 'rs', and 'ti' here, before
  checking whether the device description issued the
  'use_charnames_in_special' directive.  This way, the 'html' and
  'xhtml' output devices can straightforwardly embed these basic Latin
  characters in device control escapes (notably, "html:", for which the
  present convention is to follow the this tag immediately with a
  literal HTML URI, complete with `` element syntax).  If the
  special character is none of these and we should
  'use_charnames_in_special', proceed as groff 1.22.4 and earlier did.
  This is a behavior change, as was my addition of this translation
  mechanism in the first place, so...

* doc/groff.texi (Postprocessor Access): Document it.

* src/roff/groff/tests/device_control_escapes_express_basic_latin.sh:
  Test it.
* src/roff/groff/groff.am (groff_TESTS): Run test.

Fixes .


___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #61402] [man] handles degenerate input poorly

2021-10-30 Thread G. Branden Robinson
Update of bug #61402 (project groff):

  Status: In Progress => Fixed  
 Open/Closed:Open => Closed 
 Planned Release:None => 1.23.0 

___

Follow-up Comment #1:


commit 2a9135c7146e121a07a8c63a9f74dfc60e04f98d
Author: G. Branden Robinson 
Date:   Sat Oct 30 14:02:11 2021 +1100

[man]: Handle degenerate input quietly.

* tmac/an.tmac (TH): Define new register `an-TH-was-called`.

  (an-end): Return immediately if that register is not defined; to
  format the default page footer we must have the information declared
  in a valid `TH` call.  (`TH` also initializes the type size and
  baseline spacing registers we use to prepare the page footer
  environment.)  If the register _is_ defined, remove it just prior to
  the end of this macro definition, in preparation for next page to be
  rendered.

* tmac/tests/an_handle-degenerate-input-quietly.sh: Test it.

Fixes , a regression from groff
1.22.4 (problem introduced by me in the course of many changes to trap
management and header/footer handling to work nicely in batch rendering
with -mandoc and mdoc(7) documents).


___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #61403] [troff]: interpolates ^ for \[ti] in device control escapes

2021-10-30 Thread G. Branden Robinson
Update of bug #61403 (project groff):

  Status: In Progress => Fixed  
 Open/Closed:Open => Closed 
 Planned Release:None => 1.23.0 

___

Follow-up Comment #1:


commit 3d1988cabc90f3c4b0bbb4a809be61eeba3c
Author: G. Branden Robinson 
Date:   Sat Oct 30 15:48:42 2021 +1100

[troff]: Map \[ti] correctly in \X escapes.

[troff]: Map \[ti] correctly in device control escape sequences.

* src/roff/troff/input.cpp (encode_char): Fix copy-and-paste error.
  \[ti] should put '~', not '^', into a device control command.

Fixes ; problem introduced
by me in commit 9d61b3d1, 1 October.


___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #55107] PDFPIC: .psbb: support extraction of MediaBox from pdf files

2021-10-30 Thread Deri James
Follow-up Comment #13, bug #55107 (project groff):

Keith,

Sorry, you will have to help me here, if you want me to understand what you
meant.

> I am willing to pursue an extended lex/yacc implementation, (subject to Deri
actually answering the question I've now asked twice, without a response...

This implies to me that your pursuit of an extended lex/yacc implementation
which copes with pdf 1.5 extensions to the standard is subject to me answering
your question, i.e. it cannot be done until you receive the answer. This is
why it looks like you were claiming I was holding you up.

It is progress that you realised I was pointing you to Brendan's reference in
comment #4, not my silly mistake in the previous comment, I hope you have time
to read my answer to your cropbox question.

As to whether this ticket refers to PDFPIC or not: I think the answer is in
the title of the ticket and also the fact that you textually linked bug #61324
to this ticket in your email:-

https://lists.gnu.org/archive/html/groff/2021-09/msg00057.html

I.E. That this original ticket would solve the problems with using pdfinfo
with PDFPIC, the email chain is even referenced in bug #61324.

As to what upsets me is the impression you gave that I had been unhelpful in
not answering your question, whereas it was lack of dilligence in not reading
an email which actually starts with "Hi Keith". I do try to be a helpful chap
because I have had so much help from so many people.

Cheers 

Deri


___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #61371] [PATCH] [tbl]: tbl.1.man: use neutral wording "decimal separator" in place of English specific "decimal point" and "dot"

2021-10-30 Thread Bjarni Ingi Gislason
Follow-up Comment #4, bug #61371 (project groff):


diff -dpru --unidirectional-new-file groff/src/preproc/tbl/tbl.1.man
new-groff/src/preproc/tbl/tbl.1.man
--- groff/src/preproc/tbl/tbl.1.man 2021-10-31 00:03:10.0 +
+++ new-groff/src/preproc/tbl/tbl.1.man 2021-10-31 00:10:16.0 +
@@ -157,7 +157,7 @@ Each cell in a column is classified by b
 centered,
 left-aligned,
 numeric
-(aligned to a decimal point),
+(aligned to a decimal separator),
 and so on.
 .
 This specification can have several lines,
@@ -214,7 +214,7 @@ extension).
 .
 .TP
 .BI decimalpoint( c )
-Set the character to be recognized as the decimal point in numeric
+Set the character to be recognized as the decimal separator in numeric
 columns
 (GNU
 .I tbl \" exception
@@ -417,10 +417,11 @@ Numerically justify item in the column;
 that is,
 align columns of numbers vertically at the units place.
 .
-If there are one or more dots adjacent to a digit,
+If there are one or more decimal separators adjacent to a digit,
 use the rightmost one for vertical alignment.
 .
-If there is no dot, use the rightmost digit for vertical alignment;
+If there is no decimal separator,
+use the rightmost digit for vertical alignment;
 otherwise, center the item within the column.
 .
 Alignment can be forced to a certain position using \[oq]\[rs]&\[cq];
@@ -1141,16 +1142,16 @@ sees the input earlier than
 .IR \%@g@troff .
 .
 For example,
-number formatting with vertically aligned decimal points fails if those
-numbers are passed on as macro parameters because decimal point
-alignment is handled by
+number formatting with vertically aligned decimal separators fails
+if those numbers are passed on as macro parameters
+because decimal separator alignment is handled by
 .I \%@g@tbl
 itself:
 it only sees
 .BR \[rs]$1 ,
 .BR \[rs]$2 ,
 etc.,
-and therefore can't recognize the decimal point.
+and therefore can't recognize the decimal separator.
 .
 .
 .\" 


___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/




[bug #61371] [PATCH] [tbl]: tbl.1.man: use neutral wording "decimal separator" in place of English specific "decimal point" and "dot"

2021-10-30 Thread Dave
Update of bug #61371 (project groff):

  Status:   Need Info => None   

___

Follow-up Comment #5:

Patch updated.

___

Reply to this item at:

  

___
  Message sent via Savannah
  https://savannah.gnu.org/