Re: EPUB 3.3 spec conformity issues

2024-08-21 Thread Per Bothner

As I understand the situation (not having read the entire discussion carefully):

(1) It is very desirable for texinfo-generated epub to pass epubcheck.
It appears there are only a few minor changes needed to do so,
which increases the priority/desirabilty.

(2) Epubcheck complains about the border="0" attributes in table elements
used in the index. Changing this to border="" or border="1" would satisfy
epubcheck, but this changes the appearance of the index negatively.

(3) I agree that changing the longstanding appearance of the index
would be an unacceptable regression, to many well-estabished manuals and users.

(4) I think the correct fix is to stop using tables for format the index, and 
use
more-appropriate elements along with CSS instead. I think most of us agree.

(5) However, this may be a non-trivial non-local change. It seems unwise to
do this in a hurry or close to a release.

(6) Passing epubcheck is very desirable, but so is not breaking the appearance
of indexes. The latter would be a non-trivial regression. As a general (but 
universal)
rule, not breaking something that currently works takes precedence over fixing
something that has never worked in the past.

(7) Re-writing indexes to use CSS instead of tables should be a high priority -
but most of us are unpaid volunteers. Unless you (Daniel) are willing to do
that (and work non-antagonistically with the texinfo maintainers), you cannot
demand or expect someone else will do the work. It will happen when it happens.
Hopefully soon.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: implementation language [was: library for unicode collation in C for texi2any?]

2023-10-16 Thread Per Bothner




On 10/16/23 05:36, Gavin Smith wrote:

On Sun, Oct 15, 2023 at 12:00:51PM -0700, Per Bothner wrote:

I'm far from a C++ expert these days, but some ideas:

* First of course you can define some helper methods:

class TargetElement {
   Extra *extra;
   Command *unit_command() { return extra ? extra->init_command() : nullptr; }
}


I don't think you can implement this language feature with helper
methods, at least not like this.  Consider the chain of access
a->maybe_b()->maybe_c()->d().  If maybe_b() returns null then the ->maybe_c()
call will be an error

I just meant for the more common cases it might be reasonable to add
some helper methods. It's a case-by-case approach, not a general one.

For example if a->maybe_b()->maybe_c()->d() occurs multiple times
it might make sense to add a maybe_d method to A's class.


It's still more verbose.
...
This would require a lot of extra class definitions and doesn't seem that
easy to read.


Regardless, C++ has more tools than C does to deal with with these issues.
Which is my point: if you're using C, you might as well use a C++ compiler.
Then you can decide which C++ features to use, as you go along.
No need to aim for full-blown idiomatic C++. C with some C++ features is OK too.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: implementation language [was: library for unicode collation in C for texi2any?]

2023-10-15 Thread Per Bothner

On 10/15/23 05:41, Gavin Smith wrote:

texi2any is full of code like this:

  if ($target_element->{'extra'}
   and $target_element->{'extra'}->{'unit_command'}) {
 if ($target_element->{'extra'}->{'unit_command'}->{'cmdname'} eq 
'node') {
   $command = $target_element->{'extra'}->{'unit_command'};
 } elsif ($target_element->{'extra'}->{'unit_command'}->{'extra'}
  and $target_element->{'extra'}->{'unit_command'}
   ->{'extra'}->{'associated_node'}) {
   $command = $target_element->{'extra'}->{'unit_command'}
   ->{'extra'}->{'associated_node'};
 }
   }


I'm far from a C++ expert these days, but some ideas:

* First of course you can define some helper methods:

class TargetElement {
  Extra *extra;
  Command *unit_command() { return extra ? extra->init_command() : nullptr; }
}

* Declarations in 'if' statement:

if (auto unit_command = target_elememt->unit_command()) {
if (unit_command->cmdname() == "node") ...
}

* Some kind of smart pointer may be useful.
However, I don't really have a good handle on smart pointer,
and I don't know if this further than you want to go in terms of C++-isms.
I can do some research, though.

* Perhaps use a subclass for the "extra" fields:

class TargetElementWithExtra :: TargetElement {
Command *unit_command;
}

if (auto te = dynamic_cast(target_element)) {
// te is target_element safely cast to TargetElementWithExtra*.
}
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: implementation language [was: library for unicode collation in C for texi2any?]

2023-10-14 Thread Per Bothner

On 10/14/23 09:12, Patrice Dumas wrote:

On Thu, Oct 12, 2023 at 10:25:23AM -0700, Per Bothner wrote:

C++ has a more extensive and useful standard library than C.


I guess there is an hash map, but I am not sure that we would need much
more.


In addition to hash maps and maybe some other container classes,
I suggest using C++ std::string as being safer and more convenient that C 
strings.


I am not a good judge, but it is unclear to me that the rewriting in
perl was a mistake, it allowed to have the current design, which I
believe is much better than the design of makeinfo in C had.  It is
easier to redesign in a high level language and then translate the parts
than need to be sped up to a low level language than do everything in a
low level language.


I think C++ is about as high-level as Perl, and I think you can
write programs that as similarly concise and legible.
My hunch is that tp/Texinfo written in C++ would be at most a small amount (20% 
?)
more verbose, and to many (including myself) it would be much more readable.

Regardless, the one doing the work gets to choose their tools.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: implementation language [was: library for unicode collation in C for texi2any?]

2023-10-14 Thread Per Bothner

On 10/14/23 05:36, Gavin Smith wrote:

many might have said that Perl was a natural fit for text processing.


It probably was and is - if performance isn't a high priority.
I notice in the texinfo source how much string concatenation is going on
- and it's hard to implement that efficiently.  You need a some kind
of buffer you can append to, and only re-allocate when it is full.
(JavaScript has the same problem. Java, whose Strings are also immutable
is better because it has the StringBuffer class.)

It was reasonable to expect the cost of interpreter overhead and overly-general
data structures would be modest. Perhaps if Perl has received the same
resources as JavaScript or Java it would have been.


Could we not get those benefits by writing in Go instead?  (Not that I'm
proposing this).


All I am saying is: If you're looking into re-writing some code into C anyway,
why not allow yourself some of the benefits of C++?

I'm not saying C++ is the best language in the abstract. It does have the
virtues of being well-established, with an originator who is still active,
supported by an active standardization organization, with at least two Free
implementations, generating efficient code, and with mostly well-designed
and efficient libraries.

Also, it is mostly a superset of C, a language you are familiar with
and favorable to using.


I have a small amount of experience in C++ with the Qt library
(see https://github.com/GavinSmith0123/OpenBoard/commits/smart-import) and
downsides I have found include long compilation times,


I suspect the long compliation times are because C++ files tend to include
many and large header files. Qt applicataion are probably more likely to
supprt from these. You could potentially have the same problem with C.


long error messages in the case of templates,


That is a concern. It is one motivations for "concepts" - not for application
programmers, but for sophistcated library develpers to provide more
meaningful error messages.

However, no reason to use templates much, except perhaps some container
classes (such as hash tables) from the standard library


Adding a function to a class forces recompilation of a large part of the
program due to header files changing.


Recommended practice in C is for all non-local functions to be declared
in a header file.  So you're going to have the same problem.


Editing C++ code is harder with a standard text editor and you are
pushed to use IDEs to do things like updating header files and getting
autocompletion.


I use plain Emacs. I'm not using the new "emacs as an IDE" packages that
are now available. It's hard for me to learn too many new tricks at a time.
Perhaps when all of this new-fangled stuff settles down I might try to learn it.


I agree RAII is definitely a win although I doubt that there is such a
thing as a "simple" subset of C++.  You list class inheritance as part of
this although it is only occasionally useful, as far as I understand.


For texi2any I can imagine using different classes for different output formats
might make sense.  You might also have a hierarchy for texinfo commands.


It's not necessary to have the possibility for polymorphism for every single
function call or operator that occurs in a program.


Of course not. In C++ function are by default monomorphic (non-virtual).
In my domterm backend the keyword 'virtual' is not used at all.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: implementation language [was: library for unicode collation in C for texi2any?]

2023-10-14 Thread Per Bothner

I'm not necessarily urging making use of any particular features of C++.
Just start small: use a C++ compiler and C++ file extensions.
(There is an unfortunate lack for standardization of extensions for C++
files. Gcc uses .cc - which seems as good as anything else.)

I do suggest when possible using std::string rather than C strings.
(There is pretty easy and convenient inter-operability between them.)

Personal history/anecdote:

In an old life, working for Cygnus, I wrote a lot of C++ and was active in the
C++ community. Then for many years I basically wrote no C++ and very little C.
It was mostly Java, and Scheme. I started exploring using JavaScript to 
implement
ideas for "next-generation" iterminals using JavaScript, This became DomTerm.

In 2017 I added a C-based backend to DomTerm, based on an existing C application
(ttyd) and a C networking library (libwebsockets).  That worked pretty well,
but in 2020 I converted the backend to C++. This has been a gradual conversion:
Some structs have become classes, and I've converted some functions to methods
(member functions).  I switched to using a C++ library for JSON. I use
templates for a lookup table that is used for 3 different value types.
As I change a section of code, I may replace C strings with std::string.

And so on. It still looks more like C than C++ in many places. There is no
inheritance and no virtual methods.  (I still use a table containing function
pointers in place of virtual methods.)  If I had started with C++, or if
I were more fluent in C++ than I am now, it would look different.
But the current hybrid is fine.
--
    --Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: implementation language [was: library for unicode collation in C for texi2any?]

2023-10-12 Thread Per Bothner

On 10/12/23 11:35, Gavin Smith wrote:

Calling [using Perl] a "mistake" is a very strong statement!  Why do you say 
that?
Surely texi2any 7.0.3 (2023) is more functional than makeinfo 4.13 (2008) was.


That is not a meaningful comparison. The question is: If we had re-written
makeinfo/texi2any using C++ rather than Perl there is all reason to
believe it would have been similarly functional at this point.

The most obvious reason Perl was the wrong choice was performance: As you
recently mentioned some people are still using the old pre-Perl tools
for that reason. And this is after years of tuning the Perl code.

Another problem with Perl is that was a relatively obscure language,
in that not many people had much experience writing big complicated
programs using Perl. And that has gotten worse in the intervening years.
A related issue is tooling: There is a lot of available and up-to-date
tooling for C++, including gdb.


I'm not much of a fan of C++ tbh.


I have a lot of C++ experience, and I am a big fan. For almost any program
where C might be a reasonable choice, C++ is likely to be a better choice.
C++ is admittedly a big complicated language - but if you focus on
C and add classes, methods, and inheritance it's not very complicated
and it's already a big win over C. Memmory management (RAII) is also easier
and less error-prone:
https://learn.microsoft.com/en-us/cpp/cpp/object-lifetime-and-resource-management-modern-cpp?view=msvc-170

It is much easier to make a large C++ program well-structured and maintainable:
Classes help immeasurably, as do namespaces.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



implementation language [was: library for unicode collation in C for texi2any?]

2023-10-12 Thread Per Bothner

On 10/12/23 02:39, Patrice Dumas wrote:

There is a translation to C of texi2any code going on, for the future,
after the next release, mainly for the conversion to HTML in a first step.


I've always thought that C++ is the obvious implementation language for 
texi2any.
The structure of the Perl code is probably a lot easier and cleaner to map into 
C++
(using classes and simple single inheritance) than to plain C.

C++ has a more extensive and useful standard library than C.

One data point: Gcc was converted to C++ some years ago.

Re-writing texi2any in Perl turns out to have been a mistake; switching to C
seems like it would be another mistake. But hey - I'm not the one doing the 
work.
--
    --Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: control over sectioning and splitting

2023-07-27 Thread Per Bothner

My priority issue was to be able to vary the splitting level,
but allow sidebar entries to go deeper than the splitting level.

This seems to be working acceptably at this point, by leaving out @node 
commands.
See e.g. https://domterm.org/Settings.html
Source: https://github.com/PerBothner/DomTerm/blob/master/doc/DomTerm.texi

I don't remember what changes if any I had to make - probably the js/info.js
around November 2022.
--
    --Per Bothner
p...@bothner.com   http://per.bothner.com/



using hashes for internal URLs

2022-12-15 Thread Per Bothner

When using the js-info reader, the location bar displays these URLs for nodes:
  https://domterm.org/Wire-byte-protocol.html (if http/https)
  file:///home/bothner/DomTerm/web/index.html#Wire-byte-protocol (if file:)
These are also what you get if you right-click a link and select "Copy link".

(The reason for the difference if using file: scheme has to do with browser
security restrictions.)

"Inner" URL have the form:
  https://domterm.org/Wire-byte-protocol.html#Window-operations (if 
http:/https:)
  
file:///home/bothner/DomTerm/web/index.html#Wire-byte-protocol.Window-operations
 (if file:)

I suggest an option (perhaps the default) to always prefer the #hash form.
So the public link might be:
  https://domterm.org#Wire-byte-protocol
Most servers would treat the following as equivalent to the above:
  https://domterm.org/#Wire-byte-protocol
  https://domterm.org/index.html#Wire-byte-protocol

So far this makes for slightly shorter URLs, as well as consistency
when using file: URL. The big win comes if we do the same for
inner URLs.  So the Window-operations link becomes:
  https://domterm.org#Window-operations

This requires a mapping table generated by tex2any so info.js can
translate #Window-operations to Wire-byte-protocol.html#Window-operations.
This table should include *all* @node names *and* @anchor names.
This would allow links to remain valid even if things are moved
around. I think this makes for a nicer solution than using
redirection pages.

To handle the case when JavaScript is missing/disabled, we can
make the re-mapping table human-readable. Perhaps:


For ``Window operations'' see here.


Then using the URL with the hash #Window-operations takes to a line
with an easy-to-click link to the correct file and position.
The re-mapping table would be hidden if using JavaScript.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: info.js demo on Texinfo manual updated on website

2022-11-30 Thread Per Bothner

On 11/30/22 06:44, Gavin Smith wrote:

There are a couple of other less important problems.  You can see
them at https://www.gnu.org/software/texinfo/manual/texinfo-html/index.html:

* A tooltip pops up saying "General Index (GNU Texinfo 7.0.1)" at the
   table of contents.
* The page title (displayed in a browser tab or the window title bar) at
   the Top node is "General Index (GNU Texinfo 7.0.1)".  However, go to
   either of the index nodes and the title becomes "Top (GNU Texinfo 7.0.1)".


A partial work-around is to test for INDEX_ID in Pages.prototype.render:

if (state.action.type === "window-title"
&& div.getAttribute("id") !== config.INDEX_ID) {
div.setAttribute("title", state.action.title);
}

However, this doesn't set the correct title for the Index nodes (which
isn't yjay big a deal).

The actual problem seems to be that state.current points to
the Top page when we loading the indexes.  Thus we are trying to set the "title"
attributes of the Top node when it should be one of the Index nodes.

The logic depends on the "current" node -but that isn't always correct when 
dealing
with messages from sub-windows. Specifically, the sub-page (iframe) sends a 
"window-title"
message to the parent window. The title should be set on the node corresponding 
to
the sending sub-window. This is not necsssarily the same as the "current" 
window.
We can modify state.current before loading the sub-window, and it will 
*probably*
work correctly, but selecting the affected node from the sub-window seems more
correct than assuming the "current" node.

I find the "state machine" logic of info.js quite hard to deal with,
and I confess I don't understand what problem it is intended to solve.
I don't think it's the right logic for dealing with "window-title" messages.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: info.js demo on Texinfo manual updated on website

2022-11-29 Thread Per Bothner




On 11/29/22 13:18, Gavin Smith wrote:

I have looked at the diff and found which parts of the code were
changed.  Can you edit the following to describe the changes, and
send it back, or commit it yourself with a ChangeLog entry,


I'll do that.

One question that occured to be: Would it be clearer to use
semi-colons (instead of commas) to separate entries/sub-entries?  I.e.
instead of:
text = text0 + ", " + text;
do:
text = text0 + "; " + text;

I think would be clearer, especially if index entries contain commas.


using the ChangeLog entry as the commit message.


I believe best practice is for the first line of a commit message
to be a self-contained short summary, followed by an empty line
and further details (if any). So some editing of a ChangeLog entry
is usually appropriate before using it as a commit message.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: info.js demo on Texinfo manual updated on website

2022-11-29 Thread Per Bothner

On 11/28/22 13:28, Gavin Smith wrote:

The indices go to the right place and they update the entries, but the
index text is wrong now: for example, in the Texinfo manual, type in
"author" and there is nothing, but type in "@title" and
"@title @subtitle @author" is suggested, because there is an
index entry with that node as its target.


Could you try the attached patch?

I also implemented sub-entry handling while I was in that area.

Optionally, I suggest a wider width for the text-entr box - see the info.css 
part of the patch.

--
--Per Bothner
p...@bothner.com   http://per.bothner.com/diff --git a/js/info.css b/js/info.css
index 72f46a3948..ac0e596484 100644
--- a/js/info.css
+++ b/js/info.css
@@ -230,6 +230,10 @@ table#keyboard-shortcuts th {
 right: 0;
 }
 
+.text-input input {
+width: 25em;
+}
+
 .error {
 background-color: orange;
 padding: 5px;
diff --git a/js/info.js b/js/info.js
index bf555bcf9d..7ff7ced161 100644
--- a/js/info.js
+++ b/js/info.js
@@ -146,10 +146,30 @@
 /** @arg {NodeListOf} links */
 cache_index_links: function (links) {
   var dict = {};
+  var text0 = "", text1 = ""; // for subentries
   for (var i = 0; i < links.length; i += 1)
 {
   var link = links[i];
-  dict[link.textContent] = href_hash (link_href (link));
+  var link_cl = link.classList;
+  var text = link.textContent;
+  if (link_cl.contains("index-entry-level-2"))
+{
+  text = text0 + ", " + text1 + ", " + text;
+}
+  else if (link_cl.contains("index-entry-level-1"))
+{
+  text1 = text;
+  text = text0 + ", " + text;
+}
+  else
+{
+  text0 = text;
+}
+  var sec_link = link.nextSibling
+  && link.nextSibling.classList.contains("printindex-index-section")
+  && link.nextSibling.firstChild;
+  if (sec_link)
+dict[text] = href_hash (link_href (sec_link));
 }
   return { type: "cache-index-links", links: dict };
 },
@@ -1457,9 +1477,9 @@
   if (linkid_contains_index (linkid))
 {
   /* Scan links that should be added to the index.  */
-  var index_links = document.querySelectorAll
-("td.printindex-index-section a");
-  store.dispatch (actions.cache_index_links (index_links));
+  var index_entries = document.querySelectorAll
+("td.printindex-index-entry");
+  store.dispatch (actions.cache_index_links (index_entries));
 }
 
   add_icons ();


Re: info.js demo on Texinfo manual updated on website

2022-11-27 Thread Per Bothner

I think it would make sense to add rules in the Makefile.am
to build the html versions of the texinfo manual(s). It doesn't
necessarily have to be run by default or installed.
However, having the rules with the same options as for the website
is useful for testing if nothing else. Plus it's useful as an example.
--
    --Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: info.js demo on Texinfo manual updated on website

2022-11-27 Thread Per Bothner

On 11/27/22 12:48, Per Bothner wrote:


On 11/27/22 12:24, Gavin Smith wrote:

The main other problem is updating the contents sidebar after an
index search.  You can see it at
 It would be good to have this fixed before the 7.0.1 bug-fix release.
I do not know if I will have time or be able to fix it myself.


I checked in a fix.

The problem is we were using the query pattern "td.printindex-index-entry a"
which gives us URLS of the form 
"Texinfo-File-Header.html#index-Beginning-line-of-a-Texinfo-file".
That loads the right page, but it doesn't match what is in the ToC, used to 
generate the sidebar.

Instead we need to use "td.printindex-index-section a" which gives us
URLs of the form "Texinfo-File-Header.html#First-Line".

Note to handle sub-entries we need some more tweaking.  I don't know if you want
fix this before the bug-fix release.  It shouldn't take long.  However, I would 
prefer to
first change the index to 2-column rather using an extra column for the initial 
letter.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: info.js demo on Texinfo manual updated on website

2022-11-27 Thread Per Bothner




On 11/27/22 12:24, Gavin Smith wrote:

The main other problem is updating the contents sidebar after an
index search.  You can see it at
 
It would be good to have this fixed before the 7.0.1 bug-fix release.

I do not know if I will have time or be able to fix it myself.


I'm looking at it.
--
    --Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: @subentry, @seealso and @seenentry better formatted in HTML

2022-11-21 Thread Per Bothner




On 11/21/22 13:12, Patrice Dumas wrote:

The letter column allows to have the index entries aligned on the widest
letter.


I don't see any benefit to that.  Quite the opposite.
The letter headings are *headings*.  Lines below a heading are not
normally indented based on the size of the widest heading.

Some indexes group non-letters in a generic "special symbols"
category. You would certainly not want to indent all the entrie
based on the width of "special symbols".
It is also common to have no heading for the initial letter,
but just add vertical white-space when there is a new initial character.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: @subentry, @seealso and @seenentry better formatted in HTML

2022-11-20 Thread Per Bothner




On 11/20/22 13:47, Patrice Dumas wrote:

On Sun, Nov 20, 2022 at 01:35:33PM -0800, Per Bothner wrote:

Before I do that, I suggest cleaning up the generated html a bit, as discussed:
- Get rid of the dummy  table cells used for indentation.


The first column is the letter column.  It is not really indentation,
although it indeeds leads to indentation.  What would you propose
instead?


Consider how things would look if you changed border="0" to border="1".
Probably not something one might want to do, but I think it helps
visualize things.

First, there should only be two columns, IMO.
There should be no "letter column".  The letter headings should be:

   B

The regular entries should use a style rule:

td.printindex-index-entry { padding-left: 1.5em }

or:

td.printindex-index-entry a { margin-left: 1.5em }

or similar.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: @subentry, @seealso and @seenentry better formatted in HTML

2022-11-20 Thread Per Bothner

Looking at the latest checkins, I suggest replacing:

  /* Scan links that should be added to the index.  */
  var index_links = document.querySelectorAll
("td.printindex-index-entry a");

by:

  /* Scan links that should be added to the index.  */
  var index_links = document.querySelectorAll
("td.printindex-index-entry");

and passing the td elements to cache_index_links.

The latter can then do the apropriate logic to deal with entries and subentries:
(1) Keep tack of most recent level-0 and level-1 entries.
(2) If there is a child/grand-child  element, check to see if it is level-1 
or level-2 entry,
If so prepend the saved level-0 and level-2 text (followed by ", ") to the link 
text.
Then add the entry to the dictionary.

I can implement this once the output is stable.

Before I do that, I suggest cleaning up the generated html a bit, as discussed:
- Get rid of the dummy  table cells used for indentation.
- Get rid of the extra  elements. Move the 
class="index-entry-level-1"
  to either the  element (preferably) or the  element.
- Consider getting rid of empty  
elements, or
  at least removing the class attribute. (Whichever is preferable may depend on 
the line
  is styled, such as when table borders are added.)
  
--

--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: @subentry, @seealso and @seenentry better formatted in HTML

2022-11-19 Thread Per Bothner




On 11/19/22 08:58, Patrice Dumas wrote:

I have modified span.index-entry-level-1 to use padding-left instead of
margin-left, but to have some space between the td elements, shouldn't
margin-left be preferable?


Padding adds extra space "within" an element, while margin adds extra space 
between elements.
Margin (applied to ) doesn't seem to affect the size/spacing of table cells.
Specifically, the following doesn't work:

td.printindex-index-section { margin-left: 2em }

However, this works:

td.printindex-index-section a { margin-left: 2em }
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: @subentry, @seealso and @seenentry better formatted in HTML

2022-11-19 Thread Per Bothner




On 11/19/22 01:22, Patrice Dumas wrote:

@findex f---bb @subentry f---cc

the HTML is:

f---bb 
f---cc 1 
chapter


We really shouldn't be using ' ' or empty table cells for formatting.
Use CSS. For example:


  span.index-entry-level-1 {padding-left: 2em }
  td.printindex-index-section { padding-left: 1em }


Better of course if we we could nest the subentries within the super entries,
and maybe avoid using tables altogether. It miight be possible using grids
(and maybe subgrids), but they're relatively new, and I haven't used them.


Any proposition for a better formatting?  Any idea on how to help
javascript-ing, maybe with a custom attribute with the full entry with
commas separating the subentries?


That would be fairly easy to deal with.
However, JavaScript should be able figure it out without that.
It can look at sibling entries wihout too much pain,
as long as there is a clean well-defined structure:
It is easy to tell entries, subentries, and subsubentries apart,
and they're in the "obvious" order.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: info.js demo on Texinfo manual updated on website

2022-11-18 Thread Per Bothner

On 11/18/22 14:43, Gavin Smith wrote:

Unfortunately, it has various obvious problems.  Most significantly, the
index search does not work.


The problem is line 1459 in info.js:

 var index_links = document.querySelectorAll ("td[valign=top] a");
  
The  elements in index no longer have the valign="top" attribute,

so this query fails to find anything.  Instead we can do:

 var index_links = document.querySelectorAll ("td a");

Being a bit stricter in the search seems to make sense to me.
Maybe something like:
 
 var index_links = document.querySelectorAll ("div.printindex td a");


I don't know if this is the right thing - i.e. if it matches all the indices
we need.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: info.js demo on Texinfo manual updated on website

2022-11-18 Thread Per Bothner




On 11/18/22 15:48, Patrice Dumas wrote:

On Fri, Nov 18, 2022 at 03:19:23PM -0800, Per Bothner wrote:



Before, when I converted the DomTerm manual to html the output contained:

DomTerm - a terminal emulator and console using DOM and 
JavaScript

This is now gone.

Looks a change in the generated html broke this.


Yes:
2022-03-11  Patrice Dumas  

* doc/texinfo.texi @code{@@settitle}): no title in the
document anymore in the default case in HTML.

It was redundant with the @top.


I'm not sure it is.  There is no place in index.html that contains the actual 
title
of the manual as a whole.  I see:

Top (GNU Texinfo 7.0)



...
Texinfo

There are 3 places that *include* the title of manual: "Top (GNU Texinfo 7.0)"
However, doing pattern matching to extract the manual title seems losing,
especially if "Top" might be translated.

Note this was a compatility-breaking changes - it breaks people's
existing JavaScript or stylesheets.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: info.js demo on Texinfo manual updated on website

2022-11-18 Thread Per Bothner




On 11/18/22 14:43, Gavin Smith wrote:

The "hide sidebar" button at the top of the
sidebar is also too prominent and leaves too much empty space at the top.


The intention is the that "wasted" space would contain a title or logo.
Like for the DomTerm manual: https://domterm.org/index.html

Currently, this is implemented using a JavaScript "hook" function.  
Specifically,
dt-manual.js contains:

function sidebarLinkAppendContents(a, h1) {
a.innerHTML = "DomTerm terminal 
emulator"
}

If sidebarLinkAppendContents isn't defined, it should default to the title from 
@settitle.
info.js looks for an  element whose class contains "settitle".

Before, when I converted the DomTerm manual to html the output contained:

DomTerm - a terminal emulator and console using DOM and 
JavaScript

This is now gone.

Looks a change in the generated html broke this.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: Status of texinfo/js/yarn.lock ?

2022-11-16 Thread Per Bothner

On 11/16/22 16:51, Per Bothner wrote:

For what it's work, I just updated to Fedora 37, which broke 'git pull'
from Savannah:  ...
...
Same problem with GitLab, but GitHub works OK.  Probably
some new keys need to be uploaded - but not today.


Actually, GitHub was also broken.  Fixed (on Savannah, GitLab, GitHub) with:

ssh-keygen -t ed25519 -C 'p...@bothner.com'

and uploading the new ~/.ssh/id_ed25519.pub as needed.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: Status of texinfo/js/yarn.lock ?

2022-11-16 Thread Per Bothner




On 11/15/22 13:16, Gavin Smith wrote:

"make lint" issues an error:
/home/g/src/texinfo/GIT/js/node_modules/.bin/eslint -c build-aux/eslint.json 
info.js -f unix
/home/g/src/texinfo/GIT/js/info.js:123:57: Parsing error: Unexpected token = 
[Error]


While I don't think supporting old browsers is a priority,
I'm pretty sure we can fix this one easily - just remove the default parameter:

set_current_url: function (linkid, history, clicked) {


"make check-types" gives a lot of errors, starting:


Based on the ones I looked at, these seem to be false alarms.
This code is written in JavaScript (which as dynamic typing),
not TypeScript (which uses static typing). It is no doubt possible
to fix the code so check-types passes, but I think it is very low priority.


Per, would you be able to take a look at this?


For what it's work, I just updated to Fedora 37, which broke 'git pull'
from Savannah:

$ git pull
load pubkey "/home/bothner/.ssh/id_rsa": Invalid key length
both...@git.savannah.gnu.org: Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.

Same problem with GitLab, but GitHub works OK.  Probably
some new keys need to be uploaded - but not today.

(I did do a 'git pull' before updating Fedora, so i do have a recent snapshot.)
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: Status of texinfo/js/yarn.lock ?

2022-11-13 Thread Per Bothner




On 11/13/22 11:54, Gavin Smith wrote:

There are three files that may be removed: package.json, tsconfig.json
and yarn.lock.


I'm pretty sure we can also remove server.js.


I believe these are all related to the npm package manager.
npm was originally used when this code was being developed.  I'm not
completely sure if npm can or should still be used here for updating
dependencies.


I think the only 3rd-party dependency is modernizr.js, version 3.5.
The current version is 3.12: https://github.com/Modernizr/Modernizr/releases
If need be, it can always be updated "by hand". However, figuring out
the right files isn't completely obvious, so we may leave it as-is for now.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: Status of texinfo/js/yarn.lock ?

2022-11-13 Thread Per Bothner




On 11/13/22 01:22, Hilmar Preuße wrote:

hopefully this is not a FAQ, but I found nothing in the archive.

What is the status of the code sitting in subdir "js"? To me this code
looks quite unmaintained.


It is definitely maintained. It hasn't seen much development recently
(it works pretty well), though I have some things I'd like to change if/when I 
get time.

However, the directory seems to have some old crud and could probably be 
cleaned up.
Specifically, I don't think yarn.lock is needed or used at this point.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: url protection

2022-08-05 Thread Per Bothner

On 8/4/22 23:15, Eli Zaretskii wrote:

So you mean the HTML file will have these file names encoded in UTF-8,
while the file itself will be created using the locale's encoding?


That seems to me to be the correct approach.
(At least if the HTML contents is UTF-8 - which it should be.)
--
    --Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: url protection

2022-08-05 Thread Per Bothner

On 8/3/22 15:15, Patrice Dumas wrote:


In any case, it does not mean that using another encoding is fragile nor
dangerous.  There is a list of supported encodings in the Texinfo
manual
https://www.gnu.org/software/texinfo/manual/texinfo/html_node/_0040documentencoding.html
I think that we support them well, in a robust way in texi2any.  And if
it is not the case, it should be a bug.  We always emit a charset
information, too.


The question is output encodings for html (and xhtml and epub), and how well
browsers and others html-reading programs handle random "legacy" output 
encodings.

Note the term "legacy" encoding (as used in W3C standards).  That implies
that new html files should avoid using these encodings.


I think that we should support setting the output encoding explictly to
a Texinfo supported encoding for a long time, even it UTF-8 becomes the
default output encoding for HTML.


Why? Is this useful for anything?


I do not imagine dropping that feature anytime soon.


Why not?  "We have done so in the past" isn't really a reason.
Is there any consumer of texi2any-procused html files that would
break if the output encoding changed to utf8 only?  I guess hypothetically
that might be the case, but I don't see why why should support it.

At the very least the default output should be utf8.  If the interest of
simplified code and documentation it makes sense to just remove the
support for other encodings - it is useless crud.

Certainly if we have to add a new option/switch to support overriding the
default output encoding then it is not worth it.  Just switch the output
to utf8, change documentation, and rip out the old code.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: url protection

2022-08-05 Thread Per Bothner

On 8/5/22 10:35, Gavin Smith wrote:

Could we write or copy the code for escaping a URL as it should
be very short and simple?  This would avoid an extra module dependency.


Here is C/C++ code written by me.
It works in two passes - the first counts the number of bytes
that need to be escaped.  For Perl a single pass may make more sense.
 
/* Returns either NULL or a freshly malloc'd urlencoding of 'in'. */

char *
url_encode(const char *in, int mode)
{
static unsigned char b16[] = "0123456789ABCDEF";
int bad_count = 0;
char *out = NULL;
for (int pass = 0; pass < 2; pass++) {
const char *p = in;
char *q = out;
while (*p) {
int ch = *p++;
bool ok = (ch >= '0' && ch <= '9')
  || (ch >= 'a' && ch <= 'z')
  || (ch >= 'A' && ch <= 'Z')
  || (ch == '/') /* may depend on mode */
  || (ch == '.' || ch == '-' || ch == '_'  || ch == '*');
if (pass == 0) {
if (! ok)
  bad_count++;
} else {
if (ok)
  *q++ = ch;
else {
*q++ = '%';
*q++ = b16[(ch>>4) & 0xF];
*q++ = b16[ch & 0xF];
}
}
}
if (pass == 0) {
if (bad_count == 0)
        return NULL;
size_t in_size = (char*) p - in;
out = challoc(in_size + 2 * bad_count + 1);
} else
*q = 0;
}
return out;
}

--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: images subdirectories in epub

2022-08-03 Thread Per Bothner




On 8/3/22 14:01, Patrice Dumas wrote:

On Wed, Aug 03, 2022 at 12:16:04PM -0700, Per Bothner wrote:

Again - why? More specifically: why are you putting the html/xhtml file in a
separate xhtml subdirectory?  If you get rid of that, it seems you avoid the 
problem.


No, the problem is not avoided, because the additional image
subdirectory is still not created.


Agreed: when I generate HTML my Makefile has to explicitly copy over the image 
files.
It would probably be better to have tex2any do that by default (or as an option
if we're concerned about compatibility).  Even more-so for epub: Whether 
texi2any
creates the epub file directly or uses a separate script, that script needs to 
copy
over the image files before creating the epub archive.


As to why the xhtml file is not put directly in the EPUB directory
(which could be named differently), it is to have a structure that I find
cleaner.  That way in the EPUB directory, there are only the opf file and
directories, xhtml for the manual files, images for the images and
something like js for the javascript (and css for specific CSS files,
but I am not sure it is used like that in practice, but it would fit well
in this setup). ...

However, if this structure is not right, I can change it, it is in no
way dictated by the standard, but I would like to have more substantive
arguments.


The primary use-case for generating epub is to support e-readers, but another
useful application is as a compressed container format for the html represention
of a manual.  For example an application or server could store the manual
as an epub file, and then map http requests to members of the epub archive.
(A server doesn't even have to uncompress the member, but can send it
compressed to a browser that understands gzip compression.  I implemented
this for the libwebsockets library.)

A related use-case is to use epub as a distribution format for manuals.
Installation is then just a matter of running unzip.

In these applications, it is useful that the structure of the epub archive
mirrors the preferred layout of a directory for web-browsing. This also
reduces the risk of errors and inconsistencies, I believe.

(For these applications it is also preferable that files be valid html and
have the .html extension, rather than .xhtml - but that is a separate 
discussion.)

The "index.html" (or index.xhtml) file should be in the top directory.
It is easier and traditionally if the other pages are in the same directory.
Similarly, images should either be in the same directory as index.html
(and the other html files) or in a sub-directory - *not* in a sibling directory.
--
    --Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: url protection

2022-08-03 Thread Per Bothner

On 8/3/22 13:46, Patrice Dumas wrote:

This is not what we do in general for html/xhtml.  For epub we always
emit utf8, as it is mandated by the standard, but for html/xhtml, we
use, in the default case, the input encoding for the output encoding.


I think that is a mistake.
It seems clear that in 2022 all publicly-visible html pages (i.e. on a public
web server) should use utf8.
It is also clear that a practical html-reading program is able to read 
utf8-encoded
html files (assuming a correct charset declaration), regardless of the local
character encoding, even for local file: urls or an internal web-server.
Ergo, always emitting utf8 (with a charset declaration) is safer and very 
unlikely to
lead to problems. while using a native or input-base encoding is fragile and 
dangerous.


The conversion should not have already been done at that point, we are
still character strings in internal perl unicode encoding.  But that was
not really myquestion, my question was more on whether we should use the
output encoding to encode string before doing the URI::Escape call, or
always use UTF-8, even if the document encoding is not UTF-8.


The question is irrelevant: we should always emit utf8 in both urls and in the 
body
of html/xhtml files.  That should certainly be the default (regardless of
native or input encoding) - and it is almost certainly a waste of time to
support anything else.

Here is another datapoint:
https://en.wikipedia.org/wiki/Internationalized_Resource_Identifier#Compatibility
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: images subdirectories in epub

2022-08-03 Thread Per Bothner




On 8/3/22 11:48, Patrice Dumas wrote:

On Wed, Aug 03, 2022 at 09:13:31AM -0700, Per Bothner wrote:

On 8/3/22 05:22, Patrice Dumas wrote:

In EPUB, the images are copied to a directory, such as
my_manual_epub_package/EPUB/images/
The manual files are in
my_manual_epub_package/EPUB/xhtml/*.xhtml
and the paths to images in the XHTML files have ../images/ prepended.


Why?  I don't see any such requirement in
https://www.w3.org/publishing/epub3/epub-ocf.html


I am just describing how things are, not that there are any constraints
on where they should be.  The issue is not the requirement, simply the
implementation.


Again - why? More specifically: why are you putting the html/xhtml file in a
separate xhtml subdirectory?  If you get rid of that, it seems you avoid the 
problem.
--
    --Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: url protection

2022-08-03 Thread Per Bothner

On 8/3/22 06:26, Patrice Dumas wrote:

The standard does not seems to clear on the encoding to use for the %
encodings.  URI::Escape has uri_escape() and uri_escape_utf8.  My
feeling is that the best would be to use first encode to the output
encoding and then call URI::Escape uri_escape().


If I read https://metacpan.org/pod/URI::Escape correctly,
uri_escape_utf8 is equivalent to utf8::encode followed by uri_escape.

For html/xhtml output (including epub) I think we should keep it simple:
always emit utf8.  The input to url-encoding is a sequence
of utf8-bytes. So whether to use uri_escape_utf8 or uri_escape
depends on whether conversion to utf8 has already been done.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: images subdirectories in epub

2022-08-03 Thread Per Bothner

On 8/3/22 05:22, Patrice Dumas wrote:

In EPUB, the images are copied to a directory, such as
my_manual_epub_package/EPUB/images/
The manual files are in
my_manual_epub_package/EPUB/xhtml/*.xhtml
and the paths to images in the XHTML files have ../images/ prepended.


Why?  I don't see any such requirement in
https://www.w3.org/publishing/epub3/epub-ocf.html

3.2 File and Directory Structure

All other files within the OCF Abstract Container MAY be in any location
descendant from the Root Directory, provided they are not within the 
META-INF directory.

kawa-manual.epub (generated using the docbook xsl stylesheets) contains:

mimetype
META-INF/
META-INF/container.xml
OEBPS/
OEBPS/index.html
OEBPS/Unicode.xhtml
OEBPS/Sequences.xhtml
...
OEBPS/images/
OEBPS/images/border-1.png
OEBPS/images/polygon-1.png
--
    --Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: control over sectioning and splitting

2022-01-23 Thread Per Bothner
 


On 1/22/22 02:07, Gavin Smith wrote:

@chapter or @section without @node would become a lot like
@heading except perhaps there could also be an @anchor generated
for the chapter heading as well as appearing in the table of
contents.  There would have to be a check that the name of the
automatic @anchor didn't clash with any other @node or @anchor.


If the @section/whatever is immediately preceded by an @anchor,
one can use the explicit anchor instead.  In a sense the @anchor
functions like the @missing node command, except in terms of splitting.

I assume you would disallow sub-nodes: I.e. the next @node command would
have to be followed by a sectioning comand at the same or higher level.
--
    --Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: error with autogen.sh probably related to gnulib update

2022-01-16 Thread Per Bothner

On 1/16/22 05:13, Gavin Smith wrote:

I've tracked those files under tp/Texinfo/XS.  Hope it works now.


I got a merge conflict because gnulib/lib/malloc/dynarray-skeleton.gl.h
is now checked into git.  Was that intentional?  Normally we prefer
not to check in files that are generated automatically.
--
    --Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: change in the sectioning commands div extent classes

2022-01-12 Thread Per Bothner

On 1/12/22 02:50, Patrice Dumas wrote:> A heads up that the  showing the 
section extent, which can appear at

the @node location now has a class name like "section-level-extent"
instead of of a class name like "section".  The "section" class name
is only be associated to the section heading element.

It may require some change in the javascript, maybe?


I did a little testing, and found no problems.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: permanent links using #fragment names

2022-01-09 Thread Per Bothner

On 1/9/22 11:53, Gavin Smith wrote:

It's certainly more elegant, but we are not generating documentation
to be served as part of some large information processing service,
like Wikipedia or other websites which don't use .html extensions.
We're generating a self-contained set of files that represent one
manual, in HTML.  These files could be accessed in a variety of ways,
not just as an online service.

In other words, the output from texi2any is small, self-contained and
inanimate.  The files should be output with .html extensions and link
to each other accordingly.


Agree.  Old-style URLs without JavaScript, as well as static file: URLs, need 
to work.


Any simplification of URL's should be treated carefully and be built
on top of the bare HTML layer if at all, not compromising the
functionality of this lower layer.


What I'm suggesting is:
(1) File splitting, file names, and intra-manual links are all unchanged from 
now.
However, if enabled by an option, file names may optionally use a 
standard-compatible %-encoding.
(2) JavaScript should automatically map #NAME to FILE.html#NAME (somehow - see 
below).
(3) If enabled (by a texi2any option *and* JavaScript is enabled), the URL
displayed in the location bar (and when hovering over a link) will be the #NAME.
Note this is already the case with file: URLs, because that's the only
part of the location URL we can update due to browser security rules.
(4) Public links (from other manuals, web pages, an email responding to a 
question)
should use the #NAME form (assuming the option in (2) enabled).
(5) When the option in (2) is enabled texinfo should emit extra html so that
when JavaScript is unavailable the #NAME fragment goes somewhere sane, probably
a link in a ToC.


Okay, but if #NAME loaded a redirection page it could go to the right
place immediately.  The way you are thinking about it is that NAME is
passed to info.js which has to find where it is.  It all depends on
how the redirection pages are done.


For #NAME to load a redirection page there has to be some JavaScript involved.
For an intra-manual link there is no need to go via a redirection page:
During loading, JavaScript rewrites href="FILE.html#NAME" to "#NAME" -
but it can save the original "FILE.html#NAME" somewhere for future use.
However, when we *start* with a #NAME-style URL it's a bit trickier:
We could *assume* there is a redirection page NAME.html and load that,
and count on redirection to take us to the correct page.  However
I'm not sure how robust that is - plus it requires another round-trip.
Probably better to put an index in the top page.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: permanent links using #fragment names

2022-01-09 Thread Per Bothner




On 1/9/22 11:01, Gavin Smith wrote:

It can't be, I'm afraid as we are not going to demand that users
use JavaScript in their browsers to access web documentation.


No, but I think it is reasonable that people who won't use JavaScript
get a clunkier user interface.

Navigating within a manual is not a problem and would not change.
I'm not proposing changing very much in the generated html:
Just adding some extra information to the ToC, plus an *option* to have the
URL display in the location bar display as "emacs#Keys" (when JavaScript is 
enabled).

The issue is what should happen that if someone types or clicks
on a link to "emacs#Keys", doesn't have JavaScript, and Keys is in
a separate Keys.html file. In that case, I think it is OK if the fragment #Keys
takes them some place on the page (probably in the ToC) such that a click will
take them to the correct file and location.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: permanent links using #fragment names

2022-01-09 Thread Per Bothner




On 1/9/22 10:28, Gavin Smith wrote:

On Sun, Jan 09, 2022 at 09:32:38AM -0800, Per Bothner wrote:

An idea I've been mulling: Currently, we handle changes in node structure
by generating extra stub pages that redirect to the correct location.
This works, but it's a bit clunky.  For one thing, it doesn't update
the location bar (though that could be fixed).  Worse, it highlights a
two-level name hierarchy FILENAME.html#FRAGMENT that depends on
the splitting mode.


When you say "changes in node structure" does this refer to changing
the splitting setting?


That too, but I was including changing the texi source to (say) split
a chapter into sections, or change subsections to separate sections.


Renamed @node's in a document are often marked with @anchor's, which
may have their own redirection pages - is that what you mean?


In part.


Can this be done without JavaScript or is it only for info.js?


If the manual is not split, then JavaScript would not be needed.
Otherwise, JavaScript would be needed to resolve a fragment id
to load the correct page.  However, one could generate in the ToC
something like:
  See Keys
That way, a link like "emacs#Key" would scroll to a link that
the user could click.  If JavaScript is enabled, we could hide that
link, or format it appropriately for the sidebar.

One could also do the opposite: Generate every anchor/node name Foo
as Foo.html. If Foo.html is only a re-direction page, the JavaScript
could optimize that.

However, there is an argument that ".html" is non-semantic
implementation information that should not appear in putlic URLs.
It is similar to a ".php" extension exposing implementation
information.  That is why "emacs#Keys" is better than to "emacs/Keys.html".


Note this would also help with the issue discussed in the
"control over sectioning and splitting" thread, since info.js
could optionally put subheading (for example) in the side-bar.


I don't really understand the benefits of listing @anchor's (invisibly)
in the table of contents in index.html.

Somebody opens a redirection page for an anchor, at some point info.js
gets loaded and the page URL is changed to the correct location.  Is
a list of the anchors in index.html needed for this?


We want URLs (the ones that are visible publicly or externally)
to look the same whether it's an @anchor, a page-level @node
(i.e. @chapter if splitting by section), or a sub-page-level @node (a 
@subsection
if splitting by section).

You can do that by making all public URLs have the form #NAME or all
URLs have the form #NAME.html. The former is shorter, more elegant,
more semantic - and doesn't hard-wire the use of html format.

If you use #NAME for your public URLs you'd like to automatically
map that to the correct location in the correct file - without
having to first load and parse all the pages. You avoid that
by having the mapping from #NAME to location somewhere on the home page.

--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: permanent links using #fragment names

2022-01-09 Thread Per Bothner




On 1/9/22 09:47, Eli Zaretskii wrote:

For example, the Keys chapter in the Emacs manual would still be in the
file Keys.html, and you can still browse to "https://whatever/emacs/Keys.html";.
However, the "perma-link" would be "https://whatever/emacs#Keys";, and that
is what would show in the location bar.  Currently, 'Keys" is a section,
but the perma-link would remain "emacs#Keys" regardless of whether
it is changed to a chapter or subsection, or of the splitting mode.
Even if the section is renamed, "emacs#Keys" would work as long as
an @anchor with the old name is added.


I'm not sure I understand what will happen with @anchor's.


The emacs manual contains @anchor{Outline Search} in text.texi.
This generates the URL "Outline-Visibility.html#Outline-Search".
This would still work, but so would "emacs#Outline-Search" and
the latter would be the preferred ("perma-link") URL.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: permanent links using #fragment names

2022-01-09 Thread Per Bothner




On 1/9/22 09:45, Eli Zaretskii wrote:

Date: Sun, 9 Jan 2022 09:32:38 -0800
From: Per Bothner 

A related change is that we should change how info node/anchor names are
mapped into URLs to use standard %-encoding.  This would make URL cleaner
and more readable.  For continuity, we could map space to '-', '-' to '%45',
and other characters as in standard percent-encoding.  An initial '-' could
be used for texinfo-internal ids such as the "-toc-foo" or "-def-printf".


What does this mean for letter-case clashes on case-insensitive
filesystems?  The most frequent case is Index.html vs index.html, but
there are others.


Well, I was mostly focusing on fragment identifiers, not file names.
Fragment identifeirs of course don't care about file systems.
Currently, filenames are generated preserving the source case;
I don't know what we do if case-folding leads to a name clash,
but we can continue doing something similar.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



permanent links using #fragment names

2022-01-09 Thread Per Bothner

An idea I've been mulling: Currently, we handle changes in node structure
by generating extra stub pages that redirect to the correct location.
This works, but it's a bit clunky.  For one thing, it doesn't update
the location bar (though that could be fixed).  Worse, it highlights a
two-level name hierarchy FILENAME.html#FRAGMENT that depends on
the splitting mode.

Instead, I suggest that the primary and visible name for every node and anchor
use the fragment names (the part after the '#').

For example, the Keys chapter in the Emacs manual would still be in the
file Keys.html, and you can still browse to "https://whatever/emacs/Keys.html";.
However, the "perma-link" would be "https://whatever/emacs#Keys";, and that
is what would show in the location bar.  Currently, 'Keys" is a section,
but the perma-link would remain "emacs#Keys" regardless of whether
it is changed to a chapter or subsection, or of the splitting mode.
Even if the section is renamed, "emacs#Keys" would work as long as
an @anchor with the old name is added.

Implementing this would be trivial in info.js - it already does this
for file: URLs.  That is assuming that info.js can map the fragment
name to the correct file.  Currently, this is done by looking
in the "contents" element in the initial index.html.  To handle
other anchors (not in the contents) I propose texti2any can add them
to the contents as invisible element.  Perhaps:
  

Note this would also help with the issue discussed in the
"control over sectioning and splitting" thread, since info.js
could optionally put subheading (for example) in the side-bar.

An optional-but-nice refinement: When scrolling in a page, update the location 
bar
*and* the side-bar.

Note this doesn't have to be something we force on users: We can still generate
the extra stub files, and we can make it a preference in info.js whether
to prefer the emacs#Keys style in the location bar.

A related change is that we should change how info node/anchor names are
mapped into URLs to use standard %-encoding.  This would make URL cleaner
and more readable.  For continuity, we could map space to '-', '-' to '%45',
and other characters as in standard percent-encoding.  An initial '-' could
be used for texinfo-internal ids such as the "-toc-foo" or "-def-printf".
(I've suggested this before, I'm pretty sure, but I think it makes sense to
co-ordinate these changes, at least for a specific manual.)
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: The HTML-Info initiative

2021-12-27 Thread Per Bothner




On 12/27/21 12:39, Gavin Smith wrote:

On Mon, Dec 27, 2021 at 09:07:21AM -0800, Per Bothner wrote:

It seems unaware of and not making use
of info.js.  Integration with info.js would be nice - at the very least disable
the latter's sidebar and other duplicated functionality.


Another detail of implementation.  For what it's worth I tried this approach
initially with QtWebEngine and embedding info.js but the special cases that
needed to be added to the code to support this weren't worth it, in my
opinion.  Again these decisions would have to be made by whoever is doing
the work to make the program work.  Programs with embedded web browsers have
an awkward structure to work with as you are no doubt aware from your work
with DomTerm.


Of course if someone wants to work on improving and maintaining webkitgtk-info 
that
is their choice.  But from the point of view of the texinfo project having
limited resources, I think it makes more sense for webkitgtk-info to be just a
thin wrapper running info.js in a webkitgtk window.  We want the web interface
to be pleasant to use, and I think we agree info.js both looks nice and works 
about
as well as stand-alone info when browsing/navigating a single manual.  Where a
local reader can be helpful (beyond not depending on the internet) is in things
like knowing where in the filesystem to look for manuals, cross-manual browsing,
and search. And of course quickly bringing up a window with an embedded browser.

But whether to write webkitgtk-info as a "pure C" application or as a hybrid
"C/JavaScript" application using info.js is up to whoever does the work, of 
course.

Another possibility:

While DomTerm supports multiple desktop browsers as well as embedded browsers
(including Electron on Qt), lately I've been experiementing with using the
Wry framework: https://github.com/tauri-apps/wry which is part of the larger 
Tauri project.
Wry is cross-platform, and is officially supported on GNU/Linux, Windows, and 
MacOS.
On GNU/Linux it makes use of WebKitGtk.  It seems well-thought-out, with 
responsive
maintainers, and a number of nice features,
Wry is a Rust "crate", which has both advantages and disadvantages.
(For me, having to learn Rust is both an advantage and a disadvantage!)

I have "dt-wry" working as one of the DomTerm front-ends, but it also works 
great
for running info.js.

The older webview project (https://github.com/webview/webview) is similar to Wry
(in terms of being an embeddable wrapper over a "system" browser component).
It works for C/C++ - but alas it seems to have becomes dormant (again).
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: The HTML-Info initiative

2021-12-27 Thread Per Bothner

On 12/27/21 01:02, Gavin Smith wrote:

I've started it in the TODO.HTML file in the git repo.  It can be read at

https://git.savannah.gnu.org/cgit/texinfo.git/plain/TODO.HTML


https://git.savannah.gnu.org/cgit/texinfo.git/plain/TODO.HTML

"However, this system is only appropriate for online manuals, not for
locally installed manuals."

I don't agree: It works prefectly fine for locally installed manuals,
either using file: URLs or http://localhost/.

However, there may be some issues relating to:

"It only handles one manual at a time and does not handle searching for a 
manual."

Cross-manual links could be handled pretty by standarding relative URLs.
I.e. if the emacs/texinfo manual is 
/usr/local/share/doc/{emacs,texinfo}/index.html
and you can get from one to the other using "../{texinfo,emacs}/index.html".
Handling a search path is trickier, but can be done using using a small 
web-server.
(I use libwebsockets for DomTerm.)  However, the webkitgtk-info approach may be 
better
(though less portable).

I tried building the webkitgtk-info branch, but failed:

In file included from hard-locale.c:23:
./locale.h:718:11: fatal error: setlocale_null.h: No such file or directory
  718 | # include "setlocale_null.h"
  |   ^~
compilation terminated.
make[4]: *** [Makefile:1473: hard-locale.o] Error 1

Didn't investigate.  Looks promising, though.  It seems unaware of and not 
making use
of info.js.  Integration with info.js would be nice - at the very least disable
the latter's sidebar and other duplicated functionality.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: epub init file, and questions

2021-12-26 Thread Per Bothner




On 12/26/21 13:43, Gavin Smith wrote:

I just ran "emacs"
at the command line to see how long that took to start up, and I
counted 31 seconds for it to load up.


Wow.  /usr/bin/emacs (i.e. not emacs-server) takes about half a second.
(OTOH I have recent laptop (bought this March) with lots of memory.)


Fact is, if you mandate using Emacs to access Texinfo documentation, a
lot of people just wouldn't bother.  I think using Emacs as an Info
replacement is just a non-starter.


Fair enough.  It was one idea for a possible solution.

In principle there is nothing to prevent extending the info reader to
read a restricted subset of HTML, just like it currently reads info files.
Especially since we control the generated HTML, so we can tweak it to
make it easier to parse (perhaps make it XML-compatible).
Or use some 3rd-party HTML-parser or text-mode browser like the old Lynx.

We don't have to solve all the issues at once, but the long-term plan
really needs to deprecate and then drop info format.

In any case, the importance and usefulness of a standalone info reader that 
works
in a plain terminal is becoming less and less. (You can't have images,
you can't have math, etc etc. - though a few terminals do support images.)
Instead, an info-reading application that uses an embedded browser (such as 
webkitgtk)
seems to make more sense.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: id attributes for header elements

2021-12-26 Thread Per Bothner




On 12/26/21 12:04, Patrice Dumas wrote:

On Sun, Dec 26, 2021 at 11:24:59AM -0800, Per Bothner wrote:

I believe putting the link on the sectioning command is slightly better (more
semantically meaningful and easier to work with) than putting it before
- and it is never worse.


I agree, and when there is no header in the sense of the header with
directions, I have made a change to have the id on the sectionning h*
element based on your report.


Much better.  Thanks!

I'll try implementing a info.js option to add the subheading elements in the 
sidebar
(though not today).  I think that should make things a little nicer.
--
    --Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: epub init file, and questions

2021-12-26 Thread Per Bothner

On 12/25/21 22:48, Eli Zaretskii wrote:

Your comments are all on-point, but I'll just add a few notes.


Date: Sat, 25 Dec 2021 11:44:34 -0800
From: Per Bothner 



Note that using eww-mode running in a terminal displays texi2any-generated
html work pretty decently, so that could potentially replace the
standalone info program, once we have an html-supporting info mode.


Two issues with this:

   . the stand-alone Info reader is a much leaner program than Emacs


Who cares? On modern computers, Emacs is relatively lean, and almost
everyone who would use 'info' is likely to have Emacs installed.
Using an emacs-server of course helps.


   . the eww mode is _slow_, especially on GUI displays, because it
 performs layout calculations in Lisp.  eww mode was never designed
 to handle large bodies of text.


It seems fast enough on my laptop.  (Though I notice an annoying bug,
in that eww doesn't seem to reflow on window resize.)

I also tried xwidget-webkit-browse-url, which worked pretty well,
and of course handles info.js as well.  That is of course less portable,
though Windows and MacOS have similar embeddable browsers.

To toot my own horn: if your terminal is DomTerm (https://domterm.org),
you can type:

  $ domterm --tab browse file:///path/to/manual.html
or:
  $ domterm --above browse file:///path/to/manual.html

and new tab or pane is opened containing the manual.
--
    --Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: id attributes for header elements

2021-12-26 Thread Per Bothner




On 12/26/21 10:29, Patrice Dumas wrote:

On Sun, Dec 26, 2021 at 10:08:50AM -0800, Per Bothner wrote:

I don't remember, and I don't see how it could be better for navigation..


It was some time ago, at least some people expected the link to point to
the start of the header, and not to the heading command.


I don't know of a browser which would show a user-visible difference
between navigating to an element vs navigating to an empty element just
before the element.  (And if there were a difference, perhaps some
high-lighting the navigated-to element, I think navigating to a heading
command would be better than nagivating to the empty element.)

I believe putting the link on the sectioning command is slightly better (more
semantically meaningful and easier to work with) than putting it before
- and it is never worse.


In my view, the @anchor{Electron} is not associated to the @subheading (except
for being before).  The @subheading has its own id as an heading not
associated to a Texinfo element.


My point is that at least the id for the @subheading should be on the generated 
heading.

I.e.

@subheading Element

should generate:

Electron

It feel wrong to generate a separate empty  in this case.
It may be reasonable to do so for @anchor - but not for a @subheading.


In a case like

@node My node
@section Section

note that there is no id output especially for the sectionning command,
it is the id of the "element" (also called a tree unit) that encompasses
a unit of Texinfo.  In general it corresponds to a @node + sectioning
@-command + associated content.


When there is a "tree unit" for a sectioning command, it makes sense
to put the id on the corresponding .  But when we just have a @subheading,
there is no .  However, there is an  element that corresponds directly 
to
the @subheading.  In that case it makes sense to put the id on the h4.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: use for hancors without content, not

2021-12-26 Thread Per Bothner




On 12/26/21 10:12, Patrice Dumas wrote:

But there is nowhere where id in a lone element is proposed, so this
argument is not very compelling not to use  for that purpose.  
was used for that semantically before, while  is explicitely
described as being relevant in relation to its content.  For those
reasons, it still seems to me that  is better than  and
actually the best choice among elements.


Using  this way makes me a little uncomfortable, but
I don't have a strong objection.
--
    --Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: id attributes for header elements

2021-12-26 Thread Per Bothner




On 12/26/21 09:30, Patrice Dumas wrote:

On Sat, Dec 25, 2021 at 02:10:29PM -0800, Per Bothner wrote:

Two requests:
(1) Don't generates the implicit id="Electron-1" when it immediately follows
an explicit id="Electron".


I do not think that it is a good idea, the @anchor anchor and id should
stay different from the @node, or section @-command id and anchor.  It
is better to keep it, but it would probably be even better to have a
class to make it possible to select @anchor generated anchors.


Maybe - is there a use-case for this?


(2) Add the id attribute to the  element rather than generate an empty 
 node.


I did something along those lines, but not when there is a header, in
order to keep the anchor before the header, as it was agreed, long ago
that having the anchor before the header was better for navigation.


I don't remember, and I don't see how it could be better for navigation..
I don't see any particular use for being able to distinguish referenes
generated by @anchor from others.

However, since @anchor can be anywhere (including the middle of paragraphs),
there is an argument from simplicity and consistency for just using an empty 
.
Though note that best practices for (hard-written) HTML would be to place
the id to a semantically meaning element.

Regardless, the implicit id that is generated for a heading command belongs
on the header element.  I.e. for

@anchor{Electron}
@subheading Electron

we should at least do:

Electron

Since id="Electron-1" is directly generated by the @subheading, it makes sense
to put the id attribute be on the generated h4 element.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: use for hancors without content, not

2021-12-26 Thread Per Bothner




On 12/26/21 09:33, Patrice Dumas wrote:

Hello,

Currently  is used for anchors, whether they
originate from @anchor, in some cases from @node and sectionning
@-commands, @*index, ...  It is syntactically correct, but I think that
it would be semantically better to use  and keep  for inline
text that needs some kind of formatting information.

Also I propose to add class to the anchors to distinguish their source.

Any opposition/idea?


I don't think using  for anchors is a good idea.  The WHATHG spec says:

   "If the a element has no href attribute, then the element represents a 
placeholder
   for where a link might otherwise have been placed, if it had been relevant, 
consisting of just the element's contents."

https://developer.mozilla.org/en-US/docs/Web/HTML/Element/a does not suggest or 
give
any examples of using  without an href attribute.

--
    --Per Bothner
p...@bothner.com   http://per.bothner.com/



id attributes for header elements

2021-12-25 Thread Per Bothner

(This is related to the "control over sectioning and splitting: thread,
as it may make it easier to provide a clean solution.)

Currently, a header element generates an empty Electron

while:

@anchor{Electron}
@subheading Electron

generates:

Electron

Two requests:
(1) Don't generates the implicit id="Electron-1" when it immediately follows
an explicit id="Electron".
(2) Add the id attribute to the  element rather than generate an empty 
 node.

Thus I'd like to see, for both inputs above:

Electron

I think this is both cleaner and it would make for easier post-processing,
such as by JavaScript or XSLT.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: epub init file, and questions

2021-12-25 Thread Per Bothner

On 12/25/21 11:01, Patrice Dumas wrote:

For cross manual references, there is no reason why it would work, I did
not fully test, but there does not seem to be a way to refer to another
epub book, for instance installed alongside, as far as I can tell.
calibre creates a link to an internal sandox for for external manuals,
but I do not know how it is supposed to work.

For list of tables, indices, it looks basically ok as web pages.

However, if cross manual links do not work, indices and list of floats
are not well integrated, it cannot really be a replacement for Info for
instance.


I don't think we can expect a non-info-aware epub reader to handle a
collection of makeinfo-generated e-books with the same functionality as info.
A more realistic goal is that a generic modern web-browser with info.js enabled
should support cross-browser links and other info functionality at least
as well as traditional info.

My proposed replacement for Emacs info mode would look for html files
and render them using eww mode.  Eww doesn't handle JavaScript, so it
couldn't use info.js, but the logic of info mode could be enhanced
to work with html files instead of (or rather in addition to) info files.
(Conceptually, it's a matter of changing eww's keybindings
to work like info mode.)

Note that using eww-mode running in a terminal displays texi2any-generated
html work pretty decently, so that could potentially replace the
standalone info program, once we have an html-supporting info mode.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: epub init file, and questions

2021-12-25 Thread Per Bothner




On 12/25/21 09:06, Patrice Dumas wrote:

Hello,

In the source there is now an init file epub3.pm to generate an epub
container, targetting epub 3.2.  In tp/init/epub3.pm.


Haven't tried it yet - but thanks!


Some questions for those knowledgable in epub:


I don't have a lot of experience or knowledge of epub readers,
so take this response as very tentative.


* is it better to have split or non split files for epub?


Is there any reason not to treat this like html output?
I.e. honor the --split and --no-split options in the same way?


* I did not output any header, so no mini_toc nor node
   directions.  Also no horizontal rules.  Is it how it should be?


Those seems reasonable.  An e-reader can supply extra mini-tocs or
lines, or adding using stylesheet (or JavaScript).  Of course that also applies
to web documentation, too, so perhaps such things should be optional
for html, too.


* should the navigation nav category TOC be used as a table of contents?
   I do not think so since it uses  which looks strange.


That can be fixed with styling.  I'm not very concerned about how things
look without css, as long as it's not too horrible.


* how are indices handled in epub?
* how are cross manual references handled in epub?
* how are list of tables/floats handled in epub?


Until we find a reason not to, just do like you would for web pages.

I tend to think of the js-info interface as a simple but functional
e-reader.  It seems useful and reasonable to include info.js in
an epub document.  You should be able to unzip an epub (or use a web server
that does so on-the-fly); if you then browse to the top page, you
should get an experience more-or-less the same as browsing plain html,
including using info.js if that was included.

--
    --Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: info.js bugs

2021-12-24 Thread Per Bothner

I checked in fixes for these problems. I also uploaded an updated 
https://domterm.org

On 12/21/21 12:28, Gavin Smith wrote:

Unrelated to your original message, but looking at those pages, I
noticed a few problems with info.js:

* Typing "i" brings up the index search box, but there is no index
in the document.  Showing a message "No indices" would be better
than a text entry box.


It now shows "No index in this document" by analogy with
'm' showing "No menu in this node".


* The browser tab is named starting "Top (DomTerm..." regardless
of which page is being shown.


Checked in code to update the window/tab title.


* When pressing "s" for search, it's very easy for the search box
to contain the letter "s", even if the default is something else.
To trigger: hold down the "s" key, then press Escape.  Then press
"s" again.  There is an "s" stuck in the search box.  (I couldn't
break this reliably.)


There was a related problem: Typing 'm' or 'i' would often initialize the
minibuffer with those letters.  The problem was likely that key
events would get passed to the minibuffer element after it got focus.
Seems to be fixed by adding a call to preventDefault on the keydown event.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: XHTML validation (was: texi to epub)

2021-12-20 Thread Per Bothner

Patrice Dumas wrote:

However it looks like HTML5 cannot be validated...


In the sense of validating using an old-style DTD validator.
However, that has limited usefulness regardless.
HTML5 is defined in terms of a very specific parsing algorithm, meant
to avoid the incompatible extensions and work-around of the old HTML4 browsers.

The W3C Markup Validator (https://github.com/w3c/markup-validator/) makes use of
and suggests the NU Validator (https://github.com/validator/validator/) for 
HTML5.

On 12/20/21 05:40, Kurt Hornik wrote:

You mean with with the W3C validator?  This indeed seems to be the case,
perhaps a result of the W3C and WHATWG conflict re HTML5 ...


There may be a little bit of a turf tension (I actually don't know),
but W3C does publish an HTML5 spec (https://dev.w3.org/html5/spec-LC/),
and it is kept in sync with the WHATWG specification:

The latest stable version of the editor's draft of this specification is 
always available on the W3C CVS server and in the WHATWG Subversion repository. 
The latest editor's working copy (which may contain unfinished text in the 
process of being prepared) contains the latest draft text of this specification 
(amongst others). For more details, please see the WHATWG FAQ.

Work on this specification is also done at the WHATWG. The W3C HTML working 
group actively pursues convergence with the WHATWG, as required by the W3C HTML 
working group charter.
--
    --Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: texi to epub

2021-12-19 Thread Per Bothner

On 12/19/21 16:02, Patrice Dumas wrote:

It was much easier in the HTML converter.  It is actually a bit
cumbersome, but it works.  To use XML compatible syntax, it is possible
to use -c 'USE_XML_SYNTAX 1'.  It is also possible to use only numeric
entities, with -c 'USE_NUMERIC_ENTITY 1', to add xmlns to the 
element with -c 'HTML_ROOT_ELEMENT_ATTRIBUTES 
xmlns="http://www.w3.org/1999/xhtml";'.


From looking at the code (without any testing), it appears that all
calls to close_html_lone_element_if_needed pass a string that ends with ">".

A simplification might be to leave off the ">" in the argument, and instead
have close_html_lone_element_if_needed append either ">" or "/>" depending on
USE_XML_SYNTAX.

I would guess this might be slightly slower in the non-USE_XML_SYNTAX case
(an extra concatenation with ">") but faster in the USE_XML_SYNTAX case
(avoid the regex substitution), but neither likely to be measurable since I'm 
also
guessing it's not called that frequently.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: texi to epub

2021-12-18 Thread Per Bothner




On 12/18/21 14:20, Patrice Dumas wrote:

I tried that with the https://validator.w3.org/, also using .xhtml as
extension as it seems that it is what makes the w3c validator consider
that it is XML, with and without , and I get:
 Schema Error: XML document with no namespace; cannot determine any schema 
to use for validation.

...

Also, what could I use for offline validation?  The w3c-markup-validator
package does not seem to handle HTML5, and the vnu 
https://validator.github.io/validator/
validator does not seem to be packages in debian.


I don't know if people even use old-style XML/HTML validators these days,
or if there are good free validators that match modern html.  Anything
that uses a DTD would seem to have limited usefulness.

The W3C Markup Validator (https://github.com/w3c/markup-validator/)
suggests the NU Validator (https://github.com/validator/validator/) for HTML5.
Maybe try the online version first: https://validator.w3.org/nu/
If it seems helpful you can download and build a local version.
--
    --Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: texi to epub

2021-12-18 Thread Per Bothner




On 12/18/21 14:33, Patrice Dumas wrote:

On Wed, Dec 15, 2021 at 05:21:26PM -0800, Per Bothner wrote:


Emacs nxml mode should be sufficient to check well-formedness.


Is there documentation on how use nxml mode to check well-formedness
from the command line?  I cannot find any information on that on the
web.  All the information I found is about using nxml mode from within
emacs.


If you want command-line/batch testing of well-formedness then nxml-mode
is probably not your best bet.

One possibility is to use xltproc with a null or identity stylesheet.

Well-formedness of normal XML documents (without external entities
or other complications that nobody uses) is pretty trivial.  The main
issue is making sure that special characters are escaped properly.  And
of course all elements have to be properly nested and terminated.
This requires no semantic knowledge.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: texi to epub

2021-12-18 Thread Per Bothner




On 12/18/21 13:01, Patrice Dumas wrote:

It seems that it does not change the work required compared to XHTML 1.1
which is to have a correct XML document.  Something I cannot find,
however, is what to put at the beginning to be able to check validity
of the resulting document if it is HTML5 XML.  Can you please tell me?


As far as I can tell, you don't need to put anything at the start:
https://html.spec.whatwg.org/multipage/xhtml.html#the-xhtml-syntax
A validator should allow you to specify what you want to validate against.

This should be OK:

  
  

For "polyglot" output, leave off the  declaration.


It seems to me that XML_OUTPUT_MODE "polyglot" would only need changing
some customization variable for the header, so I am not so convinced
that it is really interesting.


A major difference between XML and HTML is the handling of empty elements.

The following are valid HTML:




The following are valid XML:





Note that the following are *invalid* HTML:

  -- error if not followed by a closing  tag
while this is invalid XML:



The following ("polyglot") output is suggested is they are valid both as HTML 
and XML:



You have to know which elements have empty content (hr) and which don't (a).
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: Replace HTML4 doctype declaration

2021-12-18 Thread Per Bothner




On 12/18/21 07:46, Gavin Smith wrote:

We output HTML4 to get some flexibility in the output, but I am not sure
how useful the HTML4 doctype declaration is any more and perhaps we should
switch to the simpler HTML5 "" header.  It looks like we are
trying to conform to a standard that nobody cares about anymore.

Does anybody object if I go and change texi2any to output this instead of
the HTML4 Transitional doctype?


I think it would make sense.
--
    --Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: Directory names hard-coded in Texinfo Perl scripts

2021-12-18 Thread Per Bothner




On 12/18/21 06:59, Gavin Smith wrote:

On Sat, Dec 18, 2021 at 2:54 PM Eli Zaretskii  wrote:

Is this part supposed to work also when Texinfo is installed in a
directory different from the prefix with which it was configured?


No, it is only for running from the source/build directory, not for
the installed program.


It is highly desirable to be more general than that.
 
Note that automake-generated Makefiles allow overriding

both prefix and DESTDIR at install time.  I believe DESTDIR is
creating install images, so handling DESTDIR correctly is a
facctor in writing correct 'make install' rules, not runtime.

However, overriding prefix at install time is also useful.
I use it to create an AppImage of DomTerm: An AppImage is
basically an executable single-file partial file system:
It basically makes an archive wrapping the $prefix tree.

When you run the AppImage, the $prefix tree is mounted as a
temporary user file system.  With a path that may differ each
time you run the AppImage.  For example, running DomTerm.AppImage
the file domterm.jar appears to the domterm execuatble as:
/tmp/.mount_DomTerfjMAQ5/usr/bin/../share/domterm/domterm.jar
I.e. the effective $prefix is /tmp/.mount_DomTerfjMAQ5/usr/ .
--
    --Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: texi to epub

2021-12-18 Thread Per Bothner

On 12/18/21 07:38, Patrice Dumas wrote:

The idea is that it would be set if doing XHTML 1.1, be it for epub or
as an output, as it is a prerequisite for epub if I understood right.

For now, my plan is to do XHTML1.1 as a separate init file.


Why? XHTML 1.1 is an obsolete format.  It is not required for EPUB 3.x,
which is a 10-year old specification.

EPUB does requires content document to be XML:
http://idpf.org/epub/30/spec/epub30-contentdocs.html#sec-xhtml

However, my reading of the spec finds no indications that custom data 
attributes are
disallowed.  It notes that data attributes must be stripped before being fed
to a validator - which implies that they are allowed, but the validator does 
not handle them:
http://idpf.org/epub/30/spec/epub30-contentdocs.html#app-xhtml-schema

So my recommendation is: Just leave the data attributes in, even for XHTML/EPUB.

I think my previous poposal is still reasonable:
https://lists.gnu.org/archive/html/bug-texinfo/2021-09/msg0.html
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: Directory names hard-coded in Texinfo Perl scripts

2021-12-17 Thread Per Bothner




On 12/17/21 04:57, Eli Zaretskii wrote:

I mean relative to the directory of the makeinfo script, which is
installed in ${prefix}/bin (at least by default).


Right, but the problem is how does the script find out the directory
it is running in?  It is non-trivial problem, which is why the Perl FindBin
module and the C whoami package exist.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: Directory names hard-coded in Texinfo Perl scripts

2021-12-17 Thread Per Bothner

On 12/16/21 23:29, Eli Zaretskii wrote:

Personally, I find the relative-path method much easier, but I will go
with anything Gavin considers as the preferable solution.


I'm not sure what you mean by "the relative-path" method and how it differs
from my example code.  Reading the file "../share/texinfo/foo.info" doesn't
work, since that is relative to $PWD *unless* there is logic to change
the effective $PWD.  And how do figure out how what to change the $PWD to?
You do something like my example code - or use a library that does
something similar.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: texi to epub

2021-12-16 Thread Per Bothner

On 12/16/21 16:43, Patrice Dumas wrote:

Even if it is valid, I think that it is a good thing to be able to
produce HTML documents without custom attributes.  That being said, if
it seems obvious that such caution is not a needed feature, it is
possible to revert commit cbce0c098353451c0a35d740ba503ba124621272.


Well, every option or customization variable has a slight cognitive cost:
People have to read and understand the documentation and decide what
settings to use.  Setting NO_CUSTOM_HTML_ATTRIBUTE has a minor benefit
in terms of slightly leaner and cleaner HTML but is that enough?
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: Directory names hard-coded in Texinfo Perl scripts

2021-12-16 Thread Per Bothner

For DomTerm (https://domterm.org) whose main executable
is C/C++ I use whereami: https://github.com/gpakosz/whereami
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: texi to epub

2021-12-16 Thread Per Bothner

On 12/16/21 15:43, Jacob Bachmeyer wrote:

It is valid HTML5, but epub requires XHTML 1.1 and it is *not* valid there.


That does not appear to be correct, at least when talking about EPUB3:
http://idpf.org/epub/30/spec/epub30-contentdocs.html#sec-xhtml

XHTML 1.1 is officially a Superseded Recommendation: "A newer specification 
exists
that is recommended for new adoption in place of this specification."
https://www.w3.org/TR/xhtml11/


Most epub readers will *probably* silently ignore such invalid attributes, but 
I would not be surprised if there is an embedded reader out there that crashes 
in this case.


There may be all kinds of buggy or obsolete readers.  I would not
worry about them, especially with something so hypothetical.
--
    --Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: Directory names hard-coded in Texinfo Perl scripts

2021-12-16 Thread Per Bothner

On 12/16/21 15:52, Jacob Bachmeyer wrote:

If they can have a location fixed relative to the script, the Perl core module 
FindBin and pragmatic module lib can help here:


Or if using bash or similar you take do something like this,
which is a simplified version of the Kawa start-up script:

#!/bin/bash
thisfile=`command -v $0`
case "$thisfile" in
  "") echo "installation error - can't find path to $0"; exit -1 ;;
  /*) ;;
  *) thisfile="$PWD/$thisfile"  ;;
esac
while test -L "$thisfile"; do thisfile=$(readlink -f "$thisfile"); done
kawadir=`dirname "$thisfile" | sed -e 's|/bin\(/\.\)*$||'`
CLASSPATH="$kawadir/lib/kawa.jar:${CLASSPATH}"
export CLASSPATH
exec ${JAVA-java} -Dkawa.home="${kawadir}" kawa.repl "$@"

The idea is you can have:

ANYDIR/bin/kawa
ANYDIR/lib/kawa.jar
ANYDIR/share/kawa/...

Then if you execute ANYDIR/bin/kawa directly, or via a symlink,
or if ANYDIR/bin (or a symlink) is on your PATH the script will set
$kawadir to ANYDIR. Given that it can find ANYDIR/lib/kawa.jar,
ANYDIR/share/kawa or other needed files.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: texi to epub

2021-12-16 Thread Per Bothner




On 12/16/21 10:08, Patrice Dumas wrote:

On Thu, Dec 16, 2021 at 06:33:32PM +0100, Kurt Hornik wrote:


There are two major sources of warnings in my case (using texinfo 6.8):

* The data-manual attribute in the hyperlinks


I remember not being in favor of an attribute that is not defined in
HTML, even if it is acceptable in HTML5.  But if it has been added, I am
pretty sure that it is usefull for somebody (maybe Per javascript?).


I know nothing about it - appears to have been added by Gavin:

2020-11-25  Gavin Smith  

data-manual attribute

* tp/Texinfo/Convert/HTML.pm (_convert_xref_commands):
Set data-manual attribute instead of class="texi-manual"
on links to other Texinfo manuals.


I propose to add a customization variable like 'NO_CUSTOM_ATTRIBUTE' to
set to 1 to avoid data-manual and use it too for future custom
attributes.


Why bother? What problem does it solve?  It avoids some warnings from
an overly-picky validator. I don't see that as a strong reason to change 
anything.


* Tables which do their header and footer inside thead and tfoot, but
   not the content inside tbody (which seems valid in XHTML 1.0 but not
   in 1.1).


I think that it would make sens to always use tbody.


Again, it's just a warning - but if it's easy to fix, we might as well.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: texi to epub

2021-12-15 Thread Per Bothner




On 12/15/21 12:43, Patrice Dumas wrote:

I could have a try, but before I would like to have an XHTML
command-line offline validator, is there something like that existing?


The critical first step is generating "well-formed XML".  I.e.
basic lexical/syntactic correctness, ignoring semantic constraints 
("validation").
Most critical is no elements without closing tags.  For example 
must be either  (valid XML, not valid HTML) or  (valid either).

Emiotting  has the advantage that it is correct for both XML and (modern) 
HTML.

Once this is taken care of, we're a big step there.

Emacs nxml mode should be sufficient to check well-formedness.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: texi to epub

2021-12-15 Thread Per Bothner

Some previous discussion:
https://lists.gnu.org/archive/html/bug-texinfo/2021-01/msg8.html
https://lists.gnu.org/archive/html/bug-texinfo/2021-08/msg00031.html
https://lists.gnu.org/archive/html/bug-texinfo/2021-09/msg0.html
and some other messages in those threads.

I suggested generating "polyglot" output (valid both as html and xml)
as an option (and perhaps in the future the default) for --html.
--
    --Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: control over sectioning and splitting

2021-10-10 Thread Per Bothner




On 10/10/21 07:10, Patrice Dumas wrote:

I am probably missions something, but isn't what you want obtained with
--split=chapter and using sectioning commands like @section,
@subsection?


The issue is that there are some topics (chapters) that are large enough
to want to split into multiple pages for various reasons: big pages are
more overwhelming; the chapter topic naturally devides into several
semi-independent sub-topics; there may be external links one would
prefer not to break.

So one could promote each sub-topic to its own chapter, but that loses the
organization and clutters up the sidebar.  At least for something
intended as a "landing" page (home page) it is highly undesible to have
too many "starting topics" in the initial sidebar, since an over-busy home page
may drive people away.

The fundamental problem is that texinfo assumes page-splitting at a
particular level on the hierarchy, but that is too inflexible: It makes
for a non-optimal browsing expeience.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



control over sectioning and splitting

2021-10-09 Thread Per Bothner

Compare
https://domterm.org/Wire-byte-protocol.html
with
https://domterm.org/Frontends.html

The former is a section, it is divided into subsections,
which appear in the sidebar (when the page is selected) and in the Contents.

The latter is a chapter.  It is divided into pseudo-subsections,
using @subheading commands, none of which appear in the sidebar or Contents.

I'd like to have the subheadings appar in the sidebar and the Contents,
but I haven't figured out a good way to do that.

Is there a way to divide a "chapter" into "sub-chapters" such that they appear 
on the
same web-page, but show up in the sidebar and the Contents?

It seems possible for info.js to add extra entries in the side-bar by scanning
the page looking for .  However, that seems a bit kludgy
and does not add the "sub-chapter" to the ToC.

One idea is to allow the children of a @chapter to be @subsections, skipping the
@section elements - but texi2any doesn't allow that.

Another idea is some kind of @samepage annotation to could be added to a 
@section,
to prevent page-spliting.  (This might be also useful for printed manuals.)

Another idea is to use @part: Everything that should be a separate page should 
be its
own @chapter, but we use @part to group together characters that should not show
in the sidebar until the @part if expanded.  (It is desirable to avoid putting
too much in the sidebar, to make it less overwhelming.)

Another idea is to allow special handling for "single-section chapters".
In the source you could write @chapter immediately followed by @section, which 
the
same name and no separate @node command:

@node Frontends
@chapter
@section Frontends including browsers

This would be logically equivalent to:

@node Frontends
@chapter Frontends including browsers
@node Frontends-section-
@section Frontends including browsers

However, in the output (assuming --split=section) the chapter and ection
would be merged into a single page, and similar merging in the sidebar and ToC.

Ideas?  Hacking info.js is something I could do, but it doesn't help for
traditional info and it doesn't solve the missing entries in the ToC.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: @tieaccent{..} does not display the tie accent in HTML

2021-09-01 Thread Per Bothner

I'm thinking a customization variable XML_OUTPUT_MODE

-c XML_OUTPUT_MODE "html"
  [current default]
  Generate HTML files, with .html file names.
  Follow HTML specification and recommendations.

-c XML_OUTPUT_MODE "xhtml"
  Generate XHTML files (i.e. XML), with .xhtml file names.
  Follow XHTML specification.

-c XML_OUTPUT_MODE "polyglot" [or many "compatible"]
  [maybe future default?]
  generate HTML files with .html extension, and no  declaration,
  but in a way that would be XML-compatible - i.e. following "polyglot" markup.

In both "xhtml" and "polyglot" modes we do:
(1) Don't uses named entities except the builtin XML ones.
(2) Close all tags.  Where HTML prohibits separate closing tags,
use the XML shorthand, e.g.  .  This works everywhere I've tried it.

--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: @tieaccent{..} does not display the tie accent in HTML

2021-08-31 Thread Per Bothner




On 8/30/21 3:44 PM, Gavin Smith wrote:

On Mon, Aug 30, 2021 at 10:00:51AM -0700, Per Bothner wrote:

What I'm looking for is:
(1) Be able to post-process html output with xml tools, such as xslt.
(2) Generate valid epub3 ebooks.


These seem like valid goals so would be happy to see patches that produced
XML output, likely as an option.


What I would like is an option (and maybe even the default) for output that
simultaneously valid both as HTML as XML.  AT least "valid as HTML"
in the sense that it will parse by the HTML5 parsing specification
and any modern browser.

This is called "polyglot markup" by the way:
https://www.w3.org/TR/html-polyglot/ (no longer standards-track but still 
useful)
https://en.wikibooks.org/wiki/Polyglot_markup,_how_to

The key issues are just these two:
(1) Don't uses named entities except the builtin XML ones.
(2) Close all tags.  Where HTML prohibits separate closing tags,
use the XML shorthand, e.g.  .  This works everywhere I've tried it.

You also have to be careful about invalid characters in inline 

Re: @tieaccent{..} does not display the tie accent in HTML

2021-08-30 Thread Per Bothner

On 8/27/21 1:16 PM, Patrice Dumas wrote:

I think that the HTML produced with named entities is much more legible,
and probably more portable.


It may be more legible, but I'm pretty sure it's not more portable.
In fact, it is probably less portable, unless you restrict yourself
to a very minimal set (maybe those from HTML 3.2?), since there may
be version issues - and then you have to be careful which ones
you use and how portable they are.  In which case, what is the point?

The named entities are more mnemonic of course, but looking up the
meaning may well be easier with a hex code than with a name.

And that is just when it comes to strict HTML, not XML/XHTML.


However, having a customization variable to
output only numerical entities would be ok to me, maybe something like

USE_ONLY_NUMERICAL_ENTITY or NO_NAMED_ENTITY to avoid confusion with
USE_NUMERICAL_ENTITY.


I think more valuable would an "XML_COMPATIBLE" variable.
In addition to numeric entities, it would guarantee to close all tags.
E.g. instead of  it would emit  - which also works with
most (all?) HTML parsers.  And possibly other issues.

What I'm looking for is:
(1) Be able to post-process html output with xml tools, such as xslt.
(2) Generate valid epub3 ebooks.

One might want more fine-grained control: Should  declaration
be emitted?  What doctype to emit?  What file extension to emit?
However, that level of control is less important as long as the
above 2 goals are met.
 

However, when it comes to decimal or hex numerical entities I think
hex is preferable, as that is much more common for Unicode values.
I.e. © rather than © for ©.


I have no precise idea on that, but the change should only be done if
needed.


It's not "needed" - it's just that hex values are used almost universally
for Unicode, and decimal values are rarely used.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: @tieaccent{..} does not display the tie accent in HTML

2021-08-27 Thread Per Bothner




On 8/27/21 12:35 PM, Patrice Dumas wrote:

Just generate 'oo͡o' and be done with it.
(I would prefer using the hex value - one reason is it's easier to search for 
its meaning.)


For now we use numerical entities everywhere.  If this changes, it
should be everywhere too.


Using numerical entities is my recommendation - except for the
standard XML ones: < &qt; " & '

Except for the above list, I'd avoid named entities, as they are not builtin 
with XML,
and medium-term I'd like our HTML output to be XML-compatible

However, when it comes to decimal or hex numerical entities I think
hex is preferable, as that is much more common for Unicode values.
I.e. © rather than © for ©.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: @tieaccent{..} does not display the tie accent in HTML

2021-08-27 Thread Per Bothner




On 8/27/21 10:21 AM, Patrice Dumas wrote:

I don't really have a good answer, of course.  I'd think to try U+0361 or its 
HTML equivalent, as my test here
   test o͡o test  o͡o
but I honestly have no idea which browsers show what, and which fonts support 
the character ...


Actually, this looks quite good on firefox, both with the actual utf8
encoded diacritic and entity.  You can obtain the utf8 encoded diacritic
when converting to html with --enable-encoding.  I guess that we can
generate numerical diacritic entities in the default case, it'll
probably be better than the plain text accent markers.


It works fine also on Google Chrome, and on Epiphany (based on the 
Safari/WebKit engine).

Just generate 'oo͡o' and be done with it.
(I would prefer using the hex value - one reason is it's easier to search for 
its meaning.)
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



scrolling js-info help page

2021-06-22 Thread Per Bothner

In the js-info reader, typing '?' brings up a page about "Keyboard Shortcuts".
Unfortunately, if the window height is too small, it is cut off and
there is no way to scroll it.

It was difficult to find a fix that didn't have glitches, but this seems to
work (tested in Chrome and Firefox):

2021-06-22  Per Bothner  

* js/info.css: Styling tweaks for js-info to make help scrollable.


diff --git a/js/info.css b/js/info.css
index d8d20e1723..14537b67ad 100644
--- a/js/info.css
+++ b/js/info.css
@@ -166,13 +166,12 @@ table#keyboard-shortcuts th {
 display: none;
 position: fixed;
 z-index: 1;
-padding-top: 100px;
+padding-top: 40px;
 left: 25%;
 top: 0;
+bottom: 0px;
 width: 75%;
-height: 100%;
-overflow: auto;
-background-color: rgb(0,0,0); /* Fallback color */
+background-color: #888; /* Fallback color */
 background-color: rgba(0,0,0,0.5); /* Black w/ opacity */
 }
 
@@ -182,6 +181,8 @@ table#keyboard-shortcuts th {

 margin: auto;
 padding: 20px;
 width: 80%;
+max-height: 100%;
+overflow-y: auto;
 }
 
 /*---.


Not urgent, but probably worth checking in before the release.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Fix js-info in iframe

2021-06-22 Thread Per Bothner

I tested out the js-info browser in a DomTerm browser window:
$ domterm --tab browser URL-TO-MANUAL
That failed, because domterm creates the requested browser window in an iframe 
(interior frame).
Luckily the fix is easy:

diff --git a/js/info.js b/js/info.js
index 4198a9ec5c..e4c53700d2 100644
--- a/js/info.js
+++ b/js/info.js
@@ -99,7 +99,7 @@
   Remote_store ()
   {
 /* The browsing context containing the real store.  */
-this.delegate = top;
+this.delegate = window.parent;
   }
 
   /** Dispatch ACTION to the delegate browing context.  This method must be


I realize we're close to a release, but this seems fairly safe - plus it's in
an "experimental" feature. Ok to check in?

--
    --Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: texinfo-6.7.91 pretest

2021-06-19 Thread Per Bothner

Another thing worth mentioning in the "Full news" section:

* texi2any
  - changes to HTML output
- HTML is better structured and more modern, using  elements
  for nested sections, and 'id' attributes for references
--
    --Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: texinfo-6.7.91 pretest

2021-06-18 Thread Per Bothner

On 6/18/21 8:32 AM, Gavin Smith wrote:

If we were going to mess about with this code it would be better
to rewrite it to use neither strncat not strcat.


Indeed - the code is sufficiently complex that I could not be sure there is no
buffer overflow, for example.

Are we guaranteed that the 'description' is at least two bytes shorter
(to make room for final newline plus final null) than strlen(entry) ?
Probably, but there is no comment explaining why, and you'd have to look
caefully to make sure there no case that could risk overflow.
--
    --Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: texinfo-6.7.91 pretest

2021-06-18 Thread Per Bothner




On 6/18/21 6:07 AM, Eli Zaretskii wrote:

From: Gavin Smith 
Date: Tue, 15 Jun 2021 21:44:54 +0100

The next pretest for the next Texinfo release has been uploaded to

https://alpha.gnu.org/gnu/texinfo/texinfo-6.7.91.tar.xz

We hope to release this fairly soon barring the discovery of any major issues.


This pretest builds cleanly with MinGW on MS-Windows, and passes all
the tests (with some tests skipped, as expected).

The only compilation warning is this:

  gcc -DHAVE_CONFIG_H -I. -I..  -I..   -I../gnulib/lib
 -I../gnulib/lib 
-DLOCALEDIR=\"d:/usr/share/locale\"  -Id:/usr/include  -O2 -gdwarf-4 -g3 -MT 
install-info.o -MD -MP -MF .deps/install-info.Tpo -c -o install-info.o install-info.c
  install-info.c: In function 'split_entry.constprop':
  install-info.c:1633:11: warning: 'strncat' specified bound depends on the 
length of the source argument [-Wstringop-overflow=]
   1633 |   strncat (*description, ptr, length);
   |   ^~~
  install-info.c:1632:27: note: length computed here
   1632 |   size_t length = strlen (ptr);
   |   ^~~~


This may be a warning that it is pointless to use strncat in a case
that has exactly the same effect as strcat.

The following avoids the warning.

diff --git a/install-info/install-info.c b/install-info/install-info.c
index bd74ff0d08..5c0eeb4b3b 100644
--- a/install-info/install-info.c
+++ b/install-info/install-info.c
@@ -1629,9 +1629,8 @@ split_entry (const char *entry, char **name, size_t 
*name_len,
   else
 {
   /* Just show the rest when there's no newline. */
-  size_t length = strlen (ptr);
-  strncat (*description, ptr, length);
-  ptr += length;
+  strcat (*description, ptr);
+  ptr += strlen (ptr);
 }
 }
   /* Descriptions end in a new line. */

--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: Unadorned cross-references

2021-06-02 Thread Per Bothner




On 6/2/21 11:53 AM, Augusto Stoffel wrote:

However, generating HTML files that look okay on anything other than a
full-fledged browser is much harder than converting some documentation
source to info with Pandoc, Sphinx or the like.


I disagree.  I think the output in eww is quite decent and
better than info mode.  At least for output generated by tex2any
from texinfo files.  (If you looking into converting documents written in some
other non-texinfo format, that's not really a priority for this project.)


And how do you grep through a bunch of HTML files, by the way?


I don't think using grep is relevant.  Full-text search is nice
(though I seldom use it), but in principle full-text search in html
files isn't all that different from full-text search in info files;
just dealing with a slightly more complicated syntax.
And emacs already includes an HTML parser.

The new JavaScript "html-info" reader handles both index search and full-text
search (though that latter seems broken - haven't looked into it).
You can see a sample in action here:
https://per.bothner.com/tmp/Kawa-txjs/index.html

If JavaScript can do, Emacs-Lisp can do it.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: Unadorned cross-references

2021-06-01 Thread Per Bothner

On 6/1/21 5:23 AM, Eli Zaretskii wrote:

If we are after an Emacs-only solution,


If we want an Emacs-only solution, why not use eww-mode on an HTML file?
It works tolerably well - try: M-x eww ENTER https://domterm.org ENTER
Viola - you're browsing a texinfo manual with rich text and images.

Of course this has some drawbacks compared to info-mode - most obviously the
keybindings. But it seems like it should be easy to create a sub-mode
of eww-mode and then add keybindings that work like info-mode.
We would probably also change link-handling, so that clicking on an
external link defaults to creating a new eww buffer.

We could call it "hinfo" mode, and describe it as "an experimental mode
for browsing HTML documents (especially ones generated from texinfo)
using info-style keybindings and behavior".

We just need a volunteer ...
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: how to skip default css format lines in html output

2021-05-06 Thread Per Bothner

On 5/6/21 1:13 PM, Gavin Smith wrote:

On 5/6/21 12:36 PM, Gavin Smith wrote:

On Sat, May 01, 2021 at 07:07:28PM -0700, Per Bothner wrote:

It seems wrong to include inline css in generated html files,
especially when using the --ccs-ref or -C INFO_JS_DIR options.
The documentation is complicated. The advice to use !important
to override the default style rules feels quite wrong-headed.



Which rules are the ones which are causing problems? Can you be more
specific? There are not many default CSS rules - only about 15 of them
(I counted).


I don't know if any of the rules are *problems* per se.  However,
some of them don't match my preference, especially not when combined
with other explicitly-requested rules in info.css and kawa.css.
Which means I have to override the default - in some cases
setting them back to default defaults.  For example I use @kbd
to mark the user input in a REPL, so the following style is
undesirable in that context:

kbd {font-style: oblique}

I suspect I can work around the problems without too much pain.
However, it feel ugly to have to do so ...


The default CSS output is not really for the appearance but the
minimum needed to represent the intended meaning of some Texinfo
constructs in HTML output.


It's a mix.


I see, thanks. I never read all of that; as you say, it is quite
complicated. I don't know what the thinking is here behind the special
processing of @import directives by texi2any. I assume it is for some
CSS standard or to allow some types of customization.


@import is basically #include for CSS.

I'm guessing the feeling was that inline CSS was preferable (so the
HTML could be used standalone), and then things got complicated by
the desire to deal with @import.  But I think if you're trying to
include (by copying) a stylesheet that uses @import, you're doing
the wrong thing and should just use -css-ref.  I.e. makeinfo is
over-engineered to solve a problem when doing the wrong thing ...
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: how to skip default css format lines in html output

2021-05-06 Thread Per Bothner




On 5/6/21 12:36 PM, Gavin Smith wrote:

On Sat, May 01, 2021 at 07:07:28PM -0700, Per Bothner wrote:

It seems wrong to include inline css in generated html files,
especially when using the --ccs-ref or -C INFO_JS_DIR options.
The documentation is complicated. The advice to use !important
to override the default style rules feels quite wrong-headed.


Is this causing a practical problem?  Can you not override the
inline CSS with a referenced CSS file?


Yes, I can override a specific rule.  However,
it makes priority order of the various rules a bit fragile.
It also makes it difficult to add to or edit the default rules
as it may interact with user css rules in hard-to-anticipate ways.

Plus it clutters up the HTML with stuff thet doesn't belong there.
If/when we add an option to generate xml/xhtml (as needed for epub)
then the default rules will be inside an XML comment and hence
ignored.  It seems fragile if the appearance of a document
depends on whether it is html or xhtml.


I don't know where the advice to use !important comes from
as I couldn't find this in the manual anywhere.


Search for '! important' (with a space after the '!') in
http://www.gnu.org/software/texinfo/manual/texinfo/html_node/HTML-CSS.html
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: how to skip default css format lines in html output

2021-05-04 Thread Per Bothner

[I missed your reply. Sorry for the delay.]

On 5/2/21 2:49 AM, Gavin Smith wrote:


Could you test the following and see if it works okay for using NO_CSS
with --css-ref?


Doesn't seem to work.  Specifically I tried:

makeinfo -I=doc --html --split=section --no-number-sections -c INFO_JS_DIR=style -c JS_WEBLABELS=omit -c 
EXTRA_HEAD='' --css-ref=style/kawa.css -c  NO_CSS=1 ./kawa.texi -o ./web/

While the builtin lines are skipped, the --css-ref seems to have also been 
ignored.

Furthermore, using NO_CSS seems to conflict with currently-documented meaning:
"Do not use CSS".
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: js-info polishing

2021-05-04 Thread Per Bothner

On 5/2/21 2:16 PM, Gavin Smith wrote:

I've noticed something else (sorry, don't know how hard it would be
to fix).  When on a page that has sub-sections in the same page (due
to the level of "split"), the keyboard commands "n" and "p" don't
work for the sub-sections that are in the same page.

For example, at

https://per.bothner.com/tmp/Kawa-txjs-plain/Strings.html#Basic-string-procedures


I checked in a fix for this.

I uploaded a new test site at:
https://per.bothner.com/tmp/Kawa-txjs
Note this doesn't have the "-plain" suffix, because it includes some
Kawa-specific styling.

Images in examples now work (thanks Patrice!):
https://per.bothner.com/tmp/Kawa-txjs/Composable-pictures.html#Filling-a-shape-with-a-color
Notice the styling of the prompt (light green) and user input (light yellow).
This is text (and hence can by selected and copied), achieved with a small
bit of JavaScript and CSS magic.

--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: js-info polishing

2021-05-02 Thread Per Bothner




On 5/2/21 2:16 PM, Gavin Smith wrote:

I've noticed something else (sorry, don't know how hard it would be
to fix).  When on a page that has sub-sections in the same page (due
to the level of "split"), the keyboard commands "n" and "p" don't
work for the sub-sections that are in the same page.

For example, at

https://per.bothner.com/tmp/Kawa-txjs-plain/Strings.html#Basic-string-procedures

the "Basic string procedures" entry is highlighted in the side bar.  I
expected pressing "n" to then take me to "Immutable String Constructors",
but it doesn't do anything.


Not sure if that is a bug.  These subsections are not nodes - i.e. in kawa.texi
they only have a @subsection command but no @node commands.

Note that the '[' and ']' commands work for these subsections.

Tell me what you would like to happen and I'll see if it can be done easily.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: js-info polishing

2021-05-02 Thread Per Bothner




On 5/2/21 10:10 AM, Gavin Smith wrote:

I saw that you've commited a change to this but it is not reflected for
me at

https://per.bothner.com/tmp/Kawa-txjs-plain/style/info.js

in the on_message function.


I'm confused about what happened, but should be fixed now.

--
    --Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: js-info polishing

2021-05-02 Thread Per Bothner




On 4/30/21 4:06 AM, Per Bothner wrote:



On 4/29/21 9:40 AM, Gavin Smith wrote:

Another problem: Navigate to

https://per.bothner.com/tmp/Kawa-txjs-plain/index.html

type "i" then "constant-fold", select the option in the menu, then press  
Return.

Then the page

https://per.bothner.com/tmp/Kawa-txjs-plain/Application-and-Arguments-Lists.html#index-constant_002dfold

is correctly loaded, but the side bar menu is completely expanded, and scrolled



I checked in a fix for this, and updated 
https://per.bothner.com/tmp/Kawa-txjs-plain/index.html

It was slightly complicated.  When a child page is asked to "scroll-to" a link, 
it checks if the
requested link is a section element.  If  not it looks for the closest outer 
sectioning element
(an elements whose class contains chapter/section/etc).  Then it sends a 
message with the section
element's id back-to the top-level, which uses that to update the sidebar.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



how to skip default css format lines in html output

2021-05-01 Thread Per Bothner

It seems wrong to include inline css in generated html files,
especially when using the --ccs-ref or -C INFO_JS_DIR options.
The documentation is complicated. The advice to use !important
to override the default style rules feels quite wrong-headed.

However, looking at  _default_format_css_lines in HTML.pm
I don't see a clean/clear way to skip the default rules.
NO_CSS has other effects.  I don't understand this test:
  return if (!@{$self->{'css_import_lines'}} and !@{$self->{'css_rule_lines'}}
 and !keys(%{$self->{'css_map'}}) and !@$css_refs);

One simple and clean idea would be for a --css-ref option to cause the
default rules to be skipped - but it would break backward compatibility.
I don't know how seriously.

One option is to piggy0back on the -ccs-include option.
For example --css-include=NONE or --css-include=no-defaults
would skip the default rules.  Or an entirely new option.
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: js-info polishing

2021-04-30 Thread Per Bothner




On 4/29/21 9:40 AM, Gavin Smith wrote:

Another problem: Navigate to

https://per.bothner.com/tmp/Kawa-txjs-plain/index.html

type "i" then "constant-fold", select the option in the menu, then press  
Return.

Then the page

https://per.bothner.com/tmp/Kawa-txjs-plain/Application-and-Arguments-Lists.html#index-constant_002dfold

is correctly loaded, but the side bar menu is completely expanded, and scrolled
to the top.

The logic for updating the sidebar looks for matches to the URL in the ToC.
However, index links aren't in the ToC.  We could fix this by using the Section
from the index to update the sidebar.  Unfortunately, we have the same problem
for cross-reference to @anchor pseudo-nodes - which also aren't in the ToC.

I think we have to scan the loaded page "up" from the linked element until we 
get to
a node-name that is in the ToC.  This is made more complicated because the page
and the sidebar are usually in separate frames, so message-passing is needed.

As an aside, that we "fixed" the structure of the HTML may help here,
since it makes it easier to scan out from the anchor link for ancestor 
elements and collect their 'id' attributes as a list (that in turn we can use
when searching the ToC for a match).
--
--Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: @image in @example

2021-04-30 Thread Per Bothner

The issue reported below is a blocker that presents me from generating
the Kawa manual using makeinfo --html and info.js.  Compare the documentation 
of the
with-paint function here:
https://per.bothner.com/tmp/Kawa-txjs-plain/Composable-pictures.html#Colors-and-paints
with here:
https://www.gnu.org/software/kawa/Composable-pictures.html#Colors-and-paints

I don't think this is "regression" (thus perhaps not a release-blocker),
since I don't believe it works with the release texinfo either.
However, it does work when generating docbook or pdf.
The existing website uses docbook - but I would like to drop that dependency
at some point.

On 4/27/21 3:19 PM, Per Bothner wrote:

Supppage I have an image file images/paint-circ-1.png .

Consider the following k.texi program processed thus:

     makeinfo k.texi --html -o /tmp

\input texinfo
@settitle k-test

@example
$ echo foo
@image{images/paint-circ-1}
$ echo bar
@end example

@image{images/paint-circ-1}
@bye

The first @image just produces
     [ images/paint-circ-1 ]
while the second @image produces an actual  element.

It works when generating docbook or pdf, but not when generating html.


--
    --Per Bothner
p...@bothner.com   http://per.bothner.com/



Re: js-info polishing

2021-04-29 Thread Per Bothner




On 4/29/21 9:30 AM, Gavin Smith wrote:

On Thu, Apr 29, 2021 at 07:15:27PM +0300, Eli Zaretskii wrote:

One possible aspect for improvement: the way the input field for Index
search is placed it obscures other text, and there doesn't seem to be
a way of getting rid of it if I decide not to type anything into that
field.


You get rid of it by pressing Escape.


Yes - using Escape to close a popup or similar is pretty standard.


Maybe there should be another way in case the user doesn't realise this.


A Close button might make sense.

There is a minor misfeature in that if you've started typing you
have to type Escape twice (the first removes the drop-down and the
second closes the input area).  This is similar to how you sometimes
have to Enter twice if you use the arrow keys to navigate the
drop-down menu.

Fixing these might require doing more custom JavaScript to implement
the input fields.  Not a priority or blocker, I think.
--
    --Per Bothner
p...@bothner.com   http://per.bothner.com/



  1   2   3   4   >