[tex4ht] [bug #560] problem using tex4ht with xr-hyper when other documents are in separate folders

2022-04-22 Thread Nasser M. Abbasi
URL:
  

 Summary: problem using tex4ht with xr-hyper when other
documents are in separate folders
 Project: tex4ht
Submitted by: nma123
Submitted on: Sat Apr 23 05:23:31 2022
Category: None
Priority: 5 - Normal
Severity: 5 - Normal
  Status: None
 Privacy: Public
 Assigned to: None
Originator Email: 
 Open/Closed: Open
 Discussion Lock: Any

___

Details:

reference and screen shots at

https://tex.stackexchange.com/questions/641726/problem-using-tex4ht-with-xr-hyper-when-other-documents-are-in-separate-folders

I am splitting my large document into smaller separate documents, where now
each chapter will be compiled separately, as separate PDF file (not using
\include or input, but as a separate document with its own title and
\begin{document})

But I still need to have reference from a main document to the other documents
to build tables and references.

So I am using xr-hyper. Which worked well. I have this MWE tree to show my
current set

   main.tex
  |
  +-- chapter_1/ch1.tex
  +-- chapter_2/ch2.tex

I put each chapter in separate folder to reduce clutter. This is what my
main.tex, ch1.tex and ch2.tex look like

%main.tex
\documentclass{book}
\usepackage{xr-hyper}
\usepackage{hyperref}
\externaldocument{chapter_1/ch1} %notice folder name included
\externaldocument{chapter_2/ch2}
\begin{document}
See problem \hyperref[1]{1} below and problem \hyperref[2]{2}.
\end{document}

and ch1.tex which is inside folder chapter_1/

\documentclass{book}
\usepackage{xr-hyper}
\usepackage{hyperref}

\begin{document}
\chapter{chapter 1 in document ch1.tex}
\section{some section name}
\subsection{problem 1 from some book}
\label{1}
This is problem 1
\end{document}

and ch2.tex

\documentclass{book}
\usepackage{xr-hyper}
\usepackage{hyperref}
\begin{document}

\chapter{chapter 2 in document ch2.tex}
\section{some section name}
\subsection{problem 2 from some book}
\label{2}
This is problem 2
\end{document}

Next, compiled ch1.tex,ch2.tex and finally main.tex in this order using
lualatex. Now I opened main.pdf and the links there correct and work.

Next, I did the same for tex4ht. Compiled ch1.tex,ch2.tex then main.tex in
this order, all using the commands

   make4ht ch1.tex 'mathjax,htm'
   make4ht ch2.tex 'mathjax,htm'
   make4ht main.tex 'mathjax,htm'

Then opened main.htm and it shows correct output

Now here is the problem, when I click on the link, it drops the folder name
from the link. The link for 1 says

ch1.htm#x1-30001.1.1 

instead of

 chapter_1/ch1.htm#x1-30001.1.1

So the link did not work.

By manually editing the link and adding the folder name chapter_1/ to it, then
it works and opens chapter 1 page correctly.

I can fix all of this by having all separate documents (main.tex, ch1.tex and
ch2.tex) in same top level folder.

But to reduce clutter, and since I have many chapters that I want to make now
completely separate PDF files for each chapter (but still have cross
references from main document to them), it will be much better to have each in
separate folder to make it easier to manage and reduce clutter in same folder,
since I need to compile each one separately now.

This looks like a bug in tex4ht, in that it drops the folder name from the
reference. Or Am I doing something wrong?

TL 2021





___

Reply to this item at:

  

___
  Message sent via/by Puszcza
  http://puszcza.gnu.org.ua/



Re: [tex4ht] How to increase number of strings ? make4ht-lib: Fatal error. Command htlatex returned exit code 1

2022-04-22 Thread Nasser M. Abbasi

On 4/22/2022 10:11 AM, Michal Hoftich wrote:

Hi Nasser,



It now compiled the MWE given below. It does take a long
time. This is another issue for another email. But tex4ht took
110 minutes to compile this file, (which is empty, but has
lots of empty chapters and sections), while lualatex about 2 minutes.

Tex4ht is really slow.





The main problem is that your files are so extremely huge. TeX4ht
needs some extra memory, because it declares link destinations for all
sections and for each cut page.



Hello Michal. Thanks for taking the time to look at the problem.

I understand the document is long. (actually the MWE I gave
is empty, but has lots of chapters/sections/subsections to
show the problem).

But I do have more than enough Physical RAM. This is new PC with one of
the fastest  intel CPU and mximum 128 GB RAM.

I monitor the ram all the time as tex4ht is compiling, and
it does not use more than 8GB to 10GB. This below shows the current
RAM compiling with "4" split level, including 2 other tex4ht builds
running at same time in separate terminals:


free -mh

   totalusedfree  shared  buff/cache   available
Mem:  100Gi   8.7Gi83Gi   0.0Ki   8.4Gi90Gi
Swap:  26Gi  0B26Gi

So I am failing to see how is this a physical memory issue at all.

Please help me understand how this is due to memory not available for
tex4ht to compile the document.  If tex4ht runs out of memory,
would it not give an error and stop?



Now for the issue of losing links.

I used this command

make4ht  -ulm default foo.tex "mathjax,htm,fn-in,4"


Try

make4ht  -ulm draft foo.tex "mathjax,htm,fn-in,3"

It will produce separate files for sections, but not for subsections.
You should save a lot of memory and compilation time using this.



Thanks for the suggestion. Yes, this compiles much faster and does
not produce missing links as with "4" instead of "3".

Just to be clear what we are taking about:

===

make4ht  foo.tex "mathjax,htm,fn-in,4"

Takes over 2 hrs to compile, produces DOM errors

[INFO]make4ht-lib: parse_lg process file: fosu36492.htm
[WARNING] domfilter: DOM parsing of fosu36492.htm failed:
[WARNING] domfilter: 
...ive/2021/texmf-dist/tex/luatex/luaxml/luaxml-mod-xml.lua:175: Error Parsing 
XML [char=5325]

And produces wrong links that leads to no pages as discuessed earlier.
The links go dead after chapter 19, section 3 subsection 93 as
I said before.


make4ht  foo.tex "mathjax,htm,fn-in,3"

Takes only 12 minutes and produces no bad links.


But this difference on its own shows clearly there is a serious problem
in tex4ht in this area.

Why would it take 10 or 20 times longer time to compile when
using "4" vs. "3"?

A little more time is to be expected, but this much longer? May be
the algorithm  or data structure used by tex4ht to handle and manage
the splitting is not optimized? Does this mean using "5" will now
take 100 times longer to compile? I am afraid to try :)

I remember you once mentioned the code for splitting in tex4ht
is very old and hard to understand which is something I understand
and appreciate.

This workaround of changing "4" to "3" on this document
can't really be a solution for me. The issue is not how long
it takes to compile. I can wait. If I can get correct pages
generated that loads fast.

The issue is the missing links and missing sections in table
of content and seemingly corrupt pages that show up with large
blank spaces in them which the MWE shows.

I have sections that have many subsections (up to 50 or 100) in
them. Each subsection is also large. Having them all show on same
webpage, means the pages becomes very slow to open. Even
with mathjax for math, which is not practical.

No one will want to browse a website that takes 5 minutes each time
to open a large pages. Also, it harder to browse very
long pages.

That is why I wanted to have each subsection on each separate
and smaller webpage, so they open fast and easier to view.


Best regards,
Michal


If you have any other ideas or something I should try, please
feel to let me know. Right now, I am no longer able to build
my long document because of this.

Thanks,
--Nasser


Re: [tex4ht] How to control/optimize the number of times make4ht runs dvilualatex?

2022-04-22 Thread Nasser M. Abbasi

On 4/19/2022 4:30 AM, Michal Hoftich wrote:

Hi Nasser



Thanks, but the issue is, how does one know if they
need to run it 1, 2 or 3 times? What code do I need to add
to the above to decide?
With lualatex, now this is what I do and it works (other option is to
use latexmk, but for some other reasons, I prefer to stick to makefiles
for now).



One possibility is to use the latexmk_build extension, like in:

   make4ht -f html5+latexmk_build sample.tex

Another option is to check temporary files for changes. Using some
hash function. Like in this build file:


local htlatex = require "make4ht-htlatex"

local function get_checksum(main_file, extensions)
   -- make checksum for temporary files
   local checksum = ""
   local extensions = extensions or {"aux", "4tc", "xref"}
   for _, ext in ipairs(extensions) do
 local f = io.open(main_file .. "." .. ext, "r")
 if f then
   local content = f:read("*all")
   f:close()
   -- make checksum of the file and previous checksum
   -- this way, we will detect change in any file
   checksum = md5.sumhexa(checksum .. content)
 end
   end
   return checksum
end

Make:add("myhtlatex", function(par)
   -- get checksum of temp files before compilation
   local checksum = get_checksum(par.input)
   local status = htlatex.htlatex(par)
   -- stop processing on error
   if status ~= 0 then return status end
   -- get checksum after compilation
   local newchecksum = get_checksum(par.input)
   -- this is needed to prevent possible infinite loops
   local compilation_count = 1
   local max_compilations  = 3 -- <- change as you want
   while checksum ~= newchecksum do
 --
 if compilation_count > max_compilations then return status end
 status = htlatex.htlatex(par)
 -- stop processing on error
 if status ~= 0 then return status end
 checksum = newchecksum
 -- get checksum after compilation
 newchecksum = get_checksum(par.input)
   end
   return status
end)

Make:myhtlatex {}
-

Best regards,
Michal


Hello Michal;

Just a clarification. I now trying using your build file above to optimize
the number of times tex4ht is called.

Do I need to also add

\Preamble{xhtml}
\typeout{(\jobname.4tc)}
\begin{document}
\EndPreamble

to my .cfg? or is this above for use with the other method you showed
which is using latexml?

So if I just want to use the above build file, is it enough to just
load it using -e  build_file.mk4   where build_file.mk4 is what you
show above or do I need to do something more to my .cfg?

btw, when I try the above on a simple file, it calls
tex4ht twice starting from clean folder. I assume this is normal?
i.e. from clean folder:

--
cat report.tex
\documentclass[11pt]{book}
\begin{document}

test
\end{document}
---

and now


make4ht  --shell-escape -ulm default -a debug
 -c nma_mathjax.cfg
 --build-file $TEXMFHOME/tex/latex/tex4ht_build_files/MAIN.mk4
  report.tex
  "mathjax,htm,fn-in,1,notoc*,p-width,charset=utf-8" "-cunihtf -utf8" "" 
"--interaction=batchmode"
---

Shows it calls dvilualatex twice

[INFO]htlatex: LaTeX call: dvilualatex --interaction=errorstopmode -jobname='report'  
--interaction=batchmode -shell-escape 
'\makeatletter\def\HCode{\futurelet\HCode\HChar}\def\HChar{\ifx"\HCode\def\HCode"##1"{\Link##1}\expandafter\HCode\else\expandafter\Link\fi}\def\Link#1.a.b.c.{\AddToHook{class/before}{\RequirePackage[#1,html]{tex4ht}}\let\HCode\documentstyle\def\documentstyle{\let\documentstyle\HCode\expandafter\def\csname
 
tex4ht\endcsname{#1,html}\def\HCode1{\documentstyle[tex4ht,}\@ifnextchar[{\HCode}{\documentstyle[tex4ht]}}}\makeatother\HCode
 nma_mathjax.cfg,mathjax,htm,fn-in,1,notoc*,p-width,charset=utf-8,charset=utf-8,html5.a.b.c.\input 
"\detokenize{report.tex}"'
This is LuaTeX, Version 1.15.0 (TeX Live 2022)
 system commands enabled.

[INFO]htlatex: LaTeX call: dvilualatex --interaction=errorstopmode 
-jobname='report'

--interaction=batchmode -shell-escape 
'\makeatletter\def\HCode{\futurelet\HCode\HChar}\def\HChar{\ifx"\HCode\def\HCode"##1"{\Link##1}\expandafter\HCode\else\expandafter\Link\fi}\def\Link#1.a.b.c.{\AddToHook{class/before}{\RequirePackage[#1,html]{tex4ht}}\let\HCode\documentstyle\def\documentstyle{\let\documentstyle\HCode\expandafter\def\csname
 
tex4ht\endcsname{#1,html}\def\HCode1{\documentstyle[tex4ht,}\@ifnextchar[{\HCode}{\documentstyle[tex4ht]}}}\makeatother\HCode
 nma_mathjax.cfg,mathjax,htm,fn-in,1,notoc*,p-width,charset=utf-8,charset=utf-8,html5.a.b.c.\input 
"\detokenize{report.tex}"'
This is LuaTeX, Version 1.15.0 (TeX Live 2022)
 system commands enabled.

.
[STATUS]  make4ht: Conversion finished

Is the above expected? why 2 times on this simple file?

But when now I use the same command again, but do not
clear the folder and keep all intermediate files generated from earlier
command, it now ca

Re: [tex4ht] How to increase number of strings ? make4ht-lib: Fatal error. Command htlatex returned exit code 1

2022-04-22 Thread Michal Hoftich
Hi Nasser,


> It now compiled the MWE given below. It does take a long
> time. This is another issue for another email. But tex4ht took
> 110 minutes to compile this file, (which is empty, but has
> lots of empty chapters and sections), while lualatex about 2 minutes.
>
> Tex4ht is really slow.
>

The main problem is that your files are so extremely huge. TeX4ht
needs some extra memory, because it declares link destinations for all
sections and for each cut page.

> Now for the issue of losing links.
>
> I used this command
>
> make4ht  -ulm default foo.tex "mathjax,htm,fn-in,4"

Try

make4ht  -ulm draft foo.tex "mathjax,htm,fn-in,3"

It will produce separate files for sections, but not for subsections.
You should save a lot of memory and compilation time using this.

Best regards,
Michal


Re: [tex4ht] How to control/optimize the number of times make4ht runs dvilualatex?

2022-04-22 Thread Michal Hoftich
Hi Nasser,

> Reason I am asking, is that it is easy to count/find the
> number of times needed to call lulatex. So I could use that number of
> times, to also call tex4ht on same file.

It is hard to say. For correct cross-references, two runs are usually
enough. But there are some cases regarding tables, where even more
runs than three are necessary - the same number as number of columns
is needed. But I don't remember  what feature it was, and I cannot
reproduce it now. But anyway, when you don't add new sections,
cross-refs, etc., one compilation is enough.

Best regards,
Michal