Re: [tex4ht] problem with slow compilation of large latex file with large math content

2016-03-26 Thread Nasser M. Abbasi

On 3/26/2016 5:44 PM, Karl Berry wrote:


 I have many many latex files this large

Is the one you provided already one of the smaller ones?  The smaller
the file that still exhibits the problem, the easier to debug.



The file I put a link to is an average size Latex file. I have some larger.
I picked this one since it showed the problem clearly.


 given that lualatex takes one hr or so.

Oh, lualatex is another important part of the story.  LuaTeX is already
significantly slower than standard TeX (or XeTeX), and depending on what
your document is doing, we may be hitting some kind of new/unusual
slowdown that is specific to luatex.  Must you use luatex?



Sorry, I meant lualatex takes one hr for _all_ the files (I have about
50-60 files like this one for this one test). I did not mean it
takes one hr for this one file.

Just to be clear: For the specific file I posted

  http://12000.org/tmp/032616/

pdflates and lualatex take about 2-3 minutes to compile the above file,
on native linux on new PC.  Even on Vbox and cygwin, pdflatex and lualatex
as really fast. Not more than 4-5 minutes at most on this file.

tex4ht takes about 5 hrs on the new Linux PC. All running new installed TL 2015.
Linux mint 17.3, 64 bit. tex4ht takes 14 hrs on VBox and 10 hrs
on cygwin. On same file.

Any one can download the zip file and check for themselves. Sorry it
is little large. But it is only one command to run and see ;)


 Finally, is there a document that describes the passes/process
 that tex4ht uses to compile to HTML at some high level?

The htlatex script is six lines long, and is the clearest possible
summary of what is run.  I'll omit the TeX gobbeldygook that Eitan
uses.

#!/bin/sh
 latex $5 ...
 latex $5 ...
 latex $5 ...
 tex4ht -f/$1  -i~/tex4ht.dir/texmf/tex4ht/ht-fonts/$3
 t4ht -f/$1 $4

I assume Michal's make4ht is fundamentally equivalent.

As CVR says, the reason for the three latex runs is simply to resolve
references.  Thus if you are repeatedly running the same doc, with all
aux files already in place, one run would suffice.



But I do not know how to decide if I need one run or two or three?
I call make4ht to compile the file, and it does its thing. How
is the user supposed to control this?



For sure, a design document, among many others, would be extremely
desirable, not to mention many updates to the code, not to mention a new
release, not to mention ...  What's fundamentally needed are more
volunteers with time and ability to help develop and document this
highly complex system!

karl



Sure. tex4ht is very complicated and only very few experts know how
it works from the inside, and it is only the only tool we have
really to convert latex to HTML well enough, so it is important
to have design document that describes how it works. I switch
to tex4ht from latex2html about 2 years ago, since latex2html is basically
not maintained any more now and does not support many Latex packages.

Thank you for the input Karl.

--Nasser


Re: [tex4ht] problem with slow compilation of large latex file with large math content

2016-03-26 Thread Karl Berry
Hi Nasser,

TL is more optimized for native Linux vs. cygwin.

Just to remark: TL specifically is not "optimized" for any particular
platform (all binaries are built natively).  I think the difference you
are seeing here is an inevitable consequence of running a
resource-intensive job on an emulation layer (cygwin) vs. a native layer
(gnu/linux).  (As for native Windows, it is fundamentally inefficient,
so I'm not surprised it is slow too.  Cygwin or vbox is thus the worst
of both worlds.)

buying new PC and installing Linux on it just in the hope

Wow.  I suspect you are the only person in the world buying hardware to
placate tex4ht!

then it starts to slow down, the higher the number becomes

I think we need some kind of profiling of the TeX run to find the facts.
I don't have an easy recipe at hand.  (And I'm currently trying to get
the next TUGboat out the door, plus prepare for the TL pretest, so it's
going to be hard to devote significant time to this for a while,
unfortunately ...)

But the issue is, pdflatex and lualatex take about 5 minutes
on the same file to compile it to pdf !

Ok, so let's consider PDF first, since that is simpler to think about
than HTML.

I have many many latex files this large

Is the one you provided already one of the smaller ones?  The smaller
the file that still exhibits the problem, the easier to debug.

given that lualatex takes one hr or so.

Oh, lualatex is another important part of the story.  LuaTeX is already
significantly slower than standard TeX (or XeTeX), and depending on what
your document is doing, we may be hitting some kind of new/unusual
slowdown that is specific to luatex.  Must you use luatex?

Finally, is there a document that describes the passes/process
that tex4ht uses to compile to HTML at some high level?

The htlatex script is six lines long, and is the clearest possible
summary of what is run.  I'll omit the TeX gobbeldygook that Eitan
uses.

#!/bin/sh
latex $5 ...
latex $5 ...
latex $5 ...
tex4ht -f/$1  -i~/tex4ht.dir/texmf/tex4ht/ht-fonts/$3
t4ht -f/$1 $4

I assume Michal's make4ht is fundamentally equivalent.

As CVR says, the reason for the three latex runs is simply to resolve
references.  Thus if you are repeatedly running the same doc, with all
aux files already in place, one run would suffice.


For sure, a design document, among many others, would be extremely
desirable, not to mention many updates to the code, not to mention a new
release, not to mention ...  What's fundamentally needed are more
volunteers with time and ability to help develop and document this
highly complex system!

karl


Re: [tex4ht] problem with slow compilation of large latex file with large math content

2016-03-26 Thread Radhakrishnan CV
On Sat, Mar 26, 2016 at 3:38 PM, Nasser M. Abbasi  wrote:

>
> The zip file is much larger, about 270 MB :( But now it
> contains everything to compile this one latex file.
>

​I have downloaded the previous archive. Will download this if needed.​ I
will temporarily suspend svg generation and see if tex4ht runs faster. BTW,
please don't expect a quick response from me. Will try to do as quickly as
possible.

​Best regards​
-- 
Radhakrishnan
River Valley



Re: [tex4ht] problem with slow compilation of large latex file with large math content

2016-03-26 Thread Nasser M. Abbasi



Thanks for the offer to look into it. I put the latex file
and all the needed include files I use and the .cfg and main.mk
and the command in one zip file. Here it is, in this folder:

http://12000.org/tmp/032616/



opps. I am sorry, I forgot to also inlcude the svg images for use
by tex4ht in the zip file earlier. (I use pdf2svg to convert pdf
image files to svg files for use in HTML in the \includegraphics call).

If someone happened to have downloded the zip file allready, please
delete it and obtain the updated zip file I just uploaded now:

http://12000.org/tmp/032616/

The zip file is much larger, about 270 MB :( But now it
contains everything to compile this one latex file.


thank you,.
--Nasser


Re: [tex4ht] problem with slow compilation of large latex file with large math content

2016-03-26 Thread Nasser M. Abbasi

On 3/26/2016 1:11 AM, Radhakrishnan CV wrote:

On Sat, Mar 26, 2016 at 3:23 AM, Nasser M. Abbasi  wrote:

​[...]​

For example, for one file, using Vbox, it took 14 hrs

for make4ht to compile the file to html. On cygwin, it took
little less than than. About 10 hrs. This is on windows 7, 64 bit
16 GB ram, fast intel i7-3930k CPU.



​That is terrible! But, it contradicts with my own experience. At work, we
do large documents (on an average 300 pages long, 800-1000 bibliographic
items, 500 to 800 equations, very complex math, large number of figures,
double column output) on a daily basis, but it takes a few seconds to
generate Elsevier XML output. Recently, another article with 350 pages, ~70
figures, four or five very long tables each spanning several pages, 350 bib
items, several hundred cross references, but very few math, took only 12
secs for three runs of TeX4ht to generate NLM XML output on a server where
at least 50 users are working simultaneously using same resources. The only
documents that take, say, 60 secs or a bit more time are documents with
atomic and nuclear data tables, each table running to 200 pages typically!
Otherwise, tex4ht run is a breeze in my experience that too on a server
shared by at least forty to fifty users at a time.

[...]
   ​



But the issue is, pdflatex and lualatex take about 5 minutes
on the same file to compile it to pdf !

I can understand converting to HTML will take more time,
since each equation is converted to svg image,



​on the fly? Why don't you write out the math in a file and process
separately to generate the svg images in one go?​



Sorry, I do not understand what this mean. I have latex
file, which contains math, and then call tex4ht to
generate the HTML. I use make4ht to compile it and tell
it to use svg for math.


[...]



It also seems tex4ht has more than one pass. As I see it
generating these sequence of numbers  more than one time.



​tex4ht needs three passes for fixing cross links and multicolumns in
tables.
​


Ok. But each pass is slow., as is seems to go through
the whole pages over and over again.







I can make a zip file with typical large latex file
with all the images it uses and my .cfg and main.mk4
and the command I used to compile the latex file if
any one wants to confirm this problem. Would this be ok?



​I would love to debug your problem. Please do send me. If it is too large
the archive, kindly put at some location and provide me the URL.
​


Thanks for the offer to look into it. I put the latex file
and all the needed include files I use and the .cfg and main.mk
and the command in one zip file. Here it is, in this folder:

http://12000.org/tmp/032616/

There is file call compile.sh which has this line:

make4ht --lua -u -c ./nma.cfg -e ./main.mk4 report.tex "htm,3,pic-align,notoc*"

THe report.tex there is 17 MB large.  You'll see the slow down
as it pages get to over [1000]... etc.. it will take few hours
to compile.

Please let me know if you need anything else or anything
I can try on my end. I made sure all the file needed there.
If I missed something, will update.




​[...]
​


Finally, is there a document that describes the passes/process
that tex4ht uses to compile to HTML at some high level? Like block
diagram, or such. I am not able to find such design document.



​A schematic diagram of a tex4ht run namely tex4ht.pdf is attached to this
mail. Hope this might help.​



Thanks for the diagram. But there should really be a more
detailed design document for tex4ht. For something as
important as tex4ht.


​Best regards​



thank you,
--Nasser