Re: Standalone Info reader cannot read Info files with CR-LF EOLs

2015-01-14 Thread Eli Zaretskii
 Date: Wed, 14 Jan 2015 10:42:08 +
 From: Gavin Smith gavinsmith0...@gmail.com
 Cc: Texinfo bug-texinfo@gnu.org
 
 int
 wcwidth (wchar_t wc)
 #undef wcwidth
 {
   /* In UTF-8 locales, use a Unicode aware width function.  */
   const char *encoding = locale_charset ();
   if (STREQ_OPT (encoding, UTF-8, 'U', 'T', 'F', '-', '8', 0, 0, 0 ,0))
 {
   /* We assume that in a UTF-8 locale, a wide character is the same as a
  Unicode character.  */
   return uc_width (wc, encoding);
 }
   else
 {
   /* Otherwise, fall back to the system's wcwidth function.  */
 #if HAVE_WCWIDTH
   return wcwidth (wc);
 #else
   return wc == 0 ? 0 : iswprint (wc) ? 1 : -1;
 #endif
 }
 }
 
 locale_charset is always called every time.

Yes, I know.  But only if gnulib's wcwidth is used.  Is it used on
your platform?  AFAIK, glibc provides wcwidth, so I'd expect the
gnulib version not to be used on your platform.

 It must be slower under a Windows system. The implementation of
 locale_charset is in the localcharset.c file from gnulib, although I
 haven't looked at it in detail, and don't know why it would be slow
 under Windows.

If I comment out the call to locale_charset in gnulib's wcwidth, and
show that the slow-down goes away, will that convince you?

In any case, I don't see why we need to call locale_charset for each
and every character, over and over and over again.  We should call it
once and then reuse the result, since it depends on the environment
outside the reader, and will not change during the session, right?



Re: Standalone Info reader cannot read Info files with CR-LF EOLs

2015-01-14 Thread Gavin Smith
On Wed, Jan 14, 2015 at 5:57 PM, Eli Zaretskii e...@gnu.org wrote:
 Date: Wed, 14 Jan 2015 10:42:08 +
 From: Gavin Smith gavinsmith0...@gmail.com
 Cc: Texinfo bug-texinfo@gnu.org

 int
 wcwidth (wchar_t wc)
 #undef wcwidth
 {
   /* In UTF-8 locales, use a Unicode aware width function.  */
   const char *encoding = locale_charset ();
   if (STREQ_OPT (encoding, UTF-8, 'U', 'T', 'F', '-', '8', 0, 0, 0 ,0))
 {
   /* We assume that in a UTF-8 locale, a wide character is the same as a
  Unicode character.  */
   return uc_width (wc, encoding);
 }
   else
 {
   /* Otherwise, fall back to the system's wcwidth function.  */
 #if HAVE_WCWIDTH
   return wcwidth (wc);
 #else
   return wc == 0 ? 0 : iswprint (wc) ? 1 : -1;
 #endif
 }
 }

 locale_charset is always called every time.

 Yes, I know.  But only if gnulib's wcwidth is used.  Is it used on
 your platform?  AFAIK, glibc provides wcwidth, so I'd expect the
 gnulib version not to be used on your platform.

Right, I was forgetting that the gnulib version won't be used if the
function exists.


 In any case, I don't see why we need to call locale_charset for each
 and every character, over and over and over again.  We should call it
 once and then reuse the result, since it depends on the environment
 outside the reader, and will not change during the session, right?

I raised this issue on the gnulib list here:
http://lists.gnu.org/archive/html/bug-gnulib/2015-01/msg00040.html.

One idea is to add another function to gnulib where the encoding is
passed as a parameter.



Re: Standalone Info reader cannot read Info files with CR-LF EOLs

2015-01-14 Thread Gavin Smith
On Sat, Jan 10, 2015 at 3:31 PM, Eli Zaretskii e...@gnu.org wrote:
 Date: Thu, 8 Jan 2015 11:00:40 +
 From: Gavin Smith gavinsmith0...@gmail.com
 Cc: Texinfo bug-texinfo@gnu.org

 On Sat, Jan 3, 2015 at 3:29 PM, Eli Zaretskii e...@gnu.org wrote:
  As you see, wcwidth and locale_charset, both from gnulib in my build,
  take 75% of the time.
 
  I thought that perhaps the reason was that your locale is UTF-8, so
  Info doesn't need to convert text using libiconv in your locale.  But
  removing the UTF-8 encoding tag from the ELisp Info file didn't have
  any visible effect on the delay, so that's not it.
 
  Suggestions for further digging into this are welcome.

 One way to check would be to comment out the call to wcwidth, and
 replace with the value 1, and see if it is still slow.

 Doing this completely solves the problem.

 Moreover, if I replace the line that calls wcwidth:

   *pchars = wcwidth ((*iter).cur.wc);

 with what constitutes the body of the gnulib implementation, i.e.:

   *pchars = (*iter).cur.wc ? (iswprint ((*iter).cur.wc) ? 1 : -1) : 0;

 I don't see any slowdown, either.

 My conclusion is that the reason for the slowdown is the call to
 locale_charset inside gnulib's wcwidth is the culprit, because my
 locale's charset is not UTF-8.

 Thanks.

Here's the body of the gnulib wcwidth:

int
wcwidth (wchar_t wc)
#undef wcwidth
{
  /* In UTF-8 locales, use a Unicode aware width function.  */
  const char *encoding = locale_charset ();
  if (STREQ_OPT (encoding, UTF-8, 'U', 'T', 'F', '-', '8', 0, 0, 0 ,0))
{
  /* We assume that in a UTF-8 locale, a wide character is the same as a
 Unicode character.  */
  return uc_width (wc, encoding);
}
  else
{
  /* Otherwise, fall back to the system's wcwidth function.  */
#if HAVE_WCWIDTH
  return wcwidth (wc);
#else
  return wc == 0 ? 0 : iswprint (wc) ? 1 : -1;
#endif
}
}

locale_charset is always called every time. It must be slower under a
Windows system. The implementation of locale_charset is in the
localcharset.c file from gnulib, although I haven't looked at it in
detail, and don't know why it would be slow under Windows.



Re: Standalone Info reader cannot read Info files with CR-LF EOLs

2015-01-10 Thread Eli Zaretskii
 Date: Thu, 8 Jan 2015 11:00:40 +
 From: Gavin Smith gavinsmith0...@gmail.com
 Cc: Texinfo bug-texinfo@gnu.org
 
 On Sat, Jan 3, 2015 at 3:29 PM, Eli Zaretskii e...@gnu.org wrote:
  As you see, wcwidth and locale_charset, both from gnulib in my build,
  take 75% of the time.
 
  I thought that perhaps the reason was that your locale is UTF-8, so
  Info doesn't need to convert text using libiconv in your locale.  But
  removing the UTF-8 encoding tag from the ELisp Info file didn't have
  any visible effect on the delay, so that's not it.
 
  Suggestions for further digging into this are welcome.
 
 One way to check would be to comment out the call to wcwidth, and
 replace with the value 1, and see if it is still slow.

Doing this completely solves the problem.

Moreover, if I replace the line that calls wcwidth:

  *pchars = wcwidth ((*iter).cur.wc);

with what constitutes the body of the gnulib implementation, i.e.:
 
  *pchars = (*iter).cur.wc ? (iswprint ((*iter).cur.wc) ? 1 : -1) : 0;

I don't see any slowdown, either.

My conclusion is that the reason for the slowdown is the call to
locale_charset inside gnulib's wcwidth is the culprit, because my
locale's charset is not UTF-8.

Thanks.



Re: Standalone Info reader cannot read Info files with CR-LF EOLs

2015-01-08 Thread Gavin Smith
On Sat, Jan 3, 2015 at 3:29 PM, Eli Zaretskii e...@gnu.org wrote:
 As you see, wcwidth and locale_charset, both from gnulib in my build,
 take 75% of the time.

 I thought that perhaps the reason was that your locale is UTF-8, so
 Info doesn't need to convert text using libiconv in your locale.  But
 removing the UTF-8 encoding tag from the ELisp Info file didn't have
 any visible effect on the delay, so that's not it.

 Suggestions for further digging into this are welcome.

One way to check would be to comment out the call to wcwidth, and
replace with the value 1, and see if it is still slow. The calls to
mb_isprint (from gnulib) could be slow as well, although they were
there in the last release, I believe.



Re: Standalone Info reader cannot read Info files with CR-LF EOLs

2015-01-03 Thread Eli Zaretskii
 Date: Fri, 02 Jan 2015 18:05:40 +0200
 From: Eli Zaretskii e...@gnu.org
 Cc: bug-texinfo@gnu.org
 
  Date: Fri, 2 Jan 2015 15:47:50 +
  From: Gavin Smith gavinsmith0...@gmail.com
  Cc: Texinfo bug-texinfo@gnu.org
  
  On Fri, Jan 2, 2015 at 11:26 AM, Eli Zaretskii e...@gnu.org wrote:
   However, I see an annoying delay when going to a different node.  For
   example, in the Emacs Lisp manual, go to the Index node, move to the
   end of the index, type RET on one of the last entries, then type 'l'
   to go back.  Bot going to Index and going back causes a visible delay,
   which is about 1.5 sec here (this is a Core-i7 box).
  
  I couldn't reproduce this. There is a slight delay for me when loading
  the Index node for the first time, but not when going back. I would
  guess it is because it is the Index node is a long node and it is
  doing something when displaying it that takes a long time (like
  calculating the positions of the starts of lines).
 
 I don't think so: the Info viewer from Texinfo 4.13 doesn't have this
 problem, I tried with the same files.
 
  I can't see how this would be related to CR removal. Do other nodes
  in the file also this problem?
 
 OK, I will take a closer look.

The bottle-neck is clearly process_node_text, it takes more than 1 sec
when the node is Index in the ELisp manual.

I timed the loop in process_node_text, and it takes about 0.22 msec
per line on the average, and there are 5700 lines in that node.

I tried to find the culprit in that loop, but it's hard to time such
small intervals reliably.  My gut feeling is that the call to
printed_representation is the reason: we call that function once for
each character on the line.  But I cannot prove that, and I cannot
explain why you don't see the same delay.  Perhaps the reason is that
some functions called by printed_representation, which in my build are
supplied by gnulib, are much faster in glibc.  This is based on the
following profile I get from gprof:

  Each sample counts as 0.01 seconds.
%   cumulative   self  self total
   time   seconds   secondscalls  Ts/call  Ts/call  name
   50.00  0.02 0.02 locale_charset
   25.00  0.03 0.01 wcwidth
   12.50  0.04 0.01 
add_file_directory_to_path
   12.50  0.04 0.01 main
0.00  0.04 0.00   415064 0.00 0.00  printed_representation
0.00  0.04 0.00   229065 0.00 0.00  reset_conversion
0.00  0.04 0.0061093 0.00 0.00  text_buffer_alloc
0.00  0.04 0.0051681 0.00 0.00  text_buffer_iconv
0.00  0.04 0.0051681 0.00 0.00  text_buffer_space_left
0.00  0.04 0.0031362 0.00 0.00  skip_whitespace
0.00  0.04 0.0019793 0.00 0.00  read_quoted_string

As you see, wcwidth and locale_charset, both from gnulib in my build,
take 75% of the time.

I thought that perhaps the reason was that your locale is UTF-8, so
Info doesn't need to convert text using libiconv in your locale.  But
removing the UTF-8 encoding tag from the ELisp Info file didn't have
any visible effect on the delay, so that's not it.

Suggestions for further digging into this are welcome.



Re: Standalone Info reader cannot read Info files with CR-LF EOLs

2015-01-03 Thread Gavin Smith
On Sat, Jan 3, 2015 at 3:29 PM, Eli Zaretskii e...@gnu.org wrote:
 The bottle-neck is clearly process_node_text, it takes more than 1 sec
 when the node is Index in the ELisp manual.

 I timed the loop in process_node_text, and it takes about 0.22 msec
 per line on the average, and there are 5700 lines in that node.

 I tried to find the culprit in that loop, but it's hard to time such
 small intervals reliably.  My gut feeling is that the call to
 printed_representation is the reason: we call that function once for
 each character on the line.  But I cannot prove that, and I cannot
 explain why you don't see the same delay.  Perhaps the reason is that
 some functions called by printed_representation, which in my build are
 supplied by gnulib, are much faster in glibc.  This is based on the
 following profile I get from gprof:

   Each sample counts as 0.01 seconds.
 %   cumulative   self  self total
time   seconds   secondscalls  Ts/call  Ts/call  name
50.00  0.02 0.02 locale_charset
25.00  0.03 0.01 wcwidth
12.50  0.04 0.01 
 add_file_directory_to_path
12.50  0.04 0.01 main
 0.00  0.04 0.00   415064 0.00 0.00  printed_representation
 0.00  0.04 0.00   229065 0.00 0.00  reset_conversion
 0.00  0.04 0.0061093 0.00 0.00  text_buffer_alloc
 0.00  0.04 0.0051681 0.00 0.00  text_buffer_iconv
 0.00  0.04 0.0051681 0.00 0.00  text_buffer_space_left
 0.00  0.04 0.0031362 0.00 0.00  skip_whitespace
 0.00  0.04 0.0019793 0.00 0.00  read_quoted_string

 As you see, wcwidth and locale_charset, both from gnulib in my build,
 take 75% of the time.

The calls to wcwidth are new; previously there was no handling of
characters that span two display columns.

I am going to simplify the code in process_node_text to only deal with
calculating the line starts: it is generic code that was previously
used for the screen update as well. If that doesn't produce a speed-up
then it could be slow because it is calling wcwidth on every single
character.



Re: Standalone Info reader cannot read Info files with CR-LF EOLs

2015-01-02 Thread Eli Zaretskii
 Date: Thu, 1 Jan 2015 18:38:07 +
 From: Gavin Smith gavinsmith0...@gmail.com
 Cc: Texinfo bug-texinfo@gnu.org
 
 On Sun, Dec 28, 2014 at 8:57 PM, Eli Zaretskii e...@gnu.org wrote:
 
  There's still a problem of Info files produced by makeinfo 4.x on
  Windows -- these will not be reliably readable with the new
  stand-alone Info.  I think we need to have a solution for that as
  well, at least as a user option, if not automatically.  (One way of
  doing that automatically would be to detect the 4.x version from the
  file's preamble.)
 
 I've committed some changes that will likely work for these files as
 well as the other files. It worked for me on the files I've tried.
 Please have a try.

Thanks, it seems to work.

However, I see an annoying delay when going to a different node.  For
example, in the Emacs Lisp manual, go to the Index node, move to the
end of the index, type RET on one of the last entries, then type 'l'
to go back.  Bot going to Index and going back causes a visible delay,
which is about 1.5 sec here (this is a Core-i7 box).

Dies this have something to do with how you remove the CR characters
in the current trunk?

 (While testing these changes I noticed that there doesn't seem to be
 any support in the standalone Info reader for varying lengths of file
 preambles in a split file (some mention of that here:
 http://lists.gnu.org/archive/html/bug-texinfo/2013-08/msg00040.html),
 so there could be some more changes coming to do with tag table
 processing. )

Yes, different-size preambles are a PITA.

Thanks.



Re: Standalone Info reader cannot read Info files with CR-LF EOLs

2015-01-02 Thread Gavin Smith
On Fri, Jan 2, 2015 at 11:26 AM, Eli Zaretskii e...@gnu.org wrote:
 However, I see an annoying delay when going to a different node.  For
 example, in the Emacs Lisp manual, go to the Index node, move to the
 end of the index, type RET on one of the last entries, then type 'l'
 to go back.  Bot going to Index and going back causes a visible delay,
 which is about 1.5 sec here (this is a Core-i7 box).

I couldn't reproduce this. There is a slight delay for me when loading
the Index node for the first time, but not when going back. I would
guess it is because it is the Index node is a long node and it is
doing something when displaying it that takes a long time (like
calculating the positions of the starts of lines). I can't see how
this would be related to CR removal. Do other nodes in the file also
this problem? Have you tried any Info files without CR-LF line
endings?



Re: Standalone Info reader cannot read Info files with CR-LF EOLs

2015-01-02 Thread Eli Zaretskii
 Date: Fri, 2 Jan 2015 15:47:50 +
 From: Gavin Smith gavinsmith0...@gmail.com
 Cc: Texinfo bug-texinfo@gnu.org
 
 On Fri, Jan 2, 2015 at 11:26 AM, Eli Zaretskii e...@gnu.org wrote:
  However, I see an annoying delay when going to a different node.  For
  example, in the Emacs Lisp manual, go to the Index node, move to the
  end of the index, type RET on one of the last entries, then type 'l'
  to go back.  Bot going to Index and going back causes a visible delay,
  which is about 1.5 sec here (this is a Core-i7 box).
 
 I couldn't reproduce this. There is a slight delay for me when loading
 the Index node for the first time, but not when going back. I would
 guess it is because it is the Index node is a long node and it is
 doing something when displaying it that takes a long time (like
 calculating the positions of the starts of lines).

I don't think so: the Info viewer from Texinfo 4.13 doesn't have this
problem, I tried with the same files.

 I can't see how this would be related to CR removal. Do other nodes
 in the file also this problem?

OK, I will take a closer look.

 Have you tried any Info files without CR-LF line endings?

Yes, it doesn't matter.  The delay is there regardless.



Re: Standalone Info reader cannot read Info files with CR-LF EOLs

2014-12-28 Thread Gavin Smith
On Fri, Dec 26, 2014 at 9:52 PM, Eli Zaretskii e...@gnu.org wrote:
 It's a broken file.  I have no idea how they produced it, but it
 wasn't by stock makeinfo 4.8 on Windows, because that version already
 did both count byte offsets in makeinfo disregarding the CR
 characters, and had the EOL conversion function in the Info reader.  I
 just checked its code, which I still have on my disk.


I couldn't quickly find the code in C makeinfo for this - is it
something to do with file modes under Windows?

You are probably right that it wasn't produced by makeinfo under
Windows, but I did reproduce something similar when running makeinfo
4.13 under GNU/Linux with a Texinfo source file with CR-LF line
endings. See the attached input and output files. The whitespace in
the output Info file doesn't make a lot of sense, but the point is
that the preamble of the info file does contain a line with a CR-LF
ending, but the tag table doesn't take this into account - the node
separator is at byte 113 of the file exactly. It's possible that this
file was produced in a similar way.

There may be similar results if a file has mixed kinds of line endings
(or if it includes other files with different line endings). We can't
exactly say that the tag tables in files like these is incorrect.
Same goes for files produced under Windows where the CR bytes aren't
counted. We're just left with the problem of loading the files that
are out there properly.

 Its tag table accounts for the CR characters, which is wrong.  That's
 why the Info reader from 4.13 cannot read it correctly.  And that's
 exactly what will happen with Info files created by makeinfo 5.2 when
 someone tries to read them with Info from 4.13.

 Moreover, the same problem will happen with the Emacs Info reader.
 Emacs removes the CR characters when it reads files into buffers (any
 files, not just Info files), so it must have the tag table with
 offsets that disregard the CRs.

If it turns out there are files out there where the 1000-byte slack in
looking for a node isn't enough, we could tweak it, maybe by
increasing the slack as we get later on in the file. Maybe something
similar could be done in Emacs Info. If we could stop makeinfo
producing files with CR bytes it would stop this problem for newly
produced files.


cr-lf-endings-4.texi
Description: TeXInfo document


cr-lf-endings-4.info
Description: Binary data


Re: Standalone Info reader cannot read Info files with CR-LF EOLs

2014-12-28 Thread Eli Zaretskii
 Date: Sun, 28 Dec 2014 20:06:54 +
 From: Gavin Smith gavinsmith0...@gmail.com
 Cc: Texinfo bug-texinfo@gnu.org
 
 On Fri, Dec 26, 2014 at 9:52 PM, Eli Zaretskii e...@gnu.org wrote:
  It's a broken file.  I have no idea how they produced it, but it
  wasn't by stock makeinfo 4.8 on Windows, because that version already
  did both count byte offsets in makeinfo disregarding the CR
  characters, and had the EOL conversion function in the Info reader.  I
  just checked its code, which I still have on my disk.
 
 
 I couldn't quickly find the code in C makeinfo for this - is it
 something to do with file modes under Windows?

Yes.  Makeinfo simply counted the bytes in memory, and the CR
characters were added by C library functions as result of text-mode
writes.

 You are probably right that it wasn't produced by makeinfo under
 Windows, but I did reproduce something similar when running makeinfo
 4.13 under GNU/Linux with a Texinfo source file with CR-LF line
 endings. See the attached input and output files. The whitespace in
 the output Info file doesn't make a lot of sense, but the point is
 that the preamble of the info file does contain a line with a CR-LF
 ending, but the tag table doesn't take this into account - the node
 separator is at byte 113 of the file exactly. It's possible that this
 file was produced in a similar way.

Maybe.  There's of course any number of ways to produce a broken Info
file.

 There may be similar results if a file has mixed kinds of line endings
 (or if it includes other files with different line endings). We can't
 exactly say that the tag tables in files like these is incorrect.
 Same goes for files produced under Windows where the CR bytes aren't
 counted. We're just left with the problem of loading the files that
 are out there properly.

The most important requirement is to be able to read and display files
that were produced from valid Texinfo sources, either on Unix or on
Windows, and do it in a way that will work in at least the stand-alone
Info and in Emacs.  The situation we have right now with texi2any
doesn't fulfill this requirement, which is not good, I think.

  Its tag table accounts for the CR characters, which is wrong.  That's
  why the Info reader from 4.13 cannot read it correctly.  And that's
  exactly what will happen with Info files created by makeinfo 5.2 when
  someone tries to read them with Info from 4.13.
 
  Moreover, the same problem will happen with the Emacs Info reader.
  Emacs removes the CR characters when it reads files into buffers (any
  files, not just Info files), so it must have the tag table with
  offsets that disregard the CRs.
 
 If it turns out there are files out there where the 1000-byte slack in
 looking for a node isn't enough, we could tweak it, maybe by
 increasing the slack as we get later on in the file. Maybe something
 similar could be done in Emacs Info. If we could stop makeinfo
 producing files with CR bytes it would stop this problem for newly
 produced files.

There's still a problem of Info files produced by makeinfo 4.x on
Windows -- these will not be reliably readable with the new
stand-alone Info.  I think we need to have a solution for that as
well, at least as a user option, if not automatically.  (One way of
doing that automatically would be to detect the 4.x version from the
file's preamble.)



Re: Standalone Info reader cannot read Info files with CR-LF EOLs

2014-12-26 Thread Eli Zaretskii
 Date: Thu, 25 Dec 2014 18:05:25 +
 From: Gavin Smith gavinsmith0...@gmail.com
 Cc: Texinfo bug-texinfo@gnu.org
 
 On Thu, Dec 25, 2014 at 3:51 PM, Eli Zaretskii e...@gnu.org wrote:
  Today I discovered that the Info reader built from the current trunk
  cannot display any Info file that was produced natively on Windows (as
  opposed to Info files that come from distribution tarballs, which were
  produced on Unix).  The reader says it cannot find the Top node in any
  such Info file.
 
 I've commited a change that seemed to fix the problem on a small test
 file I made.

Thanks, it mostly did work for me too, although I did only limited
testing with --strict-node-location.  I say mostly because I did
find a few glitches:

 . Info still cannot find the Top node in a split manual (a manual
   produced with --no-split does work).

 . A cross-reference whose target node crosses a newline cannot be
   followed.  The example I have is in the Emacs Lisp manual, node
   Lisp Data Types.  There's a cross-reference there to Editing
   Types which has a newline between the two words.  When I try
   following it, Info shows me this:

  File: *manpages*,  Node: Editing
   Types,  Up: (dir)

  No manual entry for Editing.
  No manual entry for Types.

 . The file's encoding is not recognized when the file has CR-LF EOLs

I looked through the sources, and saw a few additional places where \r
before \n might need to be handled (some of them might be the reason
for the above-mentioned problems):

  'avoid_see_see', where it skips whitespace
  'scan_reference_label', where it turns off underlining at EOL
  'get_file_character_encoding', in the call to 'search_forward'

There might be more places, I'm not sure.

 I'll comment more in coming days.

Thanks, I hope we will be able to discuss the broader picture then.
As things stand, I don't see how fixing Info to skip/ignore CR will
solve the fundamental issue of incompatibility between the Emacs and
the stand-alone Info readers wrt byte offsets in tag tables.



Re: Standalone Info reader cannot read Info files with CR-LF EOLs

2014-12-26 Thread Eli Zaretskii
 Date: Fri, 26 Dec 2014 16:48:00 +
 From: Gavin Smith gavinsmith0...@gmail.com
 Cc: Texinfo bug-texinfo@gnu.org
 
 I discovered this problem with the gnucobpg.info file that is part
 of GNU Cobol (downloadable at
 http://opencobol.add1tocobol.com/guides/), which has many CR-LF line
 endings (but not consistently). I don't know exactly how this file was
 generated - the file preamble says
 
 This is gnucobpg.info, produced by makeinfo version 4.8 from
 gnucobpg.texi.

It's a broken file.  I have no idea how they produced it, but it
wasn't by stock makeinfo 4.8 on Windows, because that version already
did both count byte offsets in makeinfo disregarding the CR
characters, and had the EOL conversion function in the Info reader.  I
just checked its code, which I still have on my disk.

 - anyway, I had the problem mentioned that I found I couldn't access
 later nodes in the file. I tested just now with info 4.13 and wasn't
 able to access the Alphabet-Name-Clause or anything later in the
 file. That's the only Info file I remember encountering containing
 many CR bytes.

Its tag table accounts for the CR characters, which is wrong.  That's
why the Info reader from 4.13 cannot read it correctly.  And that's
exactly what will happen with Info files created by makeinfo 5.2 when
someone tries to read them with Info from 4.13.

Moreover, the same problem will happen with the Emacs Info reader.
Emacs removes the CR characters when it reads files into buffers (any
files, not just Info files), so it must have the tag table with
offsets that disregard the CRs.

 Since this claims to be produced by the 4.8 version (not 5.x)  whether
 the CR characters are counted in the tag table must depend on other,
 unknown factors.

I don't think we can or should try fixing broken Info files.  We
certainly shouldn't introduce new breakage into valid files because of
that.

 It could be helpful to make the GNU Cobol developers aware of this.

Agreed.

   . fix texi2any to produce tag tables that assume the CR characters
 are stripped from the Info file (my reading of the code is that it
 should not count CR characters before LF for the purposes of
 count_context value; or maybe it should simply open the Info output
 in 'unix' mode)
 
 The tag table containing the exact byte offsets is a lot simpler than
 having to remove all of the CR characters (or just CR characters
 before LF), and therefore less prone to incorrect implementation by
 any other Info-reading or -writing programs that might be written.

See above: we are breaking the Emacs Info reader, which is the other
reader important to the GNU project.  And we are creating an
interoperability problem vis-a-vis older versions of Texinfo.  I think
this is too high a price to pay.

 It enables accessing the correct place in the file without
 processing the entire file first. This could enable faster access of
 nodes by memory-mapping a file. Most of the time speed isn't an
 issue, but it's an idea I've had for speeding up searching the
 indices of all installed Info files at once. It could also be used
 to access a node of an Info file over a slow or expensive network
 connection without having to download the entire file.

I'm okay with these goals, but I don't think they are worth the
breakage mentioned above.  Some of the goals can be met even without
removing CRs, e.g., by using a larger slack when using the offsets
from the tag tables.

 I hope it's possible to make changes to the standalon Info reader to
 make it possible to access files with CR-LF line endings without
 having to interpret the tag table this way. At the same time, if it's
 easy to avoid outputting files with CR-LF line endings under Windows,
 then I think we should do so.

Changing makeinfo to output a Unix-style file will solve some of the
problems, yes.  I hope Patrice is reading this, and will comment.  But
the interoperability problem with files created by older makeinfo
versions will stay.  Maybe we should add an optional switch to Info to
give the user control of this.



Re: Standalone Info reader cannot read Info files with CR-LF EOLs

2014-12-25 Thread Gavin Smith
On Thu, Dec 25, 2014 at 3:51 PM, Eli Zaretskii e...@gnu.org wrote:
 Today I discovered that the Info reader built from the current trunk
 cannot display any Info file that was produced natively on Windows (as
 opposed to Info files that come from distribution tarballs, which were
 produced on Unix).  The reader says it cannot find the Top node in any
 such Info file.

I've commited a change that seemed to fix the problem on a small test
file I made. I'll comment more in coming days.