Re: Where can i get the design documentation for core utils?

2005-09-10 Thread David Feuer
On Sat, Sep 10, 2005 at 09:31:46AM +0200, Alfred M. Szmidt wrote:
> Design documents have nothing to do with free software, or open source
> for that matter.  For GNU coreutils there is little to talk about
> `design' documents, since such a document would be bigger than the
> actual source code (which I might add is quite easy to follow!) for
> GNU Coreutils.  In other words, the source code is your design
> document.

Unfortunately, the source code does not always have the internal
documentation it should, and can be quite hard to follow.  An excellent
example of this is sort.c.

The problem is not confined to coreutils.  GNU libc, for instance, does
not provide any documentation whatsoever on how it handles locales
internally.  Locales are compiled from human-readable standard format to
a completely undocumented internal format.  For some spectacularly
unreadable code, look at the source for strcoll and strxfrm.

David Feuer


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: csplit corrupts files with long lines

2005-09-10 Thread Jim Meyering
Tristan Miller <[EMAIL PROTECTED]> wrote:
> I'm reporting a bug with csplit coreutils 5.2.1, compiled from sources on a
> SuSE 9.3 system.  It seems this bug was previously reported over a year
> ago (see
> )
> but it was never squashed.
>
> In short, csplit produces corrupt output when the input file contains very
> long lines.  An example file is at
> , an XML file
> containing three articles from Wikipedia.  The second article was
> vandalized by a spammer who inserted a ridiculously long line (42280
> characters) full of links.
>
> If I try to split this file with
>
> $ csplit wikipedia.xml '//' '{*}'
>
> then the file with the second article, xx02, is garbled at the beginning of
> the long line.  See .

Thanks for reporting that!
Here's the fix I've just checked in:
[there's still a minor memory leak, but we'll deal with that separately]

2005-09-10  Jim Meyering  <[EMAIL PROTECTED]>

csplit could produce corrupt output, given input lines longer than 8KB

* src/csplit.c (load_buffer): Don't read from free'd memory
when handling lines longer than the initial buffer length.
(save_to_hold_area): Don't leak the previous hold_area buffer.
Reported by Tristan Miller and Luke Kendall.
* NEWS: Mention this.
* tests/misc/csplit: New test for this.

* src/csplit.c (load_buffer): Avoid integer overflow in buffer
size calculations for very long lines.

Index: NEWS
===
RCS file: /fetish/cu/NEWS,v
retrieving revision 1.307
retrieving revision 1.308
diff -u -p -u -r1.307 -r1.308
--- NEWS10 Sep 2005 00:08:28 -  1.307
+++ NEWS10 Sep 2005 14:07:59 -  1.308
@@ -136,6 +136,8 @@ GNU coreutils NEWS
   permissions like =xX and =u, and did not properly diagnose some invalid
   strings like g+gr, ug,+x, and +1.  These bugs have been fixed.
 
+  csplit could produce corrupt output, given input lines longer than 8KB
+
   dd now computes statistics using a realtime clock (if available)
   rather than the time-of-day clock, to avoid glitches if the
   time-of-day is changed while dd is running.  Also, it avoids
Index: src/csplit.c
===
RCS file: /fetish/cu/src/csplit.c,v
retrieving revision 1.144
retrieving revision 1.145
diff -u -p -u -r1.144 -r1.145
--- src/csplit.c9 Sep 2005 21:08:19 -   1.144
+++ src/csplit.c10 Sep 2005 13:56:45 -  1.145
@@ -251,16 +251,12 @@ interrupt_handler (int sig)
 }
 
 /* Keep track of NUM bytes of a partial line in buffer START.
-   These bytes will be retrieved later when another large buffer is read.
-   It is not necessary to create a new buffer for these bytes; instead,
-   we keep a pointer to the existing buffer.  This buffer *is* on the
-   free list, and when the next buffer is obtained from this list
-   (even if it is this one), these bytes will be placed at the
-   start of the new buffer. */
+   These bytes will be retrieved later when another large buffer is read.  */
 
 static void
 save_to_hold_area (char *start, size_t num)
 {
+  free (hold_area);
   hold_area = start;
   hold_count = num;
 }
@@ -386,7 +382,7 @@ record_line_starts (struct buffer_record
  lines++;
}
   else
-   save_to_hold_area (line_start, bytes_left);
+   save_to_hold_area (xmemdup (line_start, bytes_left), bytes_left);
 }
 
   b->num_lines = lines;
@@ -442,6 +438,7 @@ static void
 free_buffer (struct buffer_record *buf)
 {
   free (buf->buffer);
+  buf->buffer = NULL;
 }
 
 /* Append buffer BUF to the linked list of buffers that contain
@@ -495,7 +492,7 @@ load_buffer (void)
   if (bytes_wanted < hold_count)
 bytes_wanted = hold_count;
 
-  do
+  while (1)
 {
   b = get_new_buffer (bytes_wanted);
   bytes_avail = b->bytes_alloc; /* Size of buffer returned. */
@@ -504,8 +501,7 @@ load_buffer (void)
   /* First check the `holding' area for a partial line. */
   if (hold_count)
{
- if (p != hold_area)
-   memcpy (p, hold_area, hold_count);
+ memcpy (p, hold_area, hold_count);
  p += hold_count;
  b->bytes_used += hold_count;
  bytes_avail -= hold_count;
@@ -515,11 +511,18 @@ load_buffer (void)
   b->bytes_used += read_input (p, bytes_avail);
 
   lines_found = record_line_starts (b);
-  bytes_wanted = b->bytes_alloc * 2;
   if (!lines_found)
free_buffer (b);
+
+  if (lines_found || have_read_eof)
+   break;
+
+  if (xalloc_oversized (2, b->bytes_alloc))
+   xalloc_die ();
+  bytes_wanted = 2 * b->bytes_alloc;
+  free_buffer (b);
+  free (b);
 }
-  while (!lines_found && !have_read_eof);
 
   if (lines_found)
 save_buffer (b);
I

Re: Where can i get the design documentation for core utils?

2005-09-10 Thread Alfred M\. Szmidt
   Unfortunately, you've stumbled across one of the weakness of open
   source.

And unfortunatly you are crediting GNU coreutils to a movement that
doesn't care about freedom.

Design documents have nothing to do with free software, or open source
for that matter.  For GNU coreutils there is little to talk about
`design' documents, since such a document would be bigger than the
actual source code (which I might add is quite easy to follow!) for
GNU Coreutils.  In other words, the source code is your design
document.


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils