Pádraig Brady <[email protected]> writes:
> On 24/08/2025 20:15, Collin Funk wrote:
>> Pádraig Brady <[email protected]> writes:
>>
>>> I don't think getline() is a generic enough interface,
>>> especially for a utility used to introduce new lines.
>>>
>>> Consider for example you want 80x10 lines of "y",
>>> you could do this previously with:
>>>
>>> $ yes | tr -d '\n' | fold -w 80 | head -n 10
>>>
>>> While now fold will just consume all of memory.
>>> Instead you might fread() to an IO_BUFSIZE buffer?
>>
>> Good point.
>>
>> Using fread is also faster since it doesn't check for the delimiter upon
>> reading a character, and doesn't need to lock the stream each call
>> (since we use fread_unlocked).
>>
>> Will push the attached in a bit.
>
> Cool, that was quick!
> You might include something like the following test.
Actually, I realized that it is pointless to malloc line_out now. Since
the input buffer is a fixed size and we do not do case conversion or
anything that may change the number of bytes needed to represent the
output.
Will push this v2 patch in a bit instead.
> diff --git a/tests/fold/fold-characters.sh
> b/tests/fold/fold-characters.shindex 0b22aad6b..a6cc39b73 100755
> --- a/tests/fold/fold-characters.sh
> +++ b/tests/fold/fold-characters.sh
> @@ -58,4 +58,10 @@ compare column-exp2 column-out2 || fail=1
> fold --characters -w 10 input2 > character-out2 || fail=1
> compare character-exp2 character-out2 || fail=1
>
> +# Ensure bounded memory operation
> +vm=$(get_min_ulimit_v_ fold /dev/null) && {
> + yes | tr -d '\n' | (ulimit -v $(($vm+8000)) && fold 2>err) | head -n10 ||
> fail=1
> + compare /dev/null err || fail=1
> +}
> +
> Exit $fail
Thanks, I'll probably add this if I implement the --unicode option.
Since that will certainly require allocating memory based on my skimming
of unilbrk.h.
Collin
>From fb9016d50505b2cb66179dd526249063bc393846 Mon Sep 17 00:00:00 2001
Message-ID: <fb9016d50505b2cb66179dd526249063bc393846.1756069082.git.collin.fu...@gmail.com>
From: Collin Funk <[email protected]>
Date: Sun, 24 Aug 2025 12:05:41 -0700
Subject: [PATCH v2] fold: use fread instead of getline
* src/fold.c: Include ioblksize.h.
(fold_file): Use two IO_BUFSIZE-sized buffers. Use fread instead of
getline. Check for if we reached the end of file.
---
src/fold.c | 16 +++++++---------
1 file changed, 7 insertions(+), 9 deletions(-)
diff --git a/src/fold.c b/src/fold.c
index 153ce15a6..7bf30cd0b 100644
--- a/src/fold.c
+++ b/src/fold.c
@@ -25,6 +25,7 @@
#include "system.h"
#include "fadvise.h"
+#include "ioblksize.h"
#include "mcel.h"
#include "xdectoint.h"
@@ -136,11 +137,9 @@ fold_file (char const *filename, size_t width)
FILE *istream;
size_t column = 0; /* Screen column where next char will go. */
idx_t offset_out = 0; /* Index in 'line_out' for next char. */
- static char *line_out = nullptr;
- static idx_t allocated_out = 0;
- static char *line_in = nullptr;
- static size_t allocated_in = 0;
- static ssize_t length_in = 0;
+ static char line_out[IO_BUFSIZE];
+ static char line_in[IO_BUFSIZE];
+ static size_t length_in = 0;
int saved_errno;
if (STREQ (filename, "-"))
@@ -159,7 +158,7 @@ fold_file (char const *filename, size_t width)
fadvise (istream, FADVISE_SEQUENTIAL);
- while (0 <= (length_in = getline (&line_in, &allocated_in, istream)))
+ while (0 < (length_in = fread (line_in, 1, sizeof line_in, istream)))
{
char *p = line_in;
char *lim = p + length_in;
@@ -167,9 +166,6 @@ fold_file (char const *filename, size_t width)
for (; p < lim; p += g.len)
{
g = mcel_scan (p, lim);
- if (allocated_out - offset_out <= g.len)
- line_out = xpalloc (line_out, &allocated_out, g.len, -1,
- sizeof *line_out);
if (g.ch == '\n')
{
memcpy (line_out + offset_out, p, g.len);
@@ -243,6 +239,8 @@ fold_file (char const *filename, size_t width)
memcpy (line_out + offset_out, p, g.len);
offset_out += g.len;
}
+ if (feof (istream))
+ break;
}
saved_errno = errno;
--
2.51.0