Re: [coreutils] tr: case mapping anomaly
Pádraig Brady wrote: On 25/09/10 07:53, Jim Meyering wrote: Eric Blake wrote: On 09/24/2010 04:47 PM, Pádraig Brady wrote: I was just looking at a bug reported to fedora there where this abort()s $ LC_ALL=en_US tr '[:upper:] ' '[:lower:]' Ouch! Thanks for reporting it here. How many more bugs lurk in tr... Consolation: this one is a failure to diagnose invalid inputs. I found a few more issues: Nice catches. Thanks for all the work. Please mention the above abort-inducing case in the log along with the other three. While you're at it, please use LC_ALL= there, not LANG=, since the latter is a glibc-ism. This valid translation spec aborted: LANG=en_US tr '[:upper:]- ' '[:lower:]_' The above does not abort w/glibc when LC_ALL=C happens to be set. This misaligned conversion spec was allowed: LANG=C tr 'A-Y[:lower:]' 'a-z[:upper:]' This misaligned spec was allowed by extending the class: LANG=C tr '[:upper:] ' '[:lower:]' ... + tr: fix various issues with case conversion classes. In some locales + valid conversion specifications caused tr to abort, while some invalid + specifications were undiagnosed. + [bugs introduced in coreutils 6.9.90 and 6.9.92] Two separate bugs! Thanks for tracking them down. I'm surprised they are so recent. I suppose that means I introduced both. Let's see... These two seem like the only candidates: 6efd10462d8103208f4575f0b5edddf841c7d87c af5d0c363a52e787a4311a11f035209eecdc4115 Whichever they are, please list them in the commit log. ** New features cp now accepts the --attributes-only option to not copy file data, diff --git a/src/tr.c b/src/tr.c index a5b6810..3adc85f 100644 --- a/src/tr.c +++ b/src/tr.c @@ -1177,6 +1177,77 @@ card_of_complement (struct Spec_list *s) return cardinality; } +/* Discard the lengths associated with a case conversion, + as using the actual number of upper or lower case characters + is problematic when they don't match in some locales. + Also ensure the case conversion classes in string2 are + aligned correctly with those in string1. + Note POSIX says the behavior of `tr [:upper:] [:upper:]' + is undefined. Therefore we allow it (unlike Solaris) + and treat it as a no-op. */ + +static void +validate_case_classes (struct Spec_list *s1, struct Spec_list *s2) +{ + size_t upper_chars = 0; + size_t lower_chars = 0; Please name these n_upper and n_lower, so each is obviously a counter. Otherwise, seeing the name out of context might evoke an array. + int i; Please make this unsigned int, since it never needs negative values. + int c1=0, c2=0; These days, I prefer one per line, and with spaces: int c1 = 0; int c2 = 0; Hmm.. I see you're following existing style that did int c1, c2;, though in existing code, those didn't have initializers. And the int i; was essentially moved. And there are others. I blame the author ;-) (me, 18 years ago) + count old_s1_len = s1-length; + count old_s2_len = s2-length; + struct List_element *s1_tail = s1-tail; + struct List_element *s2_tail = s2-tail; + bool s1_new_element = true; + bool s2_new_element = true; ... + +# Before coreutils 8.6 the disparat number of upper and lower s/disparat/disparate/ +# characters in some locales, triggered abort()s and invalid behavior +if test $(LC_ALL=en_US.ISO-8859-1 locale charmap 2/dev/null) = ISO-8859-1; +then + ( No big deal, but why run this in a subshell? It doesn't seem necessary here. +export LC_ALL=en_US.ISO-8859-1 +tr '[:upper:] ' '[:lower:]' /dev/null 2out _fail=1 +echo 'tr: when translating with string1 longer than string2, +the latter string must not end with a character class' exp +compare out exp || _fail=1 + +# Ensure when there are a different number of elements +# in each string, we validate the case mapping correctly +tr 'ab[:lower:]' '0-1[:upper:]' /dev/null || _fail=1 It might be nice to ensure that e.g., abc.xyz maps to ABC.XYZ. +# Ensure we extend string2 appropriately +tr '[:upper:]- ' '[:lower:]_' /dev/null || _fail=1 Similarly, that ABC-XYZ maps to abc_xyz Thanks again! +# Ensure the size of the case classes are accounted +# for as a unit. +echo 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' | +tr '[:upper:]A-B' '[:lower:]0' out || _fail=1 +echo '00cdefghijklmnopqrstuvwxyz' exp +compare out exp || _fail=1 ...
Re: [coreutils] tr: case mapping anomaly
On 29/09/10 07:45, Jim Meyering wrote: Pádraig Brady wrote: +# characters in some locales, triggered abort()s and invalid behavior +if test $(LC_ALL=en_US.ISO-8859-1 locale charmap 2/dev/null) = ISO-8859-1; +then + ( No big deal, but why run this in a subshell? It doesn't seem necessary here. Yes that is overkill in this case. I had copied the sub-shell bit from my locally improved join-i18n test. I've just pushed that. cheers, Pádraig.
Re: [coreutils] tr: case mapping anomaly
On 09/29/2010 06:40 AM, Pádraig Brady wrote: +# Ensure the size of the case classes are accounted +# for as a unit. +echo 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' | +tr '[:upper:]A-B' '[:lower:]0'out || _fail=1 +echo '00cdefghijklmnopqrstuvwxyz' exp Huh? A and B are both in [:upper:]; when a character is listed more than once in string1, it is only transliterated according to the first listing. I think this should be 'abc...' not '00c...' for the expected results. Does POSIX specify that? Hmm - POSIX appears to be silent for both tr and m4's translit (but whereas 'tr long short' is unspecified and we extend the last byte of short to match, m4's translit(data,long,short) is explicitly documented as deleting bytes from long with no match in short). That's not what we do, nor what I would expect. $ echo 'A' | LANG=C tr 'AA' '01' 1 And Solaris' tr does this as well. But m4 behaves in the way I specified: $ echo 'translit(a,aa,01)' | m4 0 $ echo 'translit(a,aa,01)' | /usr/ccs/bin/m4 0 Time for me to ask for clarification from the Austin Group, I suppose. But given existing practice, I guess the argument should be that tr is explicitly different than m4. -- Eric Blake ebl...@redhat.com+1-801-349-2682 Libvirt virtualization library http://libvirt.org
bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Jim Meyering wrote: jeff.liu wrote: jeff.liu wrote: Hi Jim, Thanks for your prompt response, I will fix this issue when all review done. Hi Jim, For my current implementation, I just have another thought to remove the char *fname from struct extent_scan, and add a new item int errno to save the errno set by ioctl(2) or lseek(2) if either call failed. at first, I am intended to use char *fname for the debugging purpose inside, however, maybe its better to do such things outside of the module according to the return value and errno. this change can not only reduce the memory allocation for 'fname' but also make a neatly open_extent_scan() interface. /* Structure used to reserve extent scan information per file. */ struct extent_scan { int errno; }; void open_extent_scan (int src_fd, struct extent_scan **scan); Is it better? That would be better. Since there is only one diagnostic using that file name you can do the same from the caller. In fact, why store errno in that struct at all? Can't you just ensure that errno is set to the proper value when it fails? Indeed, there is no need to store 'errno'. Thanks, -Jeff
bug#6131: [PATCH]: fiemap support for efficient sparse file copy
Jim Meyering wrote: jeff.liu wrote: Sorry for the delay. This is the new patch to isolate the stuff regarding to extents reading to a new module. and teach cp(1) to make use of it. Jeff, I applied your patch to my rebased fiemap-copy branch. My first step was to run the usual ./bootstrap ./configure make make check make check failed on due to a double free in your new code: (x86_64, Fedora 13, ext4 working directory) To get details, I made this temporary modification: Hi Jim, I am sorry for the fault, it fixed at the patch below. Would you please revie at your convenience? Changes: 1. fix write_zeros() as Jim's comments, thanks for pointing this out. 2. remove char const *fname from struct extent_scan. 3. change the signature of open_extent_scan() from void open_extent_scan(struct extent_scan **scan) to void open_extent_scan(struct extent_scan *scan); the reason is I'd like to reduce once memory allocation for the extent_scan variable, instead, using stack to save it. 4. remove close_extent_scan() from a function defined at extent-scan.c to extent-scan.h as a Macro definination, but it does nothing for now, since initial extent scan defined at stack. 5. add a macro free_extents_info() defined at extent-scan.h to release the memory allocated to extent info which should be called combine with get_extents_info(), it just one line, so IMHO, define it as macro should be ok. I have done the memory check via `valgrind`, no issue found. make test against cp/sparse-fiemap failed at the extent compare stage, but the file content is identical to each other by comparing those two files j1/j2 manually. Is it make sense if we verify them through diff(1) since the testing file is in small size? or we have to merge the contig extents from the output of `filefrag', I admit I have not dig into the filefrag-extent-compare at the moment, I need to recall the perl language syntax. :-P. From 50a3338db06442fa2d789fd65175172d140cc96e Mon Sep 17 00:00:00 2001 From: Jie Liu jeff@oracle.com Date: Wed, 29 Sep 2010 15:35:43 +0800 Subject: [PATCH 1/1] cp: add a new module for scanning extents * src/extent-scan.c: Source code for scanning extents. Call open_extent_scan() to initialize extent scan. Call get_extents_info() to get a number of extents for each iteration. * src/extent-scan.h: Header file of extent-scan.c. Wrap free_extent_info() as macro define to release the space allocated extent_info per extent scan. Wrap close_extent_scan() as macro define but do nothing at the moment. * src/Makefile.am: Reference it and link it to copy_source. * src/copy.c: Make use of the new module, replace fiemap_copy() with extent_copy(). Signed-off-by: Jie Liu jeff@oracle.com --- src/Makefile.am |2 +- src/copy.c| 197 ++-- src/extent-scan.c | 113 ++ src/extent-scan.h | 68 ++ 4 files changed, 296 insertions(+), 84 deletions(-) create mode 100644 src/extent-scan.c create mode 100644 src/extent-scan.h diff --git a/src/Makefile.am b/src/Makefile.am index 7d56312..2412b16 100644 --- a/src/Makefile.am +++ b/src/Makefile.am @@ -459,7 +459,7 @@ uninstall-local: fi; \ fi -copy_sources = copy.c cp-hash.c +copy_sources = copy.c cp-hash.c extent-scan.c # Use `ginstall' in the definition of PROGRAMS and in dependencies to avoid # confusion with the `install' target. The install rule transforms `ginstall' diff --git a/src/copy.c b/src/copy.c index f48c74d..42ec300 100644 --- a/src/copy.c +++ b/src/copy.c @@ -35,6 +35,7 @@ #include buffer-lcm.h #include copy.h #include cp-hash.h +#include extent-scan.h #include error.h #include fcntl--.h #include file-set.h @@ -63,10 +64,6 @@ #include sys/ioctl.h -#ifndef HAVE_FIEMAP -# include fiemap.h -#endif - #ifndef HAVE_FCHOWN # define HAVE_FCHOWN false # define fchown(fd, uid, gid) (-1) @@ -153,74 +150,79 @@ clone_file (int dest_fd, int src_fd) #endif } -#ifdef __linux__ -# ifndef FS_IOC_FIEMAP -# define FS_IOC_FIEMAP _IOWR ('f', 11, struct fiemap) -# endif -/* Perform a FIEMAP copy, if possible. - Call ioctl(2) with FS_IOC_FIEMAP (available in linux 2.6.27) to - obtain a map of file extents excluding holes. This avoids the - overhead of detecting holes in a hole-introducing/preserving copy, - and thus makes copying sparse files much more efficient. Upon a - successful copy, return true. If the initial ioctl fails, set - *NORMAL_COPY_REQUIRED to true and return false. Upon any other - failure, set *NORMAL_COPY_REQUIRED to false and return false. */ static bool -fiemap_copy (int src_fd, int dest_fd, size_t buf_size, - off_t src_total_size, char const *src_name, - char const *dst_name, bool *normal_copy_required) +write_zeros (int fd, uint64_t n_bytes) { - bool last = false; - union { struct fiemap f; char c[4096]; } fiemap_buf; - struct fiemap *fiemap
bug#7129: french implementation of dh
Hello all, A minor problem in the french implementation of dh: The first column has an offset for example, with df -h i have: Sys. de fichiersTail. Occ. Disp.%Occ.Monté sur /dev/sdb1 14G4,8G8,4G 37% / none 2,0G 352K2,0G 1% /dev none 2,0G 2,9M2,0G 1% /dev/shm none 2,0G 96K 2,0G1% /var/run none 2,0G 0 2,0G0% /var/lock none 2,0G 0 2,0G0% /lib/init/rw /dev/sdc1 294G 40G 239G 15% /donnees /dev/sdb6 48G339M 45G1% /home /dev/sr1 644M 644M0100% /media/WD SmartWare /dev/sdd1 298G 265G33G90%/media/My Passport /dev/sda1 140G 24G 117G 17%/media/FAFCFC2DFCFBE1B5 but it should be: Sys. de fichiers Tail.Occ.Disp. %Occ. Monté sur /dev/sdb1 14G4,8G8,4G 37% / none 2,0G 352K2,0G 1% /dev none 2,0G 2,9M2,0G 1% /dev/shm none 2,0G 96K 2,0G1% /var/run none 2,0G 0 2,0G0% /var/lock none 2,0G 0 2,0G0% /lib/init/rw /dev/sdc1 294G 40G 239G 15% /donnees /dev/sdb6 48G339M 45G1% /home /dev/sr1 644M 644M0100% /media/WD SmartWare /dev/sdd1 298G 265G33G90%/media/My Passport /dev/sda1 140G 24G 117G 17%/media/FAFCFC2DFCFBE1B5 regards Rudy
bug#7129: french implementation of dh
On 29/09/10 07:33, rudy klein wrote: Hello all, A minor problem in the french implementation of dh: The first column has an offset for example, with df -h i have: Sys. de fichiersTail. Occ. Disp.%Occ.Monté sur /dev/sdb1 14G4,8G8,4G 37% / This is a translation issue that already looks fixed. In any case we're working on a more general column alignment fix. cheers, Pádraig.
bug#7085: fmt (GNU coreutils) 6.10
Isn't the more modular flow that Eric suggested more in keeping with the Unix philosophy and the better solution? If you want to do both stripping of carriage returns and word wrapping together then you can pipe them together. tr -d '\r' somefile | fmt Or if you are calling it as a stdin filter such as within an editor: tr -d '\r' | fmt Bob Yes, it is indeed. I don't know exactly why I chose to try (and ultimately fail) writing that patch. I think the koan ESR wrote for The Rootless Root: The Unix Koans Of Master Foo has some truth to it: “There is more Unix-nature in one line of shell script than there is in ten thousand lines of C.” Finally in true koan fashion, I am enlightened. :^) William