Re: [coreutils] tr: case mapping anomaly

2010-09-29 Thread Jim Meyering
Pádraig Brady wrote:
 On 25/09/10 07:53, Jim Meyering wrote:
 Eric Blake wrote:
 On 09/24/2010 04:47 PM, Pádraig Brady wrote:
 I was just looking at a bug reported to fedora there where this abort()s

   $ LC_ALL=en_US tr '[:upper:] ' '[:lower:]'

 Ouch!  Thanks for reporting it here.
 How many more bugs lurk in tr...
 Consolation: this one is a failure to diagnose invalid inputs.

 I found a few more issues:

Nice catches.  Thanks for all the work.

Please mention the above abort-inducing case in the log along with the
other three.  While you're at it, please use LC_ALL= there, not LANG=,
since the latter is a glibc-ism.

 This valid translation spec aborted:
   LANG=en_US tr '[:upper:]- ' '[:lower:]_'

The above does not abort w/glibc when LC_ALL=C happens to be set.

 This misaligned conversion spec was allowed:
   LANG=C tr 'A-Y[:lower:]' 'a-z[:upper:]'
 This misaligned spec was allowed by extending the class:
   LANG=C tr '[:upper:] ' '[:lower:]'

...
 +  tr: fix various issues with case conversion classes.  In some locales
 +  valid conversion specifications caused tr to abort, while some invalid
 +  specifications were undiagnosed.
 +  [bugs introduced in coreutils 6.9.90 and 6.9.92]

Two separate bugs!  Thanks for tracking them down.
I'm surprised they are so recent.
I suppose that means I introduced both.  Let's see...
These two seem like the only candidates:

  6efd10462d8103208f4575f0b5edddf841c7d87c
  af5d0c363a52e787a4311a11f035209eecdc4115

Whichever they are, please list them in the commit log.

  ** New features

cp now accepts the --attributes-only option to not copy file data,
 diff --git a/src/tr.c b/src/tr.c
 index a5b6810..3adc85f 100644
 --- a/src/tr.c
 +++ b/src/tr.c
 @@ -1177,6 +1177,77 @@ card_of_complement (struct Spec_list *s)
return cardinality;
  }

 +/* Discard the lengths associated with a case conversion,
 +   as using the actual number of upper or lower case characters
 +   is problematic when they don't match in some locales.
 +   Also ensure the case conversion classes in string2 are
 +   aligned correctly with those in string1.
 +   Note POSIX says the behavior of `tr [:upper:] [:upper:]'
 +   is undefined.  Therefore we allow it (unlike Solaris)
 +   and treat it as a no-op.  */
 +
 +static void
 +validate_case_classes (struct Spec_list *s1, struct Spec_list *s2)
 +{
 +  size_t upper_chars = 0;
 +  size_t lower_chars = 0;

Please name these n_upper and n_lower, so each is obviously a counter.
Otherwise, seeing the name out of context might evoke an array.

 +  int i;

Please make this unsigned int, since it never needs negative values.

 +  int c1=0, c2=0;

These days, I prefer one per line, and with spaces:

int c1 = 0;
int c2 = 0;

Hmm..  I see you're following existing style that did int c1, c2;,
though in existing code, those didn't have initializers.
And the int i; was essentially moved.  And there are others.
I blame the author ;-) (me, 18 years ago)

 +  count old_s1_len = s1-length;
 +  count old_s2_len = s2-length;
 +  struct List_element *s1_tail = s1-tail;
 +  struct List_element *s2_tail = s2-tail;
 +  bool s1_new_element = true;
 +  bool s2_new_element = true;
...
 +
 +# Before coreutils 8.6 the disparat number of upper and lower

s/disparat/disparate/

 +# characters in some locales, triggered abort()s and invalid behavior
 +if test $(LC_ALL=en_US.ISO-8859-1 locale charmap 2/dev/null) = ISO-8859-1;
 +then
 +  (

No big deal, but why run this in a subshell?
It doesn't seem necessary here.

 +export LC_ALL=en_US.ISO-8859-1
 +tr '[:upper:] ' '[:lower:]'  /dev/null 2out  _fail=1
 +echo 'tr: when translating with string1 longer than string2,
 +the latter string must not end with a character class'  exp
 +compare out exp || _fail=1
 +
 +# Ensure when there are a different number of elements
 +# in each string, we validate the case mapping correctly
 +tr 'ab[:lower:]' '0-1[:upper:]'  /dev/null || _fail=1

It might be nice to ensure that e.g., abc.xyz maps to ABC.XYZ.

 +# Ensure we extend string2 appropriately
 +tr '[:upper:]- ' '[:lower:]_'  /dev/null || _fail=1

Similarly, that ABC-XYZ maps to abc_xyz

Thanks again!

 +# Ensure the size of the case classes are accounted
 +# for as a unit.
 +echo 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' |
 +tr '[:upper:]A-B' '[:lower:]0' out ||  _fail=1
 +echo '00cdefghijklmnopqrstuvwxyz'  exp
 +compare out exp || _fail=1
...



Re: [coreutils] tr: case mapping anomaly

2010-09-29 Thread Pádraig Brady
On 29/09/10 07:45, Jim Meyering wrote:
 Pádraig Brady wrote:
 +# characters in some locales, triggered abort()s and invalid behavior
 +if test $(LC_ALL=en_US.ISO-8859-1 locale charmap 2/dev/null) = 
 ISO-8859-1;
 +then
 +  (
 
 No big deal, but why run this in a subshell?
 It doesn't seem necessary here.

Yes that is overkill in this case.
I had copied the sub-shell bit from my
locally improved join-i18n test.

I've just pushed that.

cheers,
Pádraig.



Re: [coreutils] tr: case mapping anomaly

2010-09-29 Thread Eric Blake

On 09/29/2010 06:40 AM, Pádraig Brady wrote:

+# Ensure the size of the case classes are accounted
+# for as a unit.
+echo 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' |
+tr '[:upper:]A-B' '[:lower:]0'out ||  _fail=1
+echo '00cdefghijklmnopqrstuvwxyz'  exp


Huh?  A and B are both in [:upper:]; when a character is listed more
than once in string1, it is only transliterated according to the first
listing.  I think this should be 'abc...' not '00c...' for the expected
results.


Does POSIX specify that?


Hmm - POSIX appears to be silent for both tr and m4's translit (but 
whereas 'tr long short' is unspecified and we extend the last byte of 
short to match, m4's translit(data,long,short) is explicitly documented 
as deleting bytes from long with no match in short).



That's not what we do, nor what I would expect.

$ echo 'A' | LANG=C tr 'AA' '01'
1


And Solaris' tr does this as well.

But m4 behaves in the way I specified:

$ echo 'translit(a,aa,01)' | m4
0
$ echo 'translit(a,aa,01)' | /usr/ccs/bin/m4
0

Time for me to ask for clarification from the Austin Group, I suppose. 
But given existing practice, I guess the argument should be that tr is 
explicitly different than m4.


--
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org



bug#6131: [PATCH]: fiemap support for efficient sparse file copy

2010-09-29 Thread jeff.liu
Jim Meyering wrote:
 jeff.liu wrote:
 jeff.liu wrote:
 Hi Jim,

 Thanks for your prompt response, I will fix this issue when all review done.
 Hi Jim,

 For my current implementation, I just have another thought to remove the 
 char *fname from struct
 extent_scan, and add a new item int errno to save the errno set by 
 ioctl(2) or lseek(2) if either
 call failed.
 at first, I am intended to use char *fname for the debugging purpose 
 inside, however, maybe its
 better to do such things outside of the module according to the return value 
 and errno. this change
 can not only reduce the memory allocation for 'fname' but also make a neatly 
 open_extent_scan()
 interface.

 /* Structure used to reserve extent scan information per file.  */
 struct extent_scan
 {
  
   int errno;
   
 };

 void
 open_extent_scan (int src_fd, struct extent_scan **scan);


 Is it better?
 
 That would be better.
 Since there is only one diagnostic using that file name
 you can do the same from the caller.
 
 In fact, why store errno in that struct at all?
 Can't you just ensure that errno is set to the proper value when it fails?
Indeed, there is no need to store 'errno'.

Thanks,
-Jeff
 
 
 






bug#6131: [PATCH]: fiemap support for efficient sparse file copy

2010-09-29 Thread jeff.liu
Jim Meyering wrote:
 jeff.liu wrote:
 Sorry for the delay.

 This is the new patch to isolate the stuff regarding to extents reading to a 
 new module. and teach
 cp(1) to make use of it.
 
 Jeff,
 
 I applied your patch to my rebased fiemap-copy branch.
 My first step was to run the usual
 
   ./bootstrap  ./configure  make  make check
 
 make check failed on due to a double free in your new code:
 (x86_64, Fedora 13, ext4 working directory)
 
 To get details, I made this temporary modification:
Hi Jim,

I am sorry for the fault, it fixed at the patch below.
Would you please revie at your convenience?

Changes:

1. fix write_zeros() as Jim's comments, thanks for pointing this out.
2. remove char const *fname from struct extent_scan.
3. change the signature of open_extent_scan() from void 
open_extent_scan(struct extent_scan
**scan) to void open_extent_scan(struct extent_scan *scan); the reason is 
I'd like to reduce once
memory allocation for the extent_scan variable, instead, using stack to save it.
4. remove close_extent_scan() from a function defined at extent-scan.c to 
extent-scan.h as a Macro
definination, but it does nothing for now, since initial extent scan defined at 
stack.
5. add a macro free_extents_info() defined at extent-scan.h to release the 
memory allocated to
extent info which should be called combine with get_extents_info(), it just one 
line, so IMHO,
define it as macro should be ok.

I have done the memory check via `valgrind`, no issue found.
make test against cp/sparse-fiemap failed at the extent compare stage, but the 
file content is
identical to each other by comparing those two files j1/j2 manually.
Is it make sense if we verify them through diff(1) since the testing file is in 
small size?
or we have to merge the contig extents from the output of `filefrag', I admit I 
have not dig into
the filefrag-extent-compare at the moment, I need to recall the perl language 
syntax. :-P.


From 50a3338db06442fa2d789fd65175172d140cc96e Mon Sep 17 00:00:00 2001
From: Jie Liu jeff@oracle.com
Date: Wed, 29 Sep 2010 15:35:43 +0800
Subject: [PATCH 1/1] cp: add a new module for scanning extents

* src/extent-scan.c: Source code for scanning extents.
  Call open_extent_scan() to initialize extent scan.
  Call get_extents_info() to get a number of extents for each iteration.
* src/extent-scan.h: Header file of extent-scan.c.
  Wrap free_extent_info() as macro define to release the space allocated 
extent_info per extent scan.
  Wrap close_extent_scan() as macro define but do nothing at the moment.
* src/Makefile.am: Reference it and link it to copy_source.
* src/copy.c: Make use of the new module, replace fiemap_copy() with 
extent_copy().

Signed-off-by: Jie Liu jeff@oracle.com
---
 src/Makefile.am   |2 +-
 src/copy.c|  197 ++--
 src/extent-scan.c |  113 ++
 src/extent-scan.h |   68 ++
 4 files changed, 296 insertions(+), 84 deletions(-)
 create mode 100644 src/extent-scan.c
 create mode 100644 src/extent-scan.h

diff --git a/src/Makefile.am b/src/Makefile.am
index 7d56312..2412b16 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -459,7 +459,7 @@ uninstall-local:
  fi; \
fi

-copy_sources = copy.c cp-hash.c
+copy_sources = copy.c cp-hash.c extent-scan.c

 # Use `ginstall' in the definition of PROGRAMS and in dependencies to avoid
 # confusion with the `install' target.  The install rule transforms `ginstall'
diff --git a/src/copy.c b/src/copy.c
index f48c74d..42ec300 100644
--- a/src/copy.c
+++ b/src/copy.c
@@ -35,6 +35,7 @@
 #include buffer-lcm.h
 #include copy.h
 #include cp-hash.h
+#include extent-scan.h
 #include error.h
 #include fcntl--.h
 #include file-set.h
@@ -63,10 +64,6 @@

 #include sys/ioctl.h

-#ifndef HAVE_FIEMAP
-# include fiemap.h
-#endif
-
 #ifndef HAVE_FCHOWN
 # define HAVE_FCHOWN false
 # define fchown(fd, uid, gid) (-1)
@@ -153,74 +150,79 @@ clone_file (int dest_fd, int src_fd)
 #endif
 }

-#ifdef __linux__
-# ifndef FS_IOC_FIEMAP
-#  define FS_IOC_FIEMAP _IOWR ('f', 11, struct fiemap)
-# endif
-/* Perform a FIEMAP copy, if possible.
-   Call ioctl(2) with FS_IOC_FIEMAP (available in linux 2.6.27) to
-   obtain a map of file extents excluding holes.  This avoids the
-   overhead of detecting holes in a hole-introducing/preserving copy,
-   and thus makes copying sparse files much more efficient.  Upon a
-   successful copy, return true.  If the initial ioctl fails, set
-   *NORMAL_COPY_REQUIRED to true and return false.  Upon any other
-   failure, set *NORMAL_COPY_REQUIRED to false and return false.  */
 static bool
-fiemap_copy (int src_fd, int dest_fd, size_t buf_size,
- off_t src_total_size, char const *src_name,
- char const *dst_name, bool *normal_copy_required)
+write_zeros (int fd, uint64_t n_bytes)
 {
-  bool last = false;
-  union { struct fiemap f; char c[4096]; } fiemap_buf;
-  struct fiemap *fiemap 

bug#7129: french implementation of dh

2010-09-29 Thread rudy klein
Hello all,

A minor problem in the french implementation of dh: The first column has an
offset

for example, with df -h i have:

Sys. de fichiersTail.  Occ.  Disp.%Occ.Monté sur
/dev/sdb1   14G4,8G8,4G   37% /
none  2,0G   352K2,0G   1%  /dev
none  2,0G   2,9M2,0G   1%  /dev/shm
none  2,0G   96K 2,0G1%  /var/run
none  2,0G   0 2,0G0%  /var/lock
none  2,0G   0 2,0G0%  /lib/init/rw
/dev/sdc1   294G  40G 239G  15% /donnees
/dev/sdb6   48G339M   45G1%   /home
/dev/sr1 644M  644M0100%   /media/WD SmartWare
/dev/sdd1   298G  265G33G90%/media/My Passport
/dev/sda1   140G  24G 117G   17%/media/FAFCFC2DFCFBE1B5

but it should be:

Sys. de fichiers  Tail.Occ.Disp.   %Occ. Monté sur
/dev/sdb1   14G4,8G8,4G   37% /
none  2,0G   352K2,0G   1%  /dev
none  2,0G   2,9M2,0G   1%  /dev/shm
none  2,0G   96K 2,0G1%  /var/run
none  2,0G   0 2,0G0%  /var/lock
none  2,0G   0 2,0G0%  /lib/init/rw
/dev/sdc1   294G  40G 239G  15% /donnees
/dev/sdb6   48G339M   45G1%   /home
/dev/sr1 644M  644M0100%   /media/WD SmartWare
/dev/sdd1   298G  265G33G90%/media/My Passport
/dev/sda1   140G  24G 117G   17%/media/FAFCFC2DFCFBE1B5


regards
Rudy


bug#7129: french implementation of dh

2010-09-29 Thread Pádraig Brady
On 29/09/10 07:33, rudy klein wrote:
 Hello all,
 
 A minor problem in the french implementation of dh: The first column has an
 offset

 
 for example, with df -h i have:
 
 Sys. de fichiersTail.  Occ.  Disp.%Occ.Monté sur
 /dev/sdb1   14G4,8G8,4G   37% /

This is a translation issue that already looks fixed.
In any case we're working on a more general column alignment fix.

cheers,
Pádraig.





bug#7085: fmt (GNU coreutils) 6.10

2010-09-29 Thread William Plusnick
 Isn't the more modular flow that Eric suggested more in keeping with
 the Unix philosophy and the better solution?  If you want to do both
 stripping of carriage returns and word wrapping together then you can
 pipe them together.

   tr -d '\r'  somefile | fmt

 Or if you are calling it as a stdin filter such as within an editor:

   tr -d '\r' | fmt

 Bob

Yes, it is indeed. I don't know exactly why I chose to try (and
ultimately fail) writing that patch. I think the koan ESR wrote for
The Rootless Root: The Unix Koans Of Master Foo has some truth to it:
“There is more Unix-nature in one line of shell script than there is
in ten thousand lines of C.”

Finally in true koan fashion, I am enlightened. :^)

William