Re: dd (coreutils) 5.97 used power of 10 not 2 for calculating MB

2007-01-23 Thread Pádraig Brady
Paul Eggert wrote:
> "Dat Head" <[EMAIL PROTECTED]> writes:
> 
>> dd if=/dev/zero of=/dev/null count=100 bs=1024k
>> 100+0 records in
>> 100+0 records out
>> 104857600 bytes (105 MB) copied, 0.00933139 seconds, 11.2 GB/s
>> ---^^^ should be 100 MB
> 
> No, "MB" means megabytes (i.e., 10**6 bytes).  I guess you want
> mebibytes (i.e., 2**20 bytes), but the standard abbreviation for that
> is "MiB", not "MB".  See .
> 
> It might be reasonable to add support for binary multiples to "dd",
> but for media the decimal numbers are probably more useful.  As you
> mentioned, most media are measured in decimal multiples nowadays.

There is support for binary multiples in dd,
as I've summarized in the help output from my truncate util¹

 is a number which may be optionally followed
by the following multiplicative suffixes:
  b  512
  KB1000
  K 1024
  MB   1000*1000
  M1024*1024
and so on for G, T, P, E, Z, Y

> My favorite was the old "1.44 MB" floppy, which contained 1.44 * 1024
> * 1000 bytes.  Almost anything is better than that sort of confusion!

cool! You learn something new everyday.

cheers,
Pádraig.

¹ http://www.pixelbeat.org/scripts/truncate



___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: dd (coreutils) 5.97 used power of 10 not 2 for calculating MB

2007-01-23 Thread Paul Eggert
Pádraig Brady <[EMAIL PROTECTED]> writes:

> There is support for binary multiples in dd,

Yes, but that's for the operands of dd, e.g., "dd bs=512M" talks about
a block size 512 * 1024 * 1024 bytes, as opposed to "dd bs=512MB"
which uses 512 * 1000 * 1000.  But Dat Head is asking for binary
multiples in the stderr messages, e.g.,

   $ dd bs=512M count=1024 if=/dev/zero of=/dev/null
   1024+0 records in
   1024+0 records out
   549755813888 bytes (550 GB) copied, 13.7008 s, 40.1 GB/s

Currently these messages always use powers of 10, not 2, even if the
block size and counts are powers of 2.  Dat Head wants that last line
to say "(512 GiB)" and "37.4 GiB/s".  That will require a new option,
I think.


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Fwd: dd (coreutils) 5.97 used power of 10 not 2 for calculating MB

2007-01-23 Thread Dat Head

i'm not saying any of this needs to happen, just something that i noticed.

instead of another flag it might be easiest to just make the stderr message
match whatever was specified in the bs= option (e.g. if you use M then
use ^2 in stderr, if you use MB then use ^10)  of course what to do
when ibs= specifies one way and obs= specifies another is a whole
'nother story!

earlier somebody mentioned MiB usually stands for ^2  is that right!?,
i always thought that the "i" between the MB was from million (as
opposed to mega)

On 1/23/07, Paul Eggert <[EMAIL PROTECTED]> wrote:

Pádraig Brady <[EMAIL PROTECTED]> writes:

> There is support for binary multiples in dd,

Yes, but that's for the operands of dd, e.g., "dd bs=512M" talks about
a block size 512 * 1024 * 1024 bytes, as opposed to "dd bs=512MB"
which uses 512 * 1000 * 1000.  But Dat Head is asking for binary
multiples in the stderr messages, e.g.,

   $ dd bs=512M count=1024 if=/dev/zero of=/dev/null
   1024+0 records in
   1024+0 records out
   549755813888 bytes (550 GB) copied, 13.7008 s, 40.1 GB/s

Currently these messages always use powers of 10, not 2, even if the
block size and counts are powers of 2.  Dat Head wants that last line
to say "(512 GiB)" and "37.4 GiB/s".  That will require a new option,
I think.



___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


sort behavior - Ubuntu problem?

2007-01-23 Thread Kevin Scannell

I suspect that the behavior I describe below is caused by broken
locale definition files, but I wanted to get an expert opinion on this
before I go trying to find who maintains those upstream.

I know about the "sort does not sort" FAQ, and I don't think that I've
fallen into that trap, so please keep reading!

Anyway, here's a sample file, utf-8 encoded text.
http://borel.slu.edu/obair/test.txt

$ uname -a
Linux borel 2.6.17-10-generic #2 SMP Fri Oct 13 18:45:35 UTC 2006 i686 GNU/Linux

$ sort --version
sort (GNU coreutils) 5.96
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software.  You may redistribute copies of it under the terms of
the GNU General Public License .
There is NO WARRANTY, to the extent permitted by law.

Written by Mike Haertel and Paul Eggert.

$ locale
LANG=
LC_CTYPE="en_US.utf8"
LC_NUMERIC="en_US.utf8"
LC_TIME="en_US.utf8"
LC_COLLATE="en_US.utf8"
LC_MONETARY="en_US.utf8"
LC_MESSAGES="en_US.utf8"
LC_PAPER="en_US.utf8"
LC_NAME="en_US.utf8"
LC_ADDRESS="en_US.utf8"
LC_TELEPHONE="en_US.utf8"
LC_MEASUREMENT="en_US.utf8"
LC_IDENTIFICATION="en_US.utf8"
LC_ALL=en_US.utf8

$ sort test.txt
a
á
áa
aá
az
áz
áa
aá

The acute-a collates after the "a" (correctly) except when there are
additional non-ASCII characters on the same line.   I see this also
with ga_IE.utf8 which is the locale I usually use, and the one I care
about.  This sort order is definitely wrong there.

The thing that leads me to believe that the problem lies with the
locale definition file is that on a different machine, running Gentoo,
same conditions as above, this file sorts as I want it to, in
dictionary order:

$ uname -a
Linux turing 2.6.17-gentoo-r4 #2 SMP Mon Aug 28 12:53:48 CDT 2006
x86_64 AMD Opteron(tm) Processor 246 AuthenticAMD GNU/Linux

$ sort test.txt
a
á
aá
áa
az
áz
aá
áa

Any advice would be appreciated.
Kevin
___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: feature request: gzip/bzip support for sort

2007-01-23 Thread Jim Meyering
Dan Hipschman <[EMAIL PROTECTED]> wrote:
> On Sun, Jan 21, 2007 at 07:14:03PM +0100, Jim Meyering wrote:
>> Not to look the gift horse in the mouth, but it'd be nice
>> if you wrote ChangeLog entries, too.  And even (gasp! :-)
>> a test case or two.  Of course, we'd expect such a test case
>> (probably named tests/misc/sort-compress, and based on
>> tests/sample-test) to have this line in it:
>>
>>   . $srcdir/../very-expensive
>>
>> If you don't have time for that, I'll take care of it, eventually.
>
> Here's some tests.  They're actually not very expensive.  Of course,
> you need to "chmod +x sort-compress".
>
>
> 2007-01-22  Dan Hipschman  <[EMAIL PROTECTED]>
>
>   Test sort compression.
>   * tests/misc/Makefile.am: Add the test.
>   * tests/misc/sort-compress: New file containing the tests.

Thanks for all the work!
I've checked in your changes, then changed NEWS a little:

** New features

  By default, sort now compresses any temporary file it writes.
  When sorting very large inputs, this usually results in sort using
  far less temporary disk space and in improved performance.

Additionally, I'm probably going to change the documentation so that
people will be less likely to depend on being able to run a separate
program.  To be precise, I'd like to document that the only valid values
of GNUSORT_COMPRESSOR are the empty string, "gzip" and "bzip2"[*].
Then we will have the liberty to remove the exec calls and use library
code instead, thus making the code a little more efficient -- but mainly,
more robust.

If someone makes a good case for allowing an arbitrary compressor, we can
allow that later.  But if we were to add (and document) this feature now,
we might well be stuck with it for a long time.

[*] If gzip and bzip2 are good enough for tar, why should sort make any
compromise (exec'ing some other program) in order to be more flexible?


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils