bug#42034: option to truncate at end or should that be default?

2020-06-24 Thread L A Walsh
I think that would work in my specific use case.
I can think of other use cases that it probably wouldn't, but I'm not going
to worry
about those right now.  :-)

Thanks!
-l




On Wed, Jun 24, 2020 at 1:07 PM Andreas Schwab 
wrote:

> On Jun 24 2020, L A Walsh wrote:
>
> > A second option would be to truncate the file to the last position
> > written.
>
> $ truncate -r $src $dest
>
> Andreas.
>
> --
> Andreas Schwab, sch...@linux-m68k.org
> GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
> "And now for something completely different."
>


bug#42034: option to truncate at end or should that be default?

2020-06-24 Thread Bob Proulx
L A Walsh wrote:
> I allocated a large file of contiguous space (~3.6T), the size of a disk
> image I was going to copy into it with 'dd'.  I have the disk image
> 'overwrite' the existing file, in place ...

It's possible that you might want to be rescuing data from a failing
disk or doing other surgery upon it.  Therefore I want to mention
ddrescue here.

  https://www.gnu.org/software/ddrescue/

Of course it all depends upon the use case but ddrescue is a good tool
to have in the toolbox.  It might be just the right tool.

Take for example a RAID1 image on two failing drives that should be
identical but both are reporting errors.  If the failures do not
overlap then ddrescue can be used to merge the successful reads from
those two images producing one fully correct image.

Bob





bug#41792: Acknowledgement (dd function – message: "No boot sector on USB device"")

2020-06-24 Thread Bob Proulx
close 41792
thanks

Since the discussion has moved away from anything GNU Coreutils
related and doesn't seem to be reporting any bugs in any of the
utilities I am going to close the bug ticket.  But discussion may
continue here regardless.  If we see a dd bug we can re-open the
ticket.

Ricky Tigg wrote:
> The difference of device path is due to the fact that the USB media was
> plugged out after the write-operation was achieved on the Linux computer
> then plugged into a computer –Asus– whose Windows OS has to be restored,
> then plugged back to the same computer but to a *different* USB port. It's
> safe to open the present issue-ticket.

Hmm...  There is no reason that the Linux kernel would renumber the
device simply because it was removed and inserted again.  Therefore me
thinks that it was not cleanly removed.  Me thinks that something in
the system had mounted it keeping it busy preventing it from cleanly
being ejected.  This "something" may have been an automatic mounting
of it as many Desktop Environments unfortunately default to doing.
IMNHO automated mounting is a bad idea and should never be enabled by
default.

> *Source media*:
> https://www.microsoft.com/en-us/software-download/windows10ISO

The source media doesn't matter to GNU utilities.  The 'dd' utility
treats files as raw bytes and does not treat MS-Windows-10 ISO images
any differently than any other raw data.  It might be that or pictures
of your dog or random cosmic noise recorded from your radio.  It
doesn't matter.  It's just data.

Your Desktop Environment may take action however.  It is possible that
your DE will probe the device, detect that it is an ISO image, and
automatically mount that ISO image.  That's bad.  But that's your
Desktop Environment and unrelated to 'dd'.  But it always been a bad
idea.  Regardless of how many people do it.

> *Rufus v4.1.4* – I couldn't use it since The Windows OS installed is
> missing some system's files. Will convert it to fit on Fedora at release of
> version 33 which will update the uniformly mingw component and thus
> mingw64-headers which is old and is the cause of a known issue.
>
> I wrote the disc image as well using those tools then booted the USB device
> having the disc image written on.:
>
> *Fedora Media Writer v4.1.4* – Officially does not support Microsoft
> Windows disc images. I did not know that before writing.

My first thought was, huh?  Why would Fedora Media Writer not treat
files as raw files?  My second thought was that the question was for a
Fedora Media Writer mailing list as this bug ticket is not the place
to be discussing other random projects.

> *Unetbootin v677* – It writes partially the disc image thus the installer
> is operational partially. Issue was already reported by someone on Git.
>
> *Woeusb v3.3.1* – Installer is operational on BIOS but not on EFI systems.
> Issue was already reported by someone on Git.
>
> *Balena Etcher v1.5.9*8 x64 as AppImage format – The device is not listed
> at boot.

Gosh.  Reading your report makes MS-Windows seem like such a terrible
system!  I read about all of your pain of working on it.  You have
tried all of these tools and nothing is working for you.  It is
reading these types of reports that I am thankful I am working on a
Free(dom) Software operating system where things Just Work!

Meanwhile...  Let's get back to your information about 'dd'.

> $ file -b Win10_2004_Finnish_x64.iso
> ISO 9660 CD-ROM filesystem data 'CCCOMA_X64FRE_FI-FI_DV9' (bootable)

That looks like you were successfully able to write the ISO image to
the device.  Looks okay.

> *Component*: coreutils.x86_64  8.32-4.fc32.; *OS*: Linux Fedora

Good.

> Source of file:
> https://www.microsoft.com/en-us/software-download/windows10ISO
>
> Disc image file
> - checked against its SHA-256 checksum was correct
> - written successfully with that command:
> # dd if=Win10_2004_Finnish_x64.iso of=/dev/sdc bs=4M oflag=direct 
> status=progress && sync

I don't see any error messages.  That's good.

The oflag=direct should use direct I/O.  Which means that the 'sync'
shouldn't matter since there should be no file system buffer to flush.
It will simply flush other unrelated buffers.  Won't hurt though.

The bs size seems very small at 4M to me.  Especially for use with a
NAND flash USB storage device.  I would select a much larger size.  I
would probably use 64M which is likely to be an integral size of your
original ISO image but that should be verified.

> Once written, the partition is as follows:
> $ mount | fgrep /run/media/$USER
> /dev/sdb on /run/media/yk/CCCOMA_X64FRE_FI-FI_DV9 type udf
> (ro,nosuid,nodev,relatime,uid=1000,gid=1000,iocharset=utf8,uhelper=udisks2)

WHY is this mounted?  That seems like a problem.

You said that the device was removed and replaced and went from sdc to
sdb?!  Probably because it was mounted.

This feels like the root cause of all of your problems.  It feels to
me that something is automatically mounting the device.  That's 

bug#41657: md5sum: odd escaping for input filename \

2020-06-24 Thread Bob Proulx
close 41657
thanks

No one else has commented therefore I am closing the bug ticket.  But
the discussion may continue here.

Michael Coleman wrote:
> Thanks very much for your prompt reply.  Certainly, if this is
> documented behavior, it's not a bug.  I would have never thought to
> check the documentation as the behavior seems so strange.

I am not always so generous about documented behavior *never* being a
bug. :-)

> If I understand correctly, the leading backslash in the first field
> is an indication that the second field is escaped.  (The first field
> never needs escapes, as far as I can see.)

Right.  But it was available to clue in the md5sum and others that the
file name was an "unsafe" file name and was going to be escaped there.

> Not sure I would have chosen this, but it can't really be changed
> now.  But, I suspect that almost no real shell script would deal
> with this escaping correctly.  Really, I'd be surprised if there
> were even one example.  If so, perhaps it could be changed without
> trouble.

Let's talk about the shell scripting part.  Why would this ever need
to be parsed in a shell script?  And if so then that is precisely
where it would need to be done due to the file name!

Your own example was a file name that consisted of a single
backslash.  Since the backslash is the shell escape character then
handling that in a shell script would require escaping it properly
with a second backslash.

I will suggest that the primary use for the *sum utility output is as
input to the same utility later to check the content for differences.
That's arguably the primary use of it.

There are also cases where we will want to use the *sum utilities on a
single file.  That's fine.  I think the problematic case here might be
a usage like this usage.

  filename="\\"
  sum=$(md5sum "$filename" | awk '{print$1}')
  printf "%s\n" "$sum"
  \d41d8cd98f00b204e9800998ecf8427e

And then there is that extra backslash at the start of the hash.
Well, yes, that is unfortunate.  But in this case we already have the
filename in a variable and don't want the filename from md5sum.  This
is very similar to portability problems between different versions of
'wc' and other utilities too.  (Some 'wc' utils print leading spaces
and some do not.)

As you already deduced if md5sum does not have a file name then it
does not know if it is escaped or not.  Reading standard input instead
doesn't have a name and therefore "-" is used as a placeholder as per
the tradition.

  filename="\\"
  sum=$(md5sum < "$filename" | awk '{print$1}')
  printf "%s\n" "$sum"
  d41d8cd98f00b204e9800998ecf8427e

And because this is discussion I will note that the name is just one
of the possible names to a file.  Let's hard link it to a different
name.  And of course symbolic links are the same too.  A name is just
a pointer to a file.

  ln "$filename" foo
  md5sum foo
  d41d8cd98f00b204e9800998ecf8427e  foo

But I drift...

I think it likely you have already educated your people about the
problems and the solution was to read from stdin when the file name is
potentially untrusted "tainted" data.  (Since programming langauges
often refer to unknown untrusted data as "tainted" data for the
purpose of tracking what actions are safe upon it or not.  When taint
checking is enabled.)  Therefore if the name is unknown then it is
safer to avoid the name and use standard input.

And I suggest the same with other utilities such as 'wc' too.
Fortunately wc is not used to read back its own input.  Otherwise I am
sure someone would suggest that it would need the same escaping done
there too.  Example that thankfully does not actually exist:

  $ wc -l \\
  \0 \\

I am sure that if such a change were made that it would result in a
large wide spread breakage.  Let's hope that never happens.

Bob





bug#42034: option to truncate at end or should that be default?

2020-06-24 Thread Andreas Schwab
On Jun 24 2020, L A Walsh wrote:

> A second option would be to truncate the file to the last position
> written.

$ truncate -r $src $dest

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."





bug#42034: option to truncate at end or should that be default?

2020-06-24 Thread L A Walsh
I allocated a large file of contiguous space (~3.6T), the size of a disk
image I was going to copy into it with 'dd'.  I have the disk image
'overwrite' the existing file, in place using "conv=nocreat,notrunc" (among
other switches) and that works with the final file still using max-sized
8GB extents.

I realize that I _do_ want it to truncate the file to the actual size when
done.

The 'notrunc' switch doesn't work for this purpose as its meaning is
overloaded (it really specifies multiple behaviors) both the non-truncation
effect, as well as a directive to preserve any blocks not written during
that specific invocation of 'dd'.  A possible 3rd behavior arises from a
vague definition of a block.  Do they mean to preserve the data in the
block, or do they mean to preserve the position of the block on disk?  It
seems they mean to preserve data, but whether or not that also preserves a
place on disk isn't specified.

There really needs to be something to specify that writes occur "in-place"
such that no "_RE_-allocation" of blocks occurs (except to extend the file,
if needed)  A second option would be to truncate the file to the last
position written.


Maybe a oflag=overwrite, and a 'ftrunc' for 'final trunc' to the position
of the final byte written?