Bug#737498: [PATCH RFC] patch: when importing from email, RFC2047-decode From/Subject headers

2016-03-03 Thread Matt Mackall
On Thu, 2016-03-03 at 18:55 +0100, Julien Cristau wrote:
> # HG changeset patch
> # User Julien Cristau 
> # Date 1457026459 -3600
> #  Thu Mar 03 18:34:19 2016 +0100
> # Node ID 6c153cbad4a032861417dbba9d1d90332964ab5f
> # Parent  549ff28a345f595cad7e06fb08c2ac6973e2f030
> patch: when importing from email, RFC2047-decode From/Subject headers
> 
> I'm not too sure about the Subject part: it should be possible to use
> the charset information from the email (RFC2047 encoding and the
> Content-Type header), but mercurial seems to use its own encoding
> instead (in the test, that means the commit message ends up as ""
> if the import is done without --encoding utf-8).  Advice welcome.
> 
> Reported at https://bugs.debian.org/737498

You should probably immediately relay such reports upstream.

> diff --git a/mercurial/patch.py b/mercurial/patch.py
> --- a/mercurial/patch.py
> +++ b/mercurial/patch.py
> @@ -201,19 +201,28 @@ def extract(ui, fileobj):
>  # (this heuristic is borrowed from quilt)
>  diffre = re.compile(r'^(?:Index:[ \t]|diff[ \t]|RCS file: |'
>  r'retrieving revision [0-9]+(\.[0-9]+)*$|'
>  r'---[ \t].*?^\+\+\+[ \t]|'
>  r'\*\*\*[ \t].*?^---[ \t])', re.MULTILINE|re.DOTALL)
> +def decode_header(header):

FYI, names with underbars are against our coding convention, contrib/check-
commit ought to warn about this.

> +if header is None:
> +return None
> +parts = []
> +for part, charset in email.Header.decode_header(header):
> +if charset is None:
> +charset = 'ascii'

This will almost certainly explode on some emails. We should probably do
something like this:

- attempt to decode based on header garbage
- attempt to decode with UTF-8
- assume Latin-1 (not ascii)

> +parts.append(part.decode(charset))
> +return encoding.tolocal(u' '.join(parts).encode('utf-8'))

Using Unicode objects outside of encoding.py is strongly discouraged. If you
must, it'd be great to unambiguously mark them all with a leading u on the
variable name. This isn't a good fit for encoding.py since it uses a third
encoding besides UTF-8 and local. Probably belongs in mail.py.

-- 
Mathematics is the supreme nostalgia of our time.



Bug#632917: libc6: getaddrinfo fails prematurely; gethostbyname does not

2011-07-06 Thread Matt Mackall
Package: libc6
Version: 2.13-4
Severity: normal


If there are multiple nameservers in /etc/resolv.conf and the first of
them refuses DNS queries for some domains, getaddrinfo will fail while
gethostbyname succeeds.

This can result in VERY mysterious that-shouldn't-happen problems: 

- host works
- ping works
- browser works
- ssh and various other "properly written" apps fail

Example:

$ cat /etc/resolv.conf
# Generated by NetworkManager
nameserver 10.8.0.1<- my VPN DNS server (any DNS server without recursion)
nameserver 192.168.43.1   <- my phone company's server

$ host bitbucket.org
bitbucket.org has address 207.223.240.182
bitbucket.org has address 207.223.240.181
bitbucket.org mail is handled by 5 alt1.aspmx.l.google.com.
bitbucket.org mail is handled by 5 alt2.aspmx.l.google.com.
bitbucket.org mail is handled by 10 aspmx2.googlemail.com.
bitbucket.org mail is handled by 10 aspmx3.googlemail.com.
bitbucket.org mail is handled by 1 aspmx.l.google.com.

$ host bitbucket.org 10.8.0.1
Using domain server:
Name: 10.8.0.1
Address: 10.8.0.1#53
Aliases: 

Host bitbucket.org not found: 5(REFUSED)

$ ping -c 1 bitbucket.org
PING bitbucket.org (207.223.240.182) 56(84) bytes of data.
64 bytes from 207.223.240.182: icmp_req=1 ttl=50 time=316 ms

--- bitbucket.org ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 316.571/316.571/316.571/0.000 ms

$ ssh bitbucket.org
ssh: Could not resolve hostname bitbucket.org: Name or service not known

$ python
>>> import socket
>>> socket.gethostbyname("bitbucket.org")
'207.223.240.181'
>>> socket.getaddrinfo("bitbucket.org", 80, 10)
Traceback (most recent call last):
  File "", line 1, in 
socket.gaierror: [Errno -5] No address associated with hostname

I've used ltrace to verify that Python is calling getaddrinfo directly
and that it fails. Swapping the order of DNS servers makes the above
test succeed.

Desired behavior: getaddrinfo() should have EXACTLY the same
robust DNS resolution algorithm as gethostbyname() to avoid mysterious,
hard-to-diagnose application-dependent failure modes.


-- System Information:
Debian Release: wheezy/sid
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.39+ (SMP w/4 CPU cores)
Locale: LANG=C, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages libc6 depends on:
ii  libc-bin  2.13-4 Embedded GNU C Library: Binaries
ii  libgcc1   1:4.6.0-10 GCC support library

libc6 recommends no packages.

Versions of packages libc6 suggests:
ii  debconf [debconf-2.0] 1.5.39 Debian configuration management sy
pn  glibc-doc  (no description available)
ii  locales   2.13-4 Embedded GNU C Library: National L

-- debconf information excluded

-- 
Mathematics is the supreme nostalgia of our time.





-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#587665: [Pkg-sysvinit-devel] Bug#587665: Safety of early boot init of /dev/random seed

2010-07-15 Thread Matt Mackall
On Thu, 2010-07-15 at 20:33 -0300, Henrique de Moraes Holschuh wrote:
> On Mon, 05 Jul 2010, Matt Mackall wrote:
> > > > Here are our questions:
> > > > 
> > > > 1. How much data of unknown quality can we feed the random pool at boot,
> > > >before it causes damage (i.e. what is the threshold where we violate 
> > > > the
> > > >"you are not goint to be any worse than you were before" rule) ?
> > 
> > There is no limit. The mixing operations are computationally reversible,
> > which guarantees that no unknown degrees of freedom are clobbered when
> > mixing known data.
> 
> Good.  So, whatever we do, we are never worse off than we were before we did
> it, at least by design.
> 
> > > > 2. How dangerous it is to feed the pool with stale seed data in the next
> > > >boot (i.e. in a failure mode where we do not regenerate the seed 
> > > > file) ?
> > 
> > Not at all.
> >  
> > > > 3. What is the optimal size of the seed data based on the pool size ?
> > 
> > 1:1.
> 
> We shall try to keep it at 1:1, then.
> 
> > > > 4. How dangerous it is to have functions that need randomness (like
> > > >encripted network and partitions, possibly encripted swap with an
> > > >ephemeral key), BEFORE initializing the random seed ?
> > 
> > Depends on the platform. For instance, if you've got an unattended boot
> > off a Live CD on a machine with a predictable clock, you may get
> > duplicate outputs.
> 
> I.e. it is somewhat dangerous, and we should try to avoid it by design, so
> we should try to init it as early as possible.  Very well.
> 
> > > > 5. Is there an optimal size for the pool?  Does the quality of the 
> > > > randomness
> > > >one extracts from the pool increase or decrease with pool size?
> > 
> > Don't bother fiddling with the pool size.
> 
> We don't, but local admins often do, probably in an attempt to better handle
> bursts of entropy drainage.  So, we do want to properly support non-standard
> pool sizes in Debian if we can.

Unless they're manually patching their kernel, they probably aren't
succeeding. The pool resize ioctl was disabled ages ago. But there's
really nothing to support here: even the largest polynomial in the
source is only 2048 bits, or 256 bytes.

> > > > Basically, we need these answers to find our way regarding the following
> > > > decisions:
> > > > 
> > > > a) Is it better to seed the pool as early as possible and risk a larger 
> > > > time
> > > >window for problem (2) above, instead of the current behaviour where 
> > > > we
> > > >have a large time window where (4) above happens?
> > 
> > Earlier is better.
> > 
> > > > b) Is it worth the effort to base the seed file on the size of the pool,
> > > >instead of just using a constant size?  If a constant size is better,
> > > >which size would that be? 512 bytes? 4096 bytes? 16384 bytes?
> > 
> > 512 bytes is plenty.
> >
> > > > c) What is the maximum seed file size we can allow (maybe based on size 
> > > > of
> > > >the pool) to try to avoid problem (1) above ?
> > 
> > Anything larger than a sector is simply wasting CPU time, but is
> > otherwise harmless.
> 
> Well, a filesystem block is usually 1024 bytes, and a sector is 4096 bytes
> nowadays... :-)
> 


-- 
Mathematics is the supreme nostalgia of our time.





-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#587665: [Pkg-sysvinit-devel] Bug#587665: Safety of early boot init of /dev/random seed

2010-07-05 Thread Matt Mackall
On Sat, 2010-07-03 at 13:08 -0300, Henrique de Moraes Holschuh wrote:
> (adding Petter Reinholdtsen to CC, stupid MUA...)
> 
> On Sat, 03 Jul 2010, Henrique de Moraes Holschuh wrote:
> > Hello,
> > 
> > We are trying to enhance the Debian support for /dev/random seeding at early
> > boot, and we need some expert help to do it right.  Maybe some of you could
> > give us some enlightenment on a few issues?
> > 
> > Apologies in advance if I got the list of Linux kernel maintainers wrong.  I
> > have also copied LKML just in case.
> > 
> > A bit of context:  Debian tries to initialize /dev/random, by restoring the
> > pool size and giving it some seed material (through a write to /dev/random)
> > from saved state stored in /var.
> > 
> > Since we store the seed data in /var, that means we only feed it to
> > /dev/random relatively late in the boot sequence, after remote filesystems
> > are available.  Thus, anything that needs random numbers earlier than that
> > point will run with whatever the kernel managed to harness without any sort
> > of userspace help (which is probably not much, especially on platforms that
> > clear RAM contents at reboot, or after a cold boot).
> > 
> > We take care of regenerating the stored seed data as soon as we use it, in
> > order to avoid as much as possible the possibility of reuse of seed data.
> > This means that we write the old seed data to /dev/random, and immediately
> > copy poolsize bytes from /dev/urandom to the seed data file.
> > 
> > The seed data file is also regenerated prior to shutdown.
> > 
> > We would like to clarify some points, so as to know how safe they are on
> > face of certain error modes, and also whether some of what we do is
> > necessary at all.  Unfortunately, real answers require more intimate
> > knowledge of the theory behind Linux' random pools than we have in the
> > Debian initscripts team.
> > 
> > Here are our questions:
> > 
> > 1. How much data of unknown quality can we feed the random pool at boot,
> >before it causes damage (i.e. what is the threshold where we violate the
> >"you are not goint to be any worse than you were before" rule) ?

There is no limit. The mixing operations are computationally reversible,
which guarantees that no unknown degrees of freedom are clobbered when
mixing known data.

> > 2. How dangerous it is to feed the pool with stale seed data in the next
> >boot (i.e. in a failure mode where we do not regenerate the seed file) ?

Not at all.
 
> > 3. What is the optimal size of the seed data based on the pool size ?

1:1.

> > 4. How dangerous it is to have functions that need randomness (like
> >encripted network and partitions, possibly encripted swap with an
> >ephemeral key), BEFORE initializing the random seed ?

Depends on the platform. For instance, if you've got an unattended boot
off a Live CD on a machine with a predictable clock, you may get
duplicate outputs.

> > 5. Is there an optimal size for the pool?  Does the quality of the 
> > randomness
> >one extracts from the pool increase or decrease with pool size?

Don't bother fiddling with the pool size.

> > Basically, we need these answers to find our way regarding the following
> > decisions:
> > 
> > a) Is it better to seed the pool as early as possible and risk a larger time
> >window for problem (2) above, instead of the current behaviour where we
> >have a large time window where (4) above happens?

Earlier is better.

> > b) Is it worth the effort to base the seed file on the size of the pool,
> >instead of just using a constant size?  If a constant size is better,
> >which size would that be? 512 bytes? 4096 bytes? 16384 bytes?

512 bytes is plenty.
 
> > c) What is the maximum seed file size we can allow (maybe based on size of
> >the pool) to try to avoid problem (1) above ?

Anything larger than a sector is simply wasting CPU time, but is
otherwise harmless.

-- 
Mathematics is the supreme nostalgia of our time.





-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Bug#475168: [Pkg-gnutls-maint] Bug#475168: certtool --generate-dh-params is ridiculously wasteful of entropy

2008-04-11 Thread Matt Mackall

On Fri, 2008-04-11 at 16:03 +0200, Simon Josefsson wrote:
> [EMAIL PROTECTED] writes:
> 
> > Simon Josefsson <[EMAIL PROTECTED]> wrote:
> >> [EMAIL PROTECTED] writes:
> >>> That paper deserves a longer reply, but even granting every claim it
> >>> makes, the only things it complains about are forward secrecy (is it
> >>> feasible to reproduce earlier /dev/*random outputs after capturing the
> >>> internal state of the pool from kernel memory?) and entropy estimation
> >>> (is there really as much seed entropy as /dev/random estimates?).
> >>>
> >>> The latter is only relevant to /dev/random.
> >>
> >> Why's that?  If the entropy estimation are wrong, you may have too
> >> little or no entropy.  /dev/urandom can't give you more entropy than
> >> /dev/random in that case.
> >
> > The quality or lack thereof of the kernel's entropy estimation is relevant
> > only to /dev/random because /dev/urandom's operation doesn't depend on
> > the entropy estimates.  If you're using /dev/urandom, it doesn't matter
> > if the kernel's entropy estimation is right, wrong, or commented out of
> > the source code.
> 
> My point is that I believe that the quality of /dev/urandom depends on
> the quality of /dev/random.  If you have found a problem in /dev/random,
> I believe it would affect /dev/urandom as well.

Again, the /dev/random entropy estimate is irrelevant to /dev/urandom
because it degrades to a conventional PRNG.

> > Well, he calls /dev/random "blindingly fast" in that thread, which appears
> > to differ from your opinion. :-)
> 
> It was /dev/urandom, but well, you are right.  On my machine, I get
> about 3.9MB/s from /dev/urandom sustained.  That is slow.  /dev/zero
> yields around 1.4GB/s.  I'm not sure David had understood this.  The
> real problem is that reading a lot of data from /dev/urandom makes the
> /dev/random unusable, so any process that reads a lot of data from
> /dev/urandom will receive complaints from applications that reads data
> from /dev/random.  I would instead consider this a design problem in the
> kernel.

..one that's long since been fixed. Reading from /dev/urandom always
leaves enough entropy for /dev/random to reseed. If you have a steady
input of environmental entropy, /dev/random will not be starved.

-- 
Mathematics is the supreme nostalgia of our time.




-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]