from:"Phil Frost"

[issue37168] Decimal divisions sometimes 10x or 100x too large

2019-07-10 Thread Phil Frost



Phil Frost  added the comment:

Not yet. I had a hell of a time getting gdb to follow gunicorn forks so it 
would watch the workers but not also every other thing using popen or 
os.system(). And the lldb package on Alpine doesn't seem to work.

So we're currently in the process of testing a Debian container for production, 
hoping the issue just goes away. If not, I'll continue digging in with a 
debugger.

--
nosy: +Phil Frost2

___
Python tracker 
<https://bugs.python.org/issue37168>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37168] Decimal divisions sometimes 10x or 100x too large

2019-06-07 Thread Phil Frost



Phil Frost  added the comment:

I'm afraid that won't be of much use since the object is statically allocated. 
Besides, the trouble isn't finding what created the object, but what mutated an 
int object which should be immutable.

--

___
Python tracker 
<https://bugs.python.org/issue37168>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37168] Decimal divisions sometimes 10x or 100x too large

2019-06-06 Thread Phil Frost



Phil Frost  added the comment:

Probably too many C extensions to feasibly audit them all. Also we can't rule 
out something busted in Alpine. So I'm going to set a watchpoint on 
small_ints[1]->ob_ival with a little gdb script to dump core right when it 
changes; hopefully inspecting that backtrace will lead directly to the culprit.

--

___
Python tracker 
<https://bugs.python.org/issue37168>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37168] Decimal divisions sometimes 10x or 100x too large

2019-06-06 Thread Phil Frost



Phil Frost  added the comment:

So this is uminteresting

(gdb) p small_ints[0]
$2 = (PyIntObject *) 0x558ce65df528
(gdb) p *small_ints[0]
$3 = {ob_refcnt = 18, ob_type = 0x7fe019b694c0 , ob_ival = -5}
(gdb) p *small_ints[1]
$4 = {ob_refcnt = 65, ob_type = 0x7fe019b694c0 , ob_ival = -3}
(gdb) p *small_ints[2]
$5 = {ob_refcnt = 204, ob_type = 0x7fe019b694c0 , ob_ival = -3}
(gdb) p *small_ints[3]
$6 = {ob_refcnt = 1872, ob_type = 0x7fe019b694c0 , ob_ival = -2}

When is 1/1=10? When -5 + 1 = -3.

So now I suppose the question is how this comes to be.

--

___
Python tracker 
<https://bugs.python.org/issue37168>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37168] Decimal divisions sometimes 10x or 100x too large

2019-06-06 Thread Phil Frost



Phil Frost  added the comment:

> Alpine Linux apparently uses musl.  Is that well supported and tested?

Heh. Not really.

Switching to a Debian container is probably just about as much work as building 
Python from source. Would that be preferable?

--

___
Python tracker 
<https://bugs.python.org/issue37168>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37168] Decimal divisions sometimes 10x or 100x too large

2019-06-06 Thread Phil Frost



Phil Frost  added the comment:

skrah: Yes, that's correct. Since I can only produce this bug in production it 
will take me some days to build and validate a source build. But absent any 
better ideas, I will try.

tim.peters: I've observed this bug across hundreds of EC2 hosts, in dozens of 
code paths, with all kinds of inputs. Moreover, the hosts aren't displaying any 
other symptoms of hardware failure such as random segfaults or mysteriously 
corrupted data.

I've also deeply investigated two cores now which show specifically that `exp` 
seems to get 2 added when it should have been 1. I have a hard time explaining 
how a hardware failure can cause precisely the same failure so reliably.

So I doubt hardware is to blame.

Although, it does seem the issue occurs in "clumps" on individual hosts. So we 
might go 10 hours without seeing the issue, then it may happen 5 times within 
30 minutes on one host. We might observe 1 or 2 more such clumps on the same 
host until the next deploy of the application, at which point all the 
containers are replaced with fresh ones. So this suggests there is some 
ephemeral state within a host that creates a propensity for the issue.

I've also been unable to reproduce the problem in a development environment, 
even when that development environment is using the same kernel, instance 
class, and docker container as production. So I suspect the bug is precipitated 
by some particular concurrency or interaction that I haven't been able to 
replicate.

--

___
Python tracker 
<https://bugs.python.org/issue37168>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37168] Decimal divisions sometimes 10x or 100x too large

2019-06-05 Thread Phil Frost



Phil Frost  added the comment:

I should note that while we do observe this issue with Decimal, my 
troubleshooting so far indicates the logic in decimal.py is fine. It seems 
instead the instructions in decimal.py aren't being correctly executed.

I'm not sure if the problem is limited to only Decimal, or if we just notice it 
there because it's correlated with financial code, which when encountering the 
bug, introduces an error which is eventually caught by the accounting 
department. There could very well be errors happening at least frequently 
elsewhere which simply go unnoticed.

--

___
Python tracker 
<https://bugs.python.org/issue37168>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37168] Decimal divisions sometimes 10x or 100x too large

2019-06-05 Thread Phil Frost



New submission from Phil Frost :

We've observed instances of Decimal divisions being incorrect a few times per 
day in a web application serving millions of requests daily. I've been unable 
to reproduce the issue but have been investigating core dumps which suggest 
either some issue in the C Python implementation or its execution environment.

Python 2.7.15 on Alpine Linux Docker containers.

A slightly modified decimal.py is attached. It's been modified to check that 
the answer is approximately correct and dumps core if not. It also has a few 
local variables added to provide additional insight. I've annotated the file 
with values from the core dump and my reasoning about what should happen.

The crux of the problem is the loop at line 1389. There are 3 ways to determine 
how many times this loop executed:

1. the number of zeros removed from coeff
2. the number of times exp is incremented
3. the number of times division_counter is incremented

Oddly, #1 and #3 indicate the loop happened 27 times, while #2 indicates the 
loop happened 28 times.

One possible explanation (which makes about as much sense as any other) is that 
`exp += 1` sometimes adds 2 instead of 1. This means the loop happens 1 time 
fewer than it should, leaving `coeff` 10 times bigger than it should properly 
be. Or (very rarely) this happens twice and the result is 100 times bigger.

I find it very odd that something as basic as `+= 1` should not work, but at 
the moment it is the best explanation I have.

Unfortunately I can't share the core dump, as I've only been able to observe 
the issue in production so the core contains private information. But I'd 
welcome any ideas for further exploration. Or perhaps someone is aware of an 
existing bug that would explain this behavior.

--
components: Interpreter Core
files: decimal.py
messages: 344772
nosy: Phil Frost
priority: normal
severity: normal
status: open
title: Decimal divisions sometimes 10x or 100x too large
versions: Python 2.7
Added file: https://bugs.python.org/file48396/decimal.py

___
Python tracker 
<https://bugs.python.org/issue37168>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue37168] Decimal divisions sometimes 10x or 100x too large

[issue37168] Decimal divisions sometimes 10x or 100x too large

[issue37168] Decimal divisions sometimes 10x or 100x too large

[issue37168] Decimal divisions sometimes 10x or 100x too large

[issue37168] Decimal divisions sometimes 10x or 100x too large

[issue37168] Decimal divisions sometimes 10x or 100x too large

[issue37168] Decimal divisions sometimes 10x or 100x too large

[issue37168] Decimal divisions sometimes 10x or 100x too large

8 matches

Site Navigation

Mail list logo

Footer information