[issue32475] Add ability to query number of buffered bytes available on buffered I/O

2018-01-01 Thread Tim Savannah

New submission from Tim Savannah :

Hello!

This is my first time submitting to Python bug tracker, so please bear with me 
if I miss/mess something.

So a little bit of relevant background, I'm an avid python developer with many 
open-source projects.

One of the projects I wrote and maintain is called "python-nonblock", which 
provides pure-python non-blocking I/O methods. It is available at:

https://github.com/kata198/python-nonblock


I'll only include the relevant details to this topic.

So, one of the features provided by the python-nonblock in the nonblock_read 
function. This allows you to read from a stream whilst ensuring the operation 
does not block.

It achieves this by basically following this pattern:

1. Call "select" on the stream and see if any data is available. If not, sleep 
and reiterate.

2. If there is data available, it reads a single byte from the stream and 
stores is to return at the end.

It supports most streams and sockets which have a real fd backing (and thus 
support "select").


There are a couple reasons you may need to do this, e.x. certain interactive 
scenarios, I won't go into it too much.


The python-nonblock library also bundles a layer which sits on top of that 
method, called BackgroundRead. This interface launches a thread into the 
background, which reads blocks of arbitrary (user-provided) size into a 
variable on an object. So you could have a processing app which reads blocks of 
data from a source, processes them in the foreground whilst they continue to 
load up in the background.


That's all well and good, but we are getting to the meat of the issue: for 
large sources of available data (like a file on disk), while this method of 
operation is effective, it is SLOW, due to the overhead of a select syscall and 
libpython for every single byte. This is especially true on a loaded system, as 
it makes us a prime candidate for the scheduler to preempt our task and context 
switch us off the cpu!


I've been looking into ways to improve this, and have actually seemed to have 
struck gold. So on a standard linux HDD filesystem, the I/O block size is 4096. 
So, thanks to readahead, on a non-fragmented file, a read call for 1 byte will 
actually load up to 4096 bytes. libpython has this extra data, and calls like 
read1 will return it if available, but it does not expose this number. Thus, 
libraries like mine can't take advantage of it, which means that for a 
filesystem I/O read on linux, 4095 out of 4096 iterations of the two-step loop 
above are wasted effort.

So I've written up an additional function to the C code for BufferedReader, 
"getbuffn", which returns the number bytes currently buffered in libpython, but 
not yet returned to the application, and modified python-nonblock ( in the 
4.0branch_getbuffn branch ) with simple additions to take advantage of this 
extra information, when available. So if we detect that there are 4095 bytes 
already read and pending, we know for certain we can grab all 4095 bytes at 
once without blocking, or even needing a call to select!

So the new pattern for buffered streams that have getbuffn available, we can:

1. Select to see if data is available, if not rest and reiterate

2. Read a single byte off the stream

3. Check getbuffn, and if it returns >0 read that many bytes off the stream 
(Guaranteed to not block)


The performance improvements are * MASSIVE * with this change in place.

   On a 5 meg file from a VM which is running on an SSD, I average the 
following:
 
 Loaded system, non-cached I/O:
 
One-Fell-Swoop file.read() - .3 seconds
 
getbuffn-patched python and this impl - 3.1 seconds
 
unpatched python and this impl - 41 to 55 = 44 seconds. ( avg min - avg 
max)
 
 Unloaded system, cached I/O:
 
One-Fell-Swoop file.read() - .0017 seconds
 
getbuffn-patched python and this impl - .034 seconds
 
unpatched python and this impl - 45 seconds. ( not as variable as 
loaded system )

That's a 13,235% (thirteen-thousand two-hundred and five percent) performance 
boost on just a 5MB file, which just grows exponentially as the size of the 
dataset increases. These gains are just simply not possible without this 
information available (the amount remaining in the buffer).


So I've attached the simple patch (only additions, no modifications to existing 
functions) against python 3.6.4. The version of python-nonblock which supports 
this enhanced approach when available (and more details at the top of the 
README) can be found here: 

https://github.com/kata198/python-nonblock/tree/4.0branch_getbuffn

I've looked at the python 2.7 code too, and it seems that with minimal effort 
this same functionality can be provided there as well!


So, I'm an experienced developer who is willing to put in the legwork. Is this 
something that is possible to get merged upstream

[issue6178] Core error in Py_EvalFrameEx 2.6.2

2012-09-07 Thread Tim Savannah

Tim Savannah added the comment:

As an update (since someone else has this problem) this issue stopped once we 
converted from centos to archlinux (www.archlinux.org). May be an underlying 
issue with something in the centos environment. We used the same modules same 
configuration basically same compilation for python.

--

___
Python tracker 
<http://bugs.python.org/issue6178>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6178] Core error in Py_EvalFrameEx 2.6.2

2009-06-03 Thread Tim Savannah

Tim Savannah  added the comment:

to update, no additional output was seen from pydebug.

--

___
Python tracker 
<http://bugs.python.org/issue6178>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6178] Core error in Py_EvalFrameEx 2.6.2

2009-06-03 Thread Tim Savannah

Tim Savannah  added the comment:

recompiled with pydebug enabled, and recompiled all site-packages. Still
getting exceptions, however they are occuring within the python binary
now and not libpython2.6.1 .

pythonLaunch.py[25914]: segfault at 0068 rip
004c7694 rsp 4181a4c0 error 4
pythonLaunch.py[1421]: segfault at 0068 rip 004c7694
rsp 432914c0 error 4
pythonLaunch.py[2552]: segfault at 0068 rip 004c7694
rsp 41f7d4c0 error 4

--

___
Python tracker 
<http://bugs.python.org/issue6178>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6178] Core error in Py_EvalFrameEx 2.6.2

2009-06-02 Thread Tim Savannah

Tim Savannah  added the comment:

All site-packages were compiled against python 2.6.1, and python was
upgraded later to 2.6.2 (but upon running a make install with python
2.6.2, it seemed to recompile site-packages on a byte-code level).

And no, there is still segfaults without optimizations, I've tried at
-O2 -O and -O0 ( -O0 being no optimization). Judging by the invalid read
always being on 0x58, and the line of assembly accessing 0x58 offset
from a register, tstate->frame must be being initilized to NULL (or
always being corrupted to point to other NULL data)

The compiler used is gcc version 4.1.2 20071124 (Red Hat 4.1.2-42)

The setup we are using is 8-core xeon 64-bit servers. (We have about 14
of these, Centos based systems, all are experiencing the segfaults).

--

___
Python tracker 
<http://bugs.python.org/issue6178>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6178] Core error in Py_EvalFrameEx 2.6.2

2009-06-02 Thread Tim Savannah

Tim Savannah  added the comment:

Yes I compiled python myself, using ./configure
--prefix=/usr/local/python2.6/ --with-pth --enable-shared

It is a 64-bit compile.

I've done this with both standard config and a config that I modded
which produces optimizations options as -ggdb3 -O0. Both contain the
segfault error.

We are including some external site packages, but there is no consistent
site package import or usage that causes the segfault, it just seems
that heavy stress with many threads going off has a race chance to cause it.

I can send any additional info that can help debug this issue

--

___
Python tracker 
<http://bugs.python.org/issue6178>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6178] Core error in Py_EvalFrameEx 2.6.2

2009-06-02 Thread Tim Savannah

New submission from Tim Savannah :

I'm getting many segmentation faults (about 1 per half hour) from within
the core of python 2.6.2 on 64-bit machines.

(examples from dmesg:
pythonLaunch.py[13307]: segfault at 0058 rip
2b845cfb3550 rsp 41809930 error 4
pythonLaunch.py[27589]: segfault at 0058 rip
2b4112287906 rsp 42dab930 error 4
pythonLaunch.py[14436]: segfault at 0058 rip
2ae0a4f68550 rsp 42cd9930 error 4
pythonLaunch.py[10374]: segfault at 0058 rip
2af43f966906 rsp 4214b930 error 4
pythonLaunch.py[17656]: segfault at 0058 rip
2aed0cfe8906 rsp 417f0930 error 4
)
pythonLaunch.py is a symbolic link to python 2.6.2 binary.
>From disassembling the python binary, I've found the corrosponding line
in source to be ceval.c:2717

if (tstate->frame->f_exc_type != NULL)

tstate->frame is null, and an access on f_exc_type causes a segfault
(trying to access memory 0x58, see above segfaults).

I can't find any clear code path that could cause tstate->frame to go
null, any suggestions? This is preventing us from moving from python 2.4
32-bit to python 2.6 64-bit.

--
components: Interpreter Core
messages: 88748
nosy: tsavannah
severity: normal
status: open
title: Core error in Py_EvalFrameEx 2.6.2
type: crash
versions: Python 2.6

___
Python tracker 
<http://bugs.python.org/issue6178>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com