Re: [python-tulip] Async iterators

2015-03-24 Thread Guido van Rossum
On Tue, Mar 24, 2015 at 4:10 AM, Victor Stinner 
wrote:

> 2015-03-24 2:44 GMT+01:00 Guido van Rossum :
> > For seekable() I couldn't find any dynamic implemetations,
>
> The first call to io.FileIO.seekable() calls lseek(0, SEEK_CUR).
>

Oops. :(


> It's safer to expect that any file method can block on I/O.
>

Yup.


> If you doubt that syscalls can block, try unbuffered FileIO on a NFS
> share with metadata cache disabled ("mount -o noac" on Linux). Unplug
> the network cable and enjoy :-)
>
> I checked yesterday with fstat(): the syscall blocks until the network
> cable is plugged again. At least on Linux, it's not possible to
> interrupt fstat() with a signal like CTRL+c :-(


That's a sad state of the world. NFS just sucks in so many ways... This
also means that if you use a thread pool for this, it might fill up with
tasks that won't make progress, and eventually your thread pool will block
all tasks (unless it's not really a thread pool :-). I guess we need
timeouts on everything and eventually just kill the process. :-(

-- 
--Guido van Rossum (python.org/~guido)


Re: [python-tulip] Async iterators

2015-03-24 Thread Victor Stinner
2015-03-24 2:44 GMT+01:00 Guido van Rossum :
> For seekable() I couldn't find any dynamic implemetations,

The first call to io.FileIO.seekable() calls lseek(0, SEEK_CUR).

It's safer to expect that any file method can block on I/O.

If you doubt that syscalls can block, try unbuffered FileIO on a NFS
share with metadata cache disabled ("mount -o noac" on Linux). Unplug
the network cable and enjoy :-)

I checked yesterday with fstat(): the syscall blocks until the network
cable is plugged again. At least on Linux, it's not possible to
interrupt fstat() with a signal like CTRL+c :-(

Victor


Re: [python-tulip] Async iterators

2015-03-24 Thread Luciano Ramalho
On Mon, Mar 23, 2015 at 8:39 PM, Tin Tvrtković  wrote:
> f = yield from aiofiles.open('test.bin', mode='rb')
> try:
> data = yield from f.read(512)
> finally:
> yield from f.close()
>
> I've run into two difficulties - first, it's difficult for me to tell which
> calls may actually block (does 'isatty' block? does 'seekable' block [I
> think so]?) and which don't have to go through an executor. But this is a
> question for another day. :)

On Mon, Mar 23, 2015 at 8:39 PM, Tin Tvrtković  wrote:
> f = yield from aiofiles.open('test.bin', mode='rb')
> try:
> data = yield from f.read(512)
> finally:
> yield from f.close()

That's awesome, Tin!

> I've run into two difficulties - first, it's difficult for me to tell which
> calls may actually block (does 'isatty' block? does 'seekable' block [I
> think so]?) and which don't have to go through an executor. But this is a
> question for another day. :)

I'd recommend taking a look at the Node.js filesystem API. Their
philosophy is: anything that needs to go to disk is blocking, and
everything that is blocking must have a callback. Just look at the API
for their fs module:

https://nodejs.org/api/fs.html

For convenience, some functions have a non-callback "synchronous"
version. Those have a Sync suffix, e.g.

fs.stat(path, callback)
fs.statSync(path)


Cheers,

Luciano

-- 
Luciano Ramalho
|  Author of Fluent Python (O'Reilly, 2015)
| http://shop.oreilly.com/product/0636920032519.do
|  Professor em: http://python.pro.br
|  Twitter: @ramalhoorg


Re: [python-tulip] Async iterators

2015-03-23 Thread Guido van Rossum
On Mon, Mar 23, 2015 at 4:39 PM, Tin Tvrtković  wrote:

> Hello,
>
> following the discussion from
> https://groups.google.com/forum/?fromgroups#!topic/python-tulip/iGPv24gTpAI,
> I've been working on a small library for async access to files through a
> thread pool. I've been aiming to emulate the existing file API as much as
> possible:
>
> f = yield from aiofiles.open('test.bin', mode='rb')
> try:
> data = yield from f.read(512)
> finally:
> yield from f.close()
>
> Cool project!


> I've run into two difficulties - first, it's difficult for me to tell
> which calls may actually block (does 'isatty' block? does 'seekable' block
> [I think so]?) and which don't have to go through an executor. But this is
> a question for another day. :)
>

isatty() can definitely make a system call -- e.g.
http://opensource.apple.com/source/Libc/Libc-167/gen.subproj/isatty.c -- so
it should be considered blocking. For seekable() I couldn't find any
dynamic implemetations, but IIRC it's possible to implement this as trying
to seek to the current position and catching the error (and then caching it
so subsequent calls won't have to do this). You should probably try to find
at least one such implementation -- if you can't find one assume it won't
be needed. (After all you're *defining* how things will behave in your
version here.)


> The second is that certain nifty file operations can't really be ported to
> the async world; for example context managers. A file close may block, I
> believe, so __exit__ would need to be yielded from, and that's currently
> impossible, right?
>

Right.


> Also, iterating over the file is also presenting me with difficulties.
> There's no way for __next__ to be a coroutine, right? So __next__ would
> have to return futures. But how to know when to raise StopIteration without
> actually doing IO? Also, all the futures would basically be the same -
> calling readline() in an executor. So if a used accidentally (or on purpose
> maybe) doesn't actually yield from the futures right away, the iteration
> would spin infinitely.
>
> I'm thinking implementing something like this isn't worth the trouble, and
> users should just be instructed to use a while loop and readline() until an
> empty result comes back. I'd appreciate comments to my conclusions, from
> the experts. :)
>

Sounds like a plan. This is where I left it with the asyncio.streams API as
well.


> I will say one thing, I've learned a lot about Python 3's file IO stack :)
>

You're welcome!

-- 
--Guido van Rossum (python.org/~guido)