subject:"\[Tutor\] reading an input stream"

Re: [Tutor] reading an input stream

2016-01-07 Thread James Chapman

Hi Richard

There are a number of considerations you need to take into account here.

Raw sockets is almost never the right solution, while a basic socket to
socket connection is easy enough to program, handling failure and
concurrency can very quickly make the solution a lot more complex than it
needs to be, so perhaps you could supply more information? (I realise I'm
venturing outside the realm of learning python, but I'm a pedant for doing
things right).

You said you need to read XML in from a socket connection. You've not
mentioned what's generating the data? Is that data sent over HTTP in which
case is this part of a SOAP or REST API? Is the data being generated by
something you've written or a 3rd party software package? Is REST an
option? Is there a reason to serialise to XML? (If I was performing the
serialisation I would go with JSON if being human readable was a
requirement. )

If the method of receiving that data is optional, have you considered using
something like AMQP (RabbitMQ) which would eliminate your need to support
concurrency? It would also handle failure well.

James


--
James

On 29 December 2015 at 20:14, richard kappler  wrote:

> Sorry it took so long to respond, just getting back from the holidays. You
> all have given me much to think about. I've read all the messages through
> once, now I need to go trough them again and try to apply the ideas. I'll
> be posting other questions as I run into problems. BTW, Danny, best
> explanation of generators I've heard, well done and thank you.
>
> regards, Richard
>
> On Thu, Dec 24, 2015 at 4:54 PM, Danny Yoo  wrote:
>
> > > I think what I need to do would be analogous to (pardon if I'm using
> the
> > > wrong terminology, at this poing in the discussion I am officially out
> of
> > > my depth) sending the input stream to a buffer(s) until  the ETX for
> that
> > > message comes in, shoot the buffer contents to the parser while
> accepting
> > > the next STX + message fragment into the buffer, or something
> analogous.
> >
> > Yes, I agree.  It sounds like you have one process read the socket and
> > collect chunks of bytes delimited by the STX markers.  It can then
> > send those chunks to the XML parser.
> >
> >
> > We can imagine one process that reads the socket and spits out a list
> > of byte chunks:
> >
> > chunks = readDelimitedChunks(socket)
> >
> > and another process that parses those chunks and does something with
> them:
> >
> > for chunk in chunks:
> > 
> >
> >
> > It would be nice if we could organize the program like this.  But one
> > problem is that chunks might not be finite!  The socket might keep on
> > returning bytes.  If it keeps returning bytes, we can't possibly
> > return a finite list of the chunked bytes.
> >
> >
> > What we really want is something like:
> >
> > chunkStream = readDelimitedChunks(socket)
> > for chunk in chunkStream:
> > 
> >
> > where chunkStream is itself like a socket: it should be something that
> > we can repeatedly read from as if it were potentially infinite.
> >
> >
> > We can actually do this, and it isn't too bad.  There's a mechanism in
> > Python called a generator that allows us to write function-like things
> > that consume streams of input and produce streams of output.  Here's a
> > brief introduction to them.
> >
> > For example, here's a generator that knows how to produce an infinite
> > stream of numbers:
> >
> > ##
> > def nums():
> > n = 0
> > while True:
> > yield n
> > n += 1
> > ##
> >
> > What distinguishes a generator from a regular function?  The use of
> > "yield".  A "yield" is like a return, but rather than completely
> > escape out of the function with the return value, this generator will
> > remember what it was doing  at that time.  Why?  Because it can
> > *resume* itself when we try to get another value out of the generator.
> >
> > Let's try it out:
> >
> > #
> >
> > >>> numStream = nums()
> > >>> numStream.next()
> > 0
> > >>> numStream.next()
> > 1
> > >>> numStream.next()
> > 2
> > >>> numStream.next()
> > 3
> > >>> numStream.next()
> > 4
> > #
> >
> > Every next() we call on a generator will restart it from where it left
> > off, until it reaches its next "yield".  That's how we get this
> > generator to return an infinite sequence of things.
> >
> >
> > That's how we produce infinite sequences.  And we can write another
> > generator that knows how to take a stream of numbers, and square each
> > one.
> >
> > 
> > def squaring(stream):
> > for n in stream:
> > yield n
> > 
> >
> >
> > Let's try it.
> >
> >
> > 
> >
> > >>> numStream = nums()
> > >>> squaredNums = squaring(numStream)
> > >>> squaredNums.next()
> > 0
> > >>> squaredNums.next()
> > 1
> > >>> squaredNums.next()
> > 4
> > >>>

Re: [Tutor] reading an input stream

2016-01-07 Thread Cameron Simpson


On 07Jan2016 12:14, richard kappler  wrote:

On Thu, Jan 7, 2016 at 12:07 PM, James Chapman  wrote:

From an architectural POV I'd have a few listener threads that upon
receipt would spawn (or take from a pool is a better approach) a worker
thread to process the received data.


As would I.


That's the plan, if I'm understanding you correctly. We've brainstormed the
threading, haven't written any of it yet.


The code you've posted should be fine for testing a single connection.

I'd be doing 2 things to what you posted, myself:

 - use plain old .read to collect the data and assemble the XML packets

 - decouple your XML parsing from the collection and packet parsing

To the first, I suspect that when you have our packets arriving rapidly you are 
either dropping data because the data overflows your 8192 recv size or you're 
getting multiple logical packets stuffed into a buffer:


 recv #1:
   \x02xml...\x03\x02partial-xml

 recv #2:
   tail-of-previous-xml\x03\x02more-xml...

which would definitiely get your XML parser unhappy.

Instead, gather the data progressively and emit XML chunks. You've got a TCP 
stream - the TCPServer class will do an accept and handle you an _unbuffered_ 
binary stream file from which you can just .read(), ignoring any arbitrary 
"packet" sizes.  For example (totally untested) using a generator:


 def xml_extractor(fp):
   ''' Read a continuous stream of bytes from `fp`, yield bytes to be parsed 
   elsewhere. An arbitrary size of 8192 bytes is used to request more data; it 
   doesn't say anything about any underlying network packet size.

   '''
   # a (buffer, offset) pair of ungathered data
   buffer = b''
   offset = 0
   # locate start of XML chunk
   while True:
 if offset >= len(buffer):
   buffer = fp.read1(8192)
   offset = 0
   if not buffer:
 # EOF: exit generator
 return
 # examine the next byte
 b = buffer[offset]
 offset += 1
 if b == 0x02:
   # opening delimiter
   break
 warning("discard byte 0x%02x", b)
   # gather XML chunk
   chunks = []
   while True:
 endpos = buffer.find(b'\x03', offset)
 if endpos < 0:
   # no delimiter, collect entire chunk
   chunks.append(buffer[offset:])
   offset = len(buffer)
 else:
   # collect up to the delimiter
   chunks.append(buffer[offset:endpos])
   offset = endpos + 1
   break
 # keep collecting...
 if offset >= len(buffer):
   buffer = fp.read1(8192)
   offset = 0
   if not buffer:
 error("EOF: incomplete final XML packet found: %r", b''.join(chunks))
 return
   # yield the XML bytes
   yield b''.join(chunks)
   chunks = None   # release chunks so memory can be freed promptly

This reads bytes into a buffer and locates the 0x02...0x03 boundaries and 
yields the bytes in between. Then your main stream decoder just looks like 
this:


 for xml_bytes in xml_extractor(fp):
   # decode the bytes into a str
   xml_s = xml_bytes.decode('utf-8')
   ... pass xml_s to your XML parser ...

All of this presumes you have a binary file-like object reading from your TCP 
stream. And since we're suggesting you spawn a Thread per connection, I'm 
suggesting you use the TCPServer class from the socketserver module, using its 
ThreadingMixin. That gets you a threading TCP server.


Query: do the cameras connect to you, or do you connect to the cameras? I'm 
presuming the former.


So the surround framework would create a TCPServer instance listening on your 
ip:port, and have a handler method which is given a "request" parameter by 
TCPServer. That object has a .rfile property which is a read-only binary stream 
for reading from the socket, and _that_ is what we refer to as `fp` in the code 
above.


Setting up the TCPServer is pretty simple. Lifting the essential bits from some 
code of my own (again, untested):


 from socketserver import TCPServer, ThreadingMixIn, StreamRequestHandler

 class MyServer(ThreadingMixIn, TCPServer):
   def __init__(self, bind_addr):
 TCPServer.__init__(self, bind_addr, MyRequestHandler)

 class MyRequestHandler(StreamRequestHandler):
   def handle(self):
 fp = self.rfile
 for xml_bytes in xml_extractor(fp):
   # decode the bytes into a str
   xml_s = xml_bytes.decode('utf-8')
   ... pass xml_s to your XML parser ...

 # start the server
 S = MyServer( ("hostname", ) )
 S.serve_forever()

One critical bit in the above is the use of .read1() in the xml_extractor 
function: that calls the underlying stream's .read() method at most once, so 
that it behaves like a UNIX read() call and may return a "short" read - less 
than the maximum supplied. This is what you need to return data as soon as it 
is received. By contrast, the traditional Python .read() call will try to 
gather bytes until it has the amount asked for, which means that it will block.  
You definitely need read1() for this kind of work.

Re: [Tutor] reading an input stream

2016-01-07 Thread Cameron Simpson


On 08Jan2016 08:52, Cameron Simpson  wrote:
[...]
Instead, gather the data progressively and emit XML chunks. You've got a TCP 
stream - the TCPServer class will do an accept and handle you an _unbuffered_ 
binary stream file from which you can just .read(), ignoring any arbitrary 
"packet" sizes.  For example (totally untested) using a generator:

[...]

Just a few followup remarks:

This is all Python 3, where bytes and strings are cleanly separated. You've got 
a binary stream with binary delimiters, so we're reading binary data and 
returning the binary XML in between. We separately decode this into a string 
for handing to your XML parser. Just avoid Python 2 altogether; this can all be 
done in Python 2 but it is not as clean, and more confusing.


The socketserver module is... annoyingly vague about what the .rfile property 
gets you. It says a "a file-like object". That should be a nice io.BytesIO 
subclass with a .read1() method, but conceivably it is not. I'm mentioning this 
because I've noticed that the code I lifted the TCPServer setup from seems to 
make a BytesIO from whole cloth by doing:


 fp = os.fdopen(os.dup(request.fileno()),"rb")

You'd hope that isn't necessary here, and that request.rfile is a nice BytesIO 
already.


In xml_extractor, the "# locate start of XML chunk" loop could be better by 
using .find exactly as in the "# gather XML chunk"; I started with .read(1) 
instead of .read1(8192), which is why it does things byte by byte.


Cheers,
Cameron Simpson 
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] reading an input stream

2016-01-07 Thread richard kappler

On Thu, Jan 7, 2016 at 5:07 PM, Cameron Simpson  wrote:

>
> Just a few followup remarks:
>
> This is all Python 3, where bytes and strings are cleanly separated.
> You've got a binary stream with binary delimiters, so we're reading binary
> data and returning the binary XML in between. We separately decode this
> into a string for handing to your XML parser. Just avoid Python 2
> altogether; this can all be done in Python 2 but it is not as clean, and
> more confusing.
>

Love to, can't. Splunk uses 2.7 so that's what we have to work with. That
will not change in the forseeable future. Doing other homework right now,
but will more closely review this and the other posts that have come in
since I left work later tonight or first thing in the morning.

regards, Richard
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] reading an input stream

2016-01-07 Thread Cameron Simpson


On 07Jan2016 17:22, richard kappler  wrote:

On Thu, Jan 7, 2016 at 5:07 PM, Cameron Simpson  wrote:



Just a few followup remarks:

This is all Python 3, where bytes and strings are cleanly separated.
You've got a binary stream with binary delimiters, so we're reading binary
data and returning the binary XML in between. We separately decode this
into a string for handing to your XML parser. Just avoid Python 2
altogether; this can all be done in Python 2 but it is not as clean, and
more confusing.



Love to, can't. Splunk uses 2.7 so that's what we have to work with. That
will not change in the forseeable future. Doing other homework right now,
but will more closely review this and the other posts that have come in
since I left work later tonight or first thing in the morning.


Ok. You should still be ok, but things like bs[0] == 0x02 will need to be bs[0] 
== '\x02' and so forth, because you get str objects back from reads.


The rest of the suggested code should still broadly work.

Cheers,
Cameron Simpson 
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] reading an input stream

2015-12-29 Thread richard kappler

Sorry it took so long to respond, just getting back from the holidays. You
all have given me much to think about. I've read all the messages through
once, now I need to go trough them again and try to apply the ideas. I'll
be posting other questions as I run into problems. BTW, Danny, best
explanation of generators I've heard, well done and thank you.

regards, Richard

On Thu, Dec 24, 2015 at 4:54 PM, Danny Yoo  wrote:

> > I think what I need to do would be analogous to (pardon if I'm using the
> > wrong terminology, at this poing in the discussion I am officially out of
> > my depth) sending the input stream to a buffer(s) until  the ETX for that
> > message comes in, shoot the buffer contents to the parser while accepting
> > the next STX + message fragment into the buffer, or something analogous.
>
> Yes, I agree.  It sounds like you have one process read the socket and
> collect chunks of bytes delimited by the STX markers.  It can then
> send those chunks to the XML parser.
>
>
> We can imagine one process that reads the socket and spits out a list
> of byte chunks:
>
> chunks = readDelimitedChunks(socket)
>
> and another process that parses those chunks and does something with them:
>
> for chunk in chunks:
> 
>
>
> It would be nice if we could organize the program like this.  But one
> problem is that chunks might not be finite!  The socket might keep on
> returning bytes.  If it keeps returning bytes, we can't possibly
> return a finite list of the chunked bytes.
>
>
> What we really want is something like:
>
> chunkStream = readDelimitedChunks(socket)
> for chunk in chunkStream:
> 
>
> where chunkStream is itself like a socket: it should be something that
> we can repeatedly read from as if it were potentially infinite.
>
>
> We can actually do this, and it isn't too bad.  There's a mechanism in
> Python called a generator that allows us to write function-like things
> that consume streams of input and produce streams of output.  Here's a
> brief introduction to them.
>
> For example, here's a generator that knows how to produce an infinite
> stream of numbers:
>
> ##
> def nums():
> n = 0
> while True:
> yield n
> n += 1
> ##
>
> What distinguishes a generator from a regular function?  The use of
> "yield".  A "yield" is like a return, but rather than completely
> escape out of the function with the return value, this generator will
> remember what it was doing  at that time.  Why?  Because it can
> *resume* itself when we try to get another value out of the generator.
>
> Let's try it out:
>
> #
>
> >>> numStream = nums()
> >>> numStream.next()
> 0
> >>> numStream.next()
> 1
> >>> numStream.next()
> 2
> >>> numStream.next()
> 3
> >>> numStream.next()
> 4
> #
>
> Every next() we call on a generator will restart it from where it left
> off, until it reaches its next "yield".  That's how we get this
> generator to return an infinite sequence of things.
>
>
> That's how we produce infinite sequences.  And we can write another
> generator that knows how to take a stream of numbers, and square each
> one.
>
> 
> def squaring(stream):
> for n in stream:
> yield n
> 
>
>
> Let's try it.
>
>
> 
>
> >>> numStream = nums()
> >>> squaredNums = squaring(numStream)
> >>> squaredNums.next()
> 0
> >>> squaredNums.next()
> 1
> >>> squaredNums.next()
> 4
> >>> squaredNums.next()
> 9
> >>> squaredNums.next()
> 16
> 
>
>
> If you have experience with other programming languages, you may have
> heard of the term "co-routine".  What we're doing with this should be
> reminiscent of coroutine-style programming.  We have one generator
> feeding input into the other, with program control bouncing back and
> forth between the generators as necessary.
>
>
> So that's a basic idea of generators.  It lets us write processes that
> can deal with and produce streams of data.  In the context of sockets,
> this is particularly helpful, because sockets can be considered a
> stream of bytes.
>
>
> Here's another toy example that's closer to the problem you're trying
> to solve.  Let's say that we're working on a program to alphabetize
> the words of a sentence.  Very useless, of course.  :P  We might pass
> it in the input:
>
> this
> is
> a
> test
> of
> the
> emergency
> broadcast
> system
>
> and expect to get back the following sentence:
>
>  hist
>  is
>  a
>  estt
>  fo
>  eht
>  ceeegmnry
>  aabcdorst
>  emssty
>
> We can imagine one process doing chunking, going from a sequence of
> characters to a sequence of words:
>
> ###
> def extract_words(seq):
> """Yield the words in a sequence of characters."""
> buffer = []
> for ch in seq:
>

Re: [Tutor] reading an input stream

2015-12-24 Thread Cameron Simpson


On 24Dec2015 13:54, richard kappler  wrote:

I have to create a script that reads  xml data over a tcp socket, parses it
and outputs it to console. Not so bad, most of which I already know how to
do. I know how to set up the socket, though I am using a file for
development and testing, am using lxml and have created an xslt that does
what I want with the xml, and it outputs it to console.

What I'm not really sure of, each xml 'message' is preceeded by an STX
(\x02) and ends with an ETX (\x03). These 'messages' (Danny, are you noting
I don't say -lines- anymore? :-)  ) need to be parsed and output whole as
opposed to partial.

My concern is, there will actually be numerous machines sending data to the
tcp socket, so it's entirely likely the messages will come in fragmented
and the fragments will need to be held until complete so they can be sent
on whole to the parser. While this is the job of tcp, my script needs to

I think what I need to do would be analogous to (pardon if I'm using the
wrong terminology, at this poing in the discussion I am officially out of
my depth) sending the input stream to a buffer(s) until  the ETX for that
message comes in, shoot the buffer contents to the parser while accepting
the next STX + message fragment into the buffer, or something analogous.

Any guidance here?


Since a TCP stream runs from one machine to another (may be the same machine); 
presumably your actually have multiple TCP streams to manage, and at the same 
time as otherwise you could just process one until EOF, then the next and so 
on. Correct?


My personal inclination would start a Thread for each stream, and have that 
thread simple read the stream extracting XML chunks, and then .put each chunk 
on a Queue used by whatever does stuff with the XML (accept chunk, parse, etc).  
If you need to know where the chunk came from, .put a tuple with the chunk and 
some context information.


Does that help you move forward?

Cheers,
Cameron Simpson 
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] reading an input stream

2015-12-24 Thread Alan Gauld

On 24/12/15 18:54, richard kappler wrote:

> I think what I need to do would be analogous to (pardon if I'm using the
> wrong terminology, at this poing in the discussion I am officially out of
> my depth) sending the input stream to a buffer(s) until  the ETX for that
> message comes in, shoot the buffer contents to the parser while accepting
> the next STX + message fragment into the buffer, or something analogous.

You could use a Stringbuffer in memory. But simpler still is just to
append the bits in a file, one per incoming source. Any reason that
wouldn't work?

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] reading an input stream

2015-12-24 Thread eryk sun

On Thu, Dec 24, 2015 at 9:13 PM, boB Stepp  wrote:
> AttributeError: 'generator' object has no attribute 'next'

The iterator protocol was added in Python 2.2 (circa 2001) as a
generalization for use in "for" loops, but the language didn't have
built-in next() at the time. Instead the method to get the next item
from an iterator was defined without double underscores. You'd simply
call it.next() to manually get the next item of iterator "it".

Python 3 added built-in next() and changed the method name to
"__next__". The built-in function was backported to 2.6 to have a
common idiom even though the method is still named "next" in Python 2.

The name change in Python 3  reflects that "__next__" is a special
method that's looked up on the type (in CPython it's the tp_iternext
field of the PyTypeObject). You can't simply add a bound next method
to an instance to make Python think it's an iterator. The same applies
in Python 2, but the name "next" doesn't suggest that this is the
case.

For example, let's start out with a normal Python 2 iterator that
simply iterates a count from some initial value.

class Iterator(object):
def __init__(self, start):
self.value = start - 1
def __iter__(self):
return self
def next(self):
self.value += 1
return self.value

>>> it = Iterator(0)
>>> it.next()
0
>>> next(it)
1

Now store the bound next method directly on the instance

>>> it.next = it.next
>>> it.next.__self__ is it
True

and remove the method from the class:

>>> del Iterator.next

The bound method still works:

>>> it.next()
2

But the interpreter doesn't look for "next" on the instance:

>>> next(it)
Traceback (most recent call last):
  File "", line 1, in 
TypeError: Iterator object is not an iterator

>>> for i in it:
... if i == 3: break
...
Traceback (most recent call last):
  File "", line 1, in 
TypeError: iter() returned non-iterator of type 'Iterator'

Since "next" is a special method, it should have the special name
"__next__". So let it be written. So let it be done... in Python 3.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] reading an input stream

2015-12-24 Thread boB Stepp

On Thu, Dec 24, 2015 at 3:54 PM, Danny Yoo  wrote:

I tried to follow your example:
>
> For example, here's a generator that knows how to produce an infinite
> stream of numbers:
>
> ##
> def nums():
> n = 0
> while True:
> yield n
> n += 1
> ##
>
> What distinguishes a generator from a regular function?  The use of
> "yield".  A "yield" is like a return, but rather than completely
> escape out of the function with the return value, this generator will
> remember what it was doing  at that time.  Why?  Because it can
> *resume* itself when we try to get another value out of the generator.
>
> Let's try it out:
>
> #
>
 numStream = nums()
 numStream.next()
> 0

But I got an exception:

Python 3.4.4 (v3.4.4:737efcadf5a6, Dec 20 2015, 20:20:57) [MSC v.1600
64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> def nums():
n = 0
while True:
yield n
n += 1


>>> numStream = nums()
>>> numStream.next()
Traceback (most recent call last):
  File "", line 1, in 
numStream.next()
AttributeError: 'generator' object has no attribute 'next'


If I instead do this:

>>> next(numStream)
0
>>> next(numStream)
1
>>> next(numStream)
2

Things work as you described.  Is your example from Python 2?  If yes,
is this something that changed between Python 2 and 3?  I have not
made it to generators yet, but you have now whetted my appetite!

TIA!
boB
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] reading an input stream

2015-12-24 Thread Danny Yoo

> I think what I need to do would be analogous to (pardon if I'm using the
> wrong terminology, at this poing in the discussion I am officially out of
> my depth) sending the input stream to a buffer(s) until  the ETX for that
> message comes in, shoot the buffer contents to the parser while accepting
> the next STX + message fragment into the buffer, or something analogous.

Yes, I agree.  It sounds like you have one process read the socket and
collect chunks of bytes delimited by the STX markers.  It can then
send those chunks to the XML parser.


We can imagine one process that reads the socket and spits out a list
of byte chunks:

chunks = readDelimitedChunks(socket)

and another process that parses those chunks and does something with them:

for chunk in chunks:



It would be nice if we could organize the program like this.  But one
problem is that chunks might not be finite!  The socket might keep on
returning bytes.  If it keeps returning bytes, we can't possibly
return a finite list of the chunked bytes.


What we really want is something like:

chunkStream = readDelimitedChunks(socket)
for chunk in chunkStream:


where chunkStream is itself like a socket: it should be something that
we can repeatedly read from as if it were potentially infinite.


We can actually do this, and it isn't too bad.  There's a mechanism in
Python called a generator that allows us to write function-like things
that consume streams of input and produce streams of output.  Here's a
brief introduction to them.

For example, here's a generator that knows how to produce an infinite
stream of numbers:

##
def nums():
n = 0
while True:
yield n
n += 1
##

What distinguishes a generator from a regular function?  The use of
"yield".  A "yield" is like a return, but rather than completely
escape out of the function with the return value, this generator will
remember what it was doing  at that time.  Why?  Because it can
*resume* itself when we try to get another value out of the generator.

Let's try it out:

#

>>> numStream = nums()
>>> numStream.next()
0
>>> numStream.next()
1
>>> numStream.next()
2
>>> numStream.next()
3
>>> numStream.next()
4
#

Every next() we call on a generator will restart it from where it left
off, until it reaches its next "yield".  That's how we get this
generator to return an infinite sequence of things.


That's how we produce infinite sequences.  And we can write another
generator that knows how to take a stream of numbers, and square each
one.


def squaring(stream):
for n in stream:
yield n



Let's try it.




>>> numStream = nums()
>>> squaredNums = squaring(numStream)
>>> squaredNums.next()
0
>>> squaredNums.next()
1
>>> squaredNums.next()
4
>>> squaredNums.next()
9
>>> squaredNums.next()
16



If you have experience with other programming languages, you may have
heard of the term "co-routine".  What we're doing with this should be
reminiscent of coroutine-style programming.  We have one generator
feeding input into the other, with program control bouncing back and
forth between the generators as necessary.


So that's a basic idea of generators.  It lets us write processes that
can deal with and produce streams of data.  In the context of sockets,
this is particularly helpful, because sockets can be considered a
stream of bytes.


Here's another toy example that's closer to the problem you're trying
to solve.  Let's say that we're working on a program to alphabetize
the words of a sentence.  Very useless, of course.  :P  We might pass
it in the input:

this
is
a
test
of
the
emergency
broadcast
system

and expect to get back the following sentence:

 hist
 is
 a
 estt
 fo
 eht
 ceeegmnry
 aabcdorst
 emssty

We can imagine one process doing chunking, going from a sequence of
characters to a sequence of words:

###
def extract_words(seq):
"""Yield the words in a sequence of characters."""
buffer = []
for ch in seq:
if ch.isalpha():
buffer.append(ch)
elif buffer:
yield ''.join(buffer)
del buffer[:]
# If we hit the end of the buffer, we still might
# need to yield one more result.
if buffer:
yield ''.join(buffer)
###


and a function that transforms words to their munged counterpart:

#
def transform(word):
Munges a word into its alphabetized form."""
chars = list(word)
chars.sort()
return ''.join(chars)
#

This forms the major components of a program that can do the munging
on a file... or a socket!


Here's the complete example:

Re: [Tutor] reading an input stream

2015-12-24 Thread Danny Yoo

 numStream.next()
> Traceback (most recent call last):
>   File "", line 1, in 
> numStream.next()
> AttributeError: 'generator' object has no attribute 'next'
>
>
> If I instead do this:
>
 next(numStream)
> 0
 next(numStream)
> 1
 next(numStream)
> 2
>
> Things work as you described.  Is your example from Python 2?  If yes,
> is this something that changed between Python 2 and 3?  I have not
> made it to generators yet, but you have now whetted my appetite!


Hi BoB,

Ah, yes, thank you!  Yes, I was using Python 2.  I'll have to set up
Python 3 on my server and get some more experience with it during the
break, then!

Let me double check the docs... ok, yeah, I should be using the next()
function, since that's available in Python 2 as well.  Reference:
https://docs.python.org/2/library/functions.html#next
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

[Tutor] reading an input stream

2015-12-24 Thread richard kappler

I have to create a script that reads  xml data over a tcp socket, parses it
and outputs it to console. Not so bad, most of which I already know how to
do. I know how to set up the socket, though I am using a file for
development and testing, am using lxml and have created an xslt that does
what I want with the xml, and it outputs it to console.

What I'm not really sure of, each xml 'message' is preceeded by an STX
(\x02) and ends with an ETX (\x03). These 'messages' (Danny, are you noting
I don't say -lines- anymore? :-)  ) need to be parsed and output whole as
opposed to partial.

My concern is, there will actually be numerous machines sending data to the
tcp socket, so it's entirely likely the messages will come in fragmented
and the fragments will need to be held until complete so they can be sent
on whole to the parser. While this is the job of tcp, my script needs to

I think what I need to do would be analogous to (pardon if I'm using the
wrong terminology, at this poing in the discussion I am officially out of
my depth) sending the input stream to a buffer(s) until  the ETX for that
message comes in, shoot the buffer contents to the parser while accepting
the next STX + message fragment into the buffer, or something analogous.

Any guidance here?

And Merry Christmas / Happy Holidays / Festive whatever you celebrate!!!

regards, Richard

-- 

"I want to makes shoes!" -> elf fleeing the fire engulfed Keebler Tree
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] reading an input stream

Re: [Tutor] reading an input stream

Re: [Tutor] reading an input stream

Re: [Tutor] reading an input stream

Re: [Tutor] reading an input stream

Re: [Tutor] reading an input stream

Re: [Tutor] reading an input stream

Re: [Tutor] reading an input stream

Re: [Tutor] reading an input stream

Re: [Tutor] reading an input stream

Re: [Tutor] reading an input stream

Re: [Tutor] reading an input stream

[Tutor] reading an input stream

13 matches

Site Navigation

Mail list logo

Footer information