Re: [pypy-dev] Pypy's facelift and certificate issues

2018-01-02 Thread John Camara
http://packages.pypy.org/

Question about the web site -- does PyPy currently have anything similar to
> this page for Python 3?
> http://py3readiness.org/
>
> I think a page like this, showing which major libraries are compatible
> with PyPy, could really help drive adoption of PyPy. I know for our team,
> the Python 3 page was a strong reason we felt "safe" starting to make the
> switch to Python 3.
> I'm not sure how we'd get this information about PyPy library
> compatibility. One idea would be to install each library on PyPy, run the
> automated tests, and compare the results against those for CPython.
> Barry
>
>
___
pypy-dev mailing list
pypy-dev@python.org
https://mail.python.org/mailman/listinfo/pypy-dev


Re: [pypy-dev] [ANN] Python compilers workshop at SciPy this year

2016-03-24 Thread John Camara
It turns out there is some work in progress in the Spark project to share
its memory with non JVM programs. See
https://issues.apache.org/jira/browse/SPARK-10399.  Once this is completed
it should be fairly trivial to expose it to Python and then maybe JIT
integration could be discussed at that time.  This is a huge step forward
over sharing Java objects.  From the title of the ticket it appears it
would be a c++ interface but looking at the pull request it looks like it
will be a c interface.

In the end the blocker may just come down to PyPy having complete support
for Numpy. Without Numpy the success of this would be somewhat limited
based on user expectations and without PyPy it maybe to slow for many
applications.

On Thu, Mar 24, 2016 at 1:11 PM, John Camara 
wrote:

> Hi Armin,
>
> At a minimum tighter execution is required as well as sharing memory.  But
> on the other hand you have raised the bar so high with cffi, having a clean
> and unbloated interface, that it would be nice if a library with a similar
> spirit existed for java. Having support in PyPy's JIT to remove all the
> marshalling types would be a big plus on top of the shared memory as well
> as some integration between the 2 GCs would likely be required.
>
> Maybe the best approach would be a combination of existing libraries and a
> new interface that allows for sharing of memory.  Maybe similar to numpy
> arrays with a better API that avoids the pit falls of numpy relying on
> CPython semantics/implementation details.  After all the only thing that
> needs to be eliminated is the copying/serialization of large data
> arrays/structures.
>
> John
>
>
___
pypy-dev mailing list
pypy-dev@python.org
https://mail.python.org/mailman/listinfo/pypy-dev


Re: [pypy-dev] [ANN] Python compilers workshop at SciPy this year

2016-03-24 Thread John Camara
Hi Armin,

At a minimum tighter execution is required as well as sharing memory.  But
on the other hand you have raised the bar so high with cffi, having a clean
and unbloated interface, that it would be nice if a library with a similar
spirit existed for java. Having support in PyPy's JIT to remove all the
marshalling types would be a big plus on top of the shared memory as well
as some integration between the 2 GCs would likely be required.

Maybe the best approach would be a combination of existing libraries and a
new interface that allows for sharing of memory.  Maybe similar to numpy
arrays with a better API that avoids the pit falls of numpy relying on
CPython semantics/implementation details.  After all the only thing that
needs to be eliminated is the copying/serialization of large data
arrays/structures.

John

On Thu, Mar 24, 2016 at 12:20 PM, Armin Rigo  wrote:

> Hi John,
>
> On 24 March 2016 at 13:22, John Camara  wrote:
> > (...)  Thus the need for a jffi library.
>
> When I hear "a jffi library" I'm thinking about a new library with a
> new API.  I think what you would really like instead is to keep the
> existing libraries, but adapt them internally to allow tighter
> execution of the Python and Java VMs.
>
> I may be completely wrong about that, but you're also talking to the
> wrong guys in the first place :-)
>
>
> A bientôt,
>
> Armin.
>
___
pypy-dev mailing list
pypy-dev@python.org
https://mail.python.org/mailman/listinfo/pypy-dev


Re: [pypy-dev] [ANN] Python compilers workshop at SciPy this year

2016-03-24 Thread John Camara
Hi Fijal,

I understand where your coming from and not trying to convince you to work
on it.  Just mainly trying to point out a need that may not be obvious to
this community.  I don't spend much time on big data and analytics so I
don't have a lot of time to devote to this task.  That could change in the
future so you never know I may end up getting involved with this.

At the end of the day I think it is the PSF, which needs to do an honest
assessment of the current state of Python and in programming in general, so
that they can help direct the future of Python.  I think with an honest
assessment it should be clear that it is absolutely necessary that a
dynamic language have a JIT. Otherwise, a language like Node would not be
growing so quickly on the server side.  An honest assessment would conclude
that Python needs to play a major role in big data and analytics as we
don't want this to be another area where Python misses the boat.  As with
all languages other than JavaScript we missed playing an important role on
web front end.  More recently we missed out on mobile.  I don't think it is
good for us to miss out on big data.  It would be a shame since we had such
a strong scientific community which initially gave us a huge advantage over
other communities.  Missing out on big data might also be the driver that
moves the scientific community in a different direction which would be a
big loss to Python.

I personally don't see any particular companies or industries that are
willing to fund the tasks needed to solve these issues.  It's not to say
there are no more funds for Python projects its just likely no one company
will be willing to fund these kinds of projects on their own.  It really
needs the PSF to coordinate these efforts but they seamed to be more focus
on trying to make Python 3 a success instead of improving the overall
health of the community.

I believe that Python is in pretty good shape in being able to solve these
issues but it just needs some funding and focus to get there.

Hopefully the workshop will be successful and help create some focus.

John

On Thu, Mar 24, 2016 at 8:56 AM, Maciej Fijalkowski 
wrote:

> Hi John
>
> Thanks for explaining the current situation of the ecosystem. I'm not
> quite sure what your intention is. PyPy (and CPython) is very easy to
> embed through any C-level API, especially with the latest additions to
> cffi embedding. If someone feels like doing the work to share stuff
> that way (as I presume a lot of data presented in JVM can be
> represented as some pointer and shape how to access it), then he's
> obviously more than free to do so, I'm even willing to help with that.
> Now this seems like a medium-to-big size project that additionally
> will require quite a bit of community will to endorse. Are you willing
> to volunteer to work on such a project and dedicate a lot of time to
> it? If not, then there is no way you can convince us to volunteer our
> own time to do it - it's just too big and quite a bit far out of our
> usual areas of interest. If there is some commercial interest (and I
> think there might be) in pushing python and especially pypy further in
> that area, we might want to have a better story for numpy first, but
> then feel free to send those corporate interest people my way, we can
> maybe organize something. If you want us to do community service to
> push Python solutions in the area I have very little clue about
> however, I would like to politely decline.
>
> Cheers,
> fijal
>
> On Thu, Mar 24, 2016 at 2:22 PM, John Camara 
> wrote:
> > Besides JPype and PyJNIus there is also https://www.py4j.org/.  I
> haven't
> > heard of JPype being used in any recent projects so I assuming it is
> > outdated by now.  PyJNIus gets used but I tend to only see it used on
> > Android projects.  The Py4J project gets used often in
> numerical/scientific
> > projects mainly due to it use in PySpark.  The problem with all these
> > libraries is that they don't have a way to share large amounts of memory
> > between the JVM and Python VMs and so large chunks of data have to be
> > copied/serialized when going between the 2 VMs.
> >
> > Spark is the de facto standard in clustering computing at this point in
> > time.  At a high level Spark executes code that is distributed
> throughout a
> > cluster so that the code being executed is as close as possible to where
> the
> > data lives so as to minimize transferring of large amounts of data.  The
> > code that needs to be executed are packaged up into units called
> Resilient
> > Distributed Dataset (RDD).  RDDs are lazy evaluated and are essential
> graphs
> > of the operations that need to be performed on the data.  They are
> capable
> >

Re: [pypy-dev] [ANN] Python compilers workshop at SciPy this year

2016-03-24 Thread John Camara
.  What it strongly lacks today is
the connection to C/legacy code, numerical/scientific modules and of course
it also does not have a solution to the data copying overhead it also has
with the JVM.

Any way, this is just my 2 cents on what is currently holding Python back
from taking off in this space.

On Thu, Mar 24, 2016 at 2:32 AM, Hakan Ardo  wrote:

>
> On Mar 23, 2016 21:49, "Armin Rigo"  wrote:
> >
> > Hi John,
> >
> > On 23 March 2016 at 19:16, John Camara  wrote:
> > > I would like to suggest one more topic for the workshop. I see a big
> need
> > > for a library (jffi) similar to cffi but that provides a bridge to Java
> > > instead of C code. The ability to seamlessly work with native Java
> data/code
> > > would offer a huge improvement (...)
> >
> > Isn't it what JPype does?  Can you describe how it isn't suitable for
> > your needs?
>
> There is also PyJNIus:
>
> https://pyjnius.readthedocs.org/en/latest/
>
___
pypy-dev mailing list
pypy-dev@python.org
https://mail.python.org/mailman/listinfo/pypy-dev


Re: [pypy-dev] [ANN] Python compilers workshop at SciPy this year

2016-03-23 Thread John Camara
Hi Fijal,

I agree that jffi would be both a large project and without someone leading
it, it would likely not get any where.  But I tend to disagree that it
would be a separate goal for the conference.  I realize the goal of the
summit is to talk about native-code compilation for Python and most would
argue that means executing C code, assembly, or at the very least executing
code at the speed of "C code".  But the reality now is,
numerical/scientific programming increasingly needs executing in a
clustered environment.  So I think we need to be careful to not only solve
yesterday's problems but make sure we are covering the current day and
future ones.

Today, big data and analytics, which is driving most numerical/scientific
programming, is becoming almost exclusively run in a clustered environment,
with the Apache Spark ecosystem as the de facto standard.  A few years
back, Python's ace up its sleeve for the scientific community was the
numpy/scipy ecosystem but we have recently lost that edge by falling behind
in clustered computing.  At this point in time our best move forward on the
numerical/scientific fronts is to become best buddies with the Spark
ecosystem and make sure we can bring bridge the numpy/scipy ecosystem to
it.  That is we merge the best of both worlds and suddenly Python becomes
to go to language again for numerical/scientific computing.  Of course we
still need to address what should have been yesterday's problem and deal
with the "native-code compilation" issues.

John

On Wed, Mar 23, 2016 at 2:47 PM, Maciej Fijalkowski 
wrote:

> Hi John
>
> I understand why you're bringing this up, but it's a huge project on
> it's own, worth at least a couple months worth of work. Without  a
> dedicated effort from someone I'm worried it would not go anywhere.
> It's kind of separated from the other goals of the summit
>
> On Wed, Mar 23, 2016 at 8:16 PM, John Camara 
> wrote:
> > Hi Nathaniel,
> >
> > I would like to suggest one more topic for the workshop. I see a big need
> > for a library (jffi) similar to cffi but that provides a bridge to Java
> > instead of C code. The ability to seamlessly work with native Java
> data/code
> > would offer a huge improvement when python code needs to work with the
> > Spark/Hadoop ecosystem. The current mechanisms which involve serializing
> > data to/from Java can kill performance for some applications and can
> render
> > Python unsuitable for these cases.
> >
> > John
> >
> > ___
> > pypy-dev mailing list
> > pypy-dev@python.org
> > https://mail.python.org/mailman/listinfo/pypy-dev
> >
>
___
pypy-dev mailing list
pypy-dev@python.org
https://mail.python.org/mailman/listinfo/pypy-dev


[pypy-dev] [ANN] Python compilers workshop at SciPy this year

2016-03-23 Thread John Camara
Hi Nathaniel,

I would like to suggest one more topic for the workshop. I see a big need
for a library (jffi) similar to cffi but that provides a bridge to Java
instead of C code. The ability to seamlessly work with native Java
data/code would offer a huge improvement when python code needs to work
with the Spark/Hadoop ecosystem. The current mechanisms which involve
serializing data to/from Java can kill performance for some applications
and can render Python unsuitable for these cases.

John
___
pypy-dev mailing list
pypy-dev@python.org
https://mail.python.org/mailman/listinfo/pypy-dev


Re: [pypy-dev] vmprof compression

2015-03-28 Thread John Camara
I meant to mention them in my email as both of them are great options when
you don't mind sacrificing some compression for significant improvements in
compression and decompression speeds.  These libraries are I/O bound when
saving to a hard drive unless you are using a very low powered processor.
Generally the compression ratio is 0.5-0.75 of that achieved by gzip.
Compression speeds can approach 0.5 GB/s.

These libraries don't offer any advance compression techniques so anything
you do help create long strings of 0s and 1s like compressing the deltas
like I mention in the earlier email will go a long way at significantly
improving the compression ratio while also maintaining high performance.

On Fri, Mar 27, 2015 at 9:15 PM, Leonardo Santagada 
wrote:

> snappy and lz4 are good algos to try too.
>
___
pypy-dev mailing list
pypy-dev@python.org
https://mail.python.org/mailman/listinfo/pypy-dev


[pypy-dev] vmprof compression

2015-03-26 Thread John Camara
Hi Fijal,

To recap and continue the discussion from irc.

We already discussed that the stack id are based on a counter which is good
but I also want to confirm that the ids have locality associated with the
code.  That is similar areas of the code will have similar ids.  Just to
make sure are not random with respect to the code otherwise compression
will not be helpful.  If the ids are random that would need to be corrected
first.

Right now the stack traces are written to the file repeating the following
sequence

MARKER_STACKTRACE
count
depth
stack
stack
...
stack

In order to get a high compression ratio it would be better to combine
multiple stacktraces and rearrange the data as follows

MARKER_COMPRESSED_STACKTRACES
counts_compressed_length
counts_compressed
depths_compressed_length
depths_compressed
stacks_compressed_length
stacks_compressed

In order to build the compress data you will want to 3 pairs of 2 buffers.
A pair of buffers for counts, depths, and stacks.  Your profiller would be
writing to one set of buffers and another thread would be responsible for
compressing buffers that are full and writing them to the file.  Once a set
of buffers are full the profiller would start filling up the other set of
buffers.

For each set of buffers you need a variable to hold the previous count,
depth, and stack id.  They will be initialized to 0 before any data is
written to an empty buffer.  In stead of writing the actual count value
into the counts buffer you will write the difference between the current
count and the previous count.  The reason for doing this is that the delta
values will mostly be around 0 which will significantly improve the
compression ratio without adding much overhead.  Of course you would do the
same for depths and stack ids.

When you compress the data you compress each buffer individually to make
sure like data is being compressed.  Like data compresses better the unlike
data and by saving deltas very few bits will be required to represent the
data and you are likely to have long strings of 0s and 1s.

I'm sure now you can see why I don't want stack ids being random.  As if
they are random then the deltas will be all over the place so you wont end
up with long strings of 0s and 1s and random data itself does not compress.

To test this out I wouldn't bother modifying the c code but instead try it
out in Python to first make sure the compression is providing huge gains
and figure out how to tune the algorithm without having to mess with the
signal handlers and writing the code for the separate thread and dealing
issues such as making sure you don't start writing to a buffer before the
thread finished writing the data to the file, etc.  I would just read an
existing profile file and rewrite it to a different file by rearranging the
data and compressing the delta as I described.  You can get away with one
set of buffers as you wouldn't be profiling at the same time.

To tune this process you will need to determine the appropriate number of
stack traces that is small enough to keep memory down but large enough so
that the overhead associated with compression small.  Maybe start of with
about 8000 stack traces.  I would try gzip, bz2, and lzma and look at their
compression ratios and times.  Gzip is general faster than bz2 and lzma is
the slowest.  On the other hand lzma provides the best compression and gzip
the worse.  Since you will be compressing deltas you most likely can get
away with using the fastest compression options under each compressor and
not effect the compression ratio.  But I would test it to verify this as it
does depend on the data being compressed whether or not this is true.  Also
one option that is available in lzma is the ability to set the width of the
data to look at when looking for patterns.  Since you are saving 32 or 64
bit ints depending on the platform you can set the option to either 4 or 8
bytes based on the platform.  I don't believe qzip or bz2 have this
option.  By setting this option in lzma you will likely improve the
compression ratio.

You may find that counts and depths give similar compression, between the 3
compression types in which case just use the fastest which will likely be
gzip.  On the other hand maybe the stack ids will be better off using
lzma.  This is also another reason to separate out, like data, as it gives
you an option to use the fastest compressors for some data types while
using others to provide for better compression.

I would not be surprised if this approach achieves a compression ratio
better than 100x but that will be heavily dependent on how local the stack
ids are.  Also don't forget about simple things like not using 64 bit ints
when you can get away with smaller ones.

Also for a slight variation to the above.  If you find most of your deltas
are < 127 you could write them out as 1 byte and when greater than 127
write them out as a 4 byte int with the high bit set.  If you do this then
don't set the lzma opti

[pypy-dev] Question about extension support

2014-03-25 Thread John Camara
Hi Kevin,

More up to date information can be found on the FAQ page

http://doc.pypy.org/en/latest/faq.html#do-cpython-extension-modules-work-with-pypy

The best approach for PyPy is either use a pure Python module if possible
or use a cffi wrapped extension instead of an extension that uses the
CPython CAPI.  Often CPython CAPI extensions are wrapping some c library.
 Creating a cffi wrapper for the library is actually much simpler than
writing a CPython CAPI wrapper.  Quite a few CPython CAPI extensions have
already been wrapped for cffi so make sure to search for one before
creating your own wrapper. If you need to create a wrapper, refer to the
cffi documentation at

http://cffi.readthedocs.org/en/release-0.8/

Extensions wrapped with cffi are compatible with both CPython and PyPy.  On
CPython the performance is similar to what you would get if you used
ctypes.  How every, under PyPy, the performance is much closer to a native
C call plus the overhead for releasing and acquiring the gil.

John
___
pypy-dev mailing list
pypy-dev@python.org
https://mail.python.org/mailman/listinfo/pypy-dev


Re: [pypy-dev] Question about extension support

2014-03-25 Thread John Camara
Hi Kevin,

Here is another link about writing extensions for PyPy.

http://doc.pypy.org/en/latest/extending.html

John


On Tue, Mar 25, 2014 at 9:48 PM, John Camara wrote:

> Hi Kevin,
>
> More up to date information can be found on the FAQ page
>
>
> http://doc.pypy.org/en/latest/faq.html#do-cpython-extension-modules-work-with-pypy
>
> The best approach for PyPy is either use a pure Python module if possible
> or use a cffi wrapped extension instead of an extension that uses the
> CPython CAPI.  Often CPython CAPI extensions are wrapping some c library.
>  Creating a cffi wrapper for the library is actually much simpler than
> writing a CPython CAPI wrapper.  Quite a few CPython CAPI extensions have
> already been wrapped for cffi so make sure to search for one before
> creating your own wrapper. If you need to create a wrapper, refer to the
> cffi documentation at
>
> http://cffi.readthedocs.org/en/release-0.8/
>
> Extensions wrapped with cffi are compatible with both CPython and PyPy.
>  On CPython the performance is similar to what you would get if you used
> ctypes.  How every, under PyPy, the performance is much closer to a native
> C call plus the overhead for releasing and acquiring the gil.
>
> John
>
___
pypy-dev mailing list
pypy-dev@python.org
https://mail.python.org/mailman/listinfo/pypy-dev


Re: [pypy-dev] Parallella open hardware platform

2013-02-07 Thread John Camara
Fijal,

Whether someone works full time on a project is a separate issue.

Being popular helps attract additional resources and PyPy is a project that
could use additional resources.  How many additional optimizations could
PyPy add to get to a similar level of optimization to say the JVM.  We are
talking many many man years of work.  How much additional work is it to
develop and maintain backends for the various ARM, PPC, MIPS, etc
processors  How much work would it take to have PyPy support multi-cores?
 What if RPython needs to be significantly refactored or replaced. And we
can go on and on.

Typically every 10 years or so a new language becomes dominate but
that hasn't happen lately.  Java had been in the role for quite some time
and for quite a few years it has be on the decline but yet no language has
taken it's place in terms of dominance.  The main reason why this hasn't
happen so far is that no language has successfully dealt with the
multi-core issue in a way that also keeps other desirable features we
currently have with popular languages.  But at some point, a language will
prevail and become dominate and when that happens there will be a mass
migration to this language.  It doesn't mean that Python and other
currently popular languages are just going to go away, it just their use
will decline.  If Python's popularity declines significantly it will in
turn impact PyPy.  Also many of the earlier adopters of PyPy are more
likely to move on to the new dominate language. So where does that leave
you.  I expect you earn a living by doing PyPy consulting and thus you need
PyPy to be popular.

Now you don't have to believe that a new dominate language will occur but
history says otherwise and many have been fooled into thinking otherwise is
the past.

I feel PyPy is Python's best chance at being able to survive this change in
language dominance as it has the best chance of being able to do something
about the multi-core situation.  I'm glad the other day you mentioned about
the web stack as if you didn't mention it I likely would not have thought
about the PyPy hypervisor scenario. I'm starting to believe that approach,
may have some decent merit to it and allow a way to kick the can down the
road on the multi-core issues.  I don't have the time to get into it right
now but I start a new thread on the topic.  Maybe within the next few days.

John


On Thu, Feb 7, 2013 at 4:33 AM, Maciej Fijalkowski  wrote:

> On Thu, Feb 7, 2013 at 6:41 AM, John Camara 
> wrote:
> > Fijal,
> >
> > In the past you have complained about it being hard to make money in open
> > source. One way to make it easier for you is grow the popularity of PyPy.
> > So I would think you would at least have some interest in thinking of
> ways
> > to accomplish that.
>
> Before even reading further - how is being popular making money? Being
> popular is being popular. Do you know any CPython developer working
> full time on CPython? CPython is definitely popular by my standards
>
___
pypy-dev mailing list
pypy-dev@python.org
http://mail.python.org/mailman/listinfo/pypy-dev


Re: [pypy-dev] Parallella open hardware platform

2013-02-06 Thread John Camara
Fijal,

In the past you have complained about it being hard to make money in open
source. One way to make it easier for you is grow the popularity of PyPy.
 So I would think you would at least have some interest in thinking of ways
to accomplish that.

I'm not trying to dictate what PyPy should do but merely providing an
opinion of mine that I see an opportunity that potential could be a great
thing for PyPy.

A year ago if someone asked me if PyPy should support embedded systems I
would have given a firm no but I see the market changing in ways I didn't
expect. The people hacking on these devices are fairly similar to open
source developers and in some cases they even do open source development.
 They do things differently from the establishment which has provided a new
way to think about manufacturing.  Their ways are so different from the
establishment and have become a game changer that it has ignited what is
becoming a manufacturing revolution. Now because many who are involved in
hacking with this hardware have no prior experience with the established
ways of doing this type of business they are moving in directions that
differ in how these devices get programmed.  They are also in need of tools
and new infrastructure and I feel that what PyPy has to offer can give them
a starting point.

Now at the end of the day I don't believe many of their requirements are
going to be much different than the requirements for other markets and not
likely too different than the direction PyPy will likely take.  So why not
go where all the big money is going to be at.

Ok enough of that. Lets take a look at your example of a web stack.  I
believe right now PyPy is in a position to be used in this market.  Sure
PyPy could use some additional optimizations to improve the situation but I
think in general it's already able to kick ass compared to CPython in terms
of performance when a light web framework is used which
is becoming increasing popular as web apps push the front ends to do most
of the layout/presentation work.  Also with with the web becoming more
dynamic and the number of requests increasing at a substantial rate it
becomes more important to reduce latencies which tends to give PyPy an
advantage.

This is all great while the web stacks are running on traditional servers
but servers are changing.  There are some servers being sold today that
have hundreds of small cores and in the not too distant future there will
be systems that have a number of full cores and a much larger number of
smaller cores which may or may not have similar architectures.  For
instance servers with Phi coprocessors (8 GB of memory (60) 1 GHz cores,
with I believe 4 threads each, with a PCIe3 interface) and have
become recently available. How is PyPy going to handle this.  Is this any
different than the needs of the embedded systems. No.  PyPy is going to
have to start paying attention to how data is accessed and will have to
make optimizations based on the access patterns.  That is you have to make
sure computational loads can offset the data transfer overhead.  Today PyPy
does not take into this overhead cost which is not required when running on
one core..

For a web application it would be nice to run multiple sessions on a given
core, save session related data locally to that core so as to minimize data
transfer to the smaller cores which means directing all request for the
session to the same core, doing any necessary encryption on these small
cores, etc.  But there may also be some work for a particular request which
might not be appropriate to run on a small core and may have to run on the
main core maybe due to it requiring access too much data.  How is this
going to work.  Is PyPy going to do all the analysis itself or will the
programmer provide some hints to PyPy as to how to break up the work.  Who
is going to be responsible for the scheduling and cleaning up the session
data that is cached locally to the cores and a boat load of other issues
 I'm not sure it's a tough problem.and one that is just around the corner.

Another option would be to run an HTTP load balance on the main cores, PyPy
web stacks running on say dedicated Phi cores, with the HTTP requests
forwarded over the PCIe bus.  That way each Phi core acts like
an independent web server.  But running 60-240 PyPy processes in 8GB of
memory is quite the challenge  Maybe some sort of PyPy hypervisor that is
able to run virtualized PyPy instances so that each instance can share all
the JITed code but have it's own data.  I'm sure many issues and questions
exists like who would do the JITting the hypervisor or the virualized PyPy
instances?

Now even if you feel right now is not the time to start worrying about
these new server architectures there are still other issues PyPy will start
to run into, in the web stack market.  Typically for a web application that
is being accessed from the Internet there is a certain amount of latency
that is acceptable.  But what hap

Re: [pypy-dev] Parallella open hardware platform

2013-02-05 Thread John Camara
Hi Armin,

It's even worse I'm asking you to support  and I don't even need it.

When I posted this thread it was getting rather long and unfortunately I
didn't really make all the points I wanted to make.  At this point, and
even for some time now PyPy has a great foundation but it's use remains
low.  Every now and then it's good to step back a little bit and reflect on
the current situation and come up with a strategy that helps the project's
popularity grow.  I know that PyPy has done things to help with the growth
such as writing blog posts, being quick to fix bugs, helping others with
their performance issues and even rapidly adding optimizations to PyPy,
presenting at conferences, and often actively being engaged in commenting
any posts or comments made about PyPy.

So PyPy is doing a lot of things right to help it's PR but yet there is
this issue of slow growth.  Now we know what the main issue is with it's
growth is the fact that the Python ecosystem relies on a lot of libraries
that use the CPython API and PyPy just doesn't have full support for this
interface.  I understand the reasons why PyPy is not going to support the
full interface and PyPy has come up with the cffi library as a way to
bridge the gap.  And of course I don't expect the PyPy project to take on
the responsibility of porting all the popular 3rd party libraries that use
the CPython API to cffi.  It's going to have to be a community effort.  One
thing that could help would be more marketing of cffi as very few Python
developers know it exists.  But that along is not going to be enough.

History tells us that most successful products/projects that become popular
do so by first supporting the needs of some niche market.  As time goes by
that niche market starts providing PR that helps other markets to discover
the product/project and the cycle can sometimes continue until there is
mass adoption.  Now when PyPy started to place a focus on NumPy I had hoped
that the market it serves would turn out to be the market that would help
PyPy grow.  But at this point in time it does not appear like that is going
to happen.

For a while I have been trying to think of a niche market that maybe
helpful. But to do so you have to consider the current state of PyPy which
means eliminating markets that heavily rely on libraries that use the
CPython API, also going to avoid the NumPy market as that's currently being
worked on, there is the mobile market but that's a tough one to get into,
maybe the gaming market could be a good one, etc.  It turns out with the
current state of PyPy many markets need to be eliminated if you looking for
one that is going to help with growth. The parrallella project on the other
hand looks like it could be a promising one and I'll share so thoughts a
little later in this post as to why I feel this way.

Right now you have been putting a lot of effort into STM in which your
trying to solve what is likely the biggest challenge that the developer
community is facing.  That is how to write software that effective
leverages many cores in a way that is straight forward and in the spirit of
Python. When you solve this problem and I have the faith that you will,
most would think that it would cause PyPy's popularity to sky rocket.

What most likely will happen is that PyPy gets a temporary boost in
popularity as there is another lesson in history to be concerned about.
 Often the first to solve a problem does not become popular in the long
run.  As usually the first to solve the problem does so via a unique
solution but once people start using it issues with the approach gets
discovered.  Then often many others will use the original solution solution
as a starting point and modify it to eliminate these new issues.  Then one
of the second generation solutions ends up being the defacto standard.

Now PyPy is able to move fairly quickly in terms of implementing new
approaches so it may in fact be able to compete just fine against other 2nd
generation solutions. But there may be some benefits to exposing STM for a
smaller market to help PyPy buy some additional time before releasing it as
a solution for the general developer community.

So why the Parallella project.  Well I think it can be helpful in a number
of ways. First I don't believe that this market is going to need much from
the libraries that use the CPython APIs.  Many who are in this market are
used to having to program for embedded systems and are more likely have the
skills to help out the PyPy project in a number of areas and would likely
also have a financial incentive to contribute back to PyPy such as helping
keep various back ends up to date such as Arm, PPC, and additional
architectures.  Some in this market are used to using a number of graphical
languages to program their devices but unfortunately for them some of the
new products that need to enter the market can't be built fully with these
graphical languages.  Well with the PyPy framework it's possible for 

Re: [pypy-dev] Should jitviewer come with a warning?

2013-02-04 Thread John Camara
On Mon, Feb 4, 2013 at 3:42 AM, Maciej Fijalkowski  wrote:


> Seriously which ones? I think msgpack usage is absolutely legit. You
> seem to have different opinions about the design of that software, but
> you did not respond to my concerns even, not to mention the fact that
> it sounds like it's not "obfuscated by jitviewer".
>
> Cheers,
> fijal
>

First I would have tried using cffi to the msgpack c library.  If I wasn't
happy with it I would do a Python port.  So for no lets forget about cffi
and just deal with the current design of this library.

I had tried to minimize the discussion about this library on this forum as
I had already wrote extensive comments on the original blog [1].  Now I
didn't do an extensive review of the code as I only concentrated on a small
portion of it namely in the area of unpacking the msgpack messages.  I'll
just highlight a couple of concerns I had.

The first thing the shocked me  was the use of the struct.pack and
struct.unpack functions.  Normally when you need to pack and unpack often
with the same format you would create a struct object with the desired
format and use this object with its pack and unpack methods.  That way the
format string is not always being parsed but instead once when the struct
object is created.

As Bas pointed out pypy is able to optimize the parsing of the format which
is great but why would you prefer to write code that would run with
horrible performance under CPython when there is an alternative available.
 Now toward the end of the comments on the blog, Bas stated he tried the
struct object under pypy and found it ran slower.  So there is likely
an opportunity for pypy to add another optimization as if pypy can optimize
the struct functions it should be able to handle the struct objects which I
would think would be an easier case to handle purely looking at it from a
high level perspective.

Another issue I had was the msgpack spec is designed in a way to minimize
the need of copying data.  That is you should be able to just use the data
directly from the message buffers.  The normal way to do this with the
struct module is to use the unpack_from and pack_into methods instead of
the pack and unpack methods.  These methods take a buffer and an offset as
opposed to the pack and unpack which would require you to slice out a copy
of the original buffer to pass it in the unpack method.  As Bas pointed out
again pypy is able to optimize this copy created from slicing away which is
great but again why code it in a way that will be slow on CPython when
there is an alternative.

The other issue I mentioned on the blog was the large number of if, elif
statements used to handle each type of msgpack message.  I instead
suggested creating essentialy a list that holds references to struct
objects so that the message type would be used as in index into this list.
 So that way you remove all the if, elif statements and end up with
something like

struct_objects[message_type].unpack_from()

Now I understand that pypy is able to optimize all these if and elif
statements by creating bridges for the various paths through this code but
again why code it this way when it will be slow on CPython.  I would also
assume that using the if elif statements would still have more overhead in
pypy compared to using a list of references although maybe there is not
much of a difference.

Any way this is just the issues I saw with this library which by the way is
no where near as bad as other code I have seen written as a result of users
using the jitviewer.  Unfortunately, I could not discuss these other
projects as they are closed source.

Any way to get to the other part of you reply I assume not responding to
your concerns is about the following

"python is nicer. It does not segfault. Besides, how do you get a
string out of a C library? if you do raw malloc it's prone to be bad.
Etc. etc."

Sorry that was an over sight.  I feel the same way about Python but what's
the real issue of taking the practical approach of using a c library that
is written well and is robust.  I would love to see everything written in
Python but who has the time to port everything over.

In the msgpack c library it would have the responsibility of maintaining
the buffers.  It's API supports creating and freeing these buffers. The
msgpack library would be doing most of the work and the only data that has
to go back and forth between the Python code and the library are just basic
types like int, float, double, strings, etc.  To get a string out of the c
library just slice cffi.buffer to create a copy of it in Python before
calling the function to clear the msgpack buffer.

With using cffi this slicing to create copies of strings into Python and
the overhead of calling into the c functions does add extra work over what
would be done with the code written purely in Python and assuming pypy does
have all the optimizations in place to get you to match the performance of
the msgpack c library.  The 

Re: [pypy-dev] Should jitviewer come with a warning?

2013-02-03 Thread John Camara
> that is definitely a no (my screen is too small to have some noise
> there, if for no other reason), it might have a warning in the
> documentation though, if it's any useful. But honestly, I doubt such a
> warning makes any sense. People who are capable of using jitviewer
> already "know better".

I agree it should not be part of the normal output.  I would say add it to
the doc string in app.py and to the README file.  As far as people using
the jitviewer already "know better".  If that's the case I wouldn't have
started this thread.  Like you said earlier the use of jitviewer is only
promoted on irc and yet I have come across 3 people working on different
projects who are using it for the wrong reasons over the last 2 weeks.
 It's like this is the new RPython where people start using it for the
wrong reasons.
___
pypy-dev mailing list
pypy-dev@python.org
http://mail.python.org/mailman/listinfo/pypy-dev


Re: [pypy-dev] Should jitviewer come with a warning?

2013-02-03 Thread John Camara
> Also, looking at the msgpack - this code is maybe not ideal, but if
> you're dealing with buffer-level protocols, you end up with code
> looking like C a lot.

I do agree that this type a code will likely end up looking like C but it's
not necessary for all of it to look like c.  Like there should be a need to
have long chains of if, elif statements.  Using pack_into and unpack_from
instead of pack and unpack methods so that it directly deals with the
buffer instead of making sub strings.  Even if pypy can optimize this away
why write Python code like this when its not necessary.

Plus I felt, initially the code should just use cffi and connect to the
native c library.  I believe this approach is likely to give very close to
the best performance you could get on pypy for this type of library.  I'm
not sure how much of an increase in performance would be gain by writing
the library completely in Python vs using cffi.  Is there anything wrong
with this line of thinking.  Do you feel a pure Python approach could
achieve better results than using cffi under pypy.

John
___
pypy-dev mailing list
pypy-dev@python.org
http://mail.python.org/mailman/listinfo/pypy-dev


Re: [pypy-dev] Should jitviewer come with a warning?

2013-02-03 Thread John Camara
> Let me rephrase it. Where did you look for such a warning and you did
> not find it so you assumed it's ok?

> Cheers,
> fijal

Having a warning on https://bitbucket.org/pypy/jitviewer would be good.



On Sun, Feb 3, 2013 at 3:08 PM, John Camara  wrote:

> > What makes you think people will even read this warning, let alone
> > prioritize it over their immediate desire to make their program run
> > faster?
>
> > (Not that I am objecting to adding the warning, but I think you might be
> > fooling yourself if you think it will have any impact)
>
> > Jean-Paul
>
> I agree with you and was not being naive and thinking this alone was going to 
> solve the problem but it does gives us something to point to when we see 
> someone abusing the jitviewer.
>
> Maybe, a more effective approach, is not to advertise about the jitviewer to 
> everyone who has performance issues and only tell those who are experience 
> programmers who have already done the obvious in fixing any design issues 
> that had existed in their code.  Having inexperience developers use the 
> normal profiling tools will still help them find the hot spots in their code 
> and help prevent them from picking up habits that lead them to writing 
> un-Pythonic code.
>
> I'm sure we all agree that code with a better design will run faster in pypy 
> than trying to add optimizations that work only for pypy to help out a poor 
> design.
>
> I don't think we want to end up with a lot of Python code that looks like C 
> code.  This is what happens when the inexperience start relying on the 
> jitviewer.
>
> For instance take a look at this code [1] and blog [2] which lead me to post 
> this. This is not the first example I have come across this issue and 
> unfortunately it appears to be increaseing at an alarming rate.
>
> I guess I feel we have a responsibility to try to promote good programming 
> practices when we can.
>
> [1] - 
> https://github.com/msgpack/msgpack-python/blob/master/msgpack/fallback.py
>
> [2] - http://blog.affien.com/archives/2013/01/29/msgpack-for-pypy/
>
> John
>
>
>
> On Sun, Feb 3, 2013 at 12:39 PM, John Camara wrote:
>
>> I have been noticing a pattern where many who are writing Python code to
>> run on PyPy are relying more and more on using the jitviewer to help them
>> write faster code.  Unfortunately, many of them who do so don't look at
>> improving the design of their code as a way to improve the speed at which
>> it will run under PyPy but instead start writing obscure Python code that
>> happens to run faster under PyPy.
>>
>> I know that at least with the PyPy core developers they would like to see
>> every one just create good clean Python code and that often code that has
>> been made into obscure Python was don so to try to optimize it for CPython
>> which in many cases causes it to run slower on PyPy than it would run it
>> the code just followed typical Python idioms.
>>
>> I feel that a normal developer should be using tools like cProfiler and
>> runsnakerun and cleaning up design issues way before they should even
>> consider using jitviewer.
>>
>> In a recent case where I saw someone using the jitviewer who likely
>> doesn't need to use it.  At least they don't need to use it considering the
>> current design of the code I said the following
>>
>> "The jitviewer should be mainly used by PyPy core developers and those
>> building PyPy VMs. A normal developer writing Python code to run on PyPy
>> shouldn’t have a need to use it. They can use it to point out an
>> inefficiency that PyPy has to the core developers but it should not be used
>> as a way to get you to write Python code in a way that has a better chance
>> of being optimized under PyPy except for very rare occasions and even then
>> it should only be made by those who follow closely and understand PyPy’s
>> development."
>>
>>
>> Do others here share this same opinion and should some warning be added
>> to the jitviewer?
>>
>> John
>>
>
>
___
pypy-dev mailing list
pypy-dev@python.org
http://mail.python.org/mailman/listinfo/pypy-dev


Re: [pypy-dev] Should jitviewer come with a warning?

2013-02-03 Thread John Camara
> What makes you think people will even read this warning, let alone
> prioritize it over their immediate desire to make their program run
> faster?

> (Not that I am objecting to adding the warning, but I think you might be
> fooling yourself if you think it will have any impact)

> Jean-Paul

I agree with you and was not being naive and thinking this alone was
going to solve the problem but it does gives us something to point to
when we see someone abusing the jitviewer.

Maybe, a more effective approach, is not to advertise about the
jitviewer to everyone who has performance issues and only tell those
who are experience programmers who have already done the obvious in
fixing any design issues that had existed in their code.  Having
inexperience developers use the normal profiling tools will still help
them find the hot spots in their code and help prevent them from
picking up habits that lead them to writing un-Pythonic code.

I'm sure we all agree that code with a better design will run faster
in pypy than trying to add optimizations that work only for pypy to
help out a poor design.

I don't think we want to end up with a lot of Python code that looks
like C code.  This is what happens when the inexperience start relying
on the jitviewer.

For instance take a look at this code [1] and blog [2] which lead me
to post this. This is not the first example I have come across this
issue and unfortunately it appears to be increaseing at an alarming
rate.

I guess I feel we have a responsibility to try to promote good
programming practices when we can.

[1] - https://github.com/msgpack/msgpack-python/blob/master/msgpack/fallback.py

[2] - http://blog.affien.com/archives/2013/01/29/msgpack-for-pypy/

John



On Sun, Feb 3, 2013 at 12:39 PM, John Camara wrote:

> I have been noticing a pattern where many who are writing Python code to
> run on PyPy are relying more and more on using the jitviewer to help them
> write faster code.  Unfortunately, many of them who do so don't look at
> improving the design of their code as a way to improve the speed at which
> it will run under PyPy but instead start writing obscure Python code that
> happens to run faster under PyPy.
>
> I know that at least with the PyPy core developers they would like to see
> every one just create good clean Python code and that often code that has
> been made into obscure Python was don so to try to optimize it for CPython
> which in many cases causes it to run slower on PyPy than it would run it
> the code just followed typical Python idioms.
>
> I feel that a normal developer should be using tools like cProfiler and
> runsnakerun and cleaning up design issues way before they should even
> consider using jitviewer.
>
> In a recent case where I saw someone using the jitviewer who likely
> doesn't need to use it.  At least they don't need to use it considering the
> current design of the code I said the following
>
> "The jitviewer should be mainly used by PyPy core developers and those
> building PyPy VMs. A normal developer writing Python code to run on PyPy
> shouldn’t have a need to use it. They can use it to point out an
> inefficiency that PyPy has to the core developers but it should not be used
> as a way to get you to write Python code in a way that has a better chance
> of being optimized under PyPy except for very rare occasions and even then
> it should only be made by those who follow closely and understand PyPy’s
> development."
>
>
> Do others here share this same opinion and should some warning be added to
> the jitviewer?
>
> John
>
___
pypy-dev mailing list
pypy-dev@python.org
http://mail.python.org/mailman/listinfo/pypy-dev


[pypy-dev] Should jitviewer come with a warning?

2013-02-03 Thread John Camara
I have been noticing a pattern where many who are writing Python code to
run on PyPy are relying more and more on using the jitviewer to help them
write faster code.  Unfortunately, many of them who do so don't look at
improving the design of their code as a way to improve the speed at which
it will run under PyPy but instead start writing obscure Python code that
happens to run faster under PyPy.

I know that at least with the PyPy core developers they would like to see
every one just create good clean Python code and that often code that has
been made into obscure Python was don so to try to optimize it for CPython
which in many cases causes it to run slower on PyPy than it would run it
the code just followed typical Python idioms.

I feel that a normal developer should be using tools like cProfiler and
runsnakerun and cleaning up design issues way before they should even
consider using jitviewer.

In a recent case where I saw someone using the jitviewer who likely doesn't
need to use it.  At least they don't need to use it considering the current
design of the code I said the following

"The jitviewer should be mainly used by PyPy core developers and those
building PyPy VMs. A normal developer writing Python code to run on PyPy
shouldn’t have a need to use it. They can use it to point out an
inefficiency that PyPy has to the core developers but it should not be used
as a way to get you to write Python code in a way that has a better chance
of being optimized under PyPy except for very rare occasions and even then
it should only be made by those who follow closely and understand PyPy’s
development."


Do others here share this same opinion and should some warning be added to
the jitviewer?

John
___
pypy-dev mailing list
pypy-dev@python.org
http://mail.python.org/mailman/listinfo/pypy-dev


[pypy-dev] Parallella open hardware platform

2013-01-31 Thread John Camara
A couple of days ago I heard about the Parallella [1] project which is an
open hardware platform similar to the Raspberry Pi but with much higher
capabilities.  It has a Zynq Z-7010 which has both a dual core ARM A9 (800
MHz) processor and a Artix-7 FPGA, a 16 core Epiphany multicore
accelerator, 1GB ram (see [2] for more info) and currently boots up in
Ubuntu.

The goal of the Parallella project is to develop an open parallel hardware
platform and development tools.  Recently they announced support for Python
with Mark Dewing [3] leading the effort.  I had asked Mark if he considered
PyPy but at this time he doesn't have time for this investigation and
he reposted my comment on the forum [4] with a couple of question. Maybe
one of you could answer them.

Working with the Parallella project maybe a good opportunity for the PyPy
project from both a PR perspective and as well as the
technical challenges it would present.  On the technical side it would give
the opportunity to test STM on a reasonable number of cores while also
dealing with cores from different architectures (ARM and Epiphany).  I
could see all the JITting occurring on the ARM cores with it producing
output for both architectures based on which type of core STM decides to
use for a chunk of work to execute on. Of course there is also
the challenge of bridging between the 2 architectures.  Maybe even some of
the more expensive STM operations could be offloaded to the FPGA or even a
limited amount of very hot sections of code could be JITted to the FPGA
(although this might be more work than its worth).

>From a PR perspective PyPy needs to excel at some niche market so that the
PyPy platform can take off.  When PyPy started concentrating on the
scientific market with increasing support for Numpy I thought this would be
the niche market that would get PyPy to take off.  But there have been a
couple of issue with this approach. There is a tremendous amount of work
that needs to be done so that PyPy can look attractive to this niche
market.  It requires supporting both NumPy and SciPy and their was an
expectation that if PyPy supports NumPy others would come to help out with
the SciPy support.  The problem is that there doesn't seam to be many who
are eager to pitch in for the SciPy effort and there also has not been a
whole lot willing to help will the ongoing NumPy work.  I think in general
the ratio of people who use NumPy and SciPy to those willing to contribute
is quite small.  So the idea of going after this market was a good idea and
can definitely have the opportunity to showing the strength of PyPy project
it hasn't done much to improve the image of the PyPy project.  It also
doesn't help that there is some commercial interests that have popped up
recently that have decided to play hard ball against PyPy by spreading FUD.

Unlike the Raspberry Pi hardware which can only support hobbyist the
Parallella hardware can support both hobbyists and commercial interests.
 They cost $100 which is more than the $35 for Raspberry Pi but still
within reach of most hobbyists and they didn't cut out the many features
that are needed for commercial interests.  The Parallella project raised
nearly $0.9 million on kickstarter [5] for the project with nearly 5000
backers.  Since many who will use the Parallella hardware also have
experience on embedded systems they and are more likely used to writing low
level code in assembly, FPGAs, and even lots of C code and I'm sure have
hit many issues with programming in parallel/multithreaded and would
welcome a better developer experience.  I bet many of them would be willing
to contribute both financially and time to supporting such an effort.  I
believe the Architecture of PyPy could lend it self to becoming the core of
such a development system and would allow Python to be used in this space.
 This could provide a lot of good PR for the PyPy project.

Now I'm not saying PyPy shouldn't devote any more time to supporting NumPy
as I'm sure when PyPy has very good support for both NumPy and SciPy it's
going to be a very good day for all Python supporters.  I just think that
the PyPy team needs to think about a strategy that in the end will help its
PR and gain support from a much larger community.  This project is doing a
lot of good things technically and now it just needs to get the attention
of the development community at large.  Now I can't predict if working with
the Parallella project would be the break though in PR that PyPy needs but
it's at least an option that's out there.

BTW I don't have any commercial interests in the Parallella project.  If
some time in the future I use their hardware it would likely be  as
a hobbyist and it would be nice to program it in Python.  My real objective
of this post to see the PyPy project gain wider interest as it would be a
good thing for Python.

[1] - http://www.parallella.org/
[2] - http://www.parallella.org/board/
[3] - http://forums.parallella.org/memberlist.php?m