Re: [Numpy-discussion] py2/py3 pickling

2015-08-24 Thread Chris Laumann
Hi-

Would it be possible then (in relatively short order) to create a py2 - py3 
numpy pickle converter? This would run in py2, np.load or unpickle a pickle in 
the usual way and then repickle and/or save using a pickler that uses an 
explicit pickle type for encoding the bytes associated with numpy dtypes. The 
numpy unpickler in py3 would then know what to do. IE. is there a way to make 
the numpy py2 pickler be explicit about byte strings? Presumably this would 
cover most use-cases even for complicated pickled objects and could be used 
transparently within py2 or py3.

Best, C

 On Aug 24, 2015, at 2:30 PM, Nathaniel Smith n...@pobox.com wrote:
 
 On Aug 24, 2015 9:29 AM, Pauli Virtanen p...@iki.fi mailto:p...@iki.fi 
 wrote:
 
  24.08.2015, 01:02, Chris Laumann kirjoitti:
  [clip]
   Is there documentation about the limits and workarounds for py2/py3
   pickle/np.save/load compatibility? I haven't found anything except
   developer bug tracking discussions (eg. #4879 in github numpy).
 
  Not sure if it's written down somewhere but:
 
  - You should consider pickles not portable between Py2/3.
 
  - Setting encoding='bytes' or encoding='latin1' should produce correct
  results for numerical data. However, neither is safe because the
  option also affects other data than numpy arrays that you may have
  possibly saved.
 
 For those wondering what's going on here: if you pickled a str in python 2, 
 then python 3 wants to unpickle it as a str. But in python 2 str was a vector 
 of arbitrary bytes in some assumed encoding, and in python 3 str is a vector 
 of Unicode characters. So it needs to know what encoding to use, which is 
 fine and what you'd expect for the py2-py3 transition.
 
 But: when pickling arrays, numpy on py2 used a str to store the raw memory of 
 your array. Trying to run this data through a character decoder then 
 obviously makes a mess of everything. So the fundamental problem is that on 
 py2, there's no way to distinguish between a string of text and a string of 
 bytes -- they're encoded in exactly the same way in the pickle file -- and 
 the python 3 unpickler just has to guess. You can tell it to guess in a way 
 that works for raw bytes -- that's what the encoding= options Pauli mentions 
 above do -- but obviously this will then be incorrect if you have any actual 
 non-latin1 textual strings in your pickle, and you can't get it to handle 
 both correctly at the same time.
 
 If you're desperate, it should be possible to get your data out of py2 
 pickles by loading then with one of the encoding options above, and then 
 going through the resulting object and converting all the actual textual 
 strings back to the correct encoding by hand. No data is actually lost. And 
 of course even this is unnecessary if your file contains only ASCII/latin1.
 
 -n
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] py2/py3 pickling

2015-08-23 Thread Chris Laumann
Hi all-

Is there documentation about the limits and workarounds for py2/py3 
pickle/np.save/load compatibility? I haven't found anything except developer 
bug tracking discussions (eg. #4879 in github numpy).

The kinds of errors you get can be really obscure when save/loading complicated 
objects or pickles containing numpy scalars. It's really unclear to me why the 
following shouldn't work -- it doesn't have anything apparent to do with string 
handling and unicode.

Run in py2:

import pickle
import numpy as np

a = np.float64(0.99)
pickle.dump(a, open('test.pkl', 'wb'))

And then in py3:

import pickle
import numpy as np

b = pickle.load(open('test.pkl', 'rb'))

And you get:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xae in position 0: ordinal 
not in range(128)

If you force encoding='bytes' in the load, it works.

Is this explained anywhere?

Best, C
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] It looks like Py 3.5 will include a dedicated infix matrix multiply operator

2014-03-14 Thread Chris Laumann
That’s great. 

Does this mean that, in the not-so-distant future, the matrix class will go the 
way of the dodos? I have had more subtle to fix bugs sneak into code b/c 
something returns a matrix instead of an array than almost any other single 
source I can think of. Having two almost indistinguishable types for 2d arrays 
with slightly different semantics for a small subset of operations is terrible.

Best, C


-- 
Chris Laumann
Sent with Airmail

On March 14, 2014 at 7:16:24 PM, Christophe Bal (projet...@gmail.com) wrote:

This id good for Numpyists but this will be another operator that good also 
help in another contexts.

As a math user, I was first very skeptical but finally this is a good news for 
non Numpyists too.

Christophe BAL

Le 15 mars 2014 02:01, Frédéric Bastien no...@nouiz.org a écrit :
This is great news. Excellent work Nathaniel and all others!

Frédéric

On Fri, Mar 14, 2014 at 8:57 PM, Aron Ahmadia a...@ahmadia.net wrote:
 That's the best news I've had all week.

 Thanks for all your work on this Nathan.

 -A


 On Fri, Mar 14, 2014 at 8:51 PM, Nathaniel Smith n...@pobox.com wrote:

 Well, that was fast. Guido says he'll accept the addition of '@' as an
 infix operator for matrix multiplication, once some details are ironed
 out:
   https://mail.python.org/pipermail/python-ideas/2014-March/027109.html
   http://legacy.python.org/dev/peps/pep-0465/

 Specifically, we need to figure out whether we want to make an
 argument for a matrix power operator (@@), and what
 precedence/associativity we want '@' to have. I'll post two separate
 threads to get feedback on those in an organized way -- this is just a
 heads-up.

 -n

 --
 Nathaniel J. Smith
 Postdoctoral researcher - Informatics - University of Edinburgh
 http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
___  
NumPy-Discussion mailing list  
NumPy-Discussion@scipy.org  
http://mail.scipy.org/mailman/listinfo/numpy-discussion  
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [help needed] associativity and precedence of '@'

2014-03-14 Thread Chris Laumann
Hi all,

Let me preface my two cents by saying that I think the best part of @ being 
accepted is the potential for deprecating the matrix class — the syntactic 
beauty of infix for matrix multiply is a nice side effect IMHO :) This may be 
why my basic attitude is:

I don’t think it matters very much but I would vote (weakly) for weak-right. 
Where there is ambiguity, I suspect most practitioners will just put in 
parentheses anyway — especially with combinations of * and @, where I don’t 
think there is a natural intuitive precedence relationship. At least, 
element-wise multiplication is very rare in math/physics texts as an explicitly 
defined elementary operation so I’d be surprised if anybody had a strong 
intuition about the precedence of the ‘*’ operator. And the binding order 
doesn’t matter if it is scalar multiplication.

I have quite a bit of code with large matrices where the order of matrix-vector 
multiplies is an important optimization and I would certainly have a few 
simpler looking expressions for op @ op @ vec, hence the weak preference for 
right-associativity. That said, I routinely come across situations where the 
optimal matrix multiplication order is more complicated than can be expressed 
as left-right or right-left (because some matrices might be diagonal, CSR or 
CSC), which is why the preference is only weak. I don’t see a down-side in the 
use-case that it is actually associative (as in matrix-matrix-vector). 

Best, Chris



-- 
Chris Laumann
Sent with Airmail

On March 14, 2014 at 8:42:00 PM, Nathaniel Smith (n...@pobox.com) wrote:

Hi all,

Here's the main blocker for adding a matrix multiply operator '@' to Python: we 
need to decide what we think its precedence and associativity should be. I'll 
explain what that means so we're on the same page, and what the choices are, 
and then we can all argue about it. But even better would be if we could get 
some data to guide our decision, and this would be a lot easier if some of you 
all can help; I'll suggest some ways you might be able to do that.

So! Precedence and left- versus right-associativity. If you already know what 
these are you can skim down until you see CAPITAL LETTERS.

We all know what precedence is. Code like this:
  a + b * c
gets evaluated as:
  a + (b * c)
because * has higher precedence than +. It binds more tightly, as they say. 
Python's complete precedence able is here:
  http://docs.python.org/3/reference/expressions.html#operator-precedence

Associativity, in the parsing sense, is less well known, though it's just as 
important. It's about deciding how to evaluate code like this:
  a * b * c
Do we use
  a * (b * c)    # * is right associative
or
  (a * b) * c    # * is left associative
? Here all the operators have the same precedence (because, uh... they're the 
same operator), so precedence doesn't help. And mostly we can ignore this in 
day-to-day life, because both versions give the same answer, so who cares. But 
a programming language has to pick one (consider what happens if one of those 
objects has a non-default __mul__ implementation). And of course it matters a 
lot for non-associative operations like
  a - b - c
or
  a / b / c
So when figuring out order of evaluations, what you do first is check the 
precedence, and then if you have multiple operators next to each other with the 
same precedence, you check their associativity. Notice that this means that if 
you have different operators that share the same precedence level (like + and 
-, or * and /), then they have to all have the same associativity. All else 
being equal, it's generally considered nice to have fewer precedence levels, 
because these have to be memorized by users.

Right now in Python, every precedence level is left-associative, except for 
'**'. If you write these formulas without any parentheses, then what the 
interpreter will actually execute is:
  (a * b) * c
  (a - b) - c
  (a / b) / c
but
  a ** (b ** c)

Okay, that's the background. Here's the question. We need to decide on 
precedence and associativity for '@'. In particular, there are three different 
options that are interesting:

OPTION 1 FOR @:
Precedence: same as *
Associativity: left
My shorthand name for it: same-left (yes, very creative)

This means that if you don't use parentheses, you get:
   a @ b @ c  -  (a @ b) @ c
   a * b @ c  -  (a * b) @ c
   a @ b * c  -  (a @ b) * c

OPTION 2 FOR @:
Precedence: more-weakly-binding than *
Associativity: right
My shorthand name for it: weak-right

This means that if you don't use parentheses, you get:
   a @ b @ c  -  a @ (b @ c)
   a * b @ c  -  (a * b) @ c
   a @ b * c  -  a @ (b * c)

OPTION 3 FOR @:
Precedence: more-tightly-binding than *
Associativity: right
My shorthand name for it: tight-right

This means that if you don't use parentheses, you get:
   a @ b @ c  -  a @ (b @ c)
   a * b @ c  -  a * (b @ c)
   a @ b * c  -  (a @ b) * c

We need to pick which of which options we think is best, based

Re: [Numpy-discussion] Memory leak?

2014-02-04 Thread Chris Laumann
Hi all-

Thanks for the info re: memory leak. In trying to work around it, I think I’ve 
discovered another (still using SuperPack). This leaks ~30MB / run:

hists = zeros((50,64), dtype=int)
for i in range(50):
    for j in range(2**13):
        hists[i,j%64] += 1

The code leaks using hists[i,j] = hists[i,j] + 1 as well. 

Is this the same leak or different? Doesn’t seem to have much in common.

Incidentally, using 

a = ones(v.shape[0])
a.dot(v)

Instead of np.sum (in the previous example that i sent) does not leak. 

Re: superpack.. As a fairly technically proficient user, I’m aware that the 
super pack installs dev builds and that they may therefore by somewhat less 
reliable. I’m okay with that tradeoff and I don’t expect you guys to actually 
treat the super pack as a stable release — I also try to report that I’m using 
the superpack when I report bugs. I sometimes run git versions of ipython, 
numpy, etc in order to fiddle with the code and make tiny bug 
fixes/contributions myself. I don’t know the statistics re: superpack users but 
there is no link from scipy.org’s main install page so most new users won’t 
find it easily. Fonnesbeck’s webpage does say they are dev builds only two 
sentences into the paragraph.

Best, Chris








-- 
Chris Laumann
Sent with Airmail

On January 31, 2014 at 9:31:40 AM, Julian Taylor 
(jtaylor.deb...@googlemail.com) wrote:

On 31.01.2014 18:12, Nathaniel Smith wrote:  
 On Fri, Jan 31, 2014 at 4:29 PM, Benjamin Root ben.r...@ou.edu wrote:  
 Just to chime in here about the SciPy Superpack... this distribution tracks  
 the master branch of many projects, and then puts out releases, on the  
 assumption that master contains pristine code, I guess. I have gone down  
 strange rabbit holes thinking that a particular bug was fixed already and  
 the user telling me a version number that would confirm that, only to  
 discover that the superpack actually packaged matplotlib about a month prior 
  
 to releasing a version.  
  
 I will not comment on how good or bad of an idea it is for the Superpack to  
 do that, but I just wanted to make other developers aware of this to keep  
 them from falling down the same rabbit hole.  
  
 Wow, that is good to know. Esp. since the web page:  
 http://fonnesbeck.github.io/ScipySuperpack/  
 simply advertises that it gives you things like numpy 1.9 and scipy  
 0.14, which don't exist. (With some note about dev versions buried in  
 prose a few sentences later.)  
  
 Empirically, development versions of numpy have always contained bugs,  
 regressions, and compatibility breaks that were fixed in the released  
 version; and we make absolutely no guarantees about compatibility  
 between dev versions and any release versions. And it sort of has to  
 be that way for us to be able to make progress. But if too many people  
 start using dev versions for daily use, then we and downstream  
 dependencies will have to start adding compatibility hacks and stuff  
 to support those dev versions. Which would be a nightmare for  
 developers and users both.  
  
 Recommending this build for daily use by non-developers strikes me as  
 dangerous for both users and the wider ecosystem.  
  

while probably not good for the user I think its very good for us.  
This is the second bug I introduced found by superpack users.  
This one might have gone unnoticed into the next release as it is pretty  
much impossible to find via tests. Even in valgrind reports its hard to  
find as its lumped in with all of pythons hundreds of memory arena  
still-reachable leaks.  

Concerning the fix, it seems if python sees tp_free == PYObject_Del/Free  
it replaces it with the tp_free of the base type which is int_free in  
this case. int_free uses a special allocator for even lower overhead so  
we start leaking.  
We either need to find the right flag to set for our scalars so it stops  
doing that, add an indirection so the function pointers don't match or  
stop using the object allocator as we are apparently digging to deep  
into pythons internal implementation details by doing so.  
___  
NumPy-Discussion mailing list  
NumPy-Discussion@scipy.org  
http://mail.scipy.org/mailman/listinfo/numpy-discussion  
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Memory leak?

2014-01-31 Thread Chris Laumann
Current scipy superpack for osx so probably pretty close to master. So it's a known leak? Hmm. Maybe I'll have to work on a different machine for a bit.Chris---Sent from my iPhone using Mail Ninja--- Original Message ---which version of numpy are you using?there seems to be a leak in the scalar return due to the PyObject_Malloc usage in git master, but it doesn't affect 1.8.0
On Fri, Jan 31, 2014 at 7:20 AM, Chris Laumann chris.laum...@gmail.com wrote:
Hi all-
The following snippet appears to leak memory badly (about 10 MB per execution):
P = randint(0,2,(30,13))
for i in range(50):  print "\r", i, "/", 50  for ai in ndindex((2,)*13):j = np.sum(P.dot(ai))
If instead you execute (no np.sum call):P = randint(0,2,(30,13))
for i in range(50):  print "\r", i, "/", 50  for ai in ndindex((2,)*13):j = P.dot(ai)
There is no leak.
Any thoughts? I’m stumped.
Best, Chris --
Chris LaumannSent with Airmail___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Memory leak?

2014-01-30 Thread Chris Laumann
Hi all-

The following snippet appears to leak memory badly (about 10 MB per execution):

P = randint(0,2,(30,13))

for i in range(50):
    print \r, i, /, 50
    for ai in ndindex((2,)*13):
        j = np.sum(P.dot(ai))

If instead you execute (no np.sum call):

P = randint(0,2,(30,13))

for i in range(50):
    print \r, i, /, 50
    for ai in ndindex((2,)*13):
        j = P.dot(ai)

There is no leak. 

Any thoughts? I’m stumped.

Best, Chris

-- 
Chris Laumann
Sent with Airmail___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Memory leak in numpy?

2014-01-26 Thread Chris Laumann
Hi all-

I think I just found a memory leak in numpy, or maybe I just don’t understand 
generators. Anyway, the following snippet will quickly eat a ton of RAM:

P = randint(0,2, (20,13))

for i in range(50):
    for ai in ndindex((2,)*13):
        j = P.dot(ai)


If you replace the last line with something like j = ai, the memory leak goes 
away. I’m not exactly sure what’s going on but the .dot seems to be causing the 
memory taken by the tuple ai to be held.

This devours RAM in python 2.7.5 (OS X Mavericks default I believe), numpy 
version 1.8.0.dev-3084618. I’m upgrading to the latest Superpack (numpy 1.9) 
right now but I somehow doubt this behavior will change.

Any thoughts?

Best, Chris

-- 
Chris Laumann
Sent with Airmail___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Bitwise operations and unsigned types

2012-04-06 Thread Chris Laumann
Good morning all-- didn't realize this would generate quite such a buzz.

To answer a direct question, I'm using the github master. A few thoughts (from 
a fairly heavy numpy user for numerical simulations and analysis):

The current behavior is confusing and (as far as i can tell) undocumented. 
Scalars act up only if they are big:

In [152]: np.uint32(1)  1
Out[152]: 1

In [153]: np.uint64(1)  1
---
TypeError Traceback (most recent call last)
/Users/claumann/ipython-input-153-191a0b5fe216 in module()
 1 np.uint64(1)  1

TypeError: ufunc 'bitwise_and' not supported for the input types, and the 
inputs could not be safely coerced to any supported types according to the 
casting rule ''safe''

But arrays don't seem to mind:

In [154]: ones(3, dtype=np.uint32)  1
Out[154]: array([1, 1, 1], dtype=uint32)

In [155]: ones(3, dtype=np.uint64)  1
Out[155]: array([1, 1, 1], dtype=uint64)

As you mentioned, explicitly casting 1 to np.uint makes the above scalar case 
work, but I don't understand why this is unnecessary for the arrays. I could 
understand a general argument that type casting rules should always be the same 
independent of the underlying ufunc, but I'm not sure if that is sufficiently 
smart. Bitwise ops probably really ought to treat nonnegative python integers 
as unsigned.

 I disagree, promoting to object kind of destroys the whole idea of bitwise 
 operations. I think we *fixed* a bug.
 
 That is an interesting point of view. I could see that point of view.  
 But, was this discussed as a bug prior to this change occurring?  

I'm not sure what 'promoting to object' constitutes in the new numpy, but just 
a small thought. I can think of two reasons to go to the trouble of using 
bitfields over more pythonic (higher level) representations: speed/memory 
overhead and interfacing with external hardware/software. For me, it's mostly 
the former -- I've already implemented this program once using a much more 
pythonic approach but it just has too much memory overhead to scale to where I 
want it. If a coder goes to the trouble of using bitfields, there's probably a 
good reason they wanted a lower level representation in which bitfield ops 
happen in parallel as integer operations.

But, what do you mean that bitwise operations are destroyed by promotion to 
objects?

Best, Chris



On Apr 6, 2012, at 5:57 AM, Nathaniel Smith wrote:

 On Fri, Apr 6, 2012 at 7:19 AM, Travis Oliphant tra...@continuum.io wrote:
 That is an interesting point of view. I could see that point of view.
  But, was this discussed as a bug prior to this change occurring?
 
 I just heard from a very heavy user of NumPy that they are nervous about
 upgrading because of little changes like this one.   I don't know if this
 particular issue would affect them or not, but I will re-iterate my view
 that we should be very careful of these kinds of changes.
 
 I agree -- these changes make me very nervous as well, especially
 since I haven't seen any short, simple description of what changed or
 what the rules actually are now (comparable to the old scalars do not
 affect the type of arrays).
 
 But, I also want to speak up in favor in one respect, since real world
 data points are always good. I had some code that did
  def do_something(a):
a = np.asarray(a)
a -= np.mean(a)
...
 If someone happens to pass in an integer array, then this is totally
 broken -- np.mean(a) may be non-integral, and in 1.6, numpy silently
 discards the fractional part and performs the subtraction anyway,
 e.g.:
 
 In [4]: a
 Out[4]: array([0, 1, 2, 3])
 
 In [5]: a -= 1.5
 
 In [6]: a
 Out[6]: array([-1,  0,  0,  1])
 
 The bug was discovered when Skipper tried running my code against
 numpy master, and it errored out on the -=. So Mark's changes did
 catch one real bug that would have silently caused completely wrong
 numerical results!
 
 https://github.com/charlton/charlton/commit/d58c72529a5b33d06b49544bc3347c6480dc4512
 
 - Nathaniel
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Bitwise operations and unsigned types

2012-04-05 Thread Chris Laumann
Hi all- 

I've been trying to use numpy arrays of ints as arrays of bit fields and mostly 
this works fine. However, it seems that the bitwise_* ufuncs do not support 
unsigned integer dtypes:

In [142]: np.uint64(5)3
---
TypeError Traceback (most recent call last)
/Users/claumann/ipython-input-142-65e3301d5d07 in module()
 1 np.uint64(5)3

TypeError: ufunc 'bitwise_and' not supported for the input types, and the 
inputs could not be safely coerced to any supported types according to the 
casting rule ''safe''

This seems odd as unsigned ints are the most natural bitfields I can think of 
-- the sign bit is just confusing when doing bit manipulation. Python itself of 
course doesn't make much a distinction between ints, longs, unsigned etc.

Is this a bug?

Thanks, Chris 

-- 
Chris Laumann
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion