Re: [Numpy-discussion] Funny business with 'is' operator?

2013-09-09 Thread Robert Kern
On Fri, Sep 6, 2013 at 6:50 PM, James Bergstra 
wrote:
>
> Thanks, this is exactly what I was looking for. I'll look into what this
Diophantine equation is.

Let's say we have two arrays with shape tuples `shape0` and `shape1`,
stride tuples `stride0` and `stride1` and memory offsets `offset0` and
`offset1`. Without loss of generality, let's assume that itemsize=1 since
it is trivial to convert the general case to itemsize=1. Now, if you will
permit Einstein summation notation, you can generate the memory address of
every item in the first array like so:

  index0[i]*stride0[i] + offset0
  0 <= i < len(shape0)
  0 <= index0[i] < shape0[i]

There exists an overlap between the two arrays iff there exists two tuples
`index0` and `index1` such that

  index0[i]*stride0[i] + offset0 = index1[j]*stride0[j] + offset1
  0 <= i < len(shape0)
  0 <= index0[i] < shape0[i]
  0 <= j < len(shape1)
  0 <= index1[j] < shape1[j]

This is a bounded linear Diophantine equation. Diophantine because the
variables `index0[i]` and `index1[j]` are integer-valued, linear because
the variables only appear multiplied by a constant, and bounded because
each variable must stay within the size of its corresponding axis.

Unbounded linear Diophantine equations can be solved using an extended
version of the Euclidean GCD algorithm. Bounded linear Diophantine
equations are NP- to solve. With the right heuristics, a
branch-and-bound approach could probably work well.

> Also, relatedly, a few months ago Julian Taylor at least wrote what was
there in C, which made it faster, if not better.

The main benefit of that was to make it available at the C level for other
C-implemented routines, IIRC, not speed.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Funny business with 'is' operator?

2013-09-06 Thread James Bergstra
I'm stumped. I can't figure out how to extract from e.g.

view = A[:, 3]

that the view starts at element 3 of A. I was planning to make a
may_share_memory implementation based on the idea of swapping in a buffer
of 0s, and using the shapes, strides, itemsize etc. to increment just the
parts of the 0s buffer touched by the two ndarrays. If there  are any 2s in
the incremented buffer, it's because the two views overlapped. It's not the
best algorithm for comparing tiny views of huge arrays, I was wondering if
in my case it would have been quicker than the built-in method (I don't
know how it works). I actually have another data structure around to pull
out that shape and stride info, but it's a shame to use it, because then I
can't use the algorithm to compare ndarrays in general (or at least ones
that have been created by typical construction methods and slicing).


On Fri, Sep 6, 2013 at 12:48 PM, James Bergstra
wrote:

> Thanks for the tips! FWIW my guess is that since '.data' is dynamically
> generated property rather than an attribute, it is being freed and
> re-allocated in the loop, and once for each of my id() expressions.
>
>
> On Fri, Sep 6, 2013 at 12:32 PM, Charles R Harris <
> charlesr.har...@gmail.com> wrote:
>
>>
>>
>>
>> On Fri, Sep 6, 2013 at 10:19 AM, James Bergstra <
>> bergs...@iro.umontreal.ca> wrote:
>>
>>> Hi, could someone help me understand why this assertion fails?
>>>
>>> def test_is(self):
>>> a = np.empty(1)
>>> b = np.empty(1)
>>> if a.data is not b.data:
>>> assert id(a.data) != id(b.data) # <-- fail
>>>
>>> I'm trying to write an alternate may_share_memory function.
>>>
>>>
>> id seems not as advertised:
>>
>> In [22]: for i in range(10): print id(a.data)
>> 66094640
>> 66095792
>> 66095792
>> 66095792
>> 66095792
>> 66095792
>> 66094640
>> 66094640
>> 66094640
>> 66094640
>>
>> Not sure why.
>>
>> Chuck
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
>
> --
> http://www-etud.iro.umontreal.ca/~bergstrj
>



-- 
http://www-etud.iro.umontreal.ca/~bergstrj
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Funny business with 'is' operator?

2013-09-06 Thread James Bergstra
Thanks, this is exactly what I was looking for. I'll look into what this
Diophantine equation is. Also, relatedly, a few months ago Julian Taylor at
least wrote what was there in C, which made it faster, if not better.

- James


On Fri, Sep 6, 2013 at 1:27 PM, Robert Kern  wrote:

> On Fri, Sep 6, 2013 at 5:58 PM, James Bergstra 
> wrote:
> >
> > I'm stumped. I can't figure out how to extract from e.g.
> >
> > view = A[:, 3]
> >
> > that the view starts at element 3 of A. I was planning to make a
> may_share_memory implementation based on the idea of swapping in a buffer
> of 0s, and using the shapes, strides, itemsize etc. to increment just the
> parts of the 0s buffer touched by the two ndarrays. If there  are any 2s in
> the incremented buffer, it's because the two views overlapped. It's not the
> best algorithm for comparing tiny views of huge arrays, I was wondering if
> in my case it would have been quicker than the built-in method (I don't
> know how it works).
>
> It certainly won't be faster. may_share_memory() is very simplistic. It
> just checks if the outermost bounds of the each array overlap. Thus, it can
> give false positives for arrays that are interleaved but do not actually
> intersect. This is why it is named may_share_memory() instead of
> does_share_memory().
>
> > I actually have another data structure around to pull out that shape and
> stride info, but it's a shame to use it, because then I can't use the
> algorithm to compare ndarrays in general (or at least ones that have been
> created by typical construction methods and slicing).
>
> All of the information you need is stored in the __array_interface__
> attribute of an ndarray.
>
> The right way to solve this is to solve a bounded, linear Diophantine
> equation. That's where you should be looking if you want to crack this
> problem.
>
> --
> Robert Kern
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
http://www-etud.iro.umontreal.ca/~bergstrj
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Funny business with 'is' operator?

2013-09-06 Thread James Bergstra
Hi, could someone help me understand why this assertion fails?

def test_is(self):
a = np.empty(1)
b = np.empty(1)
if a.data is not b.data:
assert id(a.data) != id(b.data) # <-- fail

I'm trying to write an alternate may_share_memory function.

Thanks,
 - James
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Funny business with 'is' operator?

2013-09-06 Thread James Bergstra
Thanks for the tips! FWIW my guess is that since '.data' is dynamically
generated property rather than an attribute, it is being freed and
re-allocated in the loop, and once for each of my id() expressions.


On Fri, Sep 6, 2013 at 12:32 PM, Charles R Harris  wrote:

>
>
>
> On Fri, Sep 6, 2013 at 10:19 AM, James Bergstra  > wrote:
>
>> Hi, could someone help me understand why this assertion fails?
>>
>> def test_is(self):
>> a = np.empty(1)
>> b = np.empty(1)
>> if a.data is not b.data:
>> assert id(a.data) != id(b.data) # <-- fail
>>
>> I'm trying to write an alternate may_share_memory function.
>>
>>
> id seems not as advertised:
>
> In [22]: for i in range(10): print id(a.data)
> 66094640
> 66095792
> 66095792
> 66095792
> 66095792
> 66095792
> 66094640
> 66094640
> 66094640
> 66094640
>
> Not sure why.
>
> Chuck
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
http://www-etud.iro.umontreal.ca/~bergstrj
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Funny business with 'is' operator?

2013-09-06 Thread Robert Kern
On Fri, Sep 6, 2013 at 5:58 PM, James Bergstra 
wrote:
>
> I'm stumped. I can't figure out how to extract from e.g.
>
> view = A[:, 3]
>
> that the view starts at element 3 of A. I was planning to make a
may_share_memory implementation based on the idea of swapping in a buffer
of 0s, and using the shapes, strides, itemsize etc. to increment just the
parts of the 0s buffer touched by the two ndarrays. If there  are any 2s in
the incremented buffer, it's because the two views overlapped. It's not the
best algorithm for comparing tiny views of huge arrays, I was wondering if
in my case it would have been quicker than the built-in method (I don't
know how it works).

It certainly won't be faster. may_share_memory() is very simplistic. It
just checks if the outermost bounds of the each array overlap. Thus, it can
give false positives for arrays that are interleaved but do not actually
intersect. This is why it is named may_share_memory() instead of
does_share_memory().

> I actually have another data structure around to pull out that shape and
stride info, but it's a shame to use it, because then I can't use the
algorithm to compare ndarrays in general (or at least ones that have been
created by typical construction methods and slicing).

All of the information you need is stored in the __array_interface__
attribute of an ndarray.

The right way to solve this is to solve a bounded, linear Diophantine
equation. That's where you should be looking if you want to crack this
problem.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Funny business with 'is' operator?

2013-09-06 Thread Charles R Harris
On Fri, Sep 6, 2013 at 10:19 AM, James Bergstra
wrote:

> Hi, could someone help me understand why this assertion fails?
>
> def test_is(self):
> a = np.empty(1)
> b = np.empty(1)
> if a.data is not b.data:
> assert id(a.data) != id(b.data) # <-- fail
>
> I'm trying to write an alternate may_share_memory function.
>
>
id seems not as advertised:

In [22]: for i in range(10): print id(a.data)
66094640
66095792
66095792
66095792
66095792
66095792
66094640
66094640
66094640
66094640

Not sure why.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Funny business with 'is' operator?

2013-09-06 Thread Chris Barker - NOAA Federal
On Fri, Sep 6, 2013 at 9:19 AM, James Bergstra wrote:

> def test_is(self):
> a = np.empty(1)
> b = np.empty(1)
> if a.data is not b.data:
> assert id(a.data) != id(b.data) # <-- fail
>
>
I'm not familiar with the internals, but:

In [27]: a = np.empty(1)

In [28]: a.data
Out[28]: 

so .data is a buffer object, wrapped around a particular pointer.

In [29]: id(a.data)
Out[29]: 55581376

In [30]: id(a.data)
Out[30]: 55581728

but it looks like the buffer object itself is created on the fly -- so you
are getting a different pyton object, even though it wraps the same data
pointer. So you need to compare the value of the pointer compared, not the
identity of the buffer object.

Not sure how to get that value though, other that parsing __repr__

In [44]: a.data.__repr__().split()[-1][:-1]
Out[44]: '0x3512100'

ugly hack -- there must be a better way!

But:

In [38]: a = np.zeros((100,))

In [39]: b = a[50:]

In [40]: a.data
Out[40]: 

In [41]: b.data
Out[41]: 

a and b share memory, but don't have the same data pointer as b's is offset.

HTH,
   -Chris









> I'm trying to write an alternate may_share_memory function.
>
> Thanks,
>  - James
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Funny business with 'is' operator?:

2013-09-06 Thread Nathaniel Smith
The .data attribute is generated on the fly when accessed. So it returns an
anonymous temporary that's deallocated as soon as it's no longer needed.

a.data is b.data needs both objects, so both get allocated and then
compared. In the second one though, each object gets allocated one at a
time and then immediately released. So the a.data and b.data objects are
different... but since they have non-overlapping lifetimes, they happen to
be out at the same memory location. id() is only guaranteed to be unique
while the object is alive.

Notice also:
a.data is a.data
-> False

-n
On 6 Sep 2013 17:21, "James Bergstra"  wrote:

> Hi, could someone help me understand why this assertion fails?
>
> def test_is(self):
> a = np.empty(1)
> b = np.empty(1)
> if a.data is not b.data:
> assert id(a.data) != id(b.data) # <-- fail
>
> I'm trying to write an alternate may_share_memory function.
>
> Thanks,
>  - James
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion