[Numpy-discussion] Test - Please ignore

2006-11-16 Thread Robert Kern
This should be rejected.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


[Numpy-discussion] New list address

2006-11-16 Thread Jeff Strunk
Good afternoon,

In a few minutes, this list will have a new address.

Please send all posts to [EMAIL PROTECTED] .

Thank you,
Jeff

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] Defining custom types

2006-11-16 Thread Jonathan Wang

Hi all,

I've gotten to the point where Numpy recognizes the objects (represented as
doubles), but I haven't figured out how to register ufunc loops on the
custom type. It seems like Numpy should be able to check that the scalarkind
variable in the numpy type descriptor is set to float and use the float
ufuncs on the custom object. Barring that, does anyone know if the symbols
for the ufuncs are publicly accessible (and where they are) so that I can
register them with Numpy on the custom type?

As for sharing code, I've been working on this for a project at work. There
is a possibility that it will be released to the Numpy community, but that's
not clear yet.

Thanks,
Jonathan

On 11/16/06, Matt Knox <[EMAIL PROTECTED]> wrote:


> On Thursday 16 November 2006 11:44, David Douard wrote:
> > Hi, just to ask you: how is the work going on encapsulatinsg
mx.DateTime
> > as a native numpy type?
> > And most important: is the code available somewhere? I am also
> > interested in using DateTime objects in numpy arrays. For now, I've
> > always used arrays of floats (using gmticks values of dates).

> And I, as arrays of objects (well, I wrote a subclass to deal with
dates,
> where each element is a datetime object, with methods to translate to
floats
> or strings , but it's far from optimal...). I'd also be quite interested
in
> checking what has been done.

I'm also very interested in the results of this. I need to do something
very similar and am currently relying on an ugly hack to achieve the desired
result.

- Matt Knox

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share
your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion



-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] Defining custom types

2006-11-16 Thread Matt Knox
> On Thursday 16 November 2006 11:44, David Douard wrote:> > Hi, just to ask 
> you: how is the work going on encapsulatinsg mx.DateTime> > as a native numpy 
> type?> > And most important: is the code available somewhere? I am also> > 
> interested in using DateTime objects in numpy arrays. For now, I've> > always 
> used arrays of floats (using gmticks values of dates).> And I, as arrays of 
> objects (well, I wrote a subclass to deal with dates, > where each element is 
> a datetime object, with methods to translate to floats > or strings , but 
> it's far from optimal...). I'd also be quite interested in > checking what 
> has been done.
 
I'm also very interested in the results of this. I need to do something very 
similar and am currently relying on an ugly hack to achieve the desired result.
 
- Matt Knox-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] mysql -> record array

2006-11-16 Thread Tim Hochberg
Francesc Altet wrote:
> A Dimarts 14 Novembre 2006 23:08, Erin Sheldon escrigué:
>   
>> On 11/14/06, John Hunter <[EMAIL PROTECTED]> wrote:
>> 
>>> Has anyone written any code to facilitate dumping mysql query results
>>> (mainly arrays of floats) into numpy arrays directly at the extension
>>> code layer.  The query results->list->array conversion can be slow.
>>>
>>> Ideally, one could do this semi-automagically with record arrays and
>>> table introspection
>>>   
>> I've been considering this as well.  I use both postgres and Oracle
>> in my work, and I have been using the python interfaces (cx_Oracle
>> and pgdb) to get result lists and convert to numpy arrays.
>>
>> The question I have been asking myself is "what is the advantage
>> of such an approach?".  It would be faster, but by how
>> much?  Presumably the bottleneck for most applications will
>> be data retrieval rather than data copying in memory.
>> 
>
> Well, that largely depends on your pattern to access the data in your
> database. If you are accessing to regions of your database that have a
> high degree of spatial locality (i.e. they are located in equal or
> very similar places), the data is most probably already in memory (in
> your filesystem cache or maybe in your database cache) and the
> bottleneck will become the memory access. Of course, if you don't have
> such a spatial locality in the access pattern, then the bottleneck
> will be the disk.
>
> Just to see how DB 2.0 could benefit from adopting record arrays as
> input buffers, I've done a comparison between SQLite3 and PyTables.
> PyTables doesn't suport DB 2.0 as such, but it does use record arrays
> as buffers internally so as to read data in an efficient way (there
> should be other databases that features this, but I know PyTables best
> ;)
>
> For this, I've used a modified version of a small benchmarking program
> posted by Tim Hochberg in this same thread (it is listed at the end
> of the message). Here are the results:
>
> setup SQLite took 23.5661110878 seconds
> retrieve SQLite took 3.26717996597 seconds
> setup PyTables took 0.139157056808 seconds
> retrieve PyTables took 0.13444685936 seconds
>
> [SQLite results were obtained using an in-memory database, while
> PyTables used an on-disk one. See the code.]
>
> So, yes, if your access pattern exhibits a high degree of locality,
> you can expect a huge difference on the reading speed (more than 20x
> for this example, but as this depends on the dataset size, it can be
> even higher for larger datasets).
>   
One weakness of this benchmark is that it doesn't break out how much of 
the sqlite3 overhead is inherent to the sqlite3 engine, which I expect 
is somewhat more complicated internally than PyTables, and how much is 
due to all the extra layers we go through to get the data into an array 
(native[in database]->Python Objects->Native[In record array]). To try 
to get at least a little handle on this, I add this test:

def querySQLite(conn):
c = conn.cursor()
c.execute('select * from demo where x = 0.0')
y = np.fromiter(c, dtype=dtype)
return y

This returns very little data (in the cases I ran it actually returned 
no data). However is still needs to loop over all the records and 
examine them. Here's what the timings looked like:

setup SQLite took 9.7173515 seconds
retrieve SQLite took 0.92131335 seconds
query SQLite took 0.313000202179 seconds

I'm reluctant to conclude to conclude that 1/3 of the time is spent in 
traversing the database and 2/3 of the time in creating the data solely 
because databases are big voodoo to me. Still, we can probably conclude 
that traversing the data itself is pretty expensive and we would be 
unlikely to approach PyTables speed even if we didn't have the extra 
overhead. On the other hand, there's a factor of three or so improvement 
that could be realized by reducing overhead.

Or maybe not. I think that the database has to return it's data a row at 
a time, so there's intrinsically a lot of copying that's going to 
happen. So, I think it's unclear whether getting the data directly in 
native format would be significantly cheaper. I suppose that the way to 
definitively test it would be to rewrite one of these tests in C. Any 
volunteers?

I think it's probably safe to say that either way PyTables will cream 
sqllite3 in those fields where it's applicable. One of these days I 
really need to dig into PyTables. I'm sure I could use it for something.

[snip]

-tim




-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@list

Re: [Numpy-discussion] Defining custom types

2006-11-16 Thread Colin J. Williams






David Douard wrote:

  On Thu, Oct 26, 2006 at 05:26:47PM -0500, Jonathan Wang wrote:
  
  
I'm trying to write a Numpy extension that will encapsulate mxDateTime as a
native Numpy type. I've decided to use a type inherited from Numpy's scalar
double. However, I'm running into all sorts of problems. I'm using numpy
1.0b5; I realize this is somewhat out of date.


  
  
Hi, just to ask you: how is the work going on encapsulatinsg mx.DateTime
as a native numpy type? 
And most important: is the code available somewhere? I am also
interested in using DateTime objects in numpy arrays. For now, I've
always used arrays of floats (using gmticks values of dates).

Thanks you,
David

  

It would be nice if dtype were subclassable to handle this sort of
thing.

Colin W.



-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] Defining custom types

2006-11-16 Thread Pierre GM
On Thursday 16 November 2006 11:44, David Douard wrote:
> Hi, just to ask you: how is the work going on encapsulatinsg mx.DateTime
> as a native numpy type?
> And most important: is the code available somewhere? I am also
> interested in using DateTime objects in numpy arrays. For now, I've
> always used arrays of floats (using gmticks values of dates).

And I, as arrays of objects (well, I wrote a subclass to deal with dates, 
where each element is a datetime object, with methods to translate to floats 
or strings , but it's far from optimal...). I'd also be quite interested in 
checking what has been done.

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] Defining custom types

2006-11-16 Thread David Douard
On Thu, Oct 26, 2006 at 05:26:47PM -0500, Jonathan Wang wrote:
> I'm trying to write a Numpy extension that will encapsulate mxDateTime as a
> native Numpy type. I've decided to use a type inherited from Numpy's scalar
> double. However, I'm running into all sorts of problems. I'm using numpy
> 1.0b5; I realize this is somewhat out of date.
> 

Hi, just to ask you: how is the work going on encapsulatinsg mx.DateTime
as a native numpy type? 
And most important: is the code available somewhere? I am also
interested in using DateTime objects in numpy arrays. For now, I've
always used arrays of floats (using gmticks values of dates).

Thanks you,
David



-- 
David Douard LOGILAB, Paris (France)
Formations Python, Zope, Plone, Debian : http://www.logilab.fr/formations
Développement logiciel sur mesure :  http://www.logilab.fr/services
Informatique scientifique :  http://www.logilab.fr/science


signature.asc
Description: Digital signature
-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] mysql -> record array

2006-11-16 Thread Francesc Altet
A Dimarts 14 Novembre 2006 23:08, Erin Sheldon escrigué:
> On 11/14/06, John Hunter <[EMAIL PROTECTED]> wrote:
> > Has anyone written any code to facilitate dumping mysql query results
> > (mainly arrays of floats) into numpy arrays directly at the extension
> > code layer.  The query results->list->array conversion can be slow.
> >
> > Ideally, one could do this semi-automagically with record arrays and
> > table introspection
>
> I've been considering this as well.  I use both postgres and Oracle
> in my work, and I have been using the python interfaces (cx_Oracle
> and pgdb) to get result lists and convert to numpy arrays.
>
> The question I have been asking myself is "what is the advantage
> of such an approach?".  It would be faster, but by how
> much?  Presumably the bottleneck for most applications will
> be data retrieval rather than data copying in memory.

Well, that largely depends on your pattern to access the data in your
database. If you are accessing to regions of your database that have a
high degree of spatial locality (i.e. they are located in equal or
very similar places), the data is most probably already in memory (in
your filesystem cache or maybe in your database cache) and the
bottleneck will become the memory access. Of course, if you don't have
such a spatial locality in the access pattern, then the bottleneck
will be the disk.

Just to see how DB 2.0 could benefit from adopting record arrays as
input buffers, I've done a comparison between SQLite3 and PyTables.
PyTables doesn't suport DB 2.0 as such, but it does use record arrays
as buffers internally so as to read data in an efficient way (there
should be other databases that features this, but I know PyTables best
;)

For this, I've used a modified version of a small benchmarking program
posted by Tim Hochberg in this same thread (it is listed at the end
of the message). Here are the results:

setup SQLite took 23.5661110878 seconds
retrieve SQLite took 3.26717996597 seconds
setup PyTables took 0.139157056808 seconds
retrieve PyTables took 0.13444685936 seconds

[SQLite results were obtained using an in-memory database, while
PyTables used an on-disk one. See the code.]

So, yes, if your access pattern exhibits a high degree of locality,
you can expect a huge difference on the reading speed (more than 20x
for this example, but as this depends on the dataset size, it can be
even higher for larger datasets).

> On the other hand, the database access modules for all major
> databases, with DB 2.0 semicomplience, have already been written.
> This is not an insignificant amount of work.  Writing our own
> interfaces for each  of our favorite databases would require an
> equivalent amount of work.

That's true, but still, feasible. However, before people would start
doing this on a general way, it should help implementing first in
Python something like the numpy.ndarray object: this would standarize
a full-fledged heterogeneous buffer for doing intensive I/O tasks.

> I think a set of timing tests would be useful.  I will try some
> using Oracle or postgres over the next few days.  Perhaps
> you could do the same with mysql.

Well, here it is my own benchmark (admittedly trivial). Hope it helps
in your comparisons.

--
import sqlite3, numpy as np, time, tables as pt, os, os.path

N = 50
rndata = np.random.rand(2, N)
dtype = np.dtype([('x',float), ('y', float)])
data = np.empty(shape=N, dtype=dtype)
data['x'] = rndata[0]
data['y'] = rndata[1]

def setupSQLite(conn):
c = conn.cursor()
c.execute('''create table demo (x real, y real)''')
c.executemany("""insert into demo values (?, ?)""", data)

def retrieveSQLite(conn):
c = conn.cursor()
c.execute('select * from demo')
y = np.fromiter(c, dtype=dtype)
return y

def setupPT(fileh):
fileh.createTable('/', 'table', data)

def retrievePT(fileh):
y = fileh.root.table[:]
return y


# if os.path.exists('test.sql3'):
# os.remove('test.sql3')
#conn = sqlite3.connect('test.sql3')
conn = sqlite3.connect(':memory:')

t0 = time.time()
setupSQLite(conn)
t1 = time.time()
print "setup SQLite took", t1-t0, "seconds"

t0 = time.time()
y1 = retrieveSQLite(conn)
t1 = time.time()
print "retrieve SQLite took", t1-t0, "seconds"

conn.close()

fileh = pt.openFile("test.h5", "w")

t0 = time.time()
setupPT(fileh)
t1 = time.time()
print "setup PyTables took", t1-t0, "seconds"

t0 = time.time()
y2 = retrievePT(fileh)
t1 = time.time()
print "retrieve PyTables took", t1-t0, "seconds"

fileh.close()

assert y1.shape == y2.shape
assert np.alltrue(y1 == y2)


-- 
>0,0<   Francesc Altet     http://www.carabos.com/
V   V   Cárabos Coop. V.   Enjoy Data
 "-"


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief