Re: [Perldl] Loading large data from database into a piddle

2014-11-14 Thread kmx
I have tried pg_getcopydata, however I was not able to make it better than 
my old approach. After many tries it was still 15-20% slower.


My guess is that pg_getcopydata(..) might be significantly faster when 
dumping the whole table (which I was not able to test as the table in 
question was too big). When dumping a result of SQL query there seems to be 
no advantage.


I have also slightly updated my "maybe module" at 
https://gist.github.com/kmx/6f1234478828e7960fbd


--
kmx

On 12.11.2014 23:54, kmx wrote:

Thanks, pg_getcopydata sounds very promising.

I'll try to implement an alternative solution based on pg_getcopydata and 
compare it with my current approach.


--
kmx

On 12.11.2014 16:48, Vikas N Kumar wrote:

On 11/12/2014 07:43 AM, kmx wrote:

my $dbh = DBI->connect($dsn);
  my $pdl = pdl($dbh->selectall_arrayref($sql_query));

But it does not scale well for very large data (millions of rows).


Hi KMX

If you're using Postgresql you should use the DBD::Pg->pg_getcopydata 
using the "COPY mytable to STDOUT" functionality for accessing millions 
of rows. You can do this in async or sync mode. This will get you there 
faster than using selectall_arrayref(). This allows you to get the rows 
without having to redesign your DB.


SQLite has a stream API but I am not familiar with it.

--Vikas




___
Perldl mailing list
Perldl@jach.hawaii.edu
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl


Re: [Perldl] Loading large data from database into a piddle

2014-11-14 Thread Doug Hunt
Hi kmx:  What if you make a temporary table by selecting the subset of 
the table you want and then use pg_getcopydata to dump this entire temp 
table?


Just a thought...

Regards,

  Doug Hunt

dh...@ucar.edu
Software Engineer
UCAR - COSMIC, Tel. (303) 497-2611

On Fri, 14 Nov 2014, kmx wrote:


I have tried pg_getcopydata, however I was not able to make it better than my 
old approach. After many tries it was still
15-20% slower.

My guess is that pg_getcopydata(..) might be significantly faster when dumping 
the whole table (which I was not able to test as
the table in question was too big). When dumping a result of SQL query there 
seems to be no advantage.

I have also slightly updated my "maybe module" at 
https://gist.github.com/kmx/6f1234478828e7960fbd

--
kmx

On 12.11.2014 23:54, kmx wrote:
  Thanks, pg_getcopydata sounds very promising.

  I'll try to implement an alternative solution based on pg_getcopydata and 
compare it with my current approach.

  --
  kmx

  On 12.11.2014 16:48, Vikas N Kumar wrote:
  On 11/12/2014 07:43 AM, kmx wrote:
my $dbh = DBI->connect($dsn);
  my $pdl = pdl($dbh->selectall_arrayref($sql_query));

But it does not scale well for very large data (millions of rows).


  Hi KMX

  If you're using Postgresql you should use the DBD::Pg->pg_getcopydata using the 
"COPY mytable to STDOUT"
  functionality for accessing millions of rows. You can do this in async or 
sync mode. This will get you there
  faster than using selectall_arrayref(). This allows you to get the rows 
without having to redesign your DB.

  SQLite has a stream API but I am not familiar with it.

  --Vikas




___
Perldl mailing list
Perldl@jach.hawaii.edu
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl


[Perldl] matching vectors inside a PDL

2014-11-14 Thread LYONS, KENNETH B (KENNETH)
I need to be able to match a vector inside a PDL, and can't find a way to do 
it.  The existence of qsortvec and uniqvec  functions implies that such a 
comparison function exists (since you'd need to do that to sort) but the 
documentation doesn't give any info on it.  More specifically, if I have an nxm 
PDL $P, containing vectors of length n in the first dimension, and an nx1 PDL 
representing a test vector, $test, I want to be able to get the indices along 
the 2nd dimension where the vector in the PDL matches the test one.

I would expect that such a function, which I'll provisionally name findveci, 
would operate as
$findresult = $P->findveci($test)
Where $findresult would be a 1-dimensional PDL giving the set of indices along 
the second dimension of $P that match the vector $test.

I should note that a similar purpose would be served by a function uniqveci 
(which, although an obvious extension of the set that are available, also seems 
not to exist), since you could combine that with qsortvec to do what I'm 
talking about.  At present, I've resorted to pulling the vectors into perl 
lists and doing the matching there.  But that's far slower, and it seems wrong 
to have to do it that way.

Any suggestions?
___
Perldl mailing list
Perldl@jach.hawaii.edu
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl


Re: [Perldl] matching vectors inside a PDL

2014-11-14 Thread Derek Lamb
Hi Kenneth,

I did this.  The last line has what you're looking for in one line, but the 
stuff leading up to it shows my thought process:  

pdl> $P = rint(random(3,10)*5)

pdl> p $P

[
 [4 1 4]
 [5 4 2]
 [1 2 2]
 [0 3 0]
 [1 1 2]
 [2 1 2]
 [4 0 1]
 [4 1 4]
 [0 1 4]
 [4 2 3]
]

pdl> $test = pdl(4,1,4) #turns out that [4 1 4] turns up twice, so I'll just 
pick that for now

pdl> p $P==$test

[
 [1 1 1]
 [0 0 0]
 [0 0 0]
 [0 0 0]
 [0 1 0]
 [0 1 0]
 [1 0 0]
 [1 1 1]
 [0 1 1]
 [1 0 0]
]

pdl> p sumover($P==$test)
[3 0 0 0 1 1 1 3 2 1]
pdl> p sumover($P==$test)==$P->dim(0)
[1 0 0 0 0 0 0 1 0 0]
pdl> p $findresult = which(sumover($P==$test)==$P->dim(0))
[0 7]

Is that what you're looking for?

Actually, a little cleaner way is to do

pdl> p $findresult = which(andover($P==$test))
[0 7]

cheers,
Derek

On Nov 13, 2014, at 6:20 PM, LYONS, KENNETH B (KENNETH)  
wrote:

> I need to be able to match a vector inside a PDL, and can’t find a way to do 
> it.  The existence of qsortvec and uniqvec  functions implies that such a 
> comparison function exists (since you’d need to do that to sort) but the 
> documentation doesn’t give any info on it.  More specifically, if I have an 
> nxm PDL $P, containing vectors of length n in the first dimension, and an nx1 
> PDL representing a test vector, $test, I want to be able to get the indices 
> along the 2nd dimension where the vector in the PDL matches the test one. 
>  
> I would expect that such a function, which I’ll provisionally name findveci, 
> would operate as
> $findresult = $P->findveci($test)
> Where $findresult would be a 1-dimensional PDL giving the set of indices 
> along the second dimension of $P that match the vector $test. 
>  
> I should note that a similar purpose would be served by a function uniqveci 
> (which, although an obvious extension of the set that are available, also 
> seems not to exist), since you could combine that with qsortvec to do what 
> I’m talking about.  At present, I’ve resorted to pulling the vectors into 
> perl lists and doing the matching there.  But that’s far slower, and it seems 
> wrong to have to do it that way.
>  
> Any suggestions?
> ___
> Perldl mailing list
> Perldl@jach.hawaii.edu
> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl

___
Perldl mailing list
Perldl@jach.hawaii.edu
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl


Re: [Perldl] matching vectors inside a PDL

2014-11-14 Thread Derek Lamb
No problem.  Glad to help.

The documentation for the basic PDL operators is in PDL::Ops, which you can get 
by doing 

pdl> ?ops

That will show you all the basic math operators, most of which have both named 
functions (e.g., 'divide'), overloaded operators (e.g., '/'), and can modify 
the original piddle in place if you use the 'inplace' syntax.  So as long as 
your piddles have the same dimensions, or are at least thread-compatible (i.e., 
have the same 0th dimension, which is what the example I gave you did), it will 
work.  If you play with '<', and '>' you'll see that the test is done 
element-by-element, just like it was for '==' (so it's not a lexical 
comparison).  If you want to know if all elements of $P are less than the 
corresponding element of $test, then you'll need to collapse along the 0th 
dimension.  Same for if you want to know if only some of the elements of $P are 
less than $test.  

pdl> ??sort 

or

pdl> apropos sort

searches the documentation for functions whose name or description matches 
'sort'.

If you're new to PDL, check out the PDL Book, which you can download from 
pdl.perl.org.  That and the "First Steps" document should be enough to give you 
the lay of the land.  As you've noticed, the mailing list is also pretty 
responsive.

cheers,
Derek


On Nov 14, 2014, at 11:37 AM, LYONS, KENNETH B (KENNETH) 
 wrote:

> Hey, thanks.  This looks exactly right.  I hadn’t realized that the == 
> operator would work with *2* PDL operands!  (boy do I wish this stuff had 
> better documentation!).
>  
> I’ll try it out, but from the looks of what you presented here, this looks 
> like just the ticket.
>  
> Do you know if you can do the same with the > operator?  The qsortvec 
> function sorts vectors in lexical order, so I’d guess that if == works then 
> the > and < operators probably work with 2 PDL operands as well (again 
> assuming lexical ordering)?
>  
> Ken
>  
> p.s. Interesting that you’re the one who replied.  I got my physics PhD at 
> Boulder, eons ago.
>  
> Ken
>  
>  
> From: Derek Lamb [mailto:de...@boulder.swri.edu] 
> Sent: Friday, November 14, 2014 1:26 PM
> To: LYONS, KENNETH B (KENNETH)
> Cc: perldl@jach.hawaii.edu
> Subject: Re: [Perldl] matching vectors inside a PDL
>  
> Hi Kenneth,
>  
> I did this.  The last line has what you're looking for in one line, but the 
> stuff leading up to it shows my thought process:  
>  
> pdl> $P = rint(random(3,10)*5)
>  
> pdl> p $P
>  
> [
>  [4 1 4]
>  [5 4 2]
>  [1 2 2]
>  [0 3 0]
>  [1 1 2]
>  [2 1 2]
>  [4 0 1]
>  [4 1 4]
>  [0 1 4]
>  [4 2 3]
> ]
>  
> pdl> $test = pdl(4,1,4) #turns out that [4 1 4] turns up twice, so I'll just 
> pick that for now
>  
> pdl> p $P==$test
>  
> [
>  [1 1 1]
>  [0 0 0]
>  [0 0 0]
>  [0 0 0]
>  [0 1 0]
>  [0 1 0]
>  [1 0 0]
>  [1 1 1]
>  [0 1 1]
>  [1 0 0]
> ]
>  
> pdl> p sumover($P==$test)
> [3 0 0 0 1 1 1 3 2 1]
> pdl> p sumover($P==$test)==$P->dim(0)
> [1 0 0 0 0 0 0 1 0 0]
> pdl> p $findresult = which(sumover($P==$test)==$P->dim(0))
> [0 7]
>  
> Is that what you're looking for?
>  
> Actually, a little cleaner way is to do
>  
> pdl> p $findresult = which(andover($P==$test))
> [0 7]
>  
> cheers,
> Derek
>  
> On Nov 13, 2014, at 6:20 PM, LYONS, KENNETH B (KENNETH) 
>  wrote:
> 
> 
> I need to be able to match a vector inside a PDL, and can’t find a way to do 
> it.  The existence of qsortvec and uniqvec  functions implies that such a 
> comparison function exists (since you’d need to do that to sort) but the 
> documentation doesn’t give any info on it.  More specifically, if I have an 
> nxm PDL $P, containing vectors of length n in the first dimension, and an nx1 
> PDL representing a test vector, $test, I want to be able to get the indices 
> along the 2nd dimension where the vector in the PDL matches the test one. 
>  
> I would expect that such a function, which I’ll provisionally name findveci, 
> would operate as
> $findresult = $P->findveci($test)
> Where $findresult would be a 1-dimensional PDL giving the set of indices 
> along the second dimension of $P that match the vector $test. 
>  
> I should note that a similar purpose would be served by a function uniqveci 
> (which, although an obvious extension of the set that are available, also 
> seems not to exist), since you could combine that with qsortvec to do what 
> I’m talking about.  At present, I’ve resorted to pulling the vectors into 
> perl lists and doing the matching there.  But that’s far slower, and it seems 
> wrong to have to do it that way.
>  
> Any suggestions?
> ___
> Perldl mailing list
> Perldl@jach.hawaii.edu
> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl

___
Perldl mailing list
Perldl@jach.hawaii.edu
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl


Re: [Perldl] matching vectors inside a PDL

2014-11-14 Thread Derek Lamb
Hi Ken,

Please cc the list on your replies—others may have more insight than I.

== does compare element-by-element.  If you pass in a simple Perl scalar ( p 
$P==4 ), you will get a piddle of 1s where $P is 4.  If you pass in a 1-element 
piddle ( p $P==pdl(4) ), you will get the same thing.  If you pass in a vector 
$test whose dimension #0 matches the dimension #0 of $P (and any subsequent 
dimensions are of size 1), then it will do an element-by-element comparison of 
each element of $P with the corresponding element in $test.  Notice in the 
intermediate results I gave, there are some rows that have all 1s (where all 
elements of that row of $P are equal to the corresponding element in $test), 
there are some rows that have all 0s (where no elements of that row of $P are 
equal to the corresponding element in $test), but that there are ALSO some rows 
that have one or two 1s, where that element of $P matched with the 
corresponding element of $test, but others did not.  The test for VECTOR 
EQUALITY was done with the sumover() == 3 or andover() functions.  So in 
this sense ==, <, <=, >, and >= all function exactly the same way.

Re: documentation.  There are two commands, help (aliased to ? in the pdl 
shell), and apropos (aliased to ?? in the pdl shell).  help is akin to UNIX 
man, apropos is akin to UNIX apropos.  What are you doing to get several 
hundred entries for a query of 'list'?  In the pdl shell, using apropos I get 
<30 entries, and doing ?list brings me right to the documentation for the PDL 
function "list".

cheers,
Derek


> On Nov 14, 2014, at 1:33 PM, LYONS, KENNETH B (KENNETH) 
>  wrote:
> 
> Yes, most of this I knew, but thanks.  It’s because of that behavior of > and 
> <, that you mentioned, that I thought that ‘==’ would compare element by 
> element instead of on the whole vector.
>  
> Have you ever tried, for example, to search the documentation for, say, the 
> function “list”?  it gives you every occurrence of the word “list” in the 
> documents (which, needless to say, is rather voluminous, and the first few 
> hundred entries have nothing to do with the function!)  there should be some 
> analog of the “man” command in unix that gives you information about the 
> *function* without all the other garbage.  I think it’s just doing something 
> akin to a grep thru the documents.
>  
> It’s horribly designed in that regard.  The software itself is great, and I’m 
> very happy with the results, but finding the simplest little thing in the 
> docs can be a total pain!
>  
> Ken
>  
>  
> From: Derek Lamb [mailto:de...@boulder.swri.edu] 
> Sent: Friday, November 14, 2014 2:16 PM
> To: LYONS, KENNETH B (KENNETH)
> Cc: perldl
> Subject: Re: [Perldl] matching vectors inside a PDL
>  
> No problem.  Glad to help.
>  
> The documentation for the basic PDL operators is in PDL::Ops, which you can 
> get by doing 
>  
> pdl> ?ops
>  
> That will show you all the basic math operators, most of which have both 
> named functions (e.g., 'divide'), overloaded operators (e.g., '/'), and can 
> modify the original piddle in place if you use the 'inplace' syntax.  So as 
> long as your piddles have the same dimensions, or are at least 
> thread-compatible (i.e., have the same 0th dimension, which is what the 
> example I gave you did), it will work.  If you play with '<', and '>' you'll 
> see that the test is done element-by-element, just like it was for '==' (so 
> it's not a lexical comparison).  If you want to know if all elements of $P 
> are less than the corresponding element of $test, then you'll need to 
> collapse along the 0th dimension.  Same for if you want to know if only some 
> of the elements of $P are less than $test.  
>  
> pdl> ??sort 
>  
> or
>  
> pdl> apropos sort
>  
> searches the documentation for functions whose name or description matches 
> 'sort'.
>  
> If you're new to PDL, check out the PDL Book, which you can download from 
> pdl.perl.org .  That and the "First Steps" document 
> should be enough to give you the lay of the land.  As you've noticed, the 
> mailing list is also pretty responsive.
>  
> cheers,
> Derek
>  
>  
> On Nov 14, 2014, at 11:37 AM, LYONS, KENNETH B (KENNETH) 
> mailto:k...@research.att.com>> wrote:
> 
> 
> Hey, thanks.  This looks exactly right.  I hadn’t realized that the == 
> operator would work with *2* PDL operands!  (boy do I wish this stuff had 
> better documentation!).
>  
> I’ll try it out, but from the looks of what you presented here, this looks 
> like just the ticket.
>  
> Do you know if you can do the same with the > operator?  The qsortvec 
> function sorts vectors in lexical order, so I’d guess that if == works then 
> the > and < operators probably work with 2 PDL operands as well (again 
> assuming lexical ordering)?
>  
> Ken
>  
> p.s. Interesting that you’re the one who replied.  I got my physics PhD at 
> Boulder, eons ago.
>  
> Ken
>  
>  
> From: Derek Lamb [mailto:de...@boulder.swri.ed