Re: [GENERAL] obtaining ARRAY position for a given match

2009-11-23 Thread Pedro Doria Meunier

Regarding this thread...
I've been away for a while...
But thank you to all who replied! :)

BR,
Pedro

On 11/19/2009 07:03 PM, Sam Mason wrote:

On Thu, Nov 19, 2009 at 10:47:02AM -0800, Scott Bailey wrote:
   

Sam Mason wrote:
 

Is idx really the best name for this?
   

Well I used idx() because there was already a idx(int[], int) function
with the _int contrib module.
 

I don't remember ever using that before, hence my question.

   

In other languages, it is assumed you are looking for the first index.
 

Huh, they seem to don't they, even my old stalwart of pedantry, Haskell,
follows form here.  Not sure why I'd never noticed before, idx is
looking more and more sensible!

   


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] obtaining ARRAY position for a given match

2009-11-19 Thread Scott Bailey

Pedro Doria Meunier wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi,

I'm trying to get the array position for a given match as thus:

This gets me the record for a given match:
SELECT *  FROM garmin_units WHERE 'L' = ANY (protocol_tag);

Ok. so far so good...
But what about getting the array position at which 'L' is stored?

Searching the Postgresql docs gives me no answer... :(
Am I on a wild goose chase?

Any insight highly appreciated ;)

BR,
Pedro Doria Meunier.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iEYEARECAAYFAksFP6IACgkQ2FH5GXCfxAuasgCgu/d68fkg16r1OF/2QSLnmwhW
gjYAniyQ1Mn/72323NSznxgakF4dn98k
=tWbI
-END PGP SIGNATURE-


I wrote this a while ago. If you are on 8.4 use unnest instead. And if 
you are searching thru really big arrays, use plpgsql so you can 
terminate when you find a match.


CREATE OR REPLACE FUNCTION idx(text[], text)
RETURNS int AS
$$
SELECT MIN(CASE WHEN $1[i] = $2 THEN i
ELSE NULL END)::int
FROM generate_series(array_lower($1, 1),
array_upper($1, 1)) i;
$$
LANGUAGE 'sql' IMMUTABLE STRICT;

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] obtaining ARRAY position for a given match

2009-11-19 Thread Pavel Stehule
2009/11/19 Scott Bailey arta...@comcast.net:
 Pedro Doria Meunier wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Hi,

 I'm trying to get the array position for a given match as thus:

 This gets me the record for a given match:
 SELECT *  FROM garmin_units WHERE 'L' = ANY (protocol_tag);

 Ok. so far so good...
 But what about getting the array position at which 'L' is stored?

 Searching the Postgresql docs gives me no answer... :(
 Am I on a wild goose chase?

 Any insight highly appreciated ;)

 BR,
 Pedro Doria Meunier.
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.9 (GNU/Linux)
 Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

 iEYEARECAAYFAksFP6IACgkQ2FH5GXCfxAuasgCgu/d68fkg16r1OF/2QSLnmwhW
 gjYAniyQ1Mn/72323NSznxgakF4dn98k
 =tWbI
 -END PGP SIGNATURE-

 I wrote this a while ago. If you are on 8.4 use unnest instead. And if you
 are searching thru really big arrays, use plpgsql so you can terminate when
 you find a match.

 CREATE OR REPLACE FUNCTION idx(text[], text)
 RETURNS int AS
 $$
 SELECT MIN(CASE WHEN $1[i] = $2 THEN i
 ELSE NULL END)::int
 FROM generate_series(array_lower($1, 1),
 array_upper($1, 1)) i;
 $$
 LANGUAGE 'sql' IMMUTABLE STRICT;


Hello

it should be little bit more effective:

CREATE OR REPLACE FUNCTION idx(anyarray, anyelement)
RETURNS int AS $$
SELECT i
   FROM generate_series(array_lover($1,1),array_upper($1,1)) g(i)
  WHERE $1[i] = $2
  UNION ALL
  SELECT 0  -- return 0 as not found
  LIMIT 1; -- stop after first match
$$ LANGUAGE sql;

Regards
Pavel Stehule

 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] obtaining ARRAY position for a given match

2009-11-19 Thread Sam Mason
On Thu, Nov 19, 2009 at 05:24:33PM +0100, Pavel Stehule wrote:
 it should be little bit more effective:

I'm not sure if it will be much more; when you put a set returning
function into a FROM clause PG will always run the function to
completion---as far as I know, but I've only got 8.3 for testing at the
moment.  I'm also not sure why you want to return zero when you don't
find the element.  The code also exploits an implementation artifact of
PG that the zero (i.e. the RHS of your UNION ALL) will be after the
real index.

This raises a small and interesting optimization for PG, when it does
the plan it could notice that a UNION ALL followed by a LIMIT won't need
to return all rows and hence it may be better to run the quicker one
first.  Or would this end up breaking more code than it helps?

 CREATE OR REPLACE FUNCTION idx(anyarray, anyelement)
 RETURNS int AS $$
 SELECT i
FROM generate_series(array_lover($1,1),array_upper($1,1)) g(i)

Quality typo :)  ^^^

   WHERE $1[i] = $2
   UNION ALL
   SELECT 0  -- return 0 as not found
   LIMIT 1; -- stop after first match
 $$ LANGUAGE sql;

I'd do something like:

  CREATE OR REPLACE FUNCTION firstidx(anyarray, anyelement)
  RETURNS int AS $$
SELECT i FROM (
  SELECT generate_series(array_lower($1,1),array_upper($1,1))) g(i)
WHERE $1[i] = $2
LIMIT 1;
  $$ LANGUAGE sql IMMUTABLE;

You can replace the call to array_upper with some large number to check
either function's behavior with large arrays.

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] obtaining ARRAY position for a given match

2009-11-19 Thread Scott Bailey

   FROM generate_series(array_lover($1,1),array_upper($1,1)) g(i)


Pavel,

Don't get me wrong, I enjoy coding, but I think you've taken it too far 
here ;)


Yes, definitely more effective for large arrays. Thanks. Would probably 
be a good snippet for the wiki.


Scott



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] obtaining ARRAY position for a given match

2009-11-19 Thread Merlin Moncure
On Thu, Nov 19, 2009 at 12:15 PM, Scott Bailey arta...@comcast.net wrote:
   FROM generate_series(array_lover($1,1),array_upper($1,1)) g(i)

 Pavel,

 Don't get me wrong, I enjoy coding, but I think you've taken it too far here
 ;)

 Yes, definitely more effective for large arrays. Thanks. Would probably be a
 good snippet for the wiki.

this was actually a pretty typical solution to dealing with arrays
until we got 'unnest()'.  See information_schema._pg_expand_array for
example.

I need this functionality quite often...maybe we could use a version
of unnest that works like that  (returns idx, elem)?  It would be a
small efficiency win over generate_series based approaches.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] obtaining ARRAY position for a given match

2009-11-19 Thread Scott Bailey

Sam Mason wrote:

On Thu, Nov 19, 2009 at 05:24:33PM +0100, Pavel Stehule wrote:

it should be little bit more effective:


I'm not sure if it will be much more; when you put a set returning
function into a FROM clause PG will always run the function to
completion---as far as I know, but I've only got 8.3 for testing at the
moment.  I'm also not sure why you want to return zero when you don't
find the element.  The code also exploits an implementation artifact of
PG that the zero (i.e. the RHS of your UNION ALL) will be after the
real index.

This raises a small and interesting optimization for PG, when it does
the plan it could notice that a UNION ALL followed by a LIMIT won't need
to return all rows and hence it may be better to run the quicker one
first.  Or would this end up breaking more code than it helps?


CREATE OR REPLACE FUNCTION idx(anyarray, anyelement)
RETURNS int AS $$
SELECT i
   FROM generate_series(array_lover($1,1),array_upper($1,1)) g(i)


Quality typo :)  ^^^


  WHERE $1[i] = $2
  UNION ALL
  SELECT 0  -- return 0 as not found
  LIMIT 1; -- stop after first match
$$ LANGUAGE sql;


I'd do something like:

  CREATE OR REPLACE FUNCTION firstidx(anyarray, anyelement)
  RETURNS int AS $$
SELECT i FROM (
  SELECT generate_series(array_lower($1,1),array_upper($1,1))) g(i)
WHERE $1[i] = $2
LIMIT 1;
  $$ LANGUAGE sql IMMUTABLE;

You can replace the call to array_upper with some large number to check
either function's behavior with large arrays.


I agree that it should return null when the item is not found. So I 
tested both and Sam is correct. His function performs the same whether 
there are 500 elements or 50,000.


We had an idx() function in the _int contrib module. I wonder if it 
would be useful to write this in C now that _int is deprecated?


Scott



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] obtaining ARRAY position for a given match

2009-11-19 Thread Sam Mason
On Thu, Nov 19, 2009 at 09:46:42AM -0800, Scott Bailey wrote:
 We had an idx() function in the _int contrib module. I wonder if it 
 would be useful to write this in C now that _int is deprecated?

Is idx really the best name for this? there could be multiple
occurrences of a value in an array (i.e. it's not a set) and hence why I
used firstidx for the function name.  If it's replacing an existing
function, then compatibility is a good reason.

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] obtaining ARRAY position for a given match

2009-11-19 Thread Scott Bailey



this was actually a pretty typical solution to dealing with arrays
until we got 'unnest()'.  See information_schema._pg_expand_array for
example.


Oh I know. I was just having a laugh at the array_lover function. Now 
that I think about it, we could replace array_agg() with array_orgy() 
and unnest() with oh_crap_her_husband_is_home()... I'll probably get in 
trouble for that.


Scott

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] obtaining ARRAY position for a given match

2009-11-19 Thread Sam Mason
On Thu, Nov 19, 2009 at 12:43:38PM -0500, Merlin Moncure wrote:
 we could use a version
 of unnest that works like that  (returns idx, elem)?  It would be a
 small efficiency win over generate_series based approaches.

What would idx look like for multidimensional arrays?  I think PG
needs a sensible way of dealing with these before you attempt to modify
unnest.

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] obtaining ARRAY position for a given match

2009-11-19 Thread Pavel Stehule
2009/11/19 Sam Mason s...@samason.me.uk:
 On Thu, Nov 19, 2009 at 05:24:33PM +0100, Pavel Stehule wrote:
 it should be little bit more effective:

 I'm not sure if it will be much more; when you put a set returning
 function into a FROM clause PG will always run the function to
 completion---as far as I know, but I've only got 8.3 for testing at the
 moment.

yes, but generate_series is very cheap, this is protection before not
necessary string equation.

 I'm also not sure why you want to return zero when you don't
 find the element.  The code also exploits an implementation artifact of
 PG that the zero (i.e. the RHS of your UNION ALL) will be after the
 real index.


this is only convention. Somebody like 0 or -1 as result. Somebody can
live with NULL.

 This raises a small and interesting optimization for PG, when it does
 the plan it could notice that a UNION ALL followed by a LIMIT won't need
 to return all rows and hence it may be better to run the quicker one
 first.  Or would this end up breaking more code than it helps?

 CREATE OR REPLACE FUNCTION idx(anyarray, anyelement)
 RETURNS int AS $$
 SELECT i
    FROM generate_series(array_lover($1,1),array_upper($1,1)) g(i)

 Quality typo :)                  ^^^

   WHERE $1[i] = $2
   UNION ALL
   SELECT 0  -- return 0 as not found
   LIMIT 1; -- stop after first match
 $$ LANGUAGE sql;

 I'd do something like:

  CREATE OR REPLACE FUNCTION firstidx(anyarray, anyelement)
      RETURNS int AS $$
    SELECT i FROM (
      SELECT generate_series(array_lower($1,1),array_upper($1,1))) g(i)
    WHERE $1[i] = $2
    LIMIT 1;
  $$ LANGUAGE sql IMMUTABLE;

 You can replace the call to array_upper with some large number to check
 either function's behavior with large arrays.


your code is very very exactly same as my code. First - there are
flattening stage. So if you don't use offset 0, then subselect is
transformed to select. I am not sure, if offset 0 should help here -
it have to do a materialisation (5ms for 1 items) more. This
function is relative fast:

postgres=# select idx(array(select generate_series(1,1)),1);
  idx
---
 1
(1 row)

Time: 40.070 ms

maybe - I cannot test it - there could be code

CREATE OR REPLACE FUNCTION idx(anyarray, anyelement)
RETURNS int AS $$
SELECT i
   FROM (SELECT generate_subscripts($1) as i, unnest($1) as v) s
  WHERE v = $2
  LIMIT 1;
$$ LANGUAGE sql;

but I am sure so C code should be faster

Regards
Pavel Stehule

 --
  Sam  http://samason.me.uk/

 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] obtaining ARRAY position for a given match

2009-11-19 Thread Scott Bailey

Sam Mason wrote:

On Thu, Nov 19, 2009 at 09:46:42AM -0800, Scott Bailey wrote:
We had an idx() function in the _int contrib module. I wonder if it 
would be useful to write this in C now that _int is deprecated?


Is idx really the best name for this? there could be multiple
occurrences of a value in an array (i.e. it's not a set) and hence why I
used firstidx for the function name.  If it's replacing an existing
function, then compatibility is a good reason.


Well I used idx() because there was already a idx(int[], int) function 
with the _int contrib module. Obviously index is out of the question. 
In other languages, it is assumed you are looking for the first index. 
Some allow you to specify an offset to begin searching at. And some 
provide another function to get the last index of element.


Here is what other languages are using for similar concept.
PHP - array_search()
Python - index()
Ruby - index()/rindex()
Java - binarySearch()
JavaScript/ActionScript - indexOf()
MySQL - find_in_set()

Scott



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] obtaining ARRAY position for a given match

2009-11-19 Thread Sam Mason
On Thu, Nov 19, 2009 at 10:47:02AM -0800, Scott Bailey wrote:
 Sam Mason wrote:
 Is idx really the best name for this?

 Well I used idx() because there was already a idx(int[], int) function 
 with the _int contrib module.

I don't remember ever using that before, hence my question.

 In other languages, it is assumed you are looking for the first index. 

Huh, they seem to don't they, even my old stalwart of pedantry, Haskell,
follows form here.  Not sure why I'd never noticed before, idx is
looking more and more sensible!

-- 
  Sam  http://samason.me.uk/

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] obtaining ARRAY position for a given match

2009-11-19 Thread Pavel Stehule

 CREATE OR REPLACE FUNCTION idx(anyarray, anyelement)
 RETURNS int AS $$
 SELECT i
   FROM (SELECT generate_subscripts($1) as i, unnest($1) as v) s
  WHERE v = $2
  LIMIT 1;
 $$ LANGUAGE sql;


there is bug

correct is

create or replace function idx(anyarray, anyelement)
returns int as $$
select i
   from (select generate_subscripts($1,1) i, unnest($1) v) s
  where v = $2
  limit 1
$$ language sql;

and it is 5% faster than older version

on integer array on my eeepc

Regards
Pavel

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general