Re: [GENERAL] N-tile function in postgres

2012-09-24 Thread Rachel Owsley
Thank you, François!! Got it. :)


From: François Beausoleil [mailto:franc...@teksol.info]
Sent: Monday, September 24, 2012 3:37 PM
To: Rachel Owsley
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] N-tile function in postgres


Le 2012-09-24 à 14:12, Rachel Owsley a écrit :


Thank you, François! This is very helpful! I'll give this query a try. I don't 
know the cross-tab function, but that's exactly what I want to do for the 
column output. Regarding the sample query, I see the min (amount), but how is 
the upper bound defined for each decile?

ntile() splits the output in as even partitions as possible. If you have 13 
rows, and you want 10 output rows, then each row will receive something like 
this:

# select id, ntile(10) over () from generate_series(1, 13) as t1(id);
 id | ntile
+---
  1 | 1
  2 | 1
  3 | 2
  4 | 2
  5 | 3
  6 | 3
  7 | 4
  8 | 5
  9 | 6
 10 | 7
 11 | 8
 12 | 9
 13 |10

The ntile() function isn't tied to the values at all: only to the actual number 
of rows. I used min(amount) to get the minimal value per group, but you can use 
use max(amount) to get the other end as well.

Bye!
François


Re: [GENERAL] N-tile function in postgres

2012-09-24 Thread Rachel Owsley
Thank you, François! This is very helpful! I'll give this query a try. I don't 
know the cross-tab function, but that's exactly what I want to do for the 
column output. Regarding the sample query, I see the min (amount), but how is 
the upper bound defined for each decile?

Thanks,

Rachel

From: François Beausoleil [mailto:franc...@teksol.info]
Sent: Monday, September 24, 2012 12:57 PM
To: Rachel Owsley
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] N-tile function in postgres


Le 2012-09-24 à 12:32, Rachel Owsley a écrit :


Hi,

Can anyone help me with an aggregate query I am having trouble with?

I want to get the top 5 or top 10 most frequently shopped in merchant 
categories for each account holder at a bank and put each of the 
quintiles/deciles into separate columns. I would also like to put the average 
transaction amount for each of those top 5-10 categories into separate columns, 
and the date of the last transaction in each of those 5 to 10 categories into 
separate columns. I am told that ntile may be an option for doing this, but 
can't find any examples for using it in the documentation.

We use Postgresql 9.1.

Thank you so much for your help.

Rachel

Hi!

Look at the tablefunc extension to do cross tabulation. The crosstab family of 
functions turn a series of rows into columns. Something like this:

a | 1
b | 2

a  |  b
1  |  2

It obviously works with more columns. That would take care of the final part of 
your query.

I've never used ntile() myself, but the docs say it returns 1 to the value. 
Then you may want the min/max amount per decile to extract the values you want. 
Something like this (untested, made up schema):

WITH raw_values(
SELECT

account_id
  , merchant_category_id
  , amount

FROM transactions
  JOIN merchants USING (merchant_id))

, partitioned_sales AS (
SELECT
account_id
  , merchant_category_id
  , ntile(10) over (partition by account_id, merchant_category_id order by 
amount) as "partition"
  , min(amount) over (partition by account_id, merchant_category_id order by 
amount) as amount
FROM raw_values)

SELECT *
FROM partitioned_sales
ORDER BY account_id, merchant_category_id, partition, amount

Hope that helps!
François


[GENERAL] N-tile function in postgres

2012-09-24 Thread Rachel Owsley
Hi,

Can anyone help me with an aggregate query I am having trouble with?

I want to get the top 5 or top 10 most frequently shopped in merchant 
categories for each account holder at a bank and put each of the 
quintiles/deciles into separate columns. I would also like to put the average 
transaction amount for each of those top 5-10 categories into separate columns, 
and the date of the last transaction in each of those 5 to 10 categories into 
separate columns. I am told that ntile may be an option for doing this, but 
can't find any examples for using it in the documentation.

We use Postgresql 9.1.

Thank you so much for your help.

Rachel


Re: [GENERAL] General guidance: Levenshtein distance versus other similarity algorithms

2012-07-25 Thread Rachel Owsley
Thanks, Merlin. I will give that one a try. 

-Original Message-
From: Merlin Moncure [mailto:mmonc...@gmail.com] 
Sent: Wednesday, July 25, 2012 1:32 PM
To: Rachel Owsley
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] General guidance: Levenshtein distance versus other 
similarity algorithms

On Mon, Jul 23, 2012 at 11:55 AM, Rachel Owsley 
 wrote:
> Hi,
>
> I am hoping you can give me some guidance here. I'm using postgresql 9.1.
>
> Basically, I'm trying to create a query on a table of businesses that 
> will return all similar matches to a business name. This is a huge 
> table, and there is a lot of variation in names. The length of the 
> string can be up to 255. I've used regex, but there are always some 
> variations of the name that are missed when I do a regex. So I decided to 
> look at distance measures.
>
> Has anyone compared the fuzzstrmatch package to pgsimilarity?
>
> Would the levenshtein function in postgresql be the best way to go 
> here? If so, should I use levenshtein in the contribution package or 
> install the pgsimilarity package? Has anyone tried both implementations?

Another option that works with 9.1 is the pg_trgm module
(http://www.postgresql.org/docs/9.1/static/pgtrgm.html).   It works
very well for 9.1 and has the advantage of having built-in gist and gin 
operator support.

Can't speak on pg_similarity, haven't used it.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] General guidance: Levenshtein distance versus other similarity algorithms

2012-07-23 Thread Rachel Owsley
Hi,

I am hoping you can give me some guidance here. I'm using postgresql 9.1.

Basically, I'm trying to create a query on a table of businesses that will 
return all similar matches to a business name. This is a huge table, and there 
is a lot of variation in names. The length of the string can be up to 255. I've 
used regex, but there are always some variations of the name that are missed 
when I do a regex. So I decided to look at distance measures.

Has anyone compared the fuzzstrmatch package to pgsimilarity?


Would the levenshtein function in postgresql be the best way to go here? If so, 
should I use levenshtein in the contribution package or install the 
pgsimilarity package? Has anyone tried both implementations?

This would be my query:

Select * from table
WHERE levenshtein (column_name,'Name of the business')  <= 3
ORDER BY levenshtein (column_name, 'Name of the business')
Limit 10;

Thank you so much for your help.

Rachel