[SQL] Need help building this query

2012-06-21 Thread Rihad
Hi, folks. I currently need to join two tables that lack primary keys, 
and columns used to distinguish each record can be duplicated. I need to 
build statistics over the data in those tables. Consider this:



TableA:
row 1: foo: 123, bar: 456, baz: 789, amount: 10.99, date_of_op: date
row 2: foo: 123, bar: 456, baz: 789, amount: 10.99, date_of_op: date
row 3: foo: 123, bar: 456, baz: 789, amount: 10.99, date_of_op: date

TableB:
row 1: foo: 123, bar: 456, baz: 789, amount: 10.99, date_of_op: date


Columns foo + bar + baz are used to distinguish a performed "operation":
TableA.date_of_op isn't, because it can lag behind TableB.

Not all different "operations" are in table B.
Table B is just there so we know which "operations" are complete, so to 
speak (happening under external means and not under any of my control).


Now, for each operation (foo+bar+baz) in table A, only *one* row should 
be matched in table B, because it only has one matching row there.

The other two in TableA should be considered unmatched.

Now the query should be able to get count(*) and sum(amount) every day 
for that day, considering that matched and unmatched operations should 
be counted separately. The report would look something like this:


TableA.date_of_op  TableB.date_of_op
2012-06-21[empty]  [count(*) and sum(amount) 
of all data in TableA for this day unmatched in TableB]
2012-06-212012-06-20[count(*) and sum(amount) of 
all data in TableA matched in TableB for the 20-th]
2012-06-212012-06-19[count(*) and sum(amount) of 
all data in TableA matched in TableB for the 19-th]



Can this awkward thing be done in pure SQL, or I'd be better off using 
programming for this?


Thanks, I hope I could explain this.

--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Need help building this query

2012-06-21 Thread rihad
> For matching triples (foo, bar, baz) the date in table B shouldnt always
> be
> after any date in table A, as table B contains complete operations?
>
Operations in Table B can usually be obtained a day after, when Table A
gets the updates. Table B contains the physical date of operation. The
date in Table A are today's dates because this is how the reports feeding
them is generated. Those are only dates, no time parts. Date_of_op is
actually a settlement date (banking term) and TableA is more recent than
TableB for reasons irrelevant to the problem (and which suck).



>
> Best,
> Oliver
>
> - Original Message -
> From: "Rihad" 
> To: 
> Sent: Thursday, June 21, 2012 6:48 PM
> Subject: [SQL] Need help building this query
>
>
>> Hi, folks. I currently need to join two tables that lack primary keys,
>> and
>> columns used to distinguish each record can be duplicated. I need to
>> build
>> statistics over the data in those tables. Consider this:
>>
>>
>> TableA:
>> row 1: foo: 123, bar: 456, baz: 789, amount: 10.99, date_of_op: date
>> row 2: foo: 123, bar: 456, baz: 789, amount: 10.99, date_of_op: date
>> row 3: foo: 123, bar: 456, baz: 789, amount: 10.99, date_of_op: date
>>
>> TableB:
>> row 1: foo: 123, bar: 456, baz: 789, amount: 10.99, date_of_op: date
>>
>>
>> Columns foo + bar + baz are used to distinguish a performed "operation":
>> TableA.date_of_op isn't, because it can lag behind TableB.
>>
>> Not all different "operations" are in table B.
>> Table B is just there so we know which "operations" are complete, so to
>> speak (happening under external means and not under any of my control).
>>
>> Now, for each operation (foo+bar+baz) in table A, only *one* row should
>> be
>> matched in table B, because it only has one matching row there.
>> The other two in TableA should be considered unmatched.
>>
>> Now the query should be able to get count(*) and sum(amount) every day
>> for
>> that day, considering that matched and unmatched operations should be
>> counted separately. The report would look something like this:
>>
>> TableA.date_of_op  TableB.date_of_op
>> 2012-06-21[empty]  [count(*) and sum(amount)
>> of all data in TableA for this day unmatched in TableB]
>> 2012-06-212012-06-20[count(*) and sum(amount) of
>> all data in TableA matched in TableB for the 20-th]
>> 2012-06-212012-06-19[count(*) and sum(amount) of
>> all data in TableA matched in TableB for the 19-th]
>>
>>
>> Can this awkward thing be done in pure SQL, or I'd be better off using
>> programming for this?
>>
>> Thanks, I hope I could explain this.
>>
>> --
>> Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-sql
>>
>
>
> --
> Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql
>



-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Need help building this query

2012-06-21 Thread rihad
> You seem to be describing a straight reconciliation between two tables.
> My
> current means of doing this are programmatically but for the simple case
> pure SQL should be doable.  The main thing is that you have to distinguish
> between "duplicate" records first and then match them up:
>
> TableA Keys:
>
> AA
> AA
> AA
> AB
> AB
> AC
>
> TableB Keys:
> AA
> AA
> AB
>
> First you use "ROW_NUMBER() OVER (PARTITION BY key)" to assign an integer
> "sub-id" to every set of possible keys in both tables:
>
> TableA-Sub:
> AA-1
> AA-2
> AA-3
> AB-1
> AB-2
> AC-1
>
> TableB-Sub:
> AA-1
> AA-2
> AB-1
>
> Now, with these newly constructed key+sub-key values in place, you can
> perform a simple LEFT (or possibly FULL) JOIN between tables A & B.
>
> This makes no allowances for any of kind of desired date restriction on
> the
> matching nor does it consider the eventual report that you wish to
> generate.
> What this gives you is a listing of ALL rows in both tables with matched
> records joined together into a single (NULL-less) row while unmatched
> records will have one of the two resultant columns NULLed
>
> SELECT tableA.subid_a, tableB.subid_b
> FROM tableA FULL OUTER JOIN tableB ON (tableA.subid_a = tableB.subid_b)
>
> Requires at least version 8.4
>

This looks really promising, thanks. I'll read more about it (WINDOWING in
general) and see how it fits our model. This might allow me to distinguish
"operation" rows that exist in both TableA and TableB, and whether TableA
has more of them, in which case they would be marked unmatched. Using
sub-selects this seems doable.


-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Need help building this query

2012-06-21 Thread rihad
> You seem to be describing a straight reconciliation between two tables.
> My
> current means of doing this are programmatically but for the simple case
> pure SQL should be doable.  The main thing is that you have to distinguish
> between "duplicate" records first and then match them up:
>
> TableA Keys:
>
> AA
> AA
> AA
> AB
> AB
> AC
>
> TableB Keys:
> AA
> AA
> AB
>
> First you use "ROW_NUMBER() OVER (PARTITION BY key)" to assign an integer
> "sub-id" to every set of possible keys in both tables:
>
> TableA-Sub:
> AA-1
> AA-2
> AA-3
> AB-1
> AB-2
> AC-1
>
> TableB-Sub:
> AA-1
> AA-2
> AB-1
>
> Now, with these newly constructed key+sub-key values in place, you can
> perform a simple LEFT (or possibly FULL) JOIN between tables A & B.
>
> This makes no allowances for any of kind of desired date restriction on
> the
> matching nor does it consider the eventual report that you wish to
> generate.
> What this gives you is a listing of ALL rows in both tables with matched
> records joined together into a single (NULL-less) row while unmatched
> records will have one of the two resultant columns NULLed
>
> SELECT tableA.subid_a, tableB.subid_b
> FROM tableA FULL OUTER JOIN tableB ON (tableA.subid_a = tableB.subid_b)
>
> Requires at least version 8.4

This looks really promising, thanks. I'll read more about it (WINDOWING in
general) and see how it fits our model. This might allow me to distinguish
"operation" rows that exist in both TableA and TableB, and whether TableA
has more of them, in which case they would be marked unmatched. Using
sub-selects this seems doable.



-- 
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql