Re: [sqlite] Break on cumulative sum

2013-03-03 Thread James K. Lowden
On Sun, 3 Mar 2013 07:19:11 -0600
"Michael Black"  wrote:

> One thing I'll point out...the trigger solution is almost linear
> performance and constant memory where the single SQL statement will
> suffer notably as time goes on in both those measures.  Also the
> single SQL statement doesn't seem like a good idea if the data is
> coming in real time. He never mentioned how big his table is going to
> be.

If TextField 'A' is deleted, doesn't the entire table have to be
updated?  Sure, that's O(n), but where n is the size of the table, not
the size of the update.  

Acknowledged, sometimes there are reasons to store derived results.
But that choice bears costs, both in updating the database and in
engineering it, costs that can be avoided if they can be avoided.
Knuth said premature optimization is the root of all evil, and I
agree.  :-)  

Normalized data minimize the cost of manipulation.  That is why we
normalize our databases.  As a side-effect, normalized data permit more
efficient use of memory, benefiting SELECT, too.  

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Break on cumulative sum

2013-03-03 Thread James K. Lowden
On Sun, 3 Mar 2013 18:50:35 +0100
Petite Abeille  wrote:

> Ah? if only? SQLite had analytical functions? oh, well..

What do you have in mind?  I've benn adding some user defined functions
and am thinking of creating a repository for them.  

--jkl

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Break on cumulative sum

2013-03-03 Thread Petite Abeille

On Mar 3, 2013, at 2:10 PM, "James K. Lowden"  wrote:

> There's some cruft, too.  I was only demonstrating that it could be
> done.  If you find a way to simplify it, you'll know you understand
> it.  

Ah… if only… SQLite had analytical functions… oh, well..

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Break on cumulative sum

2013-03-03 Thread Michael Black
One thing I'll point out...the trigger solution is almost linear performance
and constant memory where the single SQL statement will suffer notably as
time goes on in both those measures.  Also the single SQL statement doesn't
seem like a good idea if the data is coming in real time.
He never mentioned how big his table is going to be.

-Original Message-
From: sqlite-users-boun...@sqlite.org
[mailto:sqlite-users-boun...@sqlite.org] On Behalf Of James K. Lowden
Sent: Sunday, March 03, 2013 7:10 AM
To: sqlite-users@sqlite.org
Subject: Re: [sqlite] Break on cumulative sum

On Sun, 3 Mar 2013 11:46:26 +0100
Gert Van Assche  wrote:

> My SQL knowledge is not yet good enough to understand what you
> proposed, but I hope one day that I will be able to understand and
> replicate myself the code you wrote. I started of with a join on the
> table itself as well, but I got stuck. I'll study your code and I'll
> grow.

Happy to be service, Gert.  

The trick is always the same: get your ducks in a row and shoot.  Figure
out the formula that computes the answer you want, and put the
constituent parts on the row, then compute.  

The first order of business in your case was a running sum.  That's
done by adding up everything "before" the current row, where "before"
means "less than" something, however ordered.   

select   a.*, sum(b.ValueField) as total
from Test as a join Test as b
on a.TextField >= b.TextField
group by a.TextField
, a.ValueField
, a.CumulativeValue
, a.BreakPoint

For each row in "a", get all the rows in "b" whose TextField is less
than (or equal to, here) to a's.  Here it is, live and in color: 

$ sqlite3 -echo db < self.sql 
select   a.TextField
, a.ValueField
, a.BreakPoint
 , sum(b.ValueField) as total
from Test as a join Test as b
on a.TextField >= b.TextField
group by a.TextField
, a.ValueField
, a.CumulativeValue
, a.BreakPoint
;
TextField   ValueField  BreakPoint  total 
--  --  --  --
A   2   10  2 
B   3   10  5 
C   2   10  7 
D   4   10  11
E   5   10  16
F   1   10  17
G   1   10  18
H   5   10  23
I   11  10  34
J   8   10  42
K   2   10  44

That formed my building block.  

Your formula is almost

GN = 1 + total / Breakpoint

By using integer arithmetic, I could divide "total" by BreakPoint to
get a multiple, which is almost what you call a GroupName, except for
that "one past" rule I remarked on, where for example 11 is part of
the first group.  But by joining my block to itself, offset by one, we
can slide the multiplier down a row.  

For CumulativeValue, your formula is 

CV = total - [max total of previous GN]

That requires computing the GN, grouping on it, and finding the maximum
total for it, and joining that result back to the main one, where it
can be subtracted from the main total.  

So, yes, there are a lot of joins.  But I didn't write the query
top-down, and you shouldn't read it that way.  Read it inside-out.
Each piece can be separately evaluated, and each operand used in the
calculation can be exposed in the SELECT clause.  

There's some cruft, too.  I was only demonstrating that it could be
done.  If you find a way to simplify it, you'll know you understand
it.  

Regards, 

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Break on cumulative sum

2013-03-03 Thread James K. Lowden
On Sun, 3 Mar 2013 11:46:26 +0100
Gert Van Assche  wrote:

> My SQL knowledge is not yet good enough to understand what you
> proposed, but I hope one day that I will be able to understand and
> replicate myself the code you wrote. I started of with a join on the
> table itself as well, but I got stuck. I'll study your code and I'll
> grow.

Happy to be service, Gert.  

The trick is always the same: get your ducks in a row and shoot.  Figure
out the formula that computes the answer you want, and put the
constituent parts on the row, then compute.  

The first order of business in your case was a running sum.  That's
done by adding up everything "before" the current row, where "before"
means "less than" something, however ordered.   

select   a.*, sum(b.ValueField) as total
from Test as a join Test as b
on a.TextField >= b.TextField
group by a.TextField
, a.ValueField
, a.CumulativeValue
, a.BreakPoint

For each row in "a", get all the rows in "b" whose TextField is less
than (or equal to, here) to a's.  Here it is, live and in color: 

$ sqlite3 -echo db < self.sql 
select   a.TextField
, a.ValueField
, a.BreakPoint
 , sum(b.ValueField) as total
from Test as a join Test as b
on a.TextField >= b.TextField
group by a.TextField
, a.ValueField
, a.CumulativeValue
, a.BreakPoint
;
TextField   ValueField  BreakPoint  total 
--  --  --  --
A   2   10  2 
B   3   10  5 
C   2   10  7 
D   4   10  11
E   5   10  16
F   1   10  17
G   1   10  18
H   5   10  23
I   11  10  34
J   8   10  42
K   2   10  44

That formed my building block.  

Your formula is almost

GN = 1 + total / Breakpoint

By using integer arithmetic, I could divide "total" by BreakPoint to
get a multiple, which is almost what you call a GroupName, except for
that "one past" rule I remarked on, where for example 11 is part of
the first group.  But by joining my block to itself, offset by one, we
can slide the multiplier down a row.  

For CumulativeValue, your formula is 

CV = total - [max total of previous GN]

That requires computing the GN, grouping on it, and finding the maximum
total for it, and joining that result back to the main one, where it
can be subtracted from the main total.  

So, yes, there are a lot of joins.  But I didn't write the query
top-down, and you shouldn't read it that way.  Read it inside-out.
Each piece can be separately evaluated, and each operand used in the
calculation can be exposed in the SELECT clause.  

There's some cruft, too.  I was only demonstrating that it could be
done.  If you find a way to simplify it, you'll know you understand
it.  

Regards, 

--jkl
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Break on cumulative sum

2013-03-03 Thread Gert Van Assche
James,

I'm very impressed by your solution. My SQL knowledge is not yet good
enough to understand what you proposed, but I hope one day that I will be
able to understand and replicate myself the code you wrote. I started of
with a join on the table itself as well, but I got stuck. I'll study your
code and I'll grow.
That is the beauty of groups like this: hobby developers like me get advice
from experts on problems we're facing. Very helpful, not just the code, but
also the on-problem training!
Thanks for your work, all!

gert

2013/3/3 James K. Lowden 

> On Sat, 2 Mar 2013 14:46:40 +0100
> Gert Van Assche  wrote:
>
> > All, I don't know how to achieve this: I need to put the cumulative
> > sum in a field, and create a group as soon as that cumulative sum is
> > over a breakpoint value (10).
>
> This was a bit of a challenge because the group definition is IMO a
> little strange: The first value *after* the breakpoint is included in
> the group.  That made the SQL more convoluted than it would be
> otherwise.
>
> That said, the attached SQL does the job in one statement.  No triggers
> are needed.
>
> $ sqlite3 db < running_sum.sql
> TextField   ValueField  CumulativeValue  BreakPoint  GroupName
> --  --  ---  --  --
> A   2   210  1
> B   3   510  1
> C   2   710  1
> D   4   11   10  1
> E   5   510  2
> F   1   610  2
> G   1   710  2
> H   5   12   10  2
> I   11  11   10  3
> J   8   810  4
> K   2   10   10  4
>
> This lets you define the results in terms of the base data instead of
> keeping a table of derived values.  No updates required, just a view.
> Unless you have data-warehouse rowcounts or a very limited machine, it
> should perform just fine, despite what one might guess, because the I/O
> is limited to one table.  It should also outperform any per-row
> solution.
>
> Two bits of advice if I may.
>
> Be careful of using triggers to maintain data consistency.  Triggers
> are very good for enforcing referential integrity in situations that
> DRI can't handle. Beyond that they get very complex and error prone.
>
> Using triggers to compute derived values signals redundancy in the
> database.  That redundancy often is not as desirable as the
> denormalize-for-performance crowd thinks.  Better to compute derived
> values on demand (as in a view).  If that's demonstrated to be too
> slow, a periodic update in a cron job or similar should provide better
> thoughput than recomputing on every insert/update/delete.  You'll also
> do yourself a favor by segregating derived values in a different table
> that can be regenerated at will.
>
> My other suggestion is to eschew abstract nouns in column names.  Words
> like data, field, and value add no meaning.  After all, it's a
> database.  If it's not a value, what is it?
>
> This table,
>
> CREATE TABLE [Test]
> ( Name
> , Value
> , RunningSum
> , Breakpoint /* (one word, small 'p') */ DEFAULT 10
> , GroupName
> , primary key (Name)
> );
>
> is at least  as clear, wouldn't you say?
>
> HTH.
>
> --jkl
>
> P.S. In case the listserv strips attachments, here it is in plain
> text.
>
> [SQL]
> select   A.TextField, A.ValueField
>, A.total - coalesce(C.GroupTotal, 0) as CumulativeValue
>, A.Breakpoint
> --   , coalesce(C.GroupTotal, 0) as GroupTotal
>, coalesce(B.GroupName, A.GroupName) as GroupName
> from (
> select   a.*, sum(b.ValueField) as total
>, 1 + sum(b.ValueField) / (1+a.Breakpoint) as GroupName
> from Test as a join Test as b
> on a.TextField >= b.TextField
> group by a.TextField
> , a.ValueField
> , a.CumulativeValue
> , a.BreakPoint
> ) as A
> left join (
> select   a.*, sum(b.ValueField) as total
>, 1 + sum(b.ValueField) / (1 + a.Breakpoint) as GroupName
> from Test as a join Test as b
> on a.TextField >= b.TextField
> group by a.TextField
> , a.ValueField
> , a.CumulativeValue
> , a.BreakPoint
> ) as B
> on B.TextField = (select max(TextField) from Test where TextField <
> A.TextField) left join (
> select1 + T.GroupName as GroupName
> , max(T.total) as GroupTotal
> from (
> selectA.TextField
> , A.ValueField
> , A.total
> , 

Re: [sqlite] Break on cumulative sum

2013-03-03 Thread James K. Lowden
On Sat, 2 Mar 2013 14:46:40 +0100
Gert Van Assche  wrote:

> All, I don't know how to achieve this: I need to put the cumulative
> sum in a field, and create a group as soon as that cumulative sum is
> over a breakpoint value (10).

This was a bit of a challenge because the group definition is IMO a
little strange: The first value *after* the breakpoint is included in
the group.  That made the SQL more convoluted than it would be
otherwise.  

That said, the attached SQL does the job in one statement.  No triggers
are needed.  

$ sqlite3 db < running_sum.sql
TextField   ValueField  CumulativeValue  BreakPoint  GroupName 
--  --  ---  --  --
A   2   210  1 
B   3   510  1 
C   2   710  1 
D   4   11   10  1 
E   5   510  2 
F   1   610  2 
G   1   710  2 
H   5   12   10  2 
I   11  11   10  3 
J   8   810  4 
K   2   10   10  4 

This lets you define the results in terms of the base data instead of
keeping a table of derived values.  No updates required, just a view.
Unless you have data-warehouse rowcounts or a very limited machine, it
should perform just fine, despite what one might guess, because the I/O
is limited to one table.  It should also outperform any per-row
solution.  

Two bits of advice if I may.  

Be careful of using triggers to maintain data consistency.  Triggers
are very good for enforcing referential integrity in situations that
DRI can't handle. Beyond that they get very complex and error prone.  

Using triggers to compute derived values signals redundancy in the
database.  That redundancy often is not as desirable as the
denormalize-for-performance crowd thinks.  Better to compute derived
values on demand (as in a view).  If that's demonstrated to be too
slow, a periodic update in a cron job or similar should provide better
thoughput than recomputing on every insert/update/delete.  You'll also
do yourself a favor by segregating derived values in a different table
that can be regenerated at will.  

My other suggestion is to eschew abstract nouns in column names.  Words
like data, field, and value add no meaning.  After all, it's a
database.  If it's not a value, what is it?  

This table, 

CREATE TABLE [Test] 
( Name
, Value
, RunningSum
, Breakpoint /* (one word, small 'p') */ DEFAULT 10
, GroupName
, primary key (Name)
); 

is at least  as clear, wouldn't you say?  

HTH.  

--jkl

P.S. In case the listserv strips attachments, here it is in plain
text.  

[SQL]
select   A.TextField, A.ValueField
   , A.total - coalesce(C.GroupTotal, 0) as CumulativeValue
   , A.Breakpoint
--   , coalesce(C.GroupTotal, 0) as GroupTotal
   , coalesce(B.GroupName, A.GroupName) as GroupName
from (
select   a.*, sum(b.ValueField) as total
   , 1 + sum(b.ValueField) / (1+a.Breakpoint) as GroupName
from Test as a join Test as b
on a.TextField >= b.TextField
group by a.TextField
, a.ValueField
, a.CumulativeValue
, a.BreakPoint
) as A
left join (
select   a.*, sum(b.ValueField) as total
   , 1 + sum(b.ValueField) / (1 + a.Breakpoint) as GroupName
from Test as a join Test as b
on a.TextField >= b.TextField
group by a.TextField
, a.ValueField
, a.CumulativeValue
, a.BreakPoint
) as B
on B.TextField = (select max(TextField) from Test where TextField <
A.TextField) left join (
select1 + T.GroupName as GroupName
, max(T.total) as GroupTotal
from (
selectA.TextField
, A.ValueField
, A.total
, coalesce(B.GroupName, 1) as GroupName
from (
select   a.*, sum(b.ValueField) as total
   , 1 + sum(b.ValueField) / (1 +
a.Breakpoint) as GroupName from Test as a join Test as b
on a.TextField >= b.TextField
group by  a.TextField
, a.ValueField
, a.CumulativeValue
, a.BreakPoint
) as A
left join (
select   a.*, sum(b.ValueField) as total
   , 1 + sum(b.ValueField) / (1 +