Garry,

> > > CREATE TABLE breeding (
> > >   rec SMALLINT(4) UNSIGNED AUTO_INCREMENT,
> > >   sire SMALLINT(4) UNSIGNED,
> > >   dam SMALLINT(4) UNSIGNED,
> > >   paired DATE,
> > >   split DATE,
> > >   num_offspring TINYINT(1) ZEROFILL,
> > >   PRIMARY KEY (rec)
> > > );
> > >
> > > First, we define if a pair was breeding for a given date by seeing if that
> > > date lies between the paired date and the split date. In mysql:
> > > SELECT FROM breeding WHERE TO_DAYS(paired) <= TO_DAYS('$given_date') AND
> > > TO_DAYS('$given_date') <= TO_DAYS(split);

=This seems part of the problem (as well as part of the solution). Procedural logic 
suggests that we should
iterate through the days of the month, taking note of the facts, and then perform the 
statistical calculation at
the end. Relational logic doesn't work like this, it requires us to process through 
the data in the table(s).
This table is not organised by date, indeed it notes periods of time, which are 
notoriously problematic to sort
into a 'date order'! However all is not lost...

> > There is no need to use TO_DAYS() because paired and split are both
> > dates so you can operate on them directly. I recommend that you take a
> > look at the manual's section on Time and Date Functions.
> You are right. I works without the TO_DAYS(). But I did get them from the
> manual :-)

=thanks for the note. It's difficult to know whether to risk insulting someone by 
saying "RTFM"!

=ironically I think you're going to end up right back there, but other functions of 
interest may include Month,
MonthName, Extract, Date_Sub, and even Timestamp.

=general advice would be to define the dates in PHP (with MySQL in mind), but keep all 
the date processing in
MySQL (if at all possible).

> > > Where it gets messy is this has to be put into a loop, between the given
> > > start and end dates (year and month only). For each day in the loop, I
> > > have to do a query and count the number returned to php. Then average that
> > > figure for the month.
> > So if I understand correctly, you are thinking in terms of a 'result
> > table' which would (ideally) have the days 1-28/29/30/31 in the
> > left-hand column, and the number of pairs who were put together on
> > that day (paired <= day/date <= split). Then at the bottom you want to
> > sum the right-hand column and divide it by the number of days in the
> > month to give some sort of average mating possibilities per day. Does
> > that sound right?
> Basically for statistical reasons I need average pairs per month, just a
> single number. But the months/years to be queried on would be input via a
> form. I was thinking of generating the data on-the-fly, but as you suggest
> above and from something Frank Flynn suggested, I would be better off
> having a separate table with monthly results. They could all be
> pre-calculated except the current month.

=I noted Frank's suggestion. Obviously no one wants to duplicate 'the calendar' 
unnecessarily, but by having a
calendar table, it does open up the possibility of processing through that table (from 
start-date to end-date)
and using relational logic to extract (date) 'related' rows from the breeding tbl. In 
many ways, when I have
finished, you will recognise that there is some attraction to that approach!

=You may have missed one of the points that I was making: what is the math behind this 
calculation? I'm going to
set up some sample data/an approach, but only talk about a 'week' rather than a 
'month' for brevity/ease of
comprehension... So if we have breeding pairs A through E, the breeding tbl 
(representative) looks like:

sire   dam   paired   split
Am   Af      Mon     Fri
Bm   Bf      Tue      Thu
Cm   Cf      Tue      Fri
Dm   Df      history  Sun
Em   Ef      Sat      future

where m=male, f=female, and we're only interested in the period Mon-Fri, so rows D and 
E are included only to
show exclusion (yes, that last clause does make logical sense...I think - they're 
included only so that they can
be excluded!)

=Our first thought is to think in terms of say "Monday", and to scan the breeding tbl 
to count how many pairs
were breeding that day, and make a note of the count; then move to "Tuesday" and do 
the same; finally summing
and averaging.

=Thus 'temporal logic' (working one day at a time) says:
Mon = 1 pair
Tue = 3 pairs
Wed = 3 pairs
Thu = 3 pairs
Fri = 2 pairs
Sum = 12
nDays = 5
Avg = 2.5

=Now, crucial question: have I understood the logic of the computation correctly? I 
shall blythly carry on
regardless...

=Ok, so now we're in a quandry because constructing a SQL statement to look at Monday, 
the Tuesday, ... is not
possible - enter Frank (trumpet calls offstage, left).

=However there's more than one way to skin a cat, and according to my (admittedly not 
fantastic) math ability,
we could arrive at the same numbers by coming at things the other way around! Let's 
count the number of days for
which a pair were breeding:

A = 5
B = 3
C = 4
D = ?
E = ?
Sum = 12
nDays = 5
Avg = 2.5

Proceed directly to Go!

=Let's ignore the ? rows for a minute (they're not relevant/inside the Mon-Fri period 
under consideration).
Using SUM() from SQL, getting the number of days in the month from Frank or PHP, and a 
function we/you will
define later to count days in the period, this is what the SQL statistical calculation 
might look like:

SUM( nrDays( split - paired + 1 ) ) / nDays

=but wait a minute, there's no reference to the month (from which we have already used 
nDays)!!! Quite right,
and here's where we start to build a headache...

=The split/paired dates may be completely within the required period, completely 
outside of it, or may straddle
the period. I've assumed the first, and asked you to ignore the second, so now let's 
go straight to the 'great
kahuna' - the only days we are interested in counting are those within the period. 
Please let me introduce:

SOM = start of month date, ie 1st of month.
EOM = end of month date (hah! what's this? 28/29/30/31st of month!?)

=If SOM < paired, then we only count from the beginning of the breeding period, ie 
paired. However if the
breeding period began prior to SOM then we only count from the first of the month. In 
SQL:

IF( paired < SOM , paired, SOM )

=The relevant period is from the start of the month, or from the beginning of the 
pairing, whichever came later
(oh boy, it sounds like a tax calculation form doesn't it?) The same (logic) applies 
to the relationship between
EOM and split. Which means that the statistical calculation now looks like:

SUM(
  nrDays(
    IF( split > EOM , split, EOM ) -
    IF( paired < SOM , paired, SOM ) + 1
  )
) / nDays

=Now let's step back to those three 'period' considerations above. This complicated 
SUM() takes care of
situations one and three, but what about situation two (breeding period is irrelevant 
to the calculation)? You
had already solved that - SQL uses the WHERE clause to include/exclude relevant rows:

WHERE paired <= start_date AND end_date <= split

=BTW in the case of the period being a calendar month, and further to my comments 
about the last day of the
month being anything from the 28th through the 31st, it might be easier to calculate 
the first of the next month
and change the last condition to be:

end_date(plus one, ie first of month+1) < split

=If you've followed all that, you're doing well (and exercising considerable 
patience). I don't have your data
(which I'll ask you to send privately - NOT to the list) if we have to play around 
with this problem much more).
So the time has come for you to 'give it a go, Joe', and make sure that my theory can 
actually be applied to
your practical situation. You will find it easier to work in native SQL or use a 
'management package', and
substitute suitable values for my 'convenient' variables. Does it work?

=BUT this only works for a single nominated month! I got the impression that you 
wanted more than that.
Unfortunately (and this applies to Frank's suggestion too) your data is not actually 
'organised' by dates: it is
in fact two dates representing a period. Logically breeding periods and months are not 
identical. In order to
repeat an SQL statistical function on groups of data, eg Jan, then Feb; the two groups 
need to be mutually
exclusive (AFAIK), eg return a count of all the males, and then all the females, in 
the table. Breeding pairs
for Jan, and breeding pairs for Feb, are not mutually exclusive if (using my example 
periods from above) Mon =
28Jan and Fri = 1Feb!

> > > Normally start and end dates would be input via a form.
> > I don't understand, at first you talked of "$given_date and now we
> > have two dates! Are you inputting a month - asking for the average for
> > January for example. Or are you putting in two dates representing a
> > period of arbitrary length and asking for an average of that, eg 42
> > days? Please clarify.
> Input is two dates, granularity one month. For example start Jan 2000,
> finish Dec 2001. Then for each month in that period, I want the avarage
> for each month. For example:
> 2000  January   56.2
> 2000  February  34.4
>  ....
> 2001  December  48.6

=Accordingly you might need to use PHP to break a wider calculation into individual 
months, or to store
historical monthly data (if it is called for frequently enough to be worth the 
hassle/storage space - or the
volume of data is significant (you mention other stats, below)) cf the response time 
to repeatedly call this SQL
query. [perhaps a wiser mind than mine might come up with a solution to this!?]

> There will be a whole lot of other statistics as well, such as number of
> eggs laid, number hatched, number of surviving chicks etc. But I can
> calculate _them_ quite easily.

=I've also left you with the task of counting nrDays (the number of relevant days 
within the breeding period),
and working our nDays (the number of days in the month) for the 'average' calculation. 
These will depend upon
your preferences. We're back to the TO_DAYS() idea or even using Timestamp data types. 
Also don't forget what I
said about the 'add one month and subtract one day' idea above which might fit with 
the DATE_SUB() - different
minds prefer to approach this sort of logic in different ways, so I'll leave that to 
your preference - you
already think my mind is warped, without me giving you yet more good reason to confirm!

> > Trouble is it's after midnight here. Apologies. Please would you
> > respond to the questions above, and if no one else steps in meantime,
> > I'll endeavor to return with a more lucid analysis in the morning...
> Sorry to keep you awake :-)

=I had replied during the morning yesterday (Z-time), as today. It wasn't the "keeping 
awake", no one was going
to achieve that, it was me trying to get the logical cogs into the right order - 
without stripping too many
gears in the process... Re-reading, I see that I had the ambition of being "lucid". I 
trust that at least this
is comprehensible...

=Let me know how you get on,
=dn



-- 
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
To contact the list administrators, e-mail: [EMAIL PROTECTED]

Reply via email to