[SLUG] SQL Brain teaser...

2005-06-01 Thread Grant Parnell - EverythingLinux
I'm trying to do some metrics on how long it takes to process an order in 
our system based on time between printing the picking slip and bagging the 
goods for dispatch. The aim is to determin the maximum performance if we 
were to have one guy do nothing but pick & pack orders.

At first glance you might think this would do
select 
  round(avg((unix_timestamp(docketdate) 
  -unix_timestamp(pickslip_printed))/60)) 
  as packmins from orders
  where pickslip_printed is not null

But alas that's not the true picture because in reality the events overlap 
each other due to operators batching up the pick slips or puting some 
aside while issues are dealt with.

Example data set..
select 
  ordno,pickslip_printed,docketdate,
  round((unix_timestamp(docketdate) 
  -unix_timestamp(pickslip_printed))/60) 
  as packmins from orders where pickslip_printed is not null 
  order by pickslip_printed;
+---+-+-+--+
| ordno | pickslip_printed| docketdate  | packmins |
+---+-+-+--+
| ordno | 2005-06-01 14:32:16 | 2005-06-01 14:34:47 |3 |
| ordno | 2005-06-01 15:12:27 | 2005-06-01 15:27:26 |   15 |
| ordno | 2005-06-01 15:12:28 | 2005-06-01 15:30:25 |   18 |
| ordno | 2005-06-01 15:12:29 | 2005-06-01 15:21:53 |9 |
| ordno | 2005-06-01 15:41:29 | 2005-06-01 16:12:07 |   31 |
| ordno | 2005-06-01 15:41:32 | 2005-06-01 16:11:45 |   30 |
| ordno | 2005-06-01 15:41:33 | 2005-06-01 15:52:17 |   11 |
| ordno | 2005-06-01 15:41:33 | 2005-06-01 15:49:15 |8 |
| ordno | 2005-06-01 15:41:34 | 2005-06-01 15:48:30 |7 |
| ordno | 2005-06-01 15:41:34 | 2005-06-01 15:45:56 |4 |
| ordno | 2005-06-01 15:53:00 | 2005-06-01 15:57:57 |5 |
| ordno | 2005-06-01 15:53:00 | 2005-06-01 16:01:33 |9 |
| ordno | 2005-06-01 15:53:00 | 2005-06-01 16:00:24 |7 |
| ordno | 2005-06-01 16:02:25 | 2005-06-01 16:04:00 |2 |
| ordno | 2005-06-01 16:02:26 | 2005-06-01 16:08:09 |6 |
| ordno | 2005-06-01 16:22:40 | 2005-06-01 16:34:49 |   12 |
| ordno | 2005-06-01 16:22:41 | 2005-06-01 16:36:26 |   14 |
| ordno | 2005-06-01 16:22:42 | 2005-06-01 16:37:52 |   15 |
| ordno | 2005-06-01 16:22:42 | 2005-06-01 16:25:59 |3 |
+---+-+-+--+

See what I mean, between 15:30 and 16:00 there were 9 orders started but
the average will be 12.4 minutes per order by the previous query when in
reality it's 9 orders in 30 minutes due to batching.

Or to put it into perspective, if you add up the packing minutes it's 1.7 
times the actual elapsed time :-(

Anybody got any smart ideas for reporting purposes? I don't mind if 
performance sucks because I will do it on a replicated database.

Essentially there needs to be a way of grouping orders with a similar 
pickslip_printed time. Maybe we could look at pickslip_printed in terms of 
5 minute chunks. That doesen't quite work because more orders are started 
before all the others are finished.

select 
from_unixtime(round(unix_timestamp(pickslip_printed)/300)*300) as 
fudgestart,
avg(round((unix_timestamp(docketdate)
-unix_timestamp(pickslip_printed))/60)) as packmins
from orders where pickslip_printed is not null 
group by fudgestart order by fudgestart;

For now the most practical thing I guess is to take the lesser of 
(max(docketdate)-min(pickslip_printed))/order_count
or the first query I mentioned.

 -- 
-- 
Grant Parnell - senior consultant
EverythingLinux services - the consultant's backup & tech support.
Web: http://www.everythinglinux.com.au/support.php
We're also busybits.com.au and linuxhelp.com.au and elx.com.au.
Phone 02 8756 3522 to book service or discuss your needs.

ELX or its employees participate in the following:-
OSIA (Open Source Industry Australia) - http://www.osia.net.au
AUUG (Australian Unix Users Group) - http://www.auug.org.au
SLUG (Sydney Linux Users Group) - http://www.slug.org.au 
LA (Linux Australia) - http://www.linux.org.au

-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] SQL Brain teaser...

2005-06-01 Thread Steven O'Reilly
Grant

I'm not sure that what you have is really a SQL syntax problem.  

It looks like an "Operations research queuing" problem.  Googling on
this should help - or confuse.

hope this is relevant

Steven O'Reilly


On 6/1/05, Grant Parnell - EverythingLinux <[EMAIL PROTECTED]> wrote:
> I'm trying to do some metrics on how long it takes to process an order in
> our system based on time between printing the picking slip and bagging the
> goods for dispatch. The aim is to determin the maximum performance if we
> were to have one guy do nothing but pick & pack orders.
>
--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] SQL Brain teaser...

2005-06-01 Thread Robert Collins
On Thu, 2005-06-02 at 09:46 +1000, Steven O'Reilly wrote:
> Grant
> 
> I'm not sure that what you have is really a SQL syntax problem.  
> 
> It looks like an "Operations research queuing" problem.  Googling on
> this should help - or confuse.

Or talk to Andrew Cowie whom I believe has some expertise in this arena.

Rob
-- 
GPG key available at: .


signature.asc
Description: This is a digitally signed message part
-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html

Re: [SLUG] SQL Brain teaser...

2005-06-01 Thread Phil Scarratt

Grant Parnell - EverythingLinux wrote:
I'm trying to do some metrics on how long it takes to process an order in 
our system based on time between printing the picking slip and bagging the 
goods for dispatch. The aim is to determin the maximum performance if we 
were to have one guy do nothing but pick & pack orders.








See what I mean, between 15:30 and 16:00 there were 9 orders started but
the average will be 12.4 minutes per order by the previous query when in
reality it's 9 orders in 30 minutes due to batching.

Or to put it into perspective, if you add up the packing minutes it's 1.7 
times the actual elapsed time :-(


Anybody got any smart ideas for reporting purposes? I don't mind if 
performance sucks because I will do it on a replicated database.




Could you simply take the time between the first printed picking slip 
and the last docketdate, count the number of orders in between and 
average? This may or may not be as accurate as you would like



Fil
--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html


Re: [SLUG] SQL Brain teaser...

2005-06-02 Thread James Gregory
On Wed, 2005-06-01 at 19:59 +1000, Grant Parnell - EverythingLinux
wrote:
> I'm trying to do some metrics on how long it takes to process an order in 
> our system based on time between printing the picking slip and bagging the 
> goods for dispatch. The aim is to determin the maximum performance if we 
> were to have one guy do nothing but pick & pack orders.

> +---+-+-+--+
> | ordno | pickslip_printed| docketdate  | packmins |
> +---+-+-+--+
> | ordno | 2005-06-01 14:32:16 | 2005-06-01 14:34:47 |3 |

So the issue is cases like this one where three orders are processed in
parallel:

> | ordno | 2005-06-01 15:12:27 | 2005-06-01 15:27:26 |   15 |
> | ordno | 2005-06-01 15:12:28 | 2005-06-01 15:30:25 |   18 |
> | ordno | 2005-06-01 15:12:29 | 2005-06-01 15:21:53 |9 |

So, if we want to consider the time each order actually takes to
process, and we're allowing parallel packing we need to make some
assumptions about what's going on with that parallel processing. Let's
make this assumption: in overlapped time periods, it is acceptable to
use the docketdate to demarcate time spent on different orders.

So the analysis you're looking for then is not how much time can
directly be attributed to each order, but rather how much time is spent
on each order, without necessarily knowing which order took that time
(you don't have enough data there to do direct attributions even if you
wanted to).

If we can assume that then you can make some headway on the problem --
you can flatten your data-structure to a time-line of events and measure
the time between events. Further, you can also ask the database to
ignore time when no order processing is occurring.

(excuse my SQL here, it's been a few months)

select eventdate, eventname, eventdiff from 
  (select pickslip_printed, 'printed', 1 from orders)
 union all
  (select docketdate, 'docketed', -1 from orders)
order by eventdate asc;

Which discards the relationship of times to specific orders, and allows
analysis of just the time elapsed between events. In addition, the
eventdiff column will allow you to work out how many orders are
outstanding at any point in the timeline.

I'd make that into a view, and then build another view that evaluates
the sum of the eventdiff column up to that row for every row. Whereever
that sum is > 0, there are orders being worked on. Once you get to that
point, the elapsed time on an order will be the difference between some
of the adjacent rows, which is similarly easily calculated.

I say "some of", because of course the one second that elapsed between
those orders being "printed" does not represent packing time. If you
think about it, you can solve that problem but that gets a bit fiddly
(you'd need to differentiate between print-print, print-docket and
docket-docket scenarios. The latter two would represent order processing
time, which is doable, but you need to keep inter-row state, which is
slightly tricky in SQL), so you're probably better off employing some
kind of heuristic given what you know about the data (eg that you don't
believe an order could be packed in < 30 seconds or similar).

Not bullet-proof, but it's a start.

HTH,

James.
(recovering SQL abuser)

-- 
My love burns for you
A Thermonuclear rose
Now lets go make out
  -- http://www.solardeathray.com/rose.html


signature.asc
Description: This is a digitally signed message part
-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html

Re: [SLUG] SQL Brain teaser...

2005-06-07 Thread Grant Parnell - slug
My god... there's some good ideas there. For now I opted for processing 
outside of SQL by basically breaking the day into 1 minute time slots and 
looking at how many orders were currently being processed in each time 
slot. Next I iterated again and counted up the minutes for each order such 
that for each real minute the attributed time was t/n where n was the 
number of current orders being processed. Then I simply averaged the 
attributed times for each order.
I can get away with this approach because the data set is pretty small and 
it's only twice through the data.

Essentially we agree on only looking at times of the day where packing 
activity is occurring. The jury is out on whether you can assume the 
orders were processed sequentially based on docketdate.

We'll never know, based on this data, how long each specific order took 
but then I don't care - baring exceptional cases (eg 2 hours).

Still, I'll try implimenting your approach for comparison's sake. It'll 
certainly be interesting.

On Thu, 2 Jun 2005, James Gregory wrote:

> On Wed, 2005-06-01 at 19:59 +1000, Grant Parnell - EverythingLinux
> wrote:
> > I'm trying to do some metrics on how long it takes to process an order in 
> > our system based on time between printing the picking slip and bagging the 
> > goods for dispatch. The aim is to determin the maximum performance if we 
> > were to have one guy do nothing but pick & pack orders.
> 
> > +---+-+-+--+
> > | ordno | pickslip_printed| docketdate  | packmins |
> > +---+-+-+--+
> > | ordno | 2005-06-01 14:32:16 | 2005-06-01 14:34:47 |3 |
> 
> So the issue is cases like this one where three orders are processed in
> parallel:
> 
> > | ordno | 2005-06-01 15:12:27 | 2005-06-01 15:27:26 |   15 |
> > | ordno | 2005-06-01 15:12:28 | 2005-06-01 15:30:25 |   18 |
> > | ordno | 2005-06-01 15:12:29 | 2005-06-01 15:21:53 |9 |
> 
> So, if we want to consider the time each order actually takes to
> process, and we're allowing parallel packing we need to make some
> assumptions about what's going on with that parallel processing. Let's
> make this assumption: in overlapped time periods, it is acceptable to
> use the docketdate to demarcate time spent on different orders.
> 
> So the analysis you're looking for then is not how much time can
> directly be attributed to each order, but rather how much time is spent
> on each order, without necessarily knowing which order took that time
> (you don't have enough data there to do direct attributions even if you
> wanted to).
> 
> If we can assume that then you can make some headway on the problem --
> you can flatten your data-structure to a time-line of events and measure
> the time between events. Further, you can also ask the database to
> ignore time when no order processing is occurring.
> 
> (excuse my SQL here, it's been a few months)
> 
> select eventdate, eventname, eventdiff from 
>   (select pickslip_printed, 'printed', 1 from orders)
>  union all
>   (select docketdate, 'docketed', -1 from orders)
> order by eventdate asc;
> 
> Which discards the relationship of times to specific orders, and allows
> analysis of just the time elapsed between events. In addition, the
> eventdiff column will allow you to work out how many orders are
> outstanding at any point in the timeline.
> 
> I'd make that into a view, and then build another view that evaluates
> the sum of the eventdiff column up to that row for every row. Whereever
> that sum is > 0, there are orders being worked on. Once you get to that
> point, the elapsed time on an order will be the difference between some
> of the adjacent rows, which is similarly easily calculated.
> 
> I say "some of", because of course the one second that elapsed between
> those orders being "printed" does not represent packing time. If you
> think about it, you can solve that problem but that gets a bit fiddly
> (you'd need to differentiate between print-print, print-docket and
> docket-docket scenarios. The latter two would represent order processing
> time, which is doable, but you need to keep inter-row state, which is
> slightly tricky in SQL), so you're probably better off employing some
> kind of heuristic given what you know about the data (eg that you don't
> believe an order could be packed in < 30 seconds or similar).
> 
> Not bullet-proof, but it's a start.
> 
> HTH,
> 
> James.
> (recovering SQL abuser)
> 
> 

-- 
-- 
Grant Parnell - SLUG President
EverythingLinux services - the consultant's backup & tech support.
Web: http://www.elx.com.au/support.php
We're also busybits.com.au and linuxhelp.com.au and everythinglinux.com.au.
Phone 02 8756 3522 to book service or discuss your needs 
or email us at paidsupport at elx.com.au

ELX or its employees participate in the following:-
OSIA (Open Source Industry Austra

Re: [SLUG] SQL Brain teaser...

2005-06-08 Thread telford
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Wed, Jun 08, 2005 at 10:02:34AM +1000, Grant Parnell - slug wrote:

> Essentially we agree on only looking at times of the day where packing 
> activity is occurring. The jury is out on whether you can assume the 
> orders were processed sequentially based on docketdate.

Due to the nature of SQL not being a turing-complete language
(i.e. you can be SURE that a pure-SQL query will always complete,
possibly after a long time has elapsed) it turns out that it is much
easier to detect the gaps between activity that to detect the
overlap region during activity. 

For example:

junk=> select * from orders;

 ordno |  pickslip_printed   | docketdate  
- ---+-+-
 1 | 2005-06-01 14:32:16 | 2005-06-01 14:34:47
 2 | 2005-06-01 15:12:27 | 2005-06-01 15:27:26
 3 | 2005-06-01 15:12:28 | 2005-06-01 15:30:25
 4 | 2005-06-01 15:12:29 | 2005-06-01 15:21:53
 5 | 2005-06-01 15:41:29 | 2005-06-01 16:12:07
 6 | 2005-06-01 15:41:32 | 2005-06-01 16:11:45
 7 | 2005-06-01 15:41:33 | 2005-06-01 15:52:17
 8 | 2005-06-01 15:41:33 | 2005-06-01 15:49:15
 9 | 2005-06-01 15:41:34 | 2005-06-01 15:48:30
10 | 2005-06-01 15:41:34 | 2005-06-01 15:45:56
11 | 2005-06-01 15:53:00 | 2005-06-01 15:57:57
12 | 2005-06-01 15:53:00 | 2005-06-01 16:01:33
13 | 2005-06-01 15:53:00 | 2005-06-01 16:00:24
14 | 2005-06-01 16:02:25 | 2005-06-01 16:04:00
15 | 2005-06-01 16:02:26 | 2005-06-01 16:08:09
16 | 2005-06-01 16:22:40 | 2005-06-01 16:34:49
17 | 2005-06-01 16:22:41 | 2005-06-01 16:36:26
18 | 2005-06-01 16:22:42 | 2005-06-01 16:37:52
19 | 2005-06-01 16:22:42 | 2005-06-01 16:25:59
(19 rows)

junk=> select o1.ordno, o2.ordno, o2.pickslip_printed - o1.docketdate as gap
   from orders as o1, orders as o2
   where o1.docketdate < o2.pickslip_printed
 and not exists
 (
   select * from orders as o3
   where ( o3.docketdate > o1.docketdate and o3.docketdate < 
o2.pickslip_printed )
  or ( o3.pickslip_printed > o1.docketdate and o3.pickslip_printed 
< o2.pickslip_printed )
  or ( o3.pickslip_printed = o2.pickslip_printed and o3.ordno < 
o2.ordno )
 );

 ordno | ordno |   gap
- ---+---+--
 1 | 2 | 00:37:40
 3 | 5 | 00:11:04
 5 |16 | 00:10:33
 7 |11 | 00:00:43
12 |14 | 00:00:52
(5 rows)

junk=> select max(docketdate) - min(pickslip_printed) as whole_day, count(*) as 
work_done from orders;

 whole_day | work_done 
- ---+---
 02:05:36  |19
(1 row)


junk=> select '02:05:36'::interval - '01:00:52'::interval as neat_time;

 neat_time 
- ---
 01:04:44
(1 row)


On the basis that the total time less the idle time must be the time spent
working you can say that over the day you got a usable output of 19 orders
processed at an input cost of 1 hour and 5 minutes spent working. That's
an average of 3 minutes and 24 seconds per order.

It's a backwards way of doing things but at least it can be all done in
one single query (give or take a translation from my PostgreSQL code into
your MySQL database). Note that the query that finds the gaps still crosses
the table onto itself and then onto itself again so in the simple case it
processing time will be of cubic order. However, you can safely work with
one day at a time so if you build a view (or temporary table in MySQL) that
has just one days orders then the cubic time probably won't hurt you too 
bad because the entire temp table will fit into memory cache. I'm sure that
some clever index structures will speed it up.

Of course, if SQL supported a "running total" aggregate function then it
could be processed as a sequential event list but then the "order by"
clause would have an effect on the data in the output lines and I think
there is a rule that makes such effects illegal. It just goes to show that
while SQL is a pretty handy database language, it does have its limitations.
Which is another way of saying that procedural languages will never die
because ultimately some problems only solve easily by using a procedural
language.

> We'll never know, based on this data, how long each specific order took 
> but then I don't care - baring exceptional cases (eg 2 hours).

Looking at the average over a day is probably the most meaningful
figure you can get. You can average by number of orders processed or
by number of products picked or by some fugde-factor algorithm that
combines both.

- Tel  ( http://bespoke.homelinux.net/ )


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.6 (GNU/Linux)

iQIVAwUBQqe+V8fOVl0KFTApAQIhcRAAmnqVT/LkQt9qEknCCG20i4U/+oz5+SJ8
0pTWaOo+308k8eezoF30N4tbJpkFWWKU1HQ/fg4lqcPlZnQbHml+DwFkOzrDQja+
Y+gRQyztzJUmoKEsojHoWidWka12kqrVqaJjMpxgVI9UMfMGqkgWKTJ8aay0S3RK
79p1YCJiBdjrsUT9Wj3BSXqWmViZfX6A8OXfWaet5uZCEMZjJT5TUUcQSvX5Lm5Y
KvjlxJFFZyO2GaWdmJtH7zS1ONRSZj31WUzSkn

Re: [SLUG] SQL Brain teaser...

2005-06-08 Thread james
> Looking at the average over a day is probably the most meaningful
> figure you can get.

Almost true. By itself, it gets you a ballpark figure, but you really want the
standard deviation to tell you how big the park is, and (ideally) the type of
distribution to tell you what kind of ballpark you're playing in.

-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html