Efficient counting of results

Israel Brewster Thu, 19 Oct 2017 10:44:32 -0700

I am working on developing a report that groups data into a two-dimensional 
array based on date and time. More specifically, date is grouped into 
categories:


day, week-to-date, month-to-date, and year-to-date

Then, for each of those categories, I need to get a count of records that fall 
into the following categories:

0 minutes late, 1-5 minutes late, and 6-15 minutes late

where minutes late will be calculated based on a known scheduled time and the 
time in the record. To further complicate things, there are actually two times 
in each record, so under the day, week-to-date, month-to-date etc groups, there 
will be two sets of "late" bins, one for each time. In table form it would look 
 something like this:

                    | day  |  week-to-date | month-to-date |  year-to-date  |
----------------------------------------------------------------------------------------
t1 0min        | <counts for each time group>
t1 1-5 min    | ...
t1 6-15 min  | ...
t2 0min        | ...
t2 1-5 min    | ...
t2 6-15 min  | ...

So in the extreme scenario of a record that is for the current day, it will be 
counted into 8 bins: once each for day, week-to-date, month-to-date and 
year-to-date under the proper "late" bin for the first time in the record, and 
once each into each of the time groups under the proper "late" bin for the 
second time in the record. An older record may only be counted twice, under the 
year-to-date group. A record with no matching schedule is discarded, as is any 
record that is "late" by more than 15 minutes (those are gathered into a 
separate report)

My initial approach was to simply make dictionaries for each "row" in the 
table, like so:

t10 = {'daily': 0, 'WTD': 0, 'MTD': 0, 'YTD': 0,}
t15 = {'daily': 0, 'WTD': 0, 'MTD': 0, 'YTD': 0,}
.
.
t25 = {'daily': 0, 'WTD': 0, 'MTD': 0, 'YTD': 0,}
t215 = {'daily': 0, 'WTD': 0, 'MTD': 0, 'YTD': 0,}

then loop through the records, find the schedule for that record (if any, if 
not move on as mentioned earlier), compare t1 and t2 against the schedule, and 
increment the appropriate bin counts using a bunch of if statements. 
Functional, if ugly. But then I got to thinking: I keep hearing about all these 
fancy numerical analysis tools for python like pandas and numpy - could 
something like that help? Might there be a way to simply set up a table with 
"rules" for the columns and rows, and drop my records into the table, having 
them automatically counted into the proper bins or something? Or am I over 
thinking this, and the "simple", if ugly approach is best?

-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------




-- 
https://mail.python.org/mailman/listinfo/python-list

Efficient counting of results

Reply via email to