Hello Mimko,

Sorry for the late reply. I had a bunch of work to take care of before vacation, then there was the vacation itself. :)

On 11/13/2014 2:34 PM, Mimiko wrote:
Hello. I have this table:

 > show create table cc_agents_tier_status_log:
CREATE TABLE "cc_agents_tier_status_log" (
   "id" int(10) unsigned NOT NULL AUTO_INCREMENT,
   "date_log" timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
   "cc_agent" varchar(45) NOT NULL,
   "cc_agent_tier_status_id" tinyint(3) unsigned NOT NULL,
   "cc_queue_id" tinyint(3) unsigned NOT NULL,
   "cc_agent_id" int(10) unsigned NOT NULL,
   "cc_agent_phone" smallint(5) unsigned NOT NULL,
   "cc_agent_domain" varchar(45) NOT NULL DEFAULT 'pbx01.apa-canal.md',
   PRIMARY KEY ("id"),
   KEY "IDX_cc_agents_tier_status_log_2" ("cc_agent") USING HASH,
   KEY "IDX_cc_agents_tier_status_log_3" ("date_log"),
   KEY "FK_cc_agents_tier_status_log_2" ("cc_agent_id"),
   KEY "FK_cc_agents_tier_status_log_3" ("cc_queue_id"),
   KEY "FK_cc_agents_tier_status_log_1" ("cc_agent_tier_status_id")
USING BTREE,
   KEY "IDX_cc_agents_tier_status_log_7" ("id","date_log"),
   CONSTRAINT "FK_cc_agents_tier_status_log_1" FOREIGN KEY
("cc_agent_tier_status_id") REFERENCES "cc_agent_tier_status_chart"
("id") ON UPDATE CASCADE,
   CONSTRAINT "FK_cc_agents_tier_status_log_2" FOREIGN KEY
("cc_agent_id") REFERENCES "apacanal"."employee" ("id") ON UPDATE CASCADE,
   CONSTRAINT "FK_cc_agents_tier_status_log_3" FOREIGN KEY
("cc_queue_id") REFERENCES "cc_queues" ("id") ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=23799 DEFAULT CHARSET=ascii

 > show index from cc_agents_tier_status_log:
Table    Non_unique    Key_name    Seq_in_index    Column_name
Collation Cardinality    Sub_part    Packed    Null    Index_type
Comment    Index_comment
cc_agents_tier_status_log    0    PRIMARY    1    id    A
23999            (null)    BTREE    (null) (null)
cc_agents_tier_status_log    1    IDX_cc_agents_tier_status_log_2    1
cc_agent    A    260            (null)    BTREE    (null)    (null)
cc_agents_tier_status_log    1    IDX_cc_agents_tier_status_log_3    1
date_log    A    23999            (null)    BTREE    (null)    (null)
cc_agents_tier_status_log    1    FK_cc_agents_tier_status_log_2    1
cc_agent_id    A    2            (null)    BTREE    (null)    (null)
cc_agents_tier_status_log    1    FK_cc_agents_tier_status_log_3    1
cc_queue_id    A    14            (null)    BTREE    (null)    (null)
cc_agents_tier_status_log    1    FK_cc_agents_tier_status_log_1    1
cc_agent_tier_status_id    A    2            (null)    BTREE
(null)    (null)
cc_agents_tier_status_log    1 IDX_cc_agents_tier_status_log_7    1
id    A    23999            (null)    BTREE    (null)    (null)
cc_agents_tier_status_log    1    IDX_cc_agents_tier_status_log_7    2
date_log    A    23999            (null)    BTREE    (null)    (null)

And the query is:
 >set @enddate:=now();
 >set @startdate:='2014-11-01';
 >set @que_id:=-1;
select s.theHour as theHour,avg(s.nrAgents) as nrAgents from
(select date(a.theDateHour) as theDate,extract(hour from a.theDateHour)
as theHour,count(c.cc_agent_tier_status_id) as nrAgents
from (

select dh.theDateHour as theDateHour, max(c.date_log) as maxdatelog,c.*
FROM
( select concat(d.thedate,' ',h.theHour,':0:0') as theDateHour
from
( select DATE(DATE_ADD(date(@startdate), INTERVAL @i:=@i+1 DAY) ) as
theDate from (select @i:=-1) as t1
inner join cc_member_queue_end_log b on 1=1 and
b.id<=datediff(@enddate,@startdate)+1 ) as d
left outer join
(SELECT 0 AS theHour UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL
SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION
ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9 UNION ALL SELECT 10
UNION ALL SELECT 11 UNION ALL SELECT 12 UNION ALL SELECT 13 UNION ALL
SELECT 14 UNION ALL SELECT 15 UNION ALL SELECT 16 UNION ALL SELECT 17
UNION ALL SELECT 18 UNION ALL SELECT 19 UNION ALL SELECT 20 UNION ALL
SELECT 21 UNION ALL SELECT 22 UNION ALL SELECT 23) as h
on 1=1 ) AS dh
left outer join
cc_agents_tier_status_log as c
on c.date_log<=dh.theDateHour where (if(@queue_id<0,1,0) or
if(@queue_id=c.cc_queue_id,1,0))
group by dh.theDateHour,c.cc_queue_id,c.cc_agent_id,c.cc_agent_phone


) as a
left outer join cc_agents_tier_status_log as c
on c.date_log=a.maxdatelog and c.cc_queue_id=a.cc_queue_id and
c.cc_agent_id=a.cc_agent_id and c.cc_agent_phone=a.cc_agent_phone and
c.cc_agent_tier_status_id=2
group by a.theDateHour
order by date(a.theDateHour),extract(hour from a.theDateHour))
as s
group by s.theHour
order by s.theHour;


This query takes 20 seconds to populate.

Table cc_agents_tier_status_log contains log entries of agent_id
login/logout per queue per phone. status_id can have value 1 (logged
out) and 2 (login) at date_log datetime.

The resulting table must contain average number of agents logged in at
every hour per startdate to enddate.

Hope for some hints. Thank you.

The first problem is that you are generating a lot of extra rows before you actually need them. The only place where you should be faking the missing hours is at the very end of the entire process on the very off-chance that for some hour during the period covered by the query that you did not have any agents logged in for that particular hour of the day.

The second problems is all that nesting. Look at the description of what you want: "... average number of agents logged in at every hour per startdate to enddate."

I guess you want to know how many agents, each day, had at least one entry where their status_id was 2 for every hour? If the agent logged in, logged out, then logged back in during the same hour, that would only count as a single login?

Then, you want to take the average for each day.

Start dates count as >= midnight (00:00:00) of that day, the End date counts as < midnight of the next day.

First, you need to get a list of all the Agents that had a login status of 2 for every hour of the target range. The fastest way to generate a de-duplicated list of data tuples is with a table with a PRIMARY KEY and the INSERT IGNORE command.

CREATE TEMPORARY TABLE tmpAgentList (
  cc_agent_id int not null,
  log_day date not null,
  log_hour int not null,
  PRIMARY KEY (cc_agent, log_day, log_hour)
) ENGINE=MYISAM;

INSERT IGNORE tmpAgentList(cc_agent_id, log_day,log_hour)
SELECT cc_agent_id, date(date_log), hour(date_log)
FROM cc_agents_tier_status_log
WHERE
  cc_agent_tier_status_id = 2
  and date_log >= Date(@startdate)
  AND date_log < Date(@enddate) + interval 1 day;


Notes:
* You said nothing about the role that the cc_member_queue_end_log had to the query so I left it out. * I wrapped the @startdate and @enddate variables with the DATE() function to normalize them to midnight. It makes the math easier.
* Changing this index
  KEY "IDX_cc_agents_tier_status_log_3" ("date_log"),
to this
  KEY "IDX_cc_agents_tier_status_log_3" ("date_log","cc_agent_id"),
will make this query very fast.

From this point, it is academic to compute the count of agents who logged in during that hour for each day. This will be made faster by indexing the source table by date and hour.

ALTER TABLE tmpAgentList ADD KEY(log_day, log_hour);

create temporary table tmpAgentCount (key log_hour) engine=MEMORY
SELECT log_day,log_hour, count(cc_agent_id) as login_count
FROM tmpAgentList
GROUP BY log_day, log_hour;

Note: at this point, tmpAgentCount will have at most 24 rows for every day in our query range. This is a very small table so storing it in memory should pose no problem.

Last, you average over the dates by each hour then use the LEFT JOIN (SELECT 0 as hr UNION SELECT 1 ...) trick of your original query to fill in any missing hours. You save this for LAST because it creates rows where none existed before. Processing the fewest rows necessary is one of the keys to good query performance.

SELECT
  hourlist.hr as theHour
, avg(login_count) as nrAgents
FROM tmpAgentCount ac
RIGHT JOIN (SELECT 0 as hr UNION ALL SELECT 1 ...) as hourlist
  on hourlist.hr = ac.log_hour
GROUP BY hourlist.hr;

==

Another way to do this is to skip the totals-per-day steps by counting up how many days each hour had, count up how many logins there were in that hour, then divide to get the average.

SELECT
  log_hour as theHour
, count(cc_agent_id)/count(log_day) as nrAgents
FROM tmpAgentList
GROUP BY log_hour;

I'll leave the step of integrating the 0,1,2,3,4,... list of hours into the results up to you.

Please let us know which technique works better for you.

Yours,
--
Shawn Green
MySQL Senior Principal Technical Support Engineer
Oracle USA, Inc. - Hardware and Software, Engineered to Work Together.
Office: Blountville, TN

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/mysql

Reply via email to