#240: Data Characterization queries for phase 1 CDM V1 compliance
-------------------------------------------------+-------------------------
 Reporter:  campbell                             |       Owner:  huhickman
     Type:  task                                 |      Status:  assigned
 Priority:  major                                |   Milestone:  drn-
Component:  data-stds                            |  query-1
 Keywords:  cdm-etl CDM Compliance data-quality  |  Resolution:
 Blocking:                                       |  Blocked By:
-------------------------------------------------+-------------------------

Comment (by huhickman):

 We've ran through all of the 70 queries except for two; these two as
 written will never complete - I've let them run for days just to see what
 will happen.  These two are

 The SQL is generally quite inefficient  - with exclusive use of subqueries
 for correlation of data (no WITH clause to factor out common query
 clauses).


 1.       Repeatedly converting the ISO-8601 character string dates (per
 CDM spec) into native date datatypes on the fly.
 2.       No use of with clause to factor out subqueries (so that they get
 evaluated only once).
 3.       No recommended indexes in CDM V1,2, or 3.

 One of the samples that never completes :

 {{{
 #!sql
 set echo off
 set feedback off
 SET HEAD OFF
 SET PAGES 0
 SET VERIFY OFF
 SET FEED OFF
 SET SHOW OFF
 SET TERMOUT OFF
 ALTER SESSION SET PLSQL_WARNINGS='DISABLE:ALL';
 --
 ---- TotalValidHt Function - variable
 create or replace function fnTotalValidHt return int is
   TotalValidHt number;
   begin
   SELECT COUNT(ht) INTO TotalValidHt FROM Vital WHERE ht IS NOT NULL;

 return TotalValidHt;
 end;
 /
 --vit_l3_ht_dist
 SET TERMOUT ON
 SELECT vit_l3_ht_dist FROM
 (SELECT 0 AS position, '"Statistic","HT"' AS vit_l3_ht_dist FROM dual
 UNION
         SELECT 1 AS position,  '"' || Statistic || '",' || CAST(ht AS
 VARCHAR(15)) FROM
         (SELECT 'MIN' AS Statistic, MIN(ht) AS ht FROM Vital WHERE ht IS
 NOT NULL
                         UNION
                         SELECT 'MAX' AS Statistic, MAX(ht) AS ht FROM
 Vital WHERE ht IS NOT NULL
                         UNION
                         SELECT 'MEAN' AS Statistic, SUM(ht)/COUNT(ht) AS
 ht FROM Vital WHERE ht IS NOT NULL
                         UNION
                                 -- Median is calculated by sorting all
 records by ht AND getting the ht value of the midpoint of all records
                         SELECT 'MEDIAN' AS Statistic, d.ht AS ht FROM
                         (SELECT ROW_NUMBER() OVER (ORDER BY ht desc) AS
 rowno, ht FROM Vital WHERE ht IS NOT NULL
                         ) d
                         WHERE d.rowno = CAST(fnTotalValidHt / 2 AS INT)
 AND ht IS NOT NULL
                         UNION
       -- Records with missing patid are counted here for completeness.
                         SELECT 'NULL or missing' AS Statistic, COUNT(*) AS
 ht FROM Vital WHERE ht IS NULL
       UNION
                         SELECT 'record_n' AS Statistic, COUNT(*) AS ht
 FROM Vital WHERE ht IS NOT NULL
   )csv
   ORDER BY position,vit_l3_ht_dist
 )m;
 exit

 }}}

 The other one that does not complete is the same query except for weight.

 In the above example using Oracle's MEDIAN function will help cure some of
 the ills along with factoring out some of the subselects.  I will rewrite
 and send feedback for these two in particular.

--
Ticket URL: 
<http://informatics.gpcnetwork.org/trac/Project/ticket/240#comment:20>
gpc-informatics <http://informatics.gpcnetwork.org/>
Greater Plains Network - Informatics
_______________________________________________
Gpc-dev mailing list
Gpc-dev@listserv.kumc.edu
http://listserv.kumc.edu/mailman/listinfo/gpc-dev

Reply via email to