#240: Data Characterization queries for phase 1 CDM V1 compliance -------------------------------------------------+------------------------- Reporter: campbell | Owner: huhickman Type: task | Status: assigned Priority: major | Milestone: drn- Component: data-stds | query-1 Keywords: cdm-etl CDM Compliance data-quality | Resolution: Blocking: | Blocked By: -------------------------------------------------+-------------------------
Comment (by huhickman): We've ran through all of the 70 queries except for two; these two as written will never complete - I've let them run for days just to see what will happen. These two are The SQL is generally quite inefficient - with exclusive use of subqueries for correlation of data (no WITH clause to factor out common query clauses). 1. Repeatedly converting the ISO-8601 character string dates (per CDM spec) into native date datatypes on the fly. 2. No use of with clause to factor out subqueries (so that they get evaluated only once). 3. No recommended indexes in CDM V1,2, or 3. One of the samples that never completes : {{{ #!sql set echo off set feedback off SET HEAD OFF SET PAGES 0 SET VERIFY OFF SET FEED OFF SET SHOW OFF SET TERMOUT OFF ALTER SESSION SET PLSQL_WARNINGS='DISABLE:ALL'; -- ---- TotalValidHt Function - variable create or replace function fnTotalValidHt return int is TotalValidHt number; begin SELECT COUNT(ht) INTO TotalValidHt FROM Vital WHERE ht IS NOT NULL; return TotalValidHt; end; / --vit_l3_ht_dist SET TERMOUT ON SELECT vit_l3_ht_dist FROM (SELECT 0 AS position, '"Statistic","HT"' AS vit_l3_ht_dist FROM dual UNION SELECT 1 AS position, '"' || Statistic || '",' || CAST(ht AS VARCHAR(15)) FROM (SELECT 'MIN' AS Statistic, MIN(ht) AS ht FROM Vital WHERE ht IS NOT NULL UNION SELECT 'MAX' AS Statistic, MAX(ht) AS ht FROM Vital WHERE ht IS NOT NULL UNION SELECT 'MEAN' AS Statistic, SUM(ht)/COUNT(ht) AS ht FROM Vital WHERE ht IS NOT NULL UNION -- Median is calculated by sorting all records by ht AND getting the ht value of the midpoint of all records SELECT 'MEDIAN' AS Statistic, d.ht AS ht FROM (SELECT ROW_NUMBER() OVER (ORDER BY ht desc) AS rowno, ht FROM Vital WHERE ht IS NOT NULL ) d WHERE d.rowno = CAST(fnTotalValidHt / 2 AS INT) AND ht IS NOT NULL UNION -- Records with missing patid are counted here for completeness. SELECT 'NULL or missing' AS Statistic, COUNT(*) AS ht FROM Vital WHERE ht IS NULL UNION SELECT 'record_n' AS Statistic, COUNT(*) AS ht FROM Vital WHERE ht IS NOT NULL )csv ORDER BY position,vit_l3_ht_dist )m; exit }}} The other one that does not complete is the same query except for weight. In the above example using Oracle's MEDIAN function will help cure some of the ills along with factoring out some of the subselects. I will rewrite and send feedback for these two in particular. -- Ticket URL: <http://informatics.gpcnetwork.org/trac/Project/ticket/240#comment:20> gpc-informatics <http://informatics.gpcnetwork.org/> Greater Plains Network - Informatics _______________________________________________ Gpc-dev mailing list Gpc-dev@listserv.kumc.edu http://listserv.kumc.edu/mailman/listinfo/gpc-dev