RE: Gather 2015 4th quarterly QA results

Green, Timothy A. Mon, 15 Feb 2016 13:21:39 -0800

Thanks for sharing these. I am interested in an implementation agnostic QA 
process.  These scripts and the qa val list make a decent starting point to 
build some dynamic i2b2 queries that leverage the metadata layer.  As probably 
the only site with a vested interest in altering how the QA scripts work, I 
think it's fair for MU to take the lead on looking into it.

In the interim, we are going to focus on finishing our NAACCR load and then do 
something quick and dirty for the 2015/Q4 and 2016/Q1 QA results.

From: Dan Connolly [mailto:dconno...@kumc.edu]
Sent: Monday, February 15, 2016 3:10 PM
To: Green, Timothy A.; Thomas Mish; Mosa, Abu S.
Cc: McNeeley, Todd A.; <gpc-dev@listserv.kumc.edu>
Subject: RE: Gather 2015 4th quarterly QA results

Silly me... didn't give a URL in the first place:

regression test driver:
https://informatics.kumc.edu/work/browser/heron_load/test_heron_query.py

performance test driver:
https://informatics.kumc.edu/work/browser/heron_load/query_performance.py

The queries are captured this way:
https://informatics.kumc.edu/work/browser/heron_load/capture_query.py

The code is kinda... well... I'm not sure I'd write it that way today.

The gist of it is: we use capture_query.py to grab matching rows from 
qt_query_master and save (among other things) the I2B2_REQUEST_XML; then we 
make HTTP requests just like the web client would to log-in and replay the 
request.

--
Dan
________________________________
From: Green, Timothy A. [green...@health.missouri.edu]
Sent: Monday, February 15, 2016 3:03 PM
To: Dan Connolly; Thomas Mish; Mosa, Abu S.
Cc: McNeeley, Todd A.; 
<gpc-dev@listserv.kumc.edu<mailto:gpc-dev@listserv.kumc.edu>>
Subject: RE: Gather 2015 4th quarterly QA results
Indeed, I'm all for the forward looking approach.  What driver are you using 
for batch mode i2b2 queries?

From: Dan Connolly [mailto:dconno...@kumc.edu]
Sent: Monday, February 15, 2016 3:00 PM
To: Green, Timothy A.; Thomas Mish; Mosa, Abu S.
Cc: McNeeley, Todd A.; 
<gpc-dev@listserv.kumc.edu<mailto:gpc-dev@listserv.kumc.edu>>
Subject: RE: Gather 2015 4th quarterly QA results

My first thought was: what does this have to do with the patient_dimension? It 
seems to only care what paths you use at your site.

But I think my tiny brain has finally absorbed the issue...

Even though our demographics ontology design is based on query paths, those 
query paths normally get looked up in the metadata table, which may say to look 
the path up in the concept_dimension, but may also say: we're done with the 
path; go look in some column of another dimension table.

This makes me wish we used more i2b2 queries for our QA stuff. I suppose we're 
headed that way with SHRINE/SNOW... Meanwhile, I wonder how many sites have the 
capacity to run a suite of i2b2 queries in lights-out fashon. We do this in the 
form performance and regression testing for HERON, but we should be able to 
straightforwardly adapt it for QA work.

That is to say: "some hard coded queries against patient_dimension" is one 
reasonable approach, but perhaps a more forward-looking approach would be some 
work-alike i2b2 queries.

--
Dan
________________________________
From: Green, Timothy A. [green...@health.missouri.edu]
Sent: Monday, February 15, 2016 2:43 PM
To: Dan Connolly; Thomas Mish; Mosa, Abu S.
Cc: McNeeley, Todd A.; 
<gpc-dev@listserv.kumc.edu<mailto:gpc-dev@listserv.kumc.edu>>
Subject: RE: Gather 2015 4th quarterly QA results
Yes, onboard with the readme, that is the other work we have to do, which is 
finding our paths that match the concept under QA.

Here is the same concept from the shared code, slightly different syntax.  I 
sent you a version we had already converted to mssql ourselves last December, 
before the code below was shared.  This can be found on line 95 here 
(https://bitbucket.org/gpcnetwork/gpc-qa-quarterly/src/e6d33c16e8691529f37025c36163bd3377d7e9e8/qa_mssql.sql?at=default&fileviewer=file-view-default).

select
v.VARIABLE_NAME
,count(distinct o.PATIENT_NUM) as RESULT
FROM $(UtilitiesDB).[dbo].[Q3_2015_GPC_QA_VAL_LIST] v
JOIN $(DataDB).[dbo].[CONCEPT_DIMENSION] c on (c.CONCEPT_PATH like 
v.CONCEPT_PATH_OR_CODE or c.concept_cd like v.CONCEPT_PATH_OR_CODE)
JOIN $(DataDB).[dbo].[OBSERVATION_FACT] o on o.concept_cd = c.concept_cd
JOIN  $(UtilitiesDB).[dbo].[Q3_2015_GPC_QA_ACTIVE_PATIENT] p on p.patient_num = 
o.patient_num
where v.POPULATION = 'Active'
and v.QUERY_TYPE = 'COUNT PATIENT'
group by v.variable_name

From: Dan Connolly [mailto:dconno...@kumc.edu]
Sent: Monday, February 15, 2016 2:19 PM
To: Green, Timothy A.; Thomas Mish; Mosa, Abu S.
Cc: McNeeley, Todd A.; 
<gpc-dev@listserv.kumc.edu<mailto:gpc-dev@listserv.kumc.edu>>
Subject: RE: Gather 2015 4th quarterly QA results

Thanks for elaborating, but  I'm having trouble finding that statement in our 
shared code. "JOIN  #Q4_2015_GPC_QA_ACTIVE_PATIENT" doesn't occur in 
qa_mssql.sql<https://bitbucket.org/gpcnetwork/gpc-qa-quarterly/src/e6d33c16e8691529f37025c36163bd3377d7e9e8/qa_mssql.sql?at=default>
 of 2016-01-20 e6d33c16e869. nor does "QA_ACTIVE_TMP".

Are you referring to some previous version of the code? If so, which version?

Regarding editing val_list, I'm not entirely clear on how it works. I sort of 
naively hope that practice diminishes over time. The instructions in the 
REAME<https://bitbucket.org/gpcnetwork/gpc-qa-quarterly> say:
For the script to return values, the columns titled "CONCEPT_PATH_OR_CODE" need 
to be updated with your organization's paths or codes. By default this column 
contains the paths or codes used in Babel. Do not edit the values in the 
"BABEL_PATH_OR_CODE" columns. These are used for reference to identify if your 
organization's path matches the babel path.
--
Dan
________________________________
From: Green, Timothy A. [green...@health.missouri.edu]
Sent: Monday, February 15, 2016 1:54 PM
To: Thomas Mish; Dan Connolly; Mosa, Abu S.
Cc: McNeeley, Todd A.; 
<gpc-dev@listserv.kumc.edu<mailto:gpc-dev@listserv.kumc.edu>>
Subject: RE: Gather 2015 4th quarterly QA results
Tom, thanks.  We've not yet started doing anything related to making it neutral 
with regards to location of patient attributes.  It's a tall order...I've taken 
a look at the way the i2b2 service layer builds it's queries.  It's fairly 
complex object oriented code that will be hard to adapt to straight SQL I think.

I think we will approach the first round with some hard coded queries against 
patient_dimension.

Dan:

This statement returns no results:
select v.VARIABLE_NAME ,count(distinct o.PATIENT_NUM) as RESULT
  into #Q4_2015_GPC_QA_ACTIVE_TMP
  FROM I2B2Data.Q4_2015_GPC_QA_VAL_LIST v
        JOIN I2B2Data.CONCEPT_DIMENSION c on (c.CONCEPT_PATH like 
v.CONCEPT_PATH_OR_CODE or c.CONCEPT_CD like v.CONCEPT_PATH_OR_CODE)
        JOIN I2B2Data.OBSERVATION_FACT o on o.concept_cd = c.concept_cd
        JOIN  #Q4_2015_GPC_QA_ACTIVE_PATIENT p on p.patient_num = o.patient_num
        where v.POPULATION = 'Active'
        and v.QUERY_TYPE = 'COUNT PATIENT'
        group by v.variable_name;

We of course do have patients in our database, but this query assumes the 
information of interest is in observation_fact when it is in fact in 
patient_dimension in our database.

Regarding the ontology mapping, what I got from the instructions was to map our 
concept paths into the Q4_2015_GPC_QA_VAL_LIST file so it would return results. 
 Am I to understand that we should not be mapping our ontology paths to the 
Q4_2015_GPC_QA_VAL_LIST, but instead alter our metadata in i2b2 to match that 
which is in the base Q4_2015_GPC_QA_VAL_LIST file?

From: Thomas Mish [mailto:m...@wisc.edu]
Sent: Monday, February 15, 2016 12:17 PM
To: Dan Connolly; Green, Timothy A.; Mosa, Abu S.
Cc: McNeeley, Todd A.; 
<gpc-dev@listserv.kumc.edu<mailto:gpc-dev@listserv.kumc.edu>>
Subject: RE: Gather 2015 4th quarterly QA results

I think that very early on in the GPC conversations we covered a lot of ground 
in the fact vs patient dimension conversation. I know that here at Madison we 
altered the way we organized our i2b2 data at the time to that we were more 
like the rest of the GPC. I understand the angst over changing. I'd love to see 
what you doing/have done to make the script more neutral on this issues. We'll 
incorporate them as best as possible for next time to minimize the churn.

-TM

From: 
gpc-dev-boun...@listserv.kumc.edu<mailto:gpc-dev-boun...@listserv.kumc.edu> 
[mailto:gpc-dev-boun...@listserv.kumc.edu] On Behalf Of Dan Connolly
Sent: Monday, February 15, 2016 11:50 AM
To: Green, Timothy A. 
<green...@health.missouri.edu<mailto:green...@health.missouri.edu>>; Mosa, Abu 
S. <mo...@health.missouri.edu<mailto:mo...@health.missouri.edu>>
Cc: McNeeley, Todd A. 
<mcneele...@health.missouri.edu<mailto:mcneele...@health.missouri.edu>>; 
<gpc-dev@listserv.kumc.edu<mailto:gpc-dev@listserv.kumc.edu>> 
<gpc-dev@listserv.kumc.edu<mailto:gpc-dev@listserv.kumc.edu>>
Subject: RE: Gather 2015 4th quarterly QA results

To the extent that I did any of the writing of the QA scripts, I made no 
assumption about fact table vs. patient dimension. All our ontology design 
decisions regard paths. I haven't looked at all the code in detail, though. If 
there is such an assumption, it should be straightforward to refer us to one or 
more lines or statements that exhibit the assumption. Would you please give 
such details? Or just run the script and give us one or more diagnostic 
messages that show the assumption in action?

And as to "the relatively simple task of mapping..." that's the point of the QA 
scripts: to measure progress on that mapping. If you run the scripts and get 
some zeros, the script is still working correctly. It correctly shows that 
you're not finished. (You're not alone. KUMC has plenty of work to do in this 
area, for example.)

--
Dan
________________________________
From: Green, Timothy A. [green...@health.missouri.edu]
Sent: Monday, February 15, 2016 10:38 AM
To: Mosa, Abu S.; Dan Connolly
Cc: <gpc-dev@listserv.kumc.edu<mailto:gpc-dev@listserv.kumc.edu>>; McNeeley, 
Todd A.
Subject: RE: Gather 2015 4th quarterly QA results
The patient demographics are stored in our i2b2 in the person dimension.  The 
QA scripts are written with the assumption that all data is stored in the fact 
table.  There is no provision for the script to leverage the ontology 
architecture to determine the storage location of the various pieces of data.  
I'm sure it was done that way to simplify the script, but it does ignore a 
pretty major feature of i2b2: the ontology mapping functionality.  This makes 
it less portable to sites that have a different implementation of the ontology. 
 I don't characterize a difference in implementation a problem in the data 
warehouse.

Keep in mind that MU had an i2b2 instance back in 2014, and the MU architecture 
was established before joining the GPC.  It would be work to either shift our 
demographics storage (and additional storage capacity), or to rewrite the QA 
scripts.  We will opt to rewrite the QA scripts, but to do it in a portable way 
(i.e. we could share our changes back to the GPC) will require some complexity 
of coding to leverage the ontology framework in i2b2.  We may have to opt for 
hardcoding our version against the dimension table.

Aside from that, we still have the relatively simple task of mapping our 
concept codes/paths to get the remainder (non-demographic part) of the script 
to work correctly.

Tim

From: Mosa, Abu S.
Sent: Monday, February 15, 2016 10:20 AM
To: Dan Connolly
Cc: <gpc-dev@listserv.kumc.edu<mailto:gpc-dev@listserv.kumc.edu>>; Green, 
Timothy A.; McNeeley, Todd A.
Subject: RE: Gather 2015 4th quarterly QA results

I would defer this to Tim and Todd to share their thoughts on this. Thanks. 
--Mosa

From: Dan Connolly [mailto:dconno...@kumc.edu]
Sent: Monday, February 15, 2016 10:03 AM
To: Mosa, Abu S. <mo...@health.missouri.edu<mailto:mo...@health.missouri.edu>>
Cc: <gpc-dev@listserv.kumc.edu<mailto:gpc-dev@listserv.kumc.edu>> 
<gpc-dev@listserv.kumc.edu<mailto:gpc-dev@listserv.kumc.edu>>
Subject: RE: Gather 2015 4th quarterly QA results

What are the symptoms when you try to run the QA scripts, Mosa? What 
diagnostics do you get?

The premise of the QA scripts is that if they don't run, that's a problem with 
your data warehouse. Perhaps there are actually problems with the scripts, but 
if so, we need details on what those problems are. Also, we have a shared 
source code repository in 
gpc-qa-quarterly<https://bitbucket.org/gpcnetwork/gpc-qa-quarterly>. So if you 
make changes, please strive to make them portable to other sites.

p.s. I presume it's OK to share this thread with gpc-dev, as it regards group 
technical work.

--
Dan
________________________________
From: Mosa, Abu S. [mo...@health.missouri.edu]
Sent: Friday, February 12, 2016 4:41 PM
To: Dan Connolly
Subject: Gather 2015 4th quarterly QA results
Hi Dan,

I talked with our technical team about generating the 2015 4th quarterly 
report. They looked into the MSSQL code that you referred to me on BitBucket. 
They advised that major re-writing effort is needed in order to localize the 
scripts to be able to run on our i2b2 data model. They are currently focusing 
on the Phase I implementation (specifically tumor registry data load) which 
they target to complete by end of this month. So, a viable timeline for 
generating the 2015 4th quarterly report for us would be Mid-march. Let me know 
if you have any questions. If needed, we can discuss this during the next Dev 
call.

Regards,
Abu Saleh Mohammad Mosa, PhD
Director, Research Informatics
Institute for Clinical and Translational Science
University of Missouri

_______________________________________________
Gpc-dev mailing list
Gpc-dev@listserv.kumc.edu
http://listserv.kumc.edu/mailman/listinfo/gpc-dev

RE: Gather 2015 4th quarterly QA results

Reply via email to