Thanks for sharing these. I am interested in an implementation agnostic QA process. These scripts and the qa val list make a decent starting point to build some dynamic i2b2 queries that leverage the metadata layer. As probably the only site with a vested interest in altering how the QA scripts work, I think it's fair for MU to take the lead on looking into it.
In the interim, we are going to focus on finishing our NAACCR load and then do something quick and dirty for the 2015/Q4 and 2016/Q1 QA results. From: Dan Connolly [mailto:dconno...@kumc.edu] Sent: Monday, February 15, 2016 3:10 PM To: Green, Timothy A.; Thomas Mish; Mosa, Abu S. Cc: McNeeley, Todd A.; <gpc-dev@listserv.kumc.edu> Subject: RE: Gather 2015 4th quarterly QA results Silly me... didn't give a URL in the first place: regression test driver: https://informatics.kumc.edu/work/browser/heron_load/test_heron_query.py performance test driver: https://informatics.kumc.edu/work/browser/heron_load/query_performance.py The queries are captured this way: https://informatics.kumc.edu/work/browser/heron_load/capture_query.py The code is kinda... well... I'm not sure I'd write it that way today. The gist of it is: we use capture_query.py to grab matching rows from qt_query_master and save (among other things) the I2B2_REQUEST_XML; then we make HTTP requests just like the web client would to log-in and replay the request. -- Dan ________________________________ From: Green, Timothy A. [green...@health.missouri.edu] Sent: Monday, February 15, 2016 3:03 PM To: Dan Connolly; Thomas Mish; Mosa, Abu S. Cc: McNeeley, Todd A.; <gpc-dev@listserv.kumc.edu<mailto:gpc-dev@listserv.kumc.edu>> Subject: RE: Gather 2015 4th quarterly QA results Indeed, I'm all for the forward looking approach. What driver are you using for batch mode i2b2 queries? From: Dan Connolly [mailto:dconno...@kumc.edu] Sent: Monday, February 15, 2016 3:00 PM To: Green, Timothy A.; Thomas Mish; Mosa, Abu S. Cc: McNeeley, Todd A.; <gpc-dev@listserv.kumc.edu<mailto:gpc-dev@listserv.kumc.edu>> Subject: RE: Gather 2015 4th quarterly QA results My first thought was: what does this have to do with the patient_dimension? It seems to only care what paths you use at your site. But I think my tiny brain has finally absorbed the issue... Even though our demographics ontology design is based on query paths, those query paths normally get looked up in the metadata table, which may say to look the path up in the concept_dimension, but may also say: we're done with the path; go look in some column of another dimension table. This makes me wish we used more i2b2 queries for our QA stuff. I suppose we're headed that way with SHRINE/SNOW... Meanwhile, I wonder how many sites have the capacity to run a suite of i2b2 queries in lights-out fashon. We do this in the form performance and regression testing for HERON, but we should be able to straightforwardly adapt it for QA work. That is to say: "some hard coded queries against patient_dimension" is one reasonable approach, but perhaps a more forward-looking approach would be some work-alike i2b2 queries. -- Dan ________________________________ From: Green, Timothy A. [green...@health.missouri.edu] Sent: Monday, February 15, 2016 2:43 PM To: Dan Connolly; Thomas Mish; Mosa, Abu S. Cc: McNeeley, Todd A.; <gpc-dev@listserv.kumc.edu<mailto:gpc-dev@listserv.kumc.edu>> Subject: RE: Gather 2015 4th quarterly QA results Yes, onboard with the readme, that is the other work we have to do, which is finding our paths that match the concept under QA. Here is the same concept from the shared code, slightly different syntax. I sent you a version we had already converted to mssql ourselves last December, before the code below was shared. This can be found on line 95 here (https://bitbucket.org/gpcnetwork/gpc-qa-quarterly/src/e6d33c16e8691529f37025c36163bd3377d7e9e8/qa_mssql.sql?at=default&fileviewer=file-view-default). select v.VARIABLE_NAME ,count(distinct o.PATIENT_NUM) as RESULT FROM $(UtilitiesDB).[dbo].[Q3_2015_GPC_QA_VAL_LIST] v JOIN $(DataDB).[dbo].[CONCEPT_DIMENSION] c on (c.CONCEPT_PATH like v.CONCEPT_PATH_OR_CODE or c.concept_cd like v.CONCEPT_PATH_OR_CODE) JOIN $(DataDB).[dbo].[OBSERVATION_FACT] o on o.concept_cd = c.concept_cd JOIN $(UtilitiesDB).[dbo].[Q3_2015_GPC_QA_ACTIVE_PATIENT] p on p.patient_num = o.patient_num where v.POPULATION = 'Active' and v.QUERY_TYPE = 'COUNT PATIENT' group by v.variable_name From: Dan Connolly [mailto:dconno...@kumc.edu] Sent: Monday, February 15, 2016 2:19 PM To: Green, Timothy A.; Thomas Mish; Mosa, Abu S. Cc: McNeeley, Todd A.; <gpc-dev@listserv.kumc.edu<mailto:gpc-dev@listserv.kumc.edu>> Subject: RE: Gather 2015 4th quarterly QA results Thanks for elaborating, but I'm having trouble finding that statement in our shared code. "JOIN #Q4_2015_GPC_QA_ACTIVE_PATIENT" doesn't occur in qa_mssql.sql<https://bitbucket.org/gpcnetwork/gpc-qa-quarterly/src/e6d33c16e8691529f37025c36163bd3377d7e9e8/qa_mssql.sql?at=default> of 2016-01-20 e6d33c16e869. nor does "QA_ACTIVE_TMP". Are you referring to some previous version of the code? If so, which version? Regarding editing val_list, I'm not entirely clear on how it works. I sort of naively hope that practice diminishes over time. The instructions in the REAME<https://bitbucket.org/gpcnetwork/gpc-qa-quarterly> say: For the script to return values, the columns titled "CONCEPT_PATH_OR_CODE" need to be updated with your organization's paths or codes. By default this column contains the paths or codes used in Babel. Do not edit the values in the "BABEL_PATH_OR_CODE" columns. These are used for reference to identify if your organization's path matches the babel path. -- Dan ________________________________ From: Green, Timothy A. [green...@health.missouri.edu] Sent: Monday, February 15, 2016 1:54 PM To: Thomas Mish; Dan Connolly; Mosa, Abu S. Cc: McNeeley, Todd A.; <gpc-dev@listserv.kumc.edu<mailto:gpc-dev@listserv.kumc.edu>> Subject: RE: Gather 2015 4th quarterly QA results Tom, thanks. We've not yet started doing anything related to making it neutral with regards to location of patient attributes. It's a tall order...I've taken a look at the way the i2b2 service layer builds it's queries. It's fairly complex object oriented code that will be hard to adapt to straight SQL I think. I think we will approach the first round with some hard coded queries against patient_dimension. Dan: This statement returns no results: select v.VARIABLE_NAME ,count(distinct o.PATIENT_NUM) as RESULT into #Q4_2015_GPC_QA_ACTIVE_TMP FROM I2B2Data.Q4_2015_GPC_QA_VAL_LIST v JOIN I2B2Data.CONCEPT_DIMENSION c on (c.CONCEPT_PATH like v.CONCEPT_PATH_OR_CODE or c.CONCEPT_CD like v.CONCEPT_PATH_OR_CODE) JOIN I2B2Data.OBSERVATION_FACT o on o.concept_cd = c.concept_cd JOIN #Q4_2015_GPC_QA_ACTIVE_PATIENT p on p.patient_num = o.patient_num where v.POPULATION = 'Active' and v.QUERY_TYPE = 'COUNT PATIENT' group by v.variable_name; We of course do have patients in our database, but this query assumes the information of interest is in observation_fact when it is in fact in patient_dimension in our database. Regarding the ontology mapping, what I got from the instructions was to map our concept paths into the Q4_2015_GPC_QA_VAL_LIST file so it would return results. Am I to understand that we should not be mapping our ontology paths to the Q4_2015_GPC_QA_VAL_LIST, but instead alter our metadata in i2b2 to match that which is in the base Q4_2015_GPC_QA_VAL_LIST file? From: Thomas Mish [mailto:m...@wisc.edu] Sent: Monday, February 15, 2016 12:17 PM To: Dan Connolly; Green, Timothy A.; Mosa, Abu S. Cc: McNeeley, Todd A.; <gpc-dev@listserv.kumc.edu<mailto:gpc-dev@listserv.kumc.edu>> Subject: RE: Gather 2015 4th quarterly QA results I think that very early on in the GPC conversations we covered a lot of ground in the fact vs patient dimension conversation. I know that here at Madison we altered the way we organized our i2b2 data at the time to that we were more like the rest of the GPC. I understand the angst over changing. I'd love to see what you doing/have done to make the script more neutral on this issues. We'll incorporate them as best as possible for next time to minimize the churn. -TM From: gpc-dev-boun...@listserv.kumc.edu<mailto:gpc-dev-boun...@listserv.kumc.edu> [mailto:gpc-dev-boun...@listserv.kumc.edu] On Behalf Of Dan Connolly Sent: Monday, February 15, 2016 11:50 AM To: Green, Timothy A. <green...@health.missouri.edu<mailto:green...@health.missouri.edu>>; Mosa, Abu S. <mo...@health.missouri.edu<mailto:mo...@health.missouri.edu>> Cc: McNeeley, Todd A. <mcneele...@health.missouri.edu<mailto:mcneele...@health.missouri.edu>>; <gpc-dev@listserv.kumc.edu<mailto:gpc-dev@listserv.kumc.edu>> <gpc-dev@listserv.kumc.edu<mailto:gpc-dev@listserv.kumc.edu>> Subject: RE: Gather 2015 4th quarterly QA results To the extent that I did any of the writing of the QA scripts, I made no assumption about fact table vs. patient dimension. All our ontology design decisions regard paths. I haven't looked at all the code in detail, though. If there is such an assumption, it should be straightforward to refer us to one or more lines or statements that exhibit the assumption. Would you please give such details? Or just run the script and give us one or more diagnostic messages that show the assumption in action? And as to "the relatively simple task of mapping..." that's the point of the QA scripts: to measure progress on that mapping. If you run the scripts and get some zeros, the script is still working correctly. It correctly shows that you're not finished. (You're not alone. KUMC has plenty of work to do in this area, for example.) -- Dan ________________________________ From: Green, Timothy A. [green...@health.missouri.edu] Sent: Monday, February 15, 2016 10:38 AM To: Mosa, Abu S.; Dan Connolly Cc: <gpc-dev@listserv.kumc.edu<mailto:gpc-dev@listserv.kumc.edu>>; McNeeley, Todd A. Subject: RE: Gather 2015 4th quarterly QA results The patient demographics are stored in our i2b2 in the person dimension. The QA scripts are written with the assumption that all data is stored in the fact table. There is no provision for the script to leverage the ontology architecture to determine the storage location of the various pieces of data. I'm sure it was done that way to simplify the script, but it does ignore a pretty major feature of i2b2: the ontology mapping functionality. This makes it less portable to sites that have a different implementation of the ontology. I don't characterize a difference in implementation a problem in the data warehouse. Keep in mind that MU had an i2b2 instance back in 2014, and the MU architecture was established before joining the GPC. It would be work to either shift our demographics storage (and additional storage capacity), or to rewrite the QA scripts. We will opt to rewrite the QA scripts, but to do it in a portable way (i.e. we could share our changes back to the GPC) will require some complexity of coding to leverage the ontology framework in i2b2. We may have to opt for hardcoding our version against the dimension table. Aside from that, we still have the relatively simple task of mapping our concept codes/paths to get the remainder (non-demographic part) of the script to work correctly. Tim From: Mosa, Abu S. Sent: Monday, February 15, 2016 10:20 AM To: Dan Connolly Cc: <gpc-dev@listserv.kumc.edu<mailto:gpc-dev@listserv.kumc.edu>>; Green, Timothy A.; McNeeley, Todd A. Subject: RE: Gather 2015 4th quarterly QA results I would defer this to Tim and Todd to share their thoughts on this. Thanks. --Mosa From: Dan Connolly [mailto:dconno...@kumc.edu] Sent: Monday, February 15, 2016 10:03 AM To: Mosa, Abu S. <mo...@health.missouri.edu<mailto:mo...@health.missouri.edu>> Cc: <gpc-dev@listserv.kumc.edu<mailto:gpc-dev@listserv.kumc.edu>> <gpc-dev@listserv.kumc.edu<mailto:gpc-dev@listserv.kumc.edu>> Subject: RE: Gather 2015 4th quarterly QA results What are the symptoms when you try to run the QA scripts, Mosa? What diagnostics do you get? The premise of the QA scripts is that if they don't run, that's a problem with your data warehouse. Perhaps there are actually problems with the scripts, but if so, we need details on what those problems are. Also, we have a shared source code repository in gpc-qa-quarterly<https://bitbucket.org/gpcnetwork/gpc-qa-quarterly>. So if you make changes, please strive to make them portable to other sites. p.s. I presume it's OK to share this thread with gpc-dev, as it regards group technical work. -- Dan ________________________________ From: Mosa, Abu S. [mo...@health.missouri.edu] Sent: Friday, February 12, 2016 4:41 PM To: Dan Connolly Subject: Gather 2015 4th quarterly QA results Hi Dan, I talked with our technical team about generating the 2015 4th quarterly report. They looked into the MSSQL code that you referred to me on BitBucket. They advised that major re-writing effort is needed in order to localize the scripts to be able to run on our i2b2 data model. They are currently focusing on the Phase I implementation (specifically tumor registry data load) which they target to complete by end of this month. So, a viable timeline for generating the 2015 4th quarterly report for us would be Mid-march. Let me know if you have any questions. If needed, we can discuss this during the next Dev call. Regards, Abu Saleh Mohammad Mosa, PhD Director, Research Informatics Institute for Clinical and Translational Science University of Missouri
_______________________________________________ Gpc-dev mailing list Gpc-dev@listserv.kumc.edu http://listserv.kumc.edu/mailman/listinfo/gpc-dev