WISC does have seq num 380 (central). It appears that the sites are split on 
whether they have values for 380 vs 560. As long as you have values for one of 
these two, then you're consistent with what everyone else has sent. FYI, 
gpc_seq_no in the datamart is populated by both of the two variables; it takes 
380 if that is present in the datamart, otherwise it takes 560.

From: Debbie Yoshihara [mailto:dlyos...@wisc.edu]
Sent: Monday, February 22, 2016 11:26 AM
To: McDowell, Bradley D; gpc-dev@listserv.kumc.edu
Subject: RE: data issues

Brad,

Ok, thanks for that clarification.
I'm just trying to figure out if we need to pull additional values from the 
tumor registry feed.
It looks like we have to figure out the 560 SEQNO HOSPITAL values.
Thanks.

--- Debbie

From: McDowell, Bradley D [mailto:bradley-mcdow...@uiowa.edu]
Sent: Monday, February 22, 2016 11:22 AM
To: Debbie Yoshihara; 
gpc-dev@listserv.kumc.edu<mailto:gpc-dev@listserv.kumc.edu>
Subject: RE: data issues

Sorry, now that I think of it, I believe "tr" means "tumor registry". So I'll 
bet vital_tr was copied over from NAACCR1760. gpc.vital would be the variable 
that takes values from gpc_vital_tr and/or gpc_vital_ehr.

From: McDowell, Bradley D
Sent: Monday, February 22, 2016 11:14 AM
To: 'Debbie Yoshihara'; 
gpc-dev@listserv.kumc.edu<mailto:gpc-dev@listserv.kumc.edu>
Subject: RE: data issues

I think gpc_vital_tr uses values from both gpc_vital_ehr and NAACCR 1760. If 
one of the original values isn't present in the dataset, then the other value 
is used to create vital_tr. That's my best guess. All of the GPC variables were 
created by Vince to create the datamart. There may be documentation at KUMC to 
confirm whether this is true.

From: Debbie Yoshihara [mailto:dlyos...@wisc.edu]
Sent: Monday, February 22, 2016 11:05 AM
To: McDowell, Bradley D; 
gpc-dev@listserv.kumc.edu<mailto:gpc-dev@listserv.kumc.edu>
Subject: RE: data issues

Hi Brad,

Ok, I'm confused, what is the difference between gpc_vital and gpc_vital_ehr 
then?

--- Debbie


From: McDowell, Bradley D [mailto:bradley-mcdow...@uiowa.edu]
Sent: Monday, February 22, 2016 10:59 AM
To: Debbie Yoshihara; 
gpc-dev@listserv.kumc.edu<mailto:gpc-dev@listserv.kumc.edu>
Subject: RE: data issues

Hi Debbie,

My understanding is that the gpc_vital_ehr provides the vital status from the 
electronic health record, whereas the value from NAACCR 1760 comes from the 
tumor registry. So we do have vital status from WISC, it's just that the one of 
these two (redundant) variables wasn't populated.

This should probably be part of a larger conversation: It might not make sense 
to have the EHR version of this variable for these kinds of GPC data sets. 
Sometimes two versions of vital status are included in non-GPC data sets 
because one source might be more recently updated. I get the impression that 
these two variables might be perfectly redundant for some GPC sites; for at 
least one site, the NAACCR version was calculated for the purpose of this data 
set using the EHR. If that's a common approach, then perhaps we should just 
drop the EHR version in favor of the NAACCR-standardized version.

Hope this helps...

Brad

From: Debbie Yoshihara [mailto:dlyos...@wisc.edu]
Sent: Monday, February 22, 2016 10:37 AM
To: McDowell, Bradley D; 
gpc-dev@listserv.kumc.edu<mailto:gpc-dev@listserv.kumc.edu>
Subject: RE: data issues

Hi Bradley,

I wanted some clarification about gpc_vital_ehr.
The NAACCR codes NAACCR|1760:0 and NAACCR|1760:1 appear to hold these values 
and WISC has these values, so I'm not sure why it's showing 100% missing for 
WISC.

--- Debbie Yoshihara


From: 
gpc-dev-boun...@listserv.kumc.edu<mailto:gpc-dev-boun...@listserv.kumc.edu> 
[mailto:gpc-dev-boun...@listserv.kumc.edu] On Behalf Of McDowell, Bradley D
Sent: Monday, February 01, 2016 11:01 AM
To: gpc-dev@listserv.kumc.edu<mailto:gpc-dev@listserv.kumc.edu>
Subject: FW: data issues

Dan asked me to forward this message to this group:

From: McDowell, Bradley D
Sent: Tuesday, January 26, 2016 11:00 AM
To: Dan Connolly (dconno...@kumc.edu<mailto:dconno...@kumc.edu>)
Cc: Chrischilles, Elizabeth A; Gryzlak, Brian M
Subject: data issues

Hi Dan,

Betsy asked me to provide a list of the issues that have been uncovered so far 
with respect to the oncology registry data. The hope is that we can establish 
tickets for each problem. There are some problems that are easy to describe, 
and some that are not so easy. Some of the easy ones:


*         Missing MCW patient 
(https://informatics.gpcnetwork.org/trac/Project/ticket/453)

*         Duplicate records (indicated with equivalent sequence number for same 
Study ID), some with updated surgery, class of case variables

*         Some duplicate records have dx dates that appear to have been copied 
from last contact date (dx date not the same across duplicates)

*         UMN switched | and : for NAACCR variables (not "*.Descriptor" 
variables)

For the duplicated records, I have put together a spreadsheet that nicely 
illustrates the problem, and I'm happy to share that. We'll have to transfer it 
via redcap or some other secure means since it contains patient level data.

Regarding the not so easy issues:

*         One problem concerns inconsistencies in coded values. For example, 
gpc_language has four different values for "English". In general, UIOWA is not 
using the same descriptor values as other sites, and that accounts for most of 
these. It is not the only offender, however. MCW uses a different convention 
for seer_site_breast (as does UIOWA) and Race descriptors are different for 
UIOWA and WISC. These inconsistencies have percolated through to the derived 
GPC variables. I am writing a mapping program to handle this with the registry 
data we have received so far. I'm certainly willing to share what I have if it 
would help you.

*         Another big problem concerns missing values. I have attached a report 
that provides the percentage of missing values, organized by site and variable. 
This illustrates, for example, that UIOWA has no data for the Race 5 variable 
(i.e., 100% of the values equal "NA"; this does NOT reflect cases where a value 
is assigned for the NAACCR code for 'missing'). It also illustrates some other 
things that we have discussed; for example, sites that reported data for 
central sequence number did not report data for hospital sequence number (and 
vice versa).

o   UIOWA and MCRF appear to have the biggest problems with missing data.

*         We also need to figure out why so many patients in our database do 
not appear to have tumors diagnosed between 01JAN2013 and 01MAY2014.

(General observation: You'll notice that each of the NAACCR concepts correspond 
to two variables (e.g., N0670_Surg_Prim_Site and N0670_Surg_Prim_Site_D). Vince 
and I settled on this arrangement for the datamart. Since then, though, I've 
come to believe that the redundancy makes the database difficult to use. 
Perhaps we could keep that in mind for future data cuts.)

I'm very happy to work on these problems with you. Would you like to schedule a 
phone call to plan out how to approach these issues?

Thanks,

Brad

------------------------------------------------------------
Bradley D. McDowell, Ph.D.
Director, Population Research Core
Holden Comprehensive Cancer Center

5240 MERF | The University of Iowa | Iowa City, IA | 52242
Office: 319-384-1768

_______________________________________________
Gpc-dev mailing list
Gpc-dev@listserv.kumc.edu
http://listserv.kumc.edu/mailman/listinfo/gpc-dev

Reply via email to