Bradley,

W   When you say there are “ Duplicate records (indicated with equivalent 
sequence number”, what field are you basing this off of ? 
http://www.naaccr.org/Applications/ContentReader/default.aspx?c=9 shows  only 
two sequence numbers , both with not enough room to make each tumor unique .

[cid:B0AA3B86-D2F3-40F2-B462-3E055238E77C]

and

[cid:A6D5DCC0-5C33-46C4-8DD1-08D83F723388]

George Kowalski
 414.805.7318 (office) / gkowal...@mcw.edu<mailto:gkowal...@mcw.edu>

From: 
<gpc-dev-boun...@listserv.kumc.edu<mailto:gpc-dev-boun...@listserv.kumc.edu>> 
on behalf of "McDowell, Bradley D" 
<bradley-mcdow...@uiowa.edu<mailto:bradley-mcdow...@uiowa.edu>>
Date: Monday, February 1, 2016 at 11:00 AM
To: "gpc-dev@listserv.kumc.edu<mailto:gpc-dev@listserv.kumc.edu>" 
<gpc-dev@listserv.kumc.edu<mailto:gpc-dev@listserv.kumc.edu>>
Subject: FW: data issues

Dan asked me to forward this message to this group:

From: McDowell, Bradley D
Sent: Tuesday, January 26, 2016 11:00 AM
To: Dan Connolly (dconno...@kumc.edu<mailto:dconno...@kumc.edu>)
Cc: Chrischilles, Elizabeth A; Gryzlak, Brian M
Subject: data issues

Hi Dan,

Betsy asked me to provide a list of the issues that have been uncovered so far 
with respect to the oncology registry data. The hope is that we can establish 
tickets for each problem. There are some problems that are easy to describe, 
and some that are not so easy. Some of the easy ones:


·         Missing MCW patient 
(https://informatics.gpcnetwork.org/trac/Project/ticket/453)

·         Duplicate records (indicated with equivalent sequence number for same 
Study ID), some with updated surgery, class of case variables

·         Some duplicate records have dx dates that appear to have been copied 
from last contact date (dx date not the same across duplicates)

·         UMN switched | and : for NAACCR variables (not “*.Descriptor” 
variables)

For the duplicated records, I have put together a spreadsheet that nicely 
illustrates the problem, and I’m happy to share that. We’ll have to transfer it 
via redcap or some other secure means since it contains patient level data.

Regarding the not so easy issues:

·         One problem concerns inconsistencies in coded values. For example, 
gpc_language has four different values for “English”. In general, UIOWA is not 
using the same descriptor values as other sites, and that accounts for most of 
these. It is not the only offender, however. MCW uses a different convention 
for seer_site_breast (as does UIOWA) and Race descriptors are different for 
UIOWA and WISC. These inconsistencies have percolated through to the derived 
GPC variables. I am writing a mapping program to handle this with the registry 
data we have received so far. I’m certainly willing to share what I have if it 
would help you.

·         Another big problem concerns missing values. I have attached a report 
that provides the percentage of missing values, organized by site and variable. 
This illustrates, for example, that UIOWA has no data for the Race 5 variable 
(i.e., 100% of the values equal “NA”; this does NOT reflect cases where a value 
is assigned for the NAACCR code for ‘missing’). It also illustrates some other 
things that we have discussed; for example, sites that reported data for 
central sequence number did not report data for hospital sequence number (and 
vice versa).

o   UIOWA and MCRF appear to have the biggest problems with missing data.

·         We also need to figure out why so many patients in our database do 
not appear to have tumors diagnosed between 01JAN2013 and 01MAY2014.

(General observation: You’ll notice that each of the NAACCR concepts correspond 
to two variables (e.g., N0670_Surg_Prim_Site and N0670_Surg_Prim_Site_D). Vince 
and I settled on this arrangement for the datamart. Since then, though, I’ve 
come to believe that the redundancy makes the database difficult to use. 
Perhaps we could keep that in mind for future data cuts.)

I’m very happy to work on these problems with you. Would you like to schedule a 
phone call to plan out how to approach these issues?

Thanks,

Brad

------------------------------------------------------------
Bradley D. McDowell, Ph.D.
Director, Population Research Core
Holden Comprehensive Cancer Center

5240 MERF | The University of Iowa | Iowa City, IA | 52242
Office: 319-384-1768

_______________________________________________
Gpc-dev mailing list
Gpc-dev@listserv.kumc.edu
http://listserv.kumc.edu/mailman/listinfo/gpc-dev

Reply via email to