RE: Duplicates or synonyms in NAACCR ICD_O_MORPH?

2015-05-04 Thread Dan Connolly
You say that like it's a bad thing. ;-)

We didn't do anything about this situation; never noticed it, and I don't see 
how it's a problem.

If the ICDO gods say there are two ways to spell tumour, who are we to say 
otherwise? You might try reporting the situation to them and see what they say.

Multiple terms with the same c_fullname does turn into a problem when we 
populate the concept_dimension from the metadata table. In that case, we pick 
arbitrarily (using min). See concepts_activate.sql around line 
70.
 This is not specific to NAACCR. We ran into it somewhere else, I think, though 
I don't recall where.

p.s. I happen to not have ready access to our production database so I used 
this query on babel to see the situation:

SELECT *
  FROM i2b2metadata.heron_terms ht
  where ht.c_basecode like '%80100%'
   and ht.c_fullname like '\\i2b2\\naaccr\\%'
order by ht.c_basecode
limit 100

--
Dan


From: gpc-dev-boun...@listserv.kumc.edu [gpc-dev-boun...@listserv.kumc.edu] on 
behalf of Lenon Patrick [ple...@uwhealth.org]
Sent: Monday, May 04, 2015 2:41 PM
To: gpc-dev@listserv.kumc.edu
Cc: Yoshihara Deborah L
Subject: Duplicates or synonyms in NAACCR ICD_O_MORPH?

In QA-ing our NAACCR data, we found some apparent duplicates in the NAACCR 
ontology as produced by the heron code.  “Duplicates” being defined as two 
metadata records with the same c_fullname (not synonyms).

These appear to be caused by minor differences in spelling and punctuation in 
ICD_O_MORPH, like “tumor” vs. “tumour.”

For example, this query:

SELECT *
  FROM I2B2_DEV_ETL..ICD_O_MORPH icdo
  where icdo.CONCEPT_NAME like '%8010%'
  order by concept_cd;

yields records with concept_name of '8010/0 Epithelial tumour, benign' and 
'8010/0 Epithelial tumor, benign' with the only difference being the English 
spelling of “tumour.”

There are other minor differences like
'8010/6 Carcinoma, metastatic NOS'
'8010/6 Carcinoma, metastatic, NOS'  /* extra comma */

This in turn was caused by slight differences between MORPH2 and MORPH3, aka 
ICD-0-2 and ICD-0-3.

So what if anything did you folks do with this?  Essentially they’re synonyms 
(if I understand I2B2 synonyms correctly).  Are they useful as such?  Or did 
you just wind up nuking the extras on more or less random criteria (like 
getting rid of all the “tumour” entries or some such)?




Patrick Lenon
HIMC Informatics Specialist
608 890 5671

___
Gpc-dev mailing list
Gpc-dev@listserv.kumc.edu
http://listserv.kumc.edu/mailman/listinfo/gpc-dev


Duplicates or synonyms in NAACCR ICD_O_MORPH?

2015-05-04 Thread Lenon Patrick
In QA-ing our NAACCR data, we found some apparent duplicates in the NAACCR 
ontology as produced by the heron code.  "Duplicates" being defined as two 
metadata records with the same c_fullname (not synonyms).

These appear to be caused by minor differences in spelling and punctuation in 
ICD_O_MORPH, like "tumor" vs. "tumour."

For example, this query:

SELECT *
  FROM I2B2_DEV_ETL..ICD_O_MORPH icdo
  where icdo.CONCEPT_NAME like '%8010%'
  order by concept_cd;

yields records with concept_name of '8010/0 Epithelial tumour, benign' and 
'8010/0 Epithelial tumor, benign' with the only difference being the English 
spelling of "tumour."

There are other minor differences like
'8010/6 Carcinoma, metastatic NOS'
'8010/6 Carcinoma, metastatic, NOS'  /* extra comma */

This in turn was caused by slight differences between MORPH2 and MORPH3, aka 
ICD-0-2 and ICD-0-3.

So what if anything did you folks do with this?  Essentially they're synonyms 
(if I understand I2B2 synonyms correctly).  Are they useful as such?  Or did 
you just wind up nuking the extras on more or less random criteria (like 
getting rid of all the "tumour" entries or some such)?




Patrick Lenon
HIMC Informatics Specialist
608 890 5671

___
Gpc-dev mailing list
Gpc-dev@listserv.kumc.edu
http://listserv.kumc.edu/mailman/listinfo/gpc-dev


gpc-dev 5 May agenda and meeting notes

2015-05-04 Thread Dan Connolly
Proposed agenda is below; (also available in the 5 May gpc-dev meeting 
notes
 google doc).




  1.  Convene, take roll, review records and plan next meeting.

 *   ​Meeting ID and access code: 
817-393-381; call +1 
(571) 317-3131

 *   roll: all 10(12) 
DevTeams 
represented? KUMC, CMH, UIOWA, WISC, MCW, MCRF, UMN, UNMC, UTHSCSA, UTSW, (MU), 
(IU)
Reminder - put institution after your name in GoToMeeting preferences

 *   today's scribe: George from MCW
comments on the agenda? on last week’s notes 
(#12)?
recent tickets opened/closed FYI (note also recent ticket comments 
report):

*   #210 (Query by BMI percentile among children.) 
closed 
fixed

*   #119 (federated breast cancer query with manual term alignment) 
closed  
invalid: Overtaken by 
#204 etc.

 *   Next Meeting May 12; scribe?

*   UNMC demo: standing up an identified i2b2 instance on 
Postgresql 12 or 
19 May?

*   UTHSCSA UMLS walk-through: 12 May? today?

  2.  PCORnet Obesity Studies

 *   Bariatric: 
#278 WISC?

 *   #277 
Antibiotic

 *   #283 (determine GPC funding status for bariatric study) 
created

  3.  
milestone:data-sec-check

 *   #174 (federated login for GPC data store) 
closed

 *   #159 (GPC REDCap Service for data sharing) 
closed

 *   anybody blocked on access to GPC redcap? else let’s make 
#269 minor or close 
it

  4.  milestone:data-domains3:

 *   #158 usable view of LOINC lab 
terms

*   Nate A. shared file of metadata via CDT

*   update from KUMC HERON work (awkward timing)

*   UTHSCSA UMLS stuff? (also re 
#243 procedures)

  5.  
milestone:data-quality3

 *   #232 - write-up progress?

  6.  Q2 data quality ideas (time permitting; currently minor on data-quality3)

 *   #240 Data Characterization queries for phase 1 CDM V1 
compliance - update 
since April ETA?

 *   #282 (check correlation of GPC enrollment data with CDM ENR_BASIS) 
created

*   re LV’s questions about #229: Enrollment terms based on Catchment 
Area

  7.  #88 HERON ETL SQL data check thresholds are not portable and not always 
applicable - demo 
test report



--
Dan

___
Gpc-dev mailing list
Gpc-dev@listserv.kumc.edu
http://listserv.kumc.edu/mailman/listinfo/gpc-dev


Re: [gpc-informatics] #277: PCORnet antibiotic study: collect prep-to-research materials

2015-05-04 Thread GPC Informatics
#277: PCORnet antibiotic study: collect prep-to-research materials
--+---
 Reporter:  jmclay|   Owner:  dconnolly
 Type:  task  |  Status:  accepted
 Priority:  major |   Milestone:  bariatric-study-data
Component:  data-sharing  |  Resolution:
 Keywords:  PCORnet antibiotic-study  |  Blocked By:
 Blocking:|
--+---
Changes (by dconnolly):

 * owner:  prakashnadkarni => dconnolly
 * status:  reopened => accepted


--
Ticket URL: 

gpc-informatics 
Greater Plains Network - Informatics
___
Gpc-dev mailing list
Gpc-dev@listserv.kumc.edu
http://listserv.kumc.edu/mailman/listinfo/gpc-dev


Re: [gpc-informatics] #277: PCORnet antibiotic study: collect prep-to-research materials (was: SQL code for PCORnet antibiotic study)

2015-05-04 Thread GPC Informatics
#277: PCORnet antibiotic study: collect prep-to-research materials
--+---
 Reporter:  jmclay|   Owner:  prakashnadkarni
 Type:  task  |  Status:  reopened
 Priority:  major |   Milestone:  bariatric-study-data
Component:  data-sharing  |  Resolution:
 Keywords:  PCORnet antibiotic-study  |  Blocked By:
 Blocking:|
--+---
Changes (by dconnolly):

 * status:  closed => reopened
 * reporter:  huhickman => jmclay
 * cc: dconnolly, mish, lv (removed)
 * cc: gpc-dev@…, bzschoche (added)
 * priority:  minor => major
 * milestone:   => bariatric-study-data
 * keywords:  PCORnet, antibiotic => PCORnet antibiotic-study
 * resolution:  fixed =>


Comment:

 re-opening and expanding this in light of:

  - [http://listserv.kumc.edu/pipermail/gpc-dev/2015q2/001582.html PCORnet
 Obesity Studies: Please complete prep-to-research surveys and tables ...]
\\Brittany Zschoche cc gpc-dev Thu Apr 30 10:02:28 CDT 2015

--
Ticket URL: 

gpc-informatics 
Greater Plains Network - Informatics
___
Gpc-dev mailing list
Gpc-dev@listserv.kumc.edu
http://listserv.kumc.edu/mailman/listinfo/gpc-dev


Re: [gpc-informatics] #278: CDRN Survey_Bariatric Prep to Research Table

2015-05-04 Thread GPC Informatics
#278: CDRN Survey_Bariatric Prep to Research Table
-+---
 Reporter:  jmcclay  |   Owner:  mish
 Type:  task |  Status:  assigned
 Priority:  major|   Milestone:
Component:  data-sharing |  Resolution:
 Keywords:  bariatric-study  |  Blocked By:
 Blocking:   |
-+---
Changes (by dconnolly):

 * owner:  jmcclay => mish


Comment:

 Brittany tells me she had a call with Jim M:
   - KU, UIOWA, MCRF, UNMC, UTHSCSA have sent numbers
   - UMN opted out; CMH opted out
   - no survey from WISC; plan to participate?

--
Ticket URL: 

gpc-informatics 
Greater Plains Network - Informatics
___
Gpc-dev mailing list
Gpc-dev@listserv.kumc.edu
http://listserv.kumc.edu/mailman/listinfo/gpc-dev


Re: [gpc-informatics] #227: Data QA for Breast Cancer Finder File

2015-05-04 Thread GPC Informatics
#227: Data QA for Breast Cancer Finder File
-+-
 Reporter:  tmcmahon |   Owner:  vleonardo
 Type:  task |  Status:  assigned
 Priority:  major|   Milestone:  bc-survey-cohort-def
Component:  data-sharing |  Resolution:
 Keywords:  data-quality breast- |  Blocked By:  217, 222, 223, 224,
  cancer-cohort  |  230, 231, 234, 235, 244
 Blocking:  265  |
-+-
Changes (by dconnolly):

 * cc: bzschoche (added)


Comment:

 Still no word on the IRB (#271), so we're still at least two weeks away
 from done. I'm guessing more like three; milestone due date moves from May
 18 to June 1.

--
Ticket URL: 

gpc-informatics 
Greater Plains Network - Informatics
___
Gpc-dev mailing list
Gpc-dev@listserv.kumc.edu
http://listserv.kumc.edu/mailman/listinfo/gpc-dev


Re: [gpc-informatics] #119: federated breast cancer query with manual term alignment

2015-05-04 Thread GPC Informatics
#119: federated breast cancer query with manual term alignment
---+---
 Reporter:  rwaitman   |   Owner:  jdale
 Type:  problem|  Status:  closed
 Priority:  minor  |   Milestone:  bc-survey-
Component:  data-stds  |  cohort-def
 Keywords:  breast-cancer-cohort methods-core  |  Resolution:  invalid
 Blocking: |  Blocked By:  87
---+---
Changes (by dconnolly):

 * cc: gpc-dev@… (removed)
 * status:  assigned => closed
 * resolution:   => invalid
 * blockedby:  17, 32, 44, 87, 126 => 87


Comment:

 Overtaken by #204 etc.

--
Ticket URL: 

gpc-informatics 
Greater Plains Network - Informatics
___
Gpc-dev mailing list
Gpc-dev@listserv.kumc.edu
http://listserv.kumc.edu/mailman/listinfo/gpc-dev


Re: [gpc-informatics] #252: collect test results of Weight and Height cohort selection queries from all sites

2015-05-04 Thread GPC Informatics
#252: collect test results of Weight and Height cohort selection queries from 
all
sites
+-
 Reporter:  bokov   |   Owner:  tmcmahon
 Type:  task|  Status:  assigned
 Priority:  major   |   Milestone:  obesity-survey-def
Component:  data-sharing|  Resolution:
 Keywords:  obesity-cohort  |  Blocked By:  210, 250, 251
 Blocking:  254 |
+-
Changes (by dconnolly):

 * keywords:   => obesity-cohort
 * owner:  bokov => tmcmahon


Comment:

 Tamara,

 Have you managed to try out the obesity query at KUMC?

 Ref:
  - [http://listserv.kumc.edu/pipermail/gpc-honest-
 brokers/2015-April/55.html FW: Obesity query v0.2]
\\Dan Connolly to gpc-honest-brokers 1 Apr 2015

--
Ticket URL: 

gpc-informatics 
Greater Plains Network - Informatics
___
Gpc-dev mailing list
Gpc-dev@listserv.kumc.edu
http://listserv.kumc.edu/mailman/listinfo/gpc-dev


Re: [gpc-informatics] #210: Query by BMI percentile among children.

2015-05-04 Thread GPC Informatics
#210: Query by BMI percentile among children.
-+-
 Reporter:  bokov|   Owner:  huhickman
 Type:  enhancement  |  Status:  closed
 Priority:  major|   Milestone:  obesity-
Component:  data-stds|  survey-def
 Keywords:  BMI pediatric-cohort obesity-cohort  |  Resolution:  fixed
 Blocking:  252  |  Blocked By:
-+-
Changes (by dconnolly):

 * status:  assigned => closed
 * resolution:   => fixed


Comment:

 If there's anything left to do on this, (a) I don't know what it is, and
 (b) we'll find out while collecting test results in #252.

--
Ticket URL: 

gpc-informatics 
Greater Plains Network - Informatics
___
Gpc-dev mailing list
Gpc-dev@listserv.kumc.edu
http://listserv.kumc.edu/mailman/listinfo/gpc-dev


Re: [gpc-informatics] #158: usable view of LOINC lab terms

2015-05-04 Thread GPC Informatics
#158: usable view of LOINC lab terms
-+
 Reporter:  rwaitman |   Owner:  dconnolly
 Type:  enhancement  |  Status:  accepted
 Priority:  major|   Milestone:  data-domains3
Component:  data-stds|  Resolution:
 Keywords:   |  Blocked By:
 Blocking:  241  |
-+

Comment (by nateapathy):

 I added the raw ontology table in pipe-delimited format to the Central
 Desktop [https://pcornet.centraldesktop.com/c4gpc/file/39991117/ here].

 This is the raw unedited LOINC hierarchy that serves as the base for CMH
 and MU. It is in a pipe-delimited format and should include over 71,000
 rows of LOINC metadata. All C_VISUALATTRIBUTES have been set to "%A" for
 active rather than hidden, and all C_TOTALNUM values have been set to 0.
 Our implementation counts the unique patients for each level of the
 hierarchy (folder and leaf), and adds that number to the C_TOTALNUM field.
 If that number is 0, the C_VISUALATTRIBUTES are changed to "%H" to hide
 the term. This way, we can support queries with terms that have no data in
 our database rather than returning any sort of error.

--
Ticket URL: 

gpc-informatics 
Greater Plains Network - Informatics
___
Gpc-dev mailing list
Gpc-dev@listserv.kumc.edu
http://listserv.kumc.edu/mailman/listinfo/gpc-dev