Re: [gpc-informatics] #145: transform (ETL) GPC i2b2 data to PCORNet CDM

2015-03-09 Thread GPC Informatics
#145: transform (ETL) GPC i2b2 data to PCORNet CDM
--+--
 Reporter:  dconnolly |   Owner:  ngraham
 Type:  enhancement   |  Status:  assigned
 Priority:  major |   Milestone:  drn-basic-query
Component:  data-sharing  |  Resolution:
 Keywords:|  Blocked By:  109, 146, 225
 Blocking:  160   |
--+--

Comment (by jdale):

 Per Dan's request during 3.3.15 GPC Dev call regarding GPC->CDM experience
 report
 ---
 At Minnesota, instead of creating and populating database tables to match
 the PCORI CDM, we chose to develop PCORI CDM specific views on top of our
 existing production data warehouse model.

 We chose this option for two reasons:

 1.  Database storage – We currently have data for 2.2 million patients
 which equates to approximately 5.5 billon rows of data.  This takes up a
 significant amount of storage space.  Instead of creating more tables and
 table spaces to hold duplicate data,  we decide it would be more efficient
 to abstract the data to views that represent the PCORI CDM
 2.  Standards – Views have become somewhat of a standard for how we
 provide access to subsets of data from our data warehouse.  This helps
 with limiting the exposure to all of our production tables/fields as well
 simplifying many of the complex joins that may be required. Since this is
 our standard we were able to develop the PCORI CDM views rather quickly

--
Ticket URL: 

gpc-informatics 
Greater Plains Network - Informatics
___
Gpc-dev mailing list
Gpc-dev@listserv.kumc.edu
http://listserv.kumc.edu/mailman/listinfo/gpc-dev


gpc-dev 10 March agenda

2015-03-09 Thread Dan Connolly
from gpc-dev 10 March meeting 
notes
 google doc.

This looks like more than an hour's worth, so speak up if stuff that you want 
to be sure we get to is late in the agenda.


  1.  Convene, take roll, review records and plan next meeting.

 *   ​Meeting ID and access code: 
817-393-381; call +1 
(571) 317-3131

 *   meeting notes 
(#12): previous 
notes OK? today's scribe: UTHSCSA.

 *   roll: all 10 
DevTeams 
represented? comments on the agenda?

*   KUMC, CMH, UIOWA, WISC, MCW, MCRF, UMN, UNMC, UTHSCSA, UTSW, MU

*   #245 (living situation, occupation, and years of education for ALS 
analysis) created 
FYI

*   #206 (breast cancer survey tracker: coordinating center 
requirements and review) 
closed

 *   Next meeting: 17 Mar. Scribe volunteer?

  2.  
milestone:bc-survey-cohort-def
 - #227 
dependencies

 *   #224 too few patients  UNMC - 2015-01-16 11:22:46 Hubert: to resubmit 
based on new file from the registrar (update Monday, March 09, 2015 1:41 PM 
acknowledged)

 *   #235 prior diagnosis, seq no. UIOWA

 *   #223 (variability in breast cancer data set terms) 
closed

 *   #244 - random sample - VL and co. plan to finish Weds.

  3.  milestone:drn-basic-query (recap of status for March 19 site visit)

 *   We have experience reports from KUMC (HackathonTwo presentation), MCRF 
(LaRose 30 Jan), UTHSCSA (HackathonTwo discussion; ticket:145#comment:20) of 
using shared approach for GPC i2b2->CDM.

 *   Several other sites have done CDM ETL one way or another (e.g. UMN, 
UNMC)

 *   #210 encounter terms - Dan struggling to find time to do “the obvious 
thing”; i.e. copy CDM terms to GPC ontology

 *   #109 CDM as i2b2 - we have a lingering @ vs T thingy

 *   #145 GPC to CDM ETL - what’s left? close it and see what happens in 
the March QA run?

  4.  milestone:cohort-char1 #210 BMI percentile - poll by site. (what was the 
23 Feb due date about? I forget)

 *   KUMC - done, modulo new path in vitals design (#23)

 *   CMH - ETA was 4 Mar

 *   UIOWA

 *   WISC

 *   MCW

 *   MCRF - was on track for 23 Feb

 *   UMN

 *   UNMC

 *   UTHSCSA

 *   UTSW - done

 *   MU

  5.  milestone:data-sec-check

 *   #80 assessment - update?

  6.  milestone:data-quality3 - Tom, Jim, are we on track? How do we get there?

  7.  milestone:data-domains2

 *   #23 vitals - MCRF had a March 6 ETA

 *   #67 (Demographics ontology and value sets, e.g. age) 
closed

 *   #229 (Enrollment terms based on Catchment Area, encounters, etc.) 
closed

 *   #186 (Querying age by numerical constraints) 
closed

 *   #243 (GPC Procedure Ontology: CPT, HCPCS) 
created

--
Dan

___
Gpc-dev mailing list
Gpc-dev@listserv.kumc.edu
http://listserv.kumc.edu/mailman/listinfo/gpc-dev


Re: [gpc-informatics] #240: Data Characterization ETL ADD for phase 1 CDM V1 compliance

2015-03-09 Thread GPC Informatics
#240: Data Characterization ETL ADD for phase 1 CDM V1 compliance
+
 Reporter:  campbell|   Owner:  campbell
 Type:  task|  Status:  assigned
 Priority:  major   |   Milestone:  data-quality3
Component:  data-stds   |  Resolution:
 Keywords:  ETL CDM Compliance Quality  |  Blocked By:
 Blocking:  |
+
Changes (by dconnolly):

 * milestone:  data-domains2 => data-quality3


--
Ticket URL: 

gpc-informatics 
Greater Plains Network - Informatics
___
Gpc-dev mailing list
Gpc-dev@listserv.kumc.edu
http://listserv.kumc.edu/mailman/listinfo/gpc-dev


Re: [gpc-informatics] #186: Querying age by numerical constraints

2015-03-09 Thread GPC Informatics
#186: Querying age by numerical constraints
-+
 Reporter:  huhickman|   Owner:  preeder
 Type:  problem  |  Status:  closed
 Priority:  major|   Milestone:  data-domains2
Component:  data-stds|  Resolution:  fixed
 Keywords:  age valueset obesity-cohort  |  Blocked By:
 Blocking:  67   |
-+
Changes (by dconnolly):

 * status:  assigned => closed
 * resolution:   => fixed


Comment:

 On babel, we have `\\GPC_TERMS\i2b2\Demographics\Age by Value\`

 This seems different from the one site where it's actually deployed, UNMC,
 for no reason I can find. They use `\i2b2\Demographics\Age\Age Range\`.
 Hubert, if it's more than a trivial fix to migrate, let us know.

 Modulo the initial `\\GPC_TERMS\` bit, which we track separately as #201,
 it seems pretty stable. Based on recent discussion with Russ, I'm once
 again inclined to try the optimistic approach to closing tickets.

--
Ticket URL: 

gpc-informatics 
Greater Plains Network - Informatics
___
Gpc-dev mailing list
Gpc-dev@listserv.kumc.edu
http://listserv.kumc.edu/mailman/listinfo/gpc-dev


[gpc-informatics] Batch modify: #117, #146, #148, #154, #162, #164, #165, #166, #182, #229

2015-03-09 Thread GPC Informatics
Batch modification to #117, #146, #148, #154, #162, #164, #165, #166, #182, 
#229 by dconnolly:
priority to minor

Comment:
narrowing focus for March 19 site visit goals

--
Tickets URL: 

gpc-informatics 
Greater Plains Network - Informatics
___
Gpc-dev mailing list
Gpc-dev@listserv.kumc.edu
http://listserv.kumc.edu/mailman/listinfo/gpc-dev


Re: [gpc-informatics] #235: insufficient data to exclude prior cancer diagnoses

2015-03-09 Thread GPC Informatics
#235: insufficient data to exclude prior cancer diagnoses
-+-
 Reporter:  bchrischilles|   Owner:
 Type:  problem  |  prakashnadkarni
 Priority:  major|  Status:  assigned
Component:  data-stds|   Milestone:  bc-survey-
 Keywords:  data-quality breast-cancer-cohort|  cohort-def
  bc-research-team   |  Resolution:
 Blocking:  227  |  Blocked By:
-+-
Changes (by dconnolly):

 * owner:  vleonardo => prakashnadkarni


Comment:

 Prakash, do you see any sequence number info when you run the
 BuilderSaga#DataSummary query?

 As Vince noted in ticket:168#comment:16, we don't; not for 0380 central
 nor 560 hospital.

--
Ticket URL: 

gpc-informatics 
Greater Plains Network - Informatics
___
Gpc-dev mailing list
Gpc-dev@listserv.kumc.edu
http://listserv.kumc.edu/mailman/listinfo/gpc-dev


Re: [gpc-informatics] #210: Query by BMI percentile among children.

2015-03-09 Thread Susan Morrison
This complete for UTSW.

Susan Morrison
Database Analyst
Office of Academic Information Systems (AIS)
UT Southwestern Medical Center
6303 Forest Park.| Suite BP4.100 | Dallas, TX  75390-9106
214-648-4293
susan.morri...@utsouthwestern.edu 







On 3/9/15, 5:25 PM, "GPC Informatics"  wrote:

>#210: Query by BMI percentile among children.
>-+
>-
> Reporter:  bokov|   Owner:  gkowalski
> Type:  enhancement  |  Status:  assigned
> Priority:  major|   Milestone:  cohort-
>Component:  data-stds|  char1
> Keywords:  BMI pediatric-cohort obesity-cohort  |  Resolution:
> Blocking:  33   |  Blocked By:
>-+
>-
>
>Comment (by dconnolly):
>
> I'd like to update KeyGoalTracking; what news since the Feb 3 poll? (ref
> #12)
>
> poll: status at each site
>  - KUMC: done
>  - CMH Rita: no particular plans yet; will coordinate with N. Apathy
>- ''Nate: We have this slated for our current iteration. Will be
> released with package release Mar. 4 2015.''
>  - IOWA: yes, can do; not started
>  - WISC : Started
>  - MCW : Started
>  - MCRF : Looking good for 23rd
>  - UMN: Looking at it
>  - UNMC: will run the calculation on what data they have.
>  - UTHSCSA:They are done!
>  - UTSW: They will run it
>
>--
>Ticket URL:
>
>gpc-informatics 
>Greater Plains Network - Informatics
>___
>Gpc-dev mailing list
>Gpc-dev@listserv.kumc.edu
>http://listserv.kumc.edu/mailman/listinfo/gpc-dev




UT Southwestern


Medical Center



The future of medicine, today.


___
Gpc-dev mailing list
Gpc-dev@listserv.kumc.edu
http://listserv.kumc.edu/mailman/listinfo/gpc-dev


Re: [gpc-informatics] #210: Query by BMI percentile among children.

2015-03-09 Thread GPC Informatics
#210: Query by BMI percentile among children.
-+-
 Reporter:  bokov|   Owner:  gkowalski
 Type:  enhancement  |  Status:  assigned
 Priority:  major|   Milestone:  cohort-
Component:  data-stds|  char1
 Keywords:  BMI pediatric-cohort obesity-cohort  |  Resolution:
 Blocking:  33   |  Blocked By:
-+-

Comment (by dconnolly):

 Replying to [comment:14 bokov]:
 > And, Dan, Phillip, would you recommend putting this fact into the
 `\\GPC\LP29694-4\LP30604-0\LP29703-3\LP7775-2\59576-9\` path?

 I suppose recent vitals discussion (#23) suggests `\\GPC_TERMS\GPC\Vital
 Signs\LOINC:39156-5\` with tooltip ** \ Vital Signs \ 39156-5: Body mass
 index (bmi) [ratio]**.

--
Ticket URL: 

gpc-informatics 
Greater Plains Network - Informatics
___
Gpc-dev mailing list
Gpc-dev@listserv.kumc.edu
http://listserv.kumc.edu/mailman/listinfo/gpc-dev


Re: [gpc-informatics] #210: Query by BMI percentile among children.

2015-03-09 Thread GPC Informatics
#210: Query by BMI percentile among children.
-+-
 Reporter:  bokov|   Owner:  gkowalski
 Type:  enhancement  |  Status:  assigned
 Priority:  major|   Milestone:  cohort-
Component:  data-stds|  char1
 Keywords:  BMI pediatric-cohort obesity-cohort  |  Resolution:
 Blocking:  33   |  Blocked By:
-+-

Comment (by dconnolly):

 I'd like to update KeyGoalTracking; what news since the Feb 3 poll? (ref
 #12)

 poll: status at each site
  - KUMC: done
  - CMH Rita: no particular plans yet; will coordinate with N. Apathy
- ''Nate: We have this slated for our current iteration. Will be
 released with package release Mar. 4 2015.''
  - IOWA: yes, can do; not started
  - WISC : Started
  - MCW : Started
  - MCRF : Looking good for 23rd
  - UMN: Looking at it
  - UNMC: will run the calculation on what data they have.
  - UTHSCSA:They are done!
  - UTSW: They will run it

--
Ticket URL: 

gpc-informatics 
Greater Plains Network - Informatics
___
Gpc-dev mailing list
Gpc-dev@listserv.kumc.edu
http://listserv.kumc.edu/mailman/listinfo/gpc-dev


RE: NAACCR - Free text fields

2015-03-09 Thread Dan Connolly
Thanks for the background.

Here's hoping we manage to refactor our code -- KUMC's and WISC's -- so that, 
for the parts of the logic that are shared, we can use exactly the same source 
files. I also hope to study the code you just sent more closely.

(todo: add pointer to this discussion from the portable NAACCR ETL ticket.)

--
Dan


From: Lenon Patrick [ple...@uwhealth.org]
Sent: Monday, March 09, 2015 11:07 AM
To: Dan Connolly; gpc-dev@listserv.kumc.edu
Subject: RE: NAACCR - Free text fields

OK, I got the answer to my question (“Do we exclude free text fields?”), which 
is “Yes.”  Thx.

Now, your questions:  Our basic ETL to get UW NAACCR data into the database is 
in Informatica, and mostly resolves differences between UW field names and 
official NAACCR field names.  I also do the de-identification step in 
Informatica.  From then on I largely follow the naaccr_txform script.  I used 
your code to include/exclude NAACCR sections and items based on numerous 
criteria, but I don’t believe I see item format of “free text” as one of the 
criteria.  So my fields included 310 and 320, which I can easily fix.

I did vary somewhat in an attempt to build in some flexibility.  Results are 
essentially the same as your system, so this may be largely academic, but you 
asked, so:

Where I varied was an attempt to set up my NAACCR metadata in such a way that I 
wouldn’t have to hard code which NAACCR sections and items were included in the 
EAV.   To do so I added a logical “ONTOLOGY” field to the section and item 
tables like so (apologies for formatting):

DROP TABLE naaccr_tblSection ;
CREATE TABLE naaccr_tblSection
(
Section_ID integer not null primary key
, Sectionvarchar(46)
, SectionPathName varchar(212)
, SECTION_ONTOLOGY varchar(1)
, SectionBaseCode varchar(46)
);


DROP TABLE naaccr_tblItem ;
CREATE TABLE naaccr_tblItem
(
ItemIDinteger not null primary key,
ItemNbr   integer,
ItemName  varchar(512),
   AllowValue
varchar(204),
   ItemPathNamevarchar(212),
FieldLength   integer,
SectionID integer,
ItemFormatvarchar(125),
Item_Ontology varchar(1)
, ItemBaseCode
varchar(212)
);


DROP TABLE naaccr_tblCode ;
CREATE TABLE naaccr_tblCode
(
CodeID  integer not null primary key,
ItemID  integer,
CodeNbr  varchar(212),
CodePathvarchar(212),
CodeDcrpvarchar(198)
);


  /* To see what NAACCR items are included in our extract, run:  */
  create or replace view vw_i2b2_naaccr_active_items as
(select s.section_id, s.section, i.itemid, i.itemnbr, i.itemname, 
i.itempathname, i.ITEMFORMAT
from naaccr_tblitem i
  join naaccr_tblsection s
  on i.SECTIONID = s.SECTION_ID
  where i.ITEM_ONTOLOGY = 'Y'
  AND s.SECTION_ONTOLOGY = 'Y') ;

After I load the NAACCR tables, I set the initial value for the ONTOLOGY fields 
according to the criteria in naaccr_txform.sql.

So when creating my extract_eav equivalent, instead of the hard coded section 
selection

“and ns.SectionID in (
  1 -- Cancer Identification
, 2 – Demographic
, etc…”

I select sections and items like so:

“ from
 naaccr_tblitem i,
 naaccr_tblsection s
where i.sectionID = s.section_ID
  and s.section_id is not null
  and s.section_ontology = 'Y'
  and i.item_ontology = 'Y'
  and i.FIELDLENGTH is not null
  and i.ITEMBASECODE is not null   -- another way of excluding a field/item
  order by s.section_id, i.ITEMNBR “

So, the main advantage is I can select a full section while excluding selected 
items in the section.


From: Dan Connolly [mailto:dconno...@kumc.edu]
Sent: Monday, March 09, 2015 9:25 AM
To: Lenon Patrick; gpc-dev@listserv.kumc.edu
Subject: RE: NAACCR - Free text fields

Perhaps you could back up and explain your overall approach to the NAACCR ETL? 
What code are you using? Where can I look at it? Perhaps you've discussed this 
before, but I don't see any pointers to context in this message. Did you try 
the HERON code? If not, why not? If so, what happened when you tried?

As to this specific question, it's documented on the 
TumorRegistry page:
We reviewed the data we get by section to eliminate potentially sensitive data, 
including free-text; the sections with a -- below are not loaded into HERON:

followed the relevant code excerpt from 
source:heron_load/naaccr_txform.sql#L67

--
Dan

From: 
gpc-dev-boun...@listse

Re: [gpc-informatics] #243: GPC Procedure Ontology: CPT, HCPCS

2015-03-09 Thread GPC Informatics
#243: GPC Procedure Ontology: CPT, HCPCS
--+
 Reporter:  dconnolly |   Owner:  gkowalski
 Type:  design-issue  |  Status:  new
 Priority:  major |   Milestone:  data-domains2
Component:  data-stds |  Resolution:
 Keywords:|  Blocked By:
 Blocking:|
--+

Comment (by mish):

 To understand the current process at WISC, you have to take a step back
 and look at the entire process. In general, there are a series of
 transformations that take place on the data as it moves from Clarity
 through our Electronic Data Warehouse (EDW) into the i2b2 production
 database:

 Clarity -> Staging -> EDW -> i2b2 Staging -> i2b2 candidate build -> i2b2
 QA Build -> i2b2 Production

 All of these transformations (and the SQL that performs them) are under
 the control of a tool called Informatica. Informatica
 controls/schedules/maintains/executes/tracks the various transformations
 within our data warehouse.

 Our Clarity database resides on a fairly vanilla Oracle database.  Staging
 through i2b2 candidate build all reside within a Netezza MPP database
 server.The i2b2 QA Build and i2b2 Produciton databases each have their own
 dedicated instance of MS SQL.

 One of the things services that Informatica provides for us is a lineage
 tool (within something called Metadata Manager.) This allows me to simply
 hop into metadata manager, and run the lineage tool to tell me that we
 simply pull procedure_cd ultimately from clarity_eap.proc_code
 in the clarity instance.

--
Ticket URL: 

gpc-informatics 
Greater Plains Network - Informatics
___
Gpc-dev mailing list
Gpc-dev@listserv.kumc.edu
http://listserv.kumc.edu/mailman/listinfo/gpc-dev


Re: [gpc-informatics] #31: data elements for ALS cohort characterization

2015-03-09 Thread GPC Informatics
#31: data elements for ALS cohort characterization
-+
 Reporter:  dconnolly|   Owner:  jsteinmetz
 Type:  design-issue |  Status:  assigned
 Priority:  major|   Milestone:  survey-redcap-als
Component:  data-stds|  Resolution:
 Keywords:  als-cohort methods-core  |  Blocked By:  63, 245
 Blocking:   |
-+
Changes (by dconnolly):

 * cc: gpc-dev@… (removed)
 * cc: jsteinmetz (added)
 * owner:  badagarla => jsteinmetz


Comment:

 ''I promoted the issue with living situation etc. (comment:31) to its own
 ticket: #245.''

--
Ticket URL: 

gpc-informatics 
Greater Plains Network - Informatics
___
Gpc-dev mailing list
Gpc-dev@listserv.kumc.edu
http://listserv.kumc.edu/mailman/listinfo/gpc-dev


Re: [gpc-informatics] #235: insufficient data to exclude prior cancer diagnoses

2015-03-09 Thread GPC Informatics
#235: insufficient data to exclude prior cancer diagnoses
-+-
 Reporter:  bchrischilles|   Owner:  vleonardo
 Type:  problem  |  Status:  assigned
 Priority:  major|   Milestone:  bc-survey-
Component:  data-stds|  cohort-def
 Keywords:  data-quality breast-cancer-cohort|  Resolution:
  bc-research-team   |  Blocked By:
 Blocking:  227  |
-+-
Changes (by dconnolly):

 * owner:  huhickman => vleonardo


Comment:

 Vince is looking into why we're not immediately seeing the seq.no info in
 the submission from UIOWA of 2015-03-05 11:31:21.

 Meanwhile, Hubert, UNMC is the only site where we don't yet have data as
 expected based on discussion last Tuesday (#12).

--
Ticket URL: 

gpc-informatics 
Greater Plains Network - Informatics
___
Gpc-dev mailing list
Gpc-dev@listserv.kumc.edu
http://listserv.kumc.edu/mailman/listinfo/gpc-dev


gpc-dev 10 March agenda in progress

2015-03-09 Thread Dan Connolly
I'm starting to put together the agenda:

  *   10 Mar gpc-dev meeting 
notes

Anything for discussion tomorrow should have been shared by now, but better 
late than never...

--
Dan

___
Gpc-dev mailing list
Gpc-dev@listserv.kumc.edu
http://listserv.kumc.edu/mailman/listinfo/gpc-dev


RE: NAACCR - Free text fields

2015-03-09 Thread Lenon Patrick
OK, I got the answer to my question (“Do we exclude free text fields?”), which 
is “Yes.”  Thx.

Now, your questions:  Our basic ETL to get UW NAACCR data into the database is 
in Informatica, and mostly resolves differences between UW field names and 
official NAACCR field names.  I also do the de-identification step in 
Informatica.  From then on I largely follow the naaccr_txform script.  I used 
your code to include/exclude NAACCR sections and items based on numerous 
criteria, but I don’t believe I see item format of “free text” as one of the 
criteria.  So my fields included 310 and 320, which I can easily fix.

I did vary somewhat in an attempt to build in some flexibility.  Results are 
essentially the same as your system, so this may be largely academic, but you 
asked, so:

Where I varied was an attempt to set up my NAACCR metadata in such a way that I 
wouldn’t have to hard code which NAACCR sections and items were included in the 
EAV.   To do so I added a logical “ONTOLOGY” field to the section and item 
tables like so (apologies for formatting):

DROP TABLE naaccr_tblSection ;
CREATE TABLE naaccr_tblSection
(
Section_ID integer not null primary key
, Sectionvarchar(46)
, SectionPathName varchar(212)
, SECTION_ONTOLOGY varchar(1)
, SectionBaseCode varchar(46)
);


DROP TABLE naaccr_tblItem ;
CREATE TABLE naaccr_tblItem
(
ItemIDinteger not null primary key,
ItemNbr   integer,
ItemName  varchar(512),
   AllowValue
varchar(204),
   ItemPathNamevarchar(212),
FieldLength   integer,
SectionID integer,
ItemFormatvarchar(125),
Item_Ontology varchar(1)
, ItemBaseCode
varchar(212)
);


DROP TABLE naaccr_tblCode ;
CREATE TABLE naaccr_tblCode
(
CodeID  integer not null primary key,
ItemID  integer,
CodeNbr  varchar(212),
CodePathvarchar(212),
CodeDcrpvarchar(198)
);


  /* To see what NAACCR items are included in our extract, run:  */
  create or replace view vw_i2b2_naaccr_active_items as
(select s.section_id, s.section, i.itemid, i.itemnbr, i.itemname, 
i.itempathname, i.ITEMFORMAT
from naaccr_tblitem i
  join naaccr_tblsection s
  on i.SECTIONID = s.SECTION_ID
  where i.ITEM_ONTOLOGY = 'Y'
  AND s.SECTION_ONTOLOGY = 'Y') ;

After I load the NAACCR tables, I set the initial value for the ONTOLOGY fields 
according to the criteria in naaccr_txform.sql.

So when creating my extract_eav equivalent, instead of the hard coded section 
selection

“and ns.SectionID in (
  1 -- Cancer Identification
, 2 – Demographic
, etc…”

I select sections and items like so:

“ from
 naaccr_tblitem i,
 naaccr_tblsection s
where i.sectionID = s.section_ID
  and s.section_id is not null
  and s.section_ontology = 'Y'
  and i.item_ontology = 'Y'
  and i.FIELDLENGTH is not null
  and i.ITEMBASECODE is not null   -- another way of excluding a field/item
  order by s.section_id, i.ITEMNBR “

So, the main advantage is I can select a full section while excluding selected 
items in the section.


From: Dan Connolly [mailto:dconno...@kumc.edu]
Sent: Monday, March 09, 2015 9:25 AM
To: Lenon Patrick; gpc-dev@listserv.kumc.edu
Subject: RE: NAACCR - Free text fields

Perhaps you could back up and explain your overall approach to the NAACCR ETL? 
What code are you using? Where can I look at it? Perhaps you've discussed this 
before, but I don't see any pointers to context in this message. Did you try 
the HERON code? If not, why not? If so, what happened when you tried?

As to this specific question, it's documented on the 
TumorRegistry page:
We reviewed the data we get by section to eliminate potentially sensitive data, 
including free-text; the sections with a -- below are not loaded into HERON:

followed the relevant code excerpt from 
source:heron_load/naaccr_txform.sql#L67

--
Dan

From: 
gpc-dev-boun...@listserv.kumc.edu 
[gpc-dev-boun...@listserv.kumc.edu] on behalf of Lenon Patrick 
[ple...@uwhealth.org]
Sent: Monday, March 09, 2015 9:17 AM
To: gpc-dev@listserv.kumc.edu
Subject: NAACCR - Free text fields
In trying to ensure all my Tumor Registry fact table items have corresponding 
concept codes in the Concept Dimension, I found NAACCR item 310 (Text-Usual 
Occupation) which has a format of “Free text.”  As you’d expect there are no 
entries in the NAACCR me

RE: NAACCR - Free text fields

2015-03-09 Thread Dan Connolly
Perhaps you could back up and explain your overall approach to the NAACCR ETL? 
What code are you using? Where can I look at it? Perhaps you've discussed this 
before, but I don't see any pointers to context in this message. Did you try 
the HERON code? If not, why not? If so, what happened when you tried?

As to this specific question, it's documented on the 
TumorRegistry page:

We reviewed the data we get by section to eliminate potentially sensitive data, 
including free-text; the sections with a -- below are not loaded into HERON:

followed the relevant code excerpt from 
source:heron_load/naaccr_txform.sql#L67

--
Dan


From: gpc-dev-boun...@listserv.kumc.edu [gpc-dev-boun...@listserv.kumc.edu] on 
behalf of Lenon Patrick [ple...@uwhealth.org]
Sent: Monday, March 09, 2015 9:17 AM
To: gpc-dev@listserv.kumc.edu
Subject: NAACCR - Free text fields

In trying to ensure all my Tumor Registry fact table items have corresponding 
concept codes in the Concept Dimension, I found NAACCR item 310 (Text-Usual 
Occupation) which has a format of “Free text.”  As you’d expect there are no 
entries in the NAACCR metadata for that.  However, following the Heron fact 
load code, I created a whole bunch of facts with concept codes like 
“NAACCR|310:(n)TH GRADE TEACHER – (small Wisconsin town) SCHOOL DISTRICT”

To my semi-trained eye this looks like it would be pretty useless to I2B2.  So 
I’m wondering what other sites do in similar situations.  Possibilities that 
have occurred to me already are:

1)  Exclude all “free text” format fields from the fact load.

2)  Leave them in, hoping for codification someday

Is there any reason NOT to exclude free text fields?  Or some criteria to 
include some and exclude others?


Patrick Lenon
HIMC Informatics Specialist
608 890 5671

___
Gpc-dev mailing list
Gpc-dev@listserv.kumc.edu
http://listserv.kumc.edu/mailman/listinfo/gpc-dev


NAACCR - Free text fields

2015-03-09 Thread Lenon Patrick
In trying to ensure all my Tumor Registry fact table items have corresponding 
concept codes in the Concept Dimension, I found NAACCR item 310 (Text-Usual 
Occupation) which has a format of "Free text."  As you'd expect there are no 
entries in the NAACCR metadata for that.  However, following the Heron fact 
load code, I created a whole bunch of facts with concept codes like 
"NAACCR|310:(n)TH GRADE TEACHER - (small Wisconsin town) SCHOOL DISTRICT"

To my semi-trained eye this looks like it would be pretty useless to I2B2.  So 
I'm wondering what other sites do in similar situations.  Possibilities that 
have occurred to me already are:

1)  Exclude all "free text" format fields from the fact load.

2)  Leave them in, hoping for codification someday

Is there any reason NOT to exclude free text fields?  Or some criteria to 
include some and exclude others?


Patrick Lenon
HIMC Informatics Specialist
608 890 5671

___
Gpc-dev mailing list
Gpc-dev@listserv.kumc.edu
http://listserv.kumc.edu/mailman/listinfo/gpc-dev


Re: [gpc-informatics] #235: insufficient data to exclude prior cancer diagnoses

2015-03-09 Thread GPC Informatics
#235: insufficient data to exclude prior cancer diagnoses
-+-
 Reporter:  bchrischilles|   Owner:  huhickman
 Type:  problem  |  Status:  assigned
 Priority:  major|   Milestone:  bc-survey-
Component:  data-stds|  cohort-def
 Keywords:  data-quality breast-cancer-cohort|  Resolution:
  bc-research-team   |  Blocked By:
 Blocking:  227  |
-+-

Comment (by bos):

 Uploaded new UTHSCSA file with 560 facts

--
Ticket URL: 

gpc-informatics 
Greater Plains Network - Informatics
___
Gpc-dev mailing list
Gpc-dev@listserv.kumc.edu
http://listserv.kumc.edu/mailman/listinfo/gpc-dev


Re: [gpc-informatics] #235: insufficient data to exclude prior cancer diagnoses

2015-03-09 Thread GPC Informatics
#235: insufficient data to exclude prior cancer diagnoses
-+-
 Reporter:  bchrischilles|   Owner:  huhickman
 Type:  problem  |  Status:  assigned
 Priority:  major|   Milestone:  bc-survey-
Component:  data-stds|  cohort-def
 Keywords:  data-quality breast-cancer-cohort|  Resolution:
  bc-research-team   |  Blocked By:
 Blocking:  227  |
-+-
Changes (by bos):

 * owner:  bos => huhickman


--
Ticket URL: 

gpc-informatics 
Greater Plains Network - Informatics
___
Gpc-dev mailing list
Gpc-dev@listserv.kumc.edu
http://listserv.kumc.edu/mailman/listinfo/gpc-dev