Please point me to another listserv or forum if this question is more appropriately addressed elsewhere.
I am trying to come up with a structure to store employment data by NAICS (North American Industrial Classification System). The data uses a hierarchical encoding scheme ranging between 2 and 5 digits. That is, each 2-digit code includes all industries beginning with the same two digits. 61 includes 611 which includes 6111, 6112, 6113, etc. A portion of the hierarchy is shown after the sig. A standard way to store hierarchical data is the adjacency list model, where each node's parent appears as an attribute (table column). So 6111 would list 611 as its parent. Since NAICS uses a hierarchical encoding scheme, the node's name is the same as the node's id, and the parent can always be derived from the node's id. Storing the parent id separately would seem to violate a normal form (because of the redundancy). One way to store this data would be to store at the most granular level (5-digit NAICS) and then aggregate up if I wanted employment at the 4-, 3-, or 2-digit level. The problem is that because of nondisclosure rules, the data is sometimes censored at the more specific level. I might, for example, have data for 6114, but not 61141, 61142, 61143. For a different branch of the tree, I might have data at the 5-digit level while for yet another branch I might have data only to the 3-digit level (not 4 or 5). I think that means I have to store all data at multiple levels, even if some of the higher-level data could be reconstructed from other, lower-level data. Specifically I'd like to know if this should be a single table or should there be a separate table for each level of the hierarchy (four in all)? If one table, should the digits be broken into separate columns? Should parent ids be stored in each node? More generally, what questions should I be asking to help decide what structure makes the most sense? Are there any websites, forums, or books that cover this kind of problem? Regards, --Lee -- Lee Hachadoorian PhD Student, Geography Program in Earth & Environmental Sciences CUNY Graduate Center A Portion of the NAICS scheme 61 Educational Services 611 Educational Services 6111 Elementary and Secondary Schools 61111 Elementary and Secondary Schools 6112 Junior Colleges 61121 Junior Colleges 6113 Colleges, Universities, and Professional Schools 61131 Colleges, Universities, and Professional Schools 6114 Business Schools and Computer and Management Training 61141 Business and Secretarial Schools 61142 Computer Training 61143 Professional and Management Development Training etc…