Please point me to another listserv or forum if this question is more
appropriately addressed elsewhere.

I am trying to come up with a structure to store employment data by NAICS
(North American Industrial Classification System). The data uses a
hierarchical encoding scheme ranging between 2 and 5 digits. That is, each
2-digit code includes all industries beginning with the same two digits. 61
includes 611 which includes 6111, 6112, 6113, etc. A portion of the
hierarchy is shown after the sig.

A standard way to store hierarchical data is the adjacency list model, where
each node's parent appears as an attribute (table column). So 6111 would
list 611 as its parent. Since NAICS uses a hierarchical encoding scheme, the
node's name is the same as the node's id, and the parent can always be
derived from the node's id. Storing the parent id separately would seem to
violate a normal form (because of the redundancy).

One way to store this data would be to store at the most granular level
(5-digit NAICS) and then aggregate up if I wanted employment at the 4-, 3-,
or 2-digit level. The problem is that because of nondisclosure rules, the
data is sometimes censored at the more specific level. I might, for example,
have data for 6114, but not 61141, 61142, 61143. For a different branch of
the tree, I might have data at the 5-digit level while for yet another
branch I might have data only to the 3-digit level (not 4 or 5). I think
that means I have to store all data at multiple levels, even if some of the
higher-level data could be reconstructed from other, lower-level data.

Specifically I'd like to know if this should be a single table or should
there be a separate table for each level of the hierarchy (four in all)? If
one table, should the digits be broken into separate columns? Should parent
ids be stored in each node?

More generally, what questions should I be asking to help decide what
structure makes the most sense? Are there any websites, forums, or books
that cover this kind of problem?

Regards,
--Lee

-- 
Lee Hachadoorian
PhD Student, Geography
Program in Earth & Environmental Sciences
CUNY Graduate Center

A Portion of the NAICS scheme

61    Educational Services
611    Educational Services
6111    Elementary and Secondary Schools
61111    Elementary and Secondary Schools
6112    Junior Colleges
61121    Junior Colleges
6113    Colleges, Universities, and Professional Schools
61131    Colleges, Universities, and Professional Schools
6114    Business Schools and Computer and Management Training
61141    Business and Secretarial Schools
61142    Computer Training
61143    Professional and Management Development Training
etc…

Reply via email to