Thanks Alan and Dmitriy for your thoughts. I think we have two different approaches now.
In one approach, if we encounter a null in dimension values we can just label it as "unknown" and use "NULL" string to represent rollups. Whereas, in other approach, if we encounter a null in dimension values, use the null value as such but use "*" or any other string for rollups. Both approaches looks good to me. Please let me know which one should I go ahead with. Thanks -- Prasanth On Jun 8, 2012, at 12:22 PM, Alan Gates wrote: > Option 1 (throwing an error) is bad. It violates "Pigs eat anything" (see > http://pig.apache.org/philosophy.html). > > Do we need to give users an ability to name this unknown column? Why not > just label it "unknown" and be done? > > Alan. > > On Jun 6, 2012, at 2:24 PM, Prasanth J wrote: > >> Hello everyone >> >> I would like to bring up this discussion about the ways for handling NULL >> values in dimensions specified for cubing. For example, if we have a >> dimension color with following values >> >> red >> blue >> null >> green >> >> how do we differentiate if the null value represent rollup of all colors >> values or actual null value? >> >> SQL way: >> There are 2 ways in which SQL server analysis services handles null values >> in dimensions >> 1) Throw error when it encounters null values in dimension values >> 2) Ignore error by adding the null values to UnknownMembers. By default >> UnknownMembers will be named as "Unknown". The name for UnknownMembers can >> also be specified by the user. >> >> Do we need to handle both ways in Pig? I think the first way (throwing >> error) is pretty straightforward. >> For the second way (ignoring error), what is the best way to provide support >> for user specified name for UnknownMembers? >> >> Please share your thoughts about how we can handle this scenario for >> different datatypes in Pig. >> >> Thanks >> -- Prasanth >> >
