Thanks Jon for your thoughts. I have a patch which renames the null values in dimension values to "unknown" and use null for rollups. For a sample input tuple
red, null, 12 a = cube inp by ($0, $1); the above query will emit following combinations red, unknown, 12 , unknown, 12 red, , 12 , , 12 Please let me know if anyone have different opinion on this. If the above choice looks good I can go ahead and submit the patch in JIRA. Thanks. Thanks -- Prasanth On Jun 8, 2012, at 11:06 PM, Jonathan Coveney wrote: > you could always make the value pluggable, going with Unknown for now, and > then down the line if we want, we could add an "ONNULL" value to the parser > that sets it. > > 2012/6/8 Prasanth J <[email protected]> > >> Thanks Alan and Dmitriy for your thoughts. >> >> I think we have two different approaches now. >> >> In one approach, if we encounter a null in dimension values we can just >> label it as "unknown" and use "NULL" string to represent rollups. Whereas, >> in other approach, if we encounter a null in dimension values, use the null >> value as such but use "*" or any other string for rollups. >> >> Both approaches looks good to me. Please let me know which one should I go >> ahead with. >> >> Thanks >> -- Prasanth >> >> On Jun 8, 2012, at 12:22 PM, Alan Gates wrote: >> >>> Option 1 (throwing an error) is bad. It violates "Pigs eat anything" >> (see http://pig.apache.org/philosophy.html). >>> >>> Do we need to give users an ability to name this unknown column? Why >> not just label it "unknown" and be done? >>> >>> Alan. >>> >>> On Jun 6, 2012, at 2:24 PM, Prasanth J wrote: >>> >>>> Hello everyone >>>> >>>> I would like to bring up this discussion about the ways for handling >> NULL values in dimensions specified for cubing. For example, if we have a >> dimension color with following values >>>> >>>> red >>>> blue >>>> null >>>> green >>>> >>>> how do we differentiate if the null value represent rollup of all >> colors values or actual null value? >>>> >>>> SQL way: >>>> There are 2 ways in which SQL server analysis services handles null >> values in dimensions >>>> 1) Throw error when it encounters null values in dimension values >>>> 2) Ignore error by adding the null values to UnknownMembers. By default >> UnknownMembers will be named as "Unknown". The name for UnknownMembers can >> also be specified by the user. >>>> >>>> Do we need to handle both ways in Pig? I think the first way (throwing >> error) is pretty straightforward. >>>> For the second way (ignoring error), what is the best way to provide >> support for user specified name for UnknownMembers? >>>> >>>> Please share your thoughts about how we can handle this scenario for >> different datatypes in Pig. >>>> >>>> Thanks >>>> -- Prasanth >>>> >>> >> >>
