Thanks Alan and Dmitriy for your thoughts.

I think we have two different approaches now.

In one approach, if we encounter a null in dimension values we can just label 
it as "unknown" and use "NULL" string to represent rollups. Whereas, in other 
approach, if we encounter a null in dimension values, use the null value as 
such but use "*" or any other string for rollups. 

Both approaches looks good to me. Please let me know which one should I go 
ahead with. 

Thanks
-- Prasanth

On Jun 8, 2012, at 12:22 PM, Alan Gates wrote:

> Option 1 (throwing an error) is bad.  It violates "Pigs eat anything" (see 
> http://pig.apache.org/philosophy.html).  
> 
> Do we need to give users an ability to name this unknown column?  Why not 
> just label it "unknown" and be done?
> 
> Alan.
> 
> On Jun 6, 2012, at 2:24 PM, Prasanth J wrote:
> 
>> Hello everyone
>> 
>> I would like to bring up this discussion about the ways for handling NULL 
>> values in dimensions specified for cubing. For example, if we have a 
>> dimension color with following values
>> 
>> red
>> blue
>> null
>> green
>> 
>> how do we differentiate if the null value represent rollup of all colors 
>> values or actual null value? 
>> 
>> SQL way: 
>> There are 2 ways in which SQL server analysis services handles null values 
>> in dimensions 
>> 1) Throw error when it encounters null values in dimension values
>> 2) Ignore error by adding the null values to UnknownMembers. By default 
>> UnknownMembers will be named as "Unknown". The name for UnknownMembers can 
>> also be specified by the user.
>> 
>> Do we need to handle both ways in Pig? I think the first way (throwing 
>> error) is pretty straightforward.
>> For the second way (ignoring error), what is the best way to provide support 
>> for user specified name for UnknownMembers? 
>> 
>> Please share your thoughts about how we can handle this scenario for 
>> different datatypes in Pig. 
>> 
>> Thanks
>> -- Prasanth
>> 
> 

Reply via email to