Attached is the data file, just in case, below is the data and the script,
this should give you all you want and to your last question, I am using Mac.

7001    Test0    0    1    \N
7002    Test1    0    1    \N
7003    Test3    1    2    \N
7004    Test4    2    2    \N

grunt> data = LOAD 'data' USING PigStorage('\t') AS (id: long, name:
chararray, active: int, has_city: int, gender: chararray);
grunt> dump data;
(7001L,Test0,0,1,\N)
(7002L,Test1,0,1,\N)
(7003L,Test3,1,2,\N)
(7004L,Test4,2,2,\N)

grunt> data1 = FOREACH data GENERATE id, name, (gender is null?'U':gender)
AS gender;
grunt> dump data1;
(7001L,Test0,\N)
(7002L,Test1,\N)
(7003L,Test3,\N)
(7004L,Test4,\N)

grunt> data1 = FOREACH data GENERATE id, name, (gender=='\N'?'U':gender) AS
gender;  
grunt> dump data1;
(7001L,Test0,\N)
(7002L,Test1,\N)
(7003L,Test3,\N)
(7004L,Test4,\N)



On 5/17/10 1:20 PM, "Dmitriy Ryaboy" <dvrya...@gmail.com> wrote:

> There must be some noise in your input that is getting interpreted
> differently by Hive and Pig. Loading a bunch of newlines does generate
> nulls, so I am not sure what's happening there. Are you loading using
> PigStorage? Default delimiters? Can you upload a sample file and script that
> reproduces the problem somewhere? Are you running this on Windows with its
> weird newline delimiters?
> 
> grunt> cat tmp/nulltest
> 1
> 2
> 
> 3
> grunt> data = load 'tmp/nulltest' using PigStorage() as (num);
> grunt> processed = foreach data generate (num is null OR num == 3 ? 'XXX' :
> num) as num;
> grunt> dump processed;
> (1)
> (2)
> (XXX)
> (XXX)
> 
> -Dmitriy
> 
> On Mon, May 17, 2010 at 12:44 PM, Syed Wasti <mdwa...@hotmail.com> wrote:
> 
>> Have tried both ways foo is null OR foo == '\n', doesn't work in pig.
>> Why would null values be saved as \N in a file ? Is there a reason, is this
>> hive or hadoop way which pig cant understand ?
>> 
>> 
>> On 5/17/10 11:53 AM, "Dmitriy Ryaboy" <dvrya...@gmail.com> wrote:
>> 
>>> Arguably, that's a Hive bug. What does hive do if you *want* to have a \n
>> as
>>> a value?
>>> 
>>> For your case, I think it's as simple as foreach rel generate ( foo is
>> null
>>> OR foo == '\n' ? 'U' : foo);
>>> 
>>> -D
>>> 
>>> On Mon, May 17, 2010 at 11:42 AM, Syed Wasti <mdwa...@hotmail.com>
>> wrote:
>>> 
>>>> Well Dmitriy, my bad, I was looking at the data through a hive query and
>> it
>>>> shows as NULL, but when I looked into the flat file all the NULL values
>> are
>>>> are seen as \N.
>>>> Hive is able to understand \N as NULL but pig is not... How can I
>> resolve
>>>> this ?
>>>> 
>>>> On 5/16/10 4:33 PM, "Dmitriy Ryaboy" <dvrya...@gmail.com> wrote:
>>>> 
>>>>> In that case, maybe it's the data, and what you think is null is
>> actually
>>>>> '\n' ?
>>>>> 
>>>>> -D
>>>>> 
>>>>> On Sun, May 16, 2010 at 4:07 PM, Syed Wasti <mdwa...@hotmail.com>
>> wrote:
>>>>> 
>>>>>> Doing absolutely the same thing and I am using pig 6 too.
>>>>>> Tried with the fake data on both local and mapreduce modes, works
>> fine.
>>>>>> But on my script against actual data in mapreduce mode, it fails to do
>>>> the
>>>>>> same thing, places \N instead of U.
>>>>>> 
>>>>>> grunt> rel1 = LOAD '/user/swasti/data' USING PigStorage('\t') as
>> (num);
>>>>>> grunt> dump rel1;
>>>>>> (1)
>>>>>> (2)
>>>>>> (3)
>>>>>> ()
>>>>>> (5)
>>>>>> grunt> find_null = FOREACH rel1 GENERATE (num is null?'U':num);
>>>>>> grunt> dump find_null;
>>>>>> (1)
>>>>>> (2)
>>>>>> (3)
>>>>>> (U)
>>>>>> (5)
>>>>>> 
>>>>>> 
>>>>>> On 5/16/10 2:23 PM, "Dmitriy Ryaboy" <dvrya...@gmail.com> wrote:
>>>>>> 
>>>>>>> So what I am saying is, check that you are not inserting some weird
>>>>>>> non-ascii quotes in your actual script.
>>>>>>> I just ran this on Pig 6, it worked:
>>>>>>> 
>>>>>>> grunt> data = load 'tmp/nulltest' using PigStorage() as (num);
>>>>>>> grunt> dump data;
>>>>>>> (1)
>>>>>>> (2)
>>>>>>> ()
>>>>>>> (3)
>>>>>>> grunt> find_nulls = foreach data generate ( num is null ? 'U' : num
>> );
>>>>>>> grunt> dump find_nulls;
>>>>>>> (1)
>>>>>>> (2)
>>>>>>> (U)
>>>>>>> (3)
>>>>>>> 
>>>>>>> I double-checked just in case, and it works in both local and
>> mapreduce
>>>>>>> modes.
>>>>>>> 
>>>>>>> -Dmitriy
>>>>>>> 
>>>>>>> On Sun, May 16, 2010 at 1:49 PM, Syed Wasti <mdwa...@hotmail.com>
>>>> wrote:
>>>>>>> 
>>>>>>>> Hmm not sure why, I used quotes in this mail, let me rewrite,
>>>>>>>> SQL(U is within single quotes): NVL(city,U) city
>>>>>>>> Pig(U is within single quotes): (city is null?U:city) AS city
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On 5/16/10 1:36 PM, "Dmitriy Ryaboy" <dvrya...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>>> Syed,
>>>>>>>>> The samples you pasted include all kinds of extraneous characters.
>>>> Are
>>>>>>>> you
>>>>>>>>> sure your script is properly encoded?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Sun, May 16, 2010 at 1:16 PM, Syed Wasti <mdwa...@hotmail.com>
>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> I am trying the SQL ³NVL(city, ŒU¹) city² in pig I am using the
>>>>>> bincond
>>>>>>>>>> operator, ³(city is null?'U': city) AS city², which is of
>> chararray
>>>>>>>> type,
>>>>>>>>>> the result file shows Œ\N¹ instead of U.  Any ideas ?
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>>>> 
>> 
>> 
>> 

Reply via email to