Attached is the data file, just in case, below is the data and the script, this should give you all you want and to your last question, I am using Mac.
7001 Test0 0 1 \N 7002 Test1 0 1 \N 7003 Test3 1 2 \N 7004 Test4 2 2 \N grunt> data = LOAD 'data' USING PigStorage('\t') AS (id: long, name: chararray, active: int, has_city: int, gender: chararray); grunt> dump data; (7001L,Test0,0,1,\N) (7002L,Test1,0,1,\N) (7003L,Test3,1,2,\N) (7004L,Test4,2,2,\N) grunt> data1 = FOREACH data GENERATE id, name, (gender is null?'U':gender) AS gender; grunt> dump data1; (7001L,Test0,\N) (7002L,Test1,\N) (7003L,Test3,\N) (7004L,Test4,\N) grunt> data1 = FOREACH data GENERATE id, name, (gender=='\N'?'U':gender) AS gender; grunt> dump data1; (7001L,Test0,\N) (7002L,Test1,\N) (7003L,Test3,\N) (7004L,Test4,\N) On 5/17/10 1:20 PM, "Dmitriy Ryaboy" <dvrya...@gmail.com> wrote: > There must be some noise in your input that is getting interpreted > differently by Hive and Pig. Loading a bunch of newlines does generate > nulls, so I am not sure what's happening there. Are you loading using > PigStorage? Default delimiters? Can you upload a sample file and script that > reproduces the problem somewhere? Are you running this on Windows with its > weird newline delimiters? > > grunt> cat tmp/nulltest > 1 > 2 > > 3 > grunt> data = load 'tmp/nulltest' using PigStorage() as (num); > grunt> processed = foreach data generate (num is null OR num == 3 ? 'XXX' : > num) as num; > grunt> dump processed; > (1) > (2) > (XXX) > (XXX) > > -Dmitriy > > On Mon, May 17, 2010 at 12:44 PM, Syed Wasti <mdwa...@hotmail.com> wrote: > >> Have tried both ways foo is null OR foo == '\n', doesn't work in pig. >> Why would null values be saved as \N in a file ? Is there a reason, is this >> hive or hadoop way which pig cant understand ? >> >> >> On 5/17/10 11:53 AM, "Dmitriy Ryaboy" <dvrya...@gmail.com> wrote: >> >>> Arguably, that's a Hive bug. What does hive do if you *want* to have a \n >> as >>> a value? >>> >>> For your case, I think it's as simple as foreach rel generate ( foo is >> null >>> OR foo == '\n' ? 'U' : foo); >>> >>> -D >>> >>> On Mon, May 17, 2010 at 11:42 AM, Syed Wasti <mdwa...@hotmail.com> >> wrote: >>> >>>> Well Dmitriy, my bad, I was looking at the data through a hive query and >> it >>>> shows as NULL, but when I looked into the flat file all the NULL values >> are >>>> are seen as \N. >>>> Hive is able to understand \N as NULL but pig is not... How can I >> resolve >>>> this ? >>>> >>>> On 5/16/10 4:33 PM, "Dmitriy Ryaboy" <dvrya...@gmail.com> wrote: >>>> >>>>> In that case, maybe it's the data, and what you think is null is >> actually >>>>> '\n' ? >>>>> >>>>> -D >>>>> >>>>> On Sun, May 16, 2010 at 4:07 PM, Syed Wasti <mdwa...@hotmail.com> >> wrote: >>>>> >>>>>> Doing absolutely the same thing and I am using pig 6 too. >>>>>> Tried with the fake data on both local and mapreduce modes, works >> fine. >>>>>> But on my script against actual data in mapreduce mode, it fails to do >>>> the >>>>>> same thing, places \N instead of U. >>>>>> >>>>>> grunt> rel1 = LOAD '/user/swasti/data' USING PigStorage('\t') as >> (num); >>>>>> grunt> dump rel1; >>>>>> (1) >>>>>> (2) >>>>>> (3) >>>>>> () >>>>>> (5) >>>>>> grunt> find_null = FOREACH rel1 GENERATE (num is null?'U':num); >>>>>> grunt> dump find_null; >>>>>> (1) >>>>>> (2) >>>>>> (3) >>>>>> (U) >>>>>> (5) >>>>>> >>>>>> >>>>>> On 5/16/10 2:23 PM, "Dmitriy Ryaboy" <dvrya...@gmail.com> wrote: >>>>>> >>>>>>> So what I am saying is, check that you are not inserting some weird >>>>>>> non-ascii quotes in your actual script. >>>>>>> I just ran this on Pig 6, it worked: >>>>>>> >>>>>>> grunt> data = load 'tmp/nulltest' using PigStorage() as (num); >>>>>>> grunt> dump data; >>>>>>> (1) >>>>>>> (2) >>>>>>> () >>>>>>> (3) >>>>>>> grunt> find_nulls = foreach data generate ( num is null ? 'U' : num >> ); >>>>>>> grunt> dump find_nulls; >>>>>>> (1) >>>>>>> (2) >>>>>>> (U) >>>>>>> (3) >>>>>>> >>>>>>> I double-checked just in case, and it works in both local and >> mapreduce >>>>>>> modes. >>>>>>> >>>>>>> -Dmitriy >>>>>>> >>>>>>> On Sun, May 16, 2010 at 1:49 PM, Syed Wasti <mdwa...@hotmail.com> >>>> wrote: >>>>>>> >>>>>>>> Hmm not sure why, I used quotes in this mail, let me rewrite, >>>>>>>> SQL(U is within single quotes): NVL(city,U) city >>>>>>>> Pig(U is within single quotes): (city is null?U:city) AS city >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 5/16/10 1:36 PM, "Dmitriy Ryaboy" <dvrya...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Syed, >>>>>>>>> The samples you pasted include all kinds of extraneous characters. >>>> Are >>>>>>>> you >>>>>>>>> sure your script is properly encoded? >>>>>>>>> >>>>>>>>> >>>>>>>>> On Sun, May 16, 2010 at 1:16 PM, Syed Wasti <mdwa...@hotmail.com> >>>>>> wrote: >>>>>>>>> >>>>>>>>>> I am trying the SQL ³NVL(city, ŒU¹) city² in pig I am using the >>>>>> bincond >>>>>>>>>> operator, ³(city is null?'U': city) AS city², which is of >> chararray >>>>>>>> type, >>>>>>>>>> the result file shows Œ\N¹ instead of U. Any ideas ? >>>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>> >>>>>> >>>>>> >>>> >>>> >>>> >> >> >>