Hi,

I have figured this only happens in cluster mode. working properly in local[32]

From: saif.a.ell...@wellsfargo.com [mailto:saif.a.ell...@wellsfargo.com]
Sent: Thursday, October 08, 2015 10:23 AM
To: dev@spark.apache.org
Subject: RowNumber in HiveContext returns null, negative numbers or huge

Hi all, would this be a bug??

        val ws = Window.
            partitionBy("clrty_id").
            orderBy("filemonth_dtt")

        val nm = "repeatMe"
        df.select(df.col("*"), rowNumber().over(ws).cast("int").as(nm))

        
stacked_data.filter(stacked_data("repeatMe").isNotNull).orderBy("repeatMe").take(50).foreach(println(_))

--->

Long, DateType, Int
[200000000003,2006-06-01,-1863462909]
[200000000003,2006-09-01,-1863462909]
[200000000003,2007-01-01,-1863462909]
[200000000003,2007-08-01,-1863462909]
[200000000003,2007-07-01,-1863462909]
[200000000138,2007-07-01,-1863462774]
[200000000138,2007-02-01,-1863462774]
[200000000138,2006-11-01,-1863462774]
[200000000138,2006-08-01,-1863462774]
[200000000138,2007-08-01,-1863462774]
[200000000138,2006-09-01,-1863462774]
[200000000138,2007-03-01,-1863462774]
[200000000138,2006-10-01,-1863462774]
[200000000138,2007-05-01,-1863462774]
[200000000138,2006-06-01,-1863462774]
[200000000138,2006-12-01,-1863462774]


Thanks,
Saif

Reply via email to