RowNumber in HiveContext returns null or negative values
Hi all, would this be a bug?? val ws = Window. partitionBy("clrty_id"). orderBy("filemonth_dtt") val nm = "repeatMe" df.select(df.col("*"), rowNumber().over(ws).cast("int").as(nm)) stacked_data.filter(stacked_data("repeatMe").isNotNull).orderBy("repeatMe").take(50).foreach(println(_)) ---> Long, DateType, Int [2003,2006-06-01,-1863462909] [2003,2006-09-01,-1863462909] [2003,2007-01-01,-1863462909] [2003,2007-08-01,-1863462909] [2003,2007-07-01,-1863462909] [2138,2007-07-01,-1863462774] [2138,2007-02-01,-1863462774] [2138,2006-11-01,-1863462774] [2138,2006-08-01,-1863462774] [2138,2007-08-01,-1863462774] [2138,2006-09-01,-1863462774] [2138,2007-03-01,-1863462774] [2138,2006-10-01,-1863462774] [2138,2007-05-01,-1863462774] [2138,2006-06-01,-1863462774] [2138,2006-12-01,-1863462774] Thanks, Saif
Re: RowNumber in HiveContext returns null or negative values
Which version of Spark? On Thu, Oct 8, 2015 at 7:25 AM,wrote: > Hi all, would this be a bug?? > > val ws = Window. > partitionBy("clrty_id"). > orderBy("filemonth_dtt") > > val nm = "repeatMe" > df.select(df.col("*"), rowNumber().over(ws).cast("int").as(nm)) > > > stacked_data.filter(stacked_data("repeatMe").isNotNull).orderBy("repeatMe").take(50).foreach(println(_)) > > ---> > > *Long, DateType, Int* > [2003,2006-06-01,-1863462909] > [2003,2006-09-01,-1863462909] > [2003,2007-01-01,-1863462909] > [2003,2007-08-01,-1863462909] > [2003,2007-07-01,-1863462909] > [2138,2007-07-01,-1863462774] > [2138,2007-02-01,-1863462774] > [2138,2006-11-01,-1863462774] > [2138,2006-08-01,-1863462774] > [2138,2007-08-01,-1863462774] > [2138,2006-09-01,-1863462774] > [2138,2007-03-01,-1863462774] > [2138,2006-10-01,-1863462774] > [2138,2007-05-01,-1863462774] > [2138,2006-06-01,-1863462774] > [2138,2006-12-01,-1863462774] > > > Thanks, > Saif > >
RE: RowNumber in HiveContext returns null or negative values
Hi, thanks for looking into. v1.5.1. I am really worried. I dont have hive/hadoop for real in the environment. Saif From: Michael Armbrust [mailto:mich...@databricks.com] Sent: Thursday, October 08, 2015 2:57 PM To: Ellafi, Saif A. Cc: user Subject: Re: RowNumber in HiveContext returns null or negative values Which version of Spark? On Thu, Oct 8, 2015 at 7:25 AM, <saif.a.ell...@wellsfargo.com<mailto:saif.a.ell...@wellsfargo.com>> wrote: Hi all, would this be a bug?? val ws = Window. partitionBy("clrty_id"). orderBy("filemonth_dtt") val nm = "repeatMe" df.select(df.col("*"), rowNumber().over(ws).cast("int").as(nm)) stacked_data.filter(stacked_data("repeatMe").isNotNull).orderBy("repeatMe").take(50).foreach(println(_)) ---> Long, DateType, Int [2003,2006-06-01,-1863462909] [2003,2006-09-01,-1863462909] [2003,2007-01-01,-1863462909] [2003,2007-08-01,-1863462909] [2003,2007-07-01,-1863462909] [2138,2007-07-01,-1863462774] [2138,2007-02-01,-1863462774] [2138,2006-11-01,-1863462774] [2138,2006-08-01,-1863462774] [2138,2007-08-01,-1863462774] [2138,2006-09-01,-1863462774] [2138,2007-03-01,-1863462774] [2138,2006-10-01,-1863462774] [2138,2007-05-01,-1863462774] [2138,2006-06-01,-1863462774] [2138,2006-12-01,-1863462774] Thanks, Saif
RE: RowNumber in HiveContext returns null or negative values
It turns out this does not happen in local[32] mode. Only happens when submiting to standalone cluster. Don’t have YARN/MESOS to compare. Will keep diagnosing. Saif From: saif.a.ell...@wellsfargo.com [mailto:saif.a.ell...@wellsfargo.com] Sent: Thursday, October 08, 2015 3:01 PM To: mich...@databricks.com Cc: user@spark.apache.org Subject: RE: RowNumber in HiveContext returns null or negative values Hi, thanks for looking into. v1.5.1. I am really worried. I dont have hive/hadoop for real in the environment. Saif From: Michael Armbrust [mailto:mich...@databricks.com] Sent: Thursday, October 08, 2015 2:57 PM To: Ellafi, Saif A. Cc: user Subject: Re: RowNumber in HiveContext returns null or negative values Which version of Spark? On Thu, Oct 8, 2015 at 7:25 AM, <saif.a.ell...@wellsfargo.com<mailto:saif.a.ell...@wellsfargo.com>> wrote: Hi all, would this be a bug?? val ws = Window. partitionBy("clrty_id"). orderBy("filemonth_dtt") val nm = "repeatMe" df.select(df.col("*"), rowNumber().over(ws).cast("int").as(nm)) stacked_data.filter(stacked_data("repeatMe").isNotNull).orderBy("repeatMe").take(50).foreach(println(_)) ---> Long, DateType, Int [2003,2006-06-01,-1863462909] [2003,2006-09-01,-1863462909] [2003,2007-01-01,-1863462909] [2003,2007-08-01,-1863462909] [2003,2007-07-01,-1863462909] [2138,2007-07-01,-1863462774] [2138,2007-02-01,-1863462774] [2138,2006-11-01,-1863462774] [2138,2006-08-01,-1863462774] [2138,2007-08-01,-1863462774] [2138,2006-09-01,-1863462774] [2138,2007-03-01,-1863462774] [2138,2006-10-01,-1863462774] [2138,2007-05-01,-1863462774] [2138,2006-06-01,-1863462774] [2138,2006-12-01,-1863462774] Thanks, Saif
RE: RowNumber in HiveContext returns null or negative values
Repartition and default parallelism to 1, in cluster mode, is still broken. So the problem is not the parallelism, but the cluster mode itself. Something wrong with HiveContext + cluster mode. Saif From: saif.a.ell...@wellsfargo.com [mailto:saif.a.ell...@wellsfargo.com] Sent: Thursday, October 08, 2015 3:01 PM To: mich...@databricks.com Cc: user@spark.apache.org Subject: RE: RowNumber in HiveContext returns null or negative values Hi, thanks for looking into. v1.5.1. I am really worried. I dont have hive/hadoop for real in the environment. Saif From: Michael Armbrust [mailto:mich...@databricks.com] Sent: Thursday, October 08, 2015 2:57 PM To: Ellafi, Saif A. Cc: user Subject: Re: RowNumber in HiveContext returns null or negative values Which version of Spark? On Thu, Oct 8, 2015 at 7:25 AM, <saif.a.ell...@wellsfargo.com<mailto:saif.a.ell...@wellsfargo.com>> wrote: Hi all, would this be a bug?? val ws = Window. partitionBy("clrty_id"). orderBy("filemonth_dtt") val nm = "repeatMe" df.select(df.col("*"), rowNumber().over(ws).cast("int").as(nm)) stacked_data.filter(stacked_data("repeatMe").isNotNull).orderBy("repeatMe").take(50).foreach(println(_)) ---> Long, DateType, Int [2003,2006-06-01,-1863462909] [2003,2006-09-01,-1863462909] [2003,2007-01-01,-1863462909] [2003,2007-08-01,-1863462909] [2003,2007-07-01,-1863462909] [2138,2007-07-01,-1863462774] [2138,2007-02-01,-1863462774] [2138,2006-11-01,-1863462774] [2138,2006-08-01,-1863462774] [2138,2007-08-01,-1863462774] [2138,2006-09-01,-1863462774] [2138,2007-03-01,-1863462774] [2138,2006-10-01,-1863462774] [2138,2007-05-01,-1863462774] [2138,2006-06-01,-1863462774] [2138,2006-12-01,-1863462774] Thanks, Saif
Re: RowNumber in HiveContext returns null or negative values
Can you open a JIRA? On Thu, Oct 8, 2015 at 11:24 AM, <saif.a.ell...@wellsfargo.com> wrote: > Repartition and default parallelism to 1, in cluster mode, is still > *broken*. > > > > So the problem is not the parallelism, but the cluster mode itself. > Something wrong with HiveContext + cluster mode. > > > > Saif > > > > *From:* saif.a.ell...@wellsfargo.com [mailto:saif.a.ell...@wellsfargo.com] > > *Sent:* Thursday, October 08, 2015 3:01 PM > *To:* mich...@databricks.com > *Cc:* user@spark.apache.org > *Subject:* RE: RowNumber in HiveContext returns null or negative values > > > > Hi, thanks for looking into. v1.5.1. I am really worried. > > I dont have hive/hadoop for real in the environment. > > > > Saif > > > > *From:* Michael Armbrust [mailto:mich...@databricks.com > <mich...@databricks.com>] > *Sent:* Thursday, October 08, 2015 2:57 PM > *To:* Ellafi, Saif A. > *Cc:* user > *Subject:* Re: RowNumber in HiveContext returns null or negative values > > > > Which version of Spark? > > > > On Thu, Oct 8, 2015 at 7:25 AM, <saif.a.ell...@wellsfargo.com> wrote: > > Hi all, would this be a bug?? > > > > val ws = Window. > > partitionBy("clrty_id"). > > orderBy("filemonth_dtt") > > > > val nm = "repeatMe" > > df.select(df.col("*"), rowNumber().over(ws).cast("int").as(nm)) > > > > > stacked_data.filter(stacked_data("repeatMe").isNotNull).orderBy("repeatMe").take(50).foreach(println(_)) > > > > ---> > > > > *Long, DateType, Int* > > [2003,2006-06-01,-1863462909] > > [2003,2006-09-01,-1863462909] > > [2003,2007-01-01,-1863462909] > > [2003,2007-08-01,-1863462909] > > [2003,2007-07-01,-1863462909] > > [2138,2007-07-01,-1863462774] > > [2138,2007-02-01,-1863462774] > > [2138,2006-11-01,-1863462774] > > [2138,2006-08-01,-1863462774] > > [2138,2007-08-01,-1863462774] > > [2138,2006-09-01,-1863462774] > > [2138,2007-03-01,-1863462774] > > [2138,2006-10-01,-1863462774] > > [2138,2007-05-01,-1863462774] > > [2138,2006-06-01,-1863462774] > > [2138,2006-12-01,-1863462774] > > > > > > Thanks, > > Saif > > > > >