Vishnu: rowNumber (deprecated, replaced with row_number) is a window function.
* Window function: returns a sequential number starting at 1 within a window partition. * * @group window_funcs * @since 1.6.0 */ def row_number(): Column = withExpr { UnresolvedWindowFunction("row_number", Nil) } Sample usage: df = sqlContext.range(1<<20) df2 = df.select((df.id % 1000).alias("A"), (df.id / 1000).alias('B')) ws = Window.partitionBy(df2.A).orderBy(df2.B) df3 = df2.select("client", "date", rowNumber().over(ws).alias("rn")).filter("rn < 0") Cheers On Wed, Nov 25, 2015 at 5:08 PM, Vishnu Viswanath < vishnu.viswanat...@gmail.com> wrote: > Thanks Jeff, > > rowNumber is a function in org.apache.spark.sql.functions link > <https://spark.apache.org/docs/1.4.0/api/scala/index.html#org.apache.spark.sql.functions$> > > I will try to use monotonicallyIncreasingId and see if it works. > > You’d better to use join to correlate 2 data frames : Yes, thats why I > thought of adding row number in both the DataFrames and join them based on > row number. Is there any better way of doing this? Both DataFrames will > have same number of rows always, but are not related by any column to do > join. > > Thanks and Regards, > Vishnu Viswanath > > > On Wed, Nov 25, 2015 at 6:43 PM, Jeff Zhang <zjf...@gmail.com> wrote: > >> >>> I tried to use df.withColumn but I am getting below exception. >> >> What is rowNumber here ? UDF ? You can use monotonicallyIncreasingId >> for generating id >> >> >>> Also, is it possible to add a column from one dataframe to another? >> >> You can't, because how can you add one dataframe to another if they have >> different number of rows. You'd better to use join to correlate 2 data >> frames. >> >> On Thu, Nov 26, 2015 at 6:39 AM, Vishnu Viswanath < >> vishnu.viswanat...@gmail.com> wrote: >> >>> Hi, >>> >>> I am trying to add the row number to a spark dataframe. >>> This is my dataframe: >>> >>> scala> df.printSchema >>> root >>> |-- line: string (nullable = true) >>> >>> I tried to use df.withColumn but I am getting below exception. >>> >>> scala> df.withColumn("row",rowNumber) >>> org.apache.spark.sql.AnalysisException: unresolved operator 'Project >>> [line#2326,'row_number() AS row#2327]; >>> at >>> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:37) >>> at >>> org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:44) >>> at >>> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:174) >>> at >>> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:49) >>> >>> Also, is it possible to add a column from one dataframe to another? >>> something like >>> >>> scala> df.withColumn("line2",df2("line")) >>> >>> org.apache.spark.sql.AnalysisException: resolved attribute(s) line#2330 >>> missing from line#2326 in operator !Project [line#2326,line#2330 AS >>> line2#2331]; >>> >>> >>> >>> Thanks and Regards, >>> Vishnu Viswanath >>> *www.vishnuviswanath.com <http://www.vishnuviswanath.com>* >>> >> >> >> >> -- >> Best Regards >> >> Jeff Zhang >> > > > > -- > Thanks and Regards, > Vishnu Viswanath > +1 309 550 2311 > *www.vishnuviswanath.com <http://www.vishnuviswanath.com>* >