Adding new column to Dataframe

2015-11-25 Thread Vishnu Viswanath
Hi, I am trying to add the row number to a spark dataframe. This is my dataframe: scala> df.printSchema root |-- line: string (nullable = true) I tried to use df.withColumn but I am getting below exception. scala> df.withColumn("row",rowNumber) org.apache.spark.sql.AnalysisException:

Re: Adding new column to Dataframe

2015-11-25 Thread Jeff Zhang
>>> I tried to use df.withColumn but I am getting below exception. What is rowNumber here ? UDF ? You can use monotonicallyIncreasingId for generating id >>> Also, is it possible to add a column from one dataframe to another? You can't, because how can you add one dataframe to another if they

Re: Adding new column to Dataframe

2015-11-25 Thread Vishnu Viswanath
Thanks Jeff, rowNumber is a function in org.apache.spark.sql.functions link I will try to use monotonicallyIncreasingId and see if it works. You’d better to use join to correlate 2 data frames : Yes,

Re: Adding new column to Dataframe

2015-11-25 Thread Vishnu Viswanath
Thanks Ted, It looks like I cannot use row_number then. I tried to run a sample window function and got below error org.apache.spark.sql.AnalysisException: Could not resolve window function 'avg'. Note that, using window functions currently requires a HiveContext; On Wed, Nov 25, 2015 at 8:28

Re: Adding new column to Dataframe

2015-11-25 Thread Ted Yu
Vishnu: rowNumber (deprecated, replaced with row_number) is a window function. * Window function: returns a sequential number starting at 1 within a window partition. * * @group window_funcs * @since 1.6.0 */ def row_number(): Column = withExpr {