Re: Adding new column to Dataframe

Vishnu Viswanath Wed, 25 Nov 2015 17:09:47 -0800

Thanks Jeff,

rowNumber is a function in org.apache.spark.sql.functions link
<https://spark.apache.org/docs/1.4.0/api/scala/index.html#org.apache.spark.sql.functions$>


I will try to use monotonicallyIncreasingId and see if it works.

You’d better to use join to correlate 2 data frames : Yes, thats why I
thought of adding row number in both the DataFrames and join them based on
row number. Is there any better way of doing this? Both DataFrames will
have same number of rows always, but are not related by any column to do
join.

Thanks and Regards,
Vishnu Viswanath


On Wed, Nov 25, 2015 at 6:43 PM, Jeff Zhang <zjf...@gmail.com> wrote:

> >>> I tried to use df.withColumn but I am getting below exception.
>
> What is rowNumber here ? UDF ?  You can use monotonicallyIncreasingId
> for generating id
>
> >>> Also, is it possible to add a column from one dataframe to another?
>
> You can't, because how can you add one dataframe to another if they have
> different number of rows. You'd better to use join to correlate 2 data
> frames.
>
> On Thu, Nov 26, 2015 at 6:39 AM, Vishnu Viswanath <
> vishnu.viswanat...@gmail.com> wrote:
>
>> Hi,
>>
>> I am trying to add the row number to a spark dataframe.
>> This is my dataframe:
>>
>> scala> df.printSchema
>> root
>> |-- line: string (nullable = true)
>>
>> I tried to use df.withColumn but I am getting below exception.
>>
>> scala> df.withColumn("row",rowNumber)
>> org.apache.spark.sql.AnalysisException: unresolved operator 'Project 
>> [line#2326,'row_number() AS row#2327];
>> at 
>> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:37)
>> at 
>> org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:44)
>> at 
>> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:174)
>> at 
>> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:49)
>>
>> Also, is it possible to add a column from one dataframe to another?
>> something like
>>
>> scala> df.withColumn("line2",df2("line"))
>>
>> org.apache.spark.sql.AnalysisException: resolved attribute(s) line#2330 
>> missing from line#2326 in operator !Project [line#2326,line#2330 AS 
>> line2#2331];
>>
>> 
>>
>> Thanks and Regards,
>> Vishnu Viswanath
>> *www.vishnuviswanath.com <http://www.vishnuviswanath.com>*
>>
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>



-- 
Thanks and Regards,
Vishnu Viswanath
+1 309 550 2311
*www.vishnuviswanath.com <http://www.vishnuviswanath.com>*

Re: Adding new column to Dataframe

Reply via email to