Thank you Rui Sun ! It is working now!! On Thu, Feb 4, 2016 at 9:21 AM, Sun, Rui <rui....@intel.com> wrote:
> Devesh, > > > > Note that DataFrame is immutable. withColumn returns a new DataFrame > instead of adding a column in-pace to the DataFrame being operated. > > > > So, you can modify the for loop like: > > > > for (j in 1:lev) > > > > { > > > > dummy.df.new<-withColumn(df, > > paste0(colnames(cat.column),j), > > ifelse(df$Species==levels(as.factor(unlist(cat.column)))[j],1,0) ) > > > > df<-dummy.df.new > > } > > > > As you can see, withColumn supports adding only one column, it may be more > convenient if withColumn supports adding multiple columns at once. There is > a JIRA requesting such feature ( > https://issues.apache.org/jira/browse/SPARK-12225) which is still under > discussion. If you desire this feature, you could comment on it. > > > > *From:* Franc Carter [mailto:franc.car...@gmail.com] > *Sent:* Wednesday, February 3, 2016 7:40 PM > *To:* Devesh Raj Singh > *Cc:* user@spark.apache.org > *Subject:* Re: sparkR not able to create /append new columns > > > > > > Yes, I didn't work out how to solve that - sorry > > > > > > On 3 February 2016 at 22:37, Devesh Raj Singh <raj.deves...@gmail.com> > wrote: > > Hi, > > > > but "withColumn" will only add once, if i want to add columns to the same > dataframe in a loop it will keep overwriting the added column and in the > end the last added column( in the loop) will be the added column. like in > my code above. > > > > On Wed, Feb 3, 2016 at 5:05 PM, Franc Carter <franc.car...@gmail.com> > wrote: > > > > I had problems doing this as well - I ended up using 'withColumn', it's > not particularly graceful but it worked (1.5.2 on AWS EMR) > > > > cheerd > > > > On 3 February 2016 at 22:06, Devesh Raj Singh <raj.deves...@gmail.com> > wrote: > > Hi, > > > > i am trying to create dummy variables in sparkR by creating new columns > for categorical variables. But it is not appending the columns > > > > > > df <- createDataFrame(sqlContext, iris) > > class(dtypes(df)) > > > > cat.column<-vector(mode="character",length=nrow(df)) > > cat.column<-collect(select(df,df$Species)) > > lev<-length(levels(as.factor(unlist(cat.column)))) > > varb.names<-vector(mode="character",length=lev) > > for (i in 1:lev){ > > > > varb.names[i]<-paste0(colnames(cat.column),i) > > > > } > > > > for (j in 1:lev) > > > > { > > > > dummy.df.new<-withColumn(df,paste0(colnames > > (cat.column),j),if else(df$Species==levels(as.factor(un > list(cat.column)) > > [j],1,0) ) > > > > } > > > > I am getting the below output for > > > > head(dummy.df.new) > > > > output: > > > > Sepal_Length Sepal_Width Petal_Length Petal_Width Species Species1 > > 1 5.1 3.5 1.4 0.2 setosa 1 > > 2 4.9 3.0 1.4 0.2 setosa 1 > > 3 4.7 3.2 1.3 0.2 setosa 1 > > 4 4.6 3.1 1.5 0.2 setosa 1 > > 5 5.0 3.6 1.4 0.2 setosa 1 > > 6 5.4 3.9 1.7 0.4 setosa 1 > > > > Problem: Species2 and Species3 column are not getting added to the > dataframe > > > > -- > > Warm regards, > > Devesh. > > > > > > -- > > Franc > > > > > > -- > > Warm regards, > > Devesh. > > > > > > -- > > Franc > -- Warm regards, Devesh.