Thank you Rui Sun ! It is working now!!

On Thu, Feb 4, 2016 at 9:21 AM, Sun, Rui <rui....@intel.com> wrote:

> Devesh,
>
>
>
> Note that DataFrame is immutable. withColumn returns a new DataFrame
> instead of adding a column in-pace to the DataFrame being operated.
>
>
>
> So, you can modify the for loop like:
>
>
>
> for (j in 1:lev)
>
>
>
> {
>
>
>
>    dummy.df.new<-withColumn(df,
>
>    paste0(colnames(cat.column),j),
>
>    ifelse(df$Species==levels(as.factor(unlist(cat.column)))[j],1,0) )
>
>
>
>    df<-dummy.df.new
>
> }
>
>
>
> As you can see, withColumn supports adding only one column, it may be more
> convenient if withColumn supports adding multiple columns at once. There is
> a JIRA requesting such feature (
> https://issues.apache.org/jira/browse/SPARK-12225) which is still under
> discussion. If you desire this feature, you could comment on it.
>
>
>
> *From:* Franc Carter [mailto:franc.car...@gmail.com]
> *Sent:* Wednesday, February 3, 2016 7:40 PM
> *To:* Devesh Raj Singh
> *Cc:* user@spark.apache.org
> *Subject:* Re: sparkR not able to create /append new columns
>
>
>
>
>
> Yes, I didn't work out how to solve that - sorry
>
>
>
>
>
> On 3 February 2016 at 22:37, Devesh Raj Singh <raj.deves...@gmail.com>
> wrote:
>
> Hi,
>
>
>
> but "withColumn" will only add once, if i want to add columns to the same
> dataframe in a loop it will keep overwriting the added column and in the
> end the last added column( in the loop) will be the added column. like in
> my code above.
>
>
>
> On Wed, Feb 3, 2016 at 5:05 PM, Franc Carter <franc.car...@gmail.com>
> wrote:
>
>
>
> I had problems doing this as well - I ended up using 'withColumn', it's
> not particularly graceful but it worked (1.5.2 on AWS EMR)
>
>
>
> cheerd
>
>
>
> On 3 February 2016 at 22:06, Devesh Raj Singh <raj.deves...@gmail.com>
> wrote:
>
> Hi,
>
>
>
> i am trying to create dummy variables in sparkR by creating new columns
> for categorical variables. But it is not appending the columns
>
>
>
>
>
> df <- createDataFrame(sqlContext, iris)
>
> class(dtypes(df))
>
>
>
> cat.column<-vector(mode="character",length=nrow(df))
>
> cat.column<-collect(select(df,df$Species))
>
> lev<-length(levels(as.factor(unlist(cat.column))))
>
> varb.names<-vector(mode="character",length=lev)
>
> for (i in 1:lev){
>
>
>
>   varb.names[i]<-paste0(colnames(cat.column),i)
>
>
>
> }
>
>
>
> for (j in 1:lev)
>
>
>
> {
>
>
>
>    dummy.df.new<-withColumn(df,paste0(colnames
>
>    (cat.column),j),if else(df$Species==levels(as.factor(un
> list(cat.column))
>
>    [j],1,0) )
>
>
>
> }
>
>
>
> I am getting the below output for
>
>
>
> head(dummy.df.new)
>
>
>
> output:
>
>
>
>   Sepal_Length Sepal_Width Petal_Length Petal_Width Species Species1
>
> 1          5.1         3.5          1.4         0.2  setosa        1
>
> 2          4.9         3.0          1.4         0.2  setosa        1
>
> 3          4.7         3.2          1.3         0.2  setosa        1
>
> 4          4.6         3.1          1.5         0.2  setosa        1
>
> 5          5.0         3.6          1.4         0.2  setosa        1
>
> 6          5.4         3.9          1.7         0.4  setosa        1
>
>
>
> Problem: Species2 and Species3 column are not getting added to the
> dataframe
>
>
>
> --
>
> Warm regards,
>
> Devesh.
>
>
>
>
>
> --
>
> Franc
>
>
>
>
>
> --
>
> Warm regards,
>
> Devesh.
>
>
>
>
>
> --
>
> Franc
>



-- 
Warm regards,
Devesh.

Reply via email to