Re: sparkR not able to create /append new columns

2016-02-04 Thread Devesh Raj Singh
Thank you Rui Sun ! It is working now!!

On Thu, Feb 4, 2016 at 9:21 AM, Sun, Rui <rui@intel.com> wrote:

> Devesh,
>
>
>
> Note that DataFrame is immutable. withColumn returns a new DataFrame
> instead of adding a column in-pace to the DataFrame being operated.
>
>
>
> So, you can modify the for loop like:
>
>
>
> for (j in 1:lev)
>
>
>
> {
>
>
>
>dummy.df.new<-withColumn(df,
>
>paste0(colnames(cat.column),j),
>
>ifelse(df$Species==levels(as.factor(unlist(cat.column)))[j],1,0) )
>
>
>
>df<-dummy.df.new
>
> }
>
>
>
> As you can see, withColumn supports adding only one column, it may be more
> convenient if withColumn supports adding multiple columns at once. There is
> a JIRA requesting such feature (
> https://issues.apache.org/jira/browse/SPARK-12225) which is still under
> discussion. If you desire this feature, you could comment on it.
>
>
>
> *From:* Franc Carter [mailto:franc.car...@gmail.com]
> *Sent:* Wednesday, February 3, 2016 7:40 PM
> *To:* Devesh Raj Singh
> *Cc:* user@spark.apache.org
> *Subject:* Re: sparkR not able to create /append new columns
>
>
>
>
>
> Yes, I didn't work out how to solve that - sorry
>
>
>
>
>
> On 3 February 2016 at 22:37, Devesh Raj Singh <raj.deves...@gmail.com>
> wrote:
>
> Hi,
>
>
>
> but "withColumn" will only add once, if i want to add columns to the same
> dataframe in a loop it will keep overwriting the added column and in the
> end the last added column( in the loop) will be the added column. like in
> my code above.
>
>
>
> On Wed, Feb 3, 2016 at 5:05 PM, Franc Carter <franc.car...@gmail.com>
> wrote:
>
>
>
> I had problems doing this as well - I ended up using 'withColumn', it's
> not particularly graceful but it worked (1.5.2 on AWS EMR)
>
>
>
> cheerd
>
>
>
> On 3 February 2016 at 22:06, Devesh Raj Singh <raj.deves...@gmail.com>
> wrote:
>
> Hi,
>
>
>
> i am trying to create dummy variables in sparkR by creating new columns
> for categorical variables. But it is not appending the columns
>
>
>
>
>
> df <- createDataFrame(sqlContext, iris)
>
> class(dtypes(df))
>
>
>
> cat.column<-vector(mode="character",length=nrow(df))
>
> cat.column<-collect(select(df,df$Species))
>
> lev<-length(levels(as.factor(unlist(cat.column
>
> varb.names<-vector(mode="character",length=lev)
>
> for (i in 1:lev){
>
>
>
>   varb.names[i]<-paste0(colnames(cat.column),i)
>
>
>
> }
>
>
>
> for (j in 1:lev)
>
>
>
> {
>
>
>
>dummy.df.new<-withColumn(df,paste0(colnames
>
>(cat.column),j),if else(df$Species==levels(as.factor(un
> list(cat.column))
>
>[j],1,0) )
>
>
>
> }
>
>
>
> I am getting the below output for
>
>
>
> head(dummy.df.new)
>
>
>
> output:
>
>
>
>   Sepal_Length Sepal_Width Petal_Length Petal_Width Species Species1
>
> 1  5.1 3.5  1.4 0.2  setosa1
>
> 2  4.9 3.0  1.4 0.2  setosa1
>
> 3  4.7 3.2  1.3 0.2  setosa1
>
> 4  4.6 3.1  1.5 0.2  setosa1
>
> 5  5.0 3.6  1.4 0.2  setosa1
>
> 6  5.4 3.9  1.7 0.4  setosa1
>
>
>
> Problem: Species2 and Species3 column are not getting added to the
> dataframe
>
>
>
> --
>
> Warm regards,
>
> Devesh.
>
>
>
>
>
> --
>
> Franc
>
>
>
>
>
> --
>
> Warm regards,
>
> Devesh.
>
>
>
>
>
> --
>
> Franc
>



-- 
Warm regards,
Devesh.


sparkR not able to create /append new columns

2016-02-03 Thread Devesh Raj Singh
Hi,

i am trying to create dummy variables in sparkR by creating new columns for
categorical variables. But it is not appending the columns


df <- createDataFrame(sqlContext, iris)
class(dtypes(df))

cat.column<-vector(mode="character",length=nrow(df))
cat.column<-collect(select(df,df$Species))
lev<-length(levels(as.factor(unlist(cat.column
varb.names<-vector(mode="character",length=lev)
for (i in 1:lev){

  varb.names[i]<-paste0(colnames(cat.column),i)

}

for (j in 1:lev)

{

   dummy.df.new<-withColumn(df,paste0(colnames
   (cat.column),j),if else(df$Species==levels(as.factor(un list(cat.column))
   [j],1,0) )

}

I am getting the below output for

head(dummy.df.new)

output:

  Sepal_Length Sepal_Width Petal_Length Petal_Width Species Species1
1  5.1 3.5  1.4 0.2  setosa1
2  4.9 3.0  1.4 0.2  setosa1
3  4.7 3.2  1.3 0.2  setosa1
4  4.6 3.1  1.5 0.2  setosa1
5  5.0 3.6  1.4 0.2  setosa1
6  5.4 3.9  1.7 0.4  setosa1

Problem: Species2 and Species3 column are not getting added to the dataframe

-- 
Warm regards,
Devesh.


Re: sparkR not able to create /append new columns

2016-02-03 Thread Franc Carter
Yes, I didn't work out how to solve that - sorry


On 3 February 2016 at 22:37, Devesh Raj Singh 
wrote:

> Hi,
>
> but "withColumn" will only add once, if i want to add columns to the same
> dataframe in a loop it will keep overwriting the added column and in the
> end the last added column( in the loop) will be the added column. like in
> my code above.
>
> On Wed, Feb 3, 2016 at 5:05 PM, Franc Carter 
> wrote:
>
>>
>> I had problems doing this as well - I ended up using 'withColumn', it's
>> not particularly graceful but it worked (1.5.2 on AWS EMR)
>>
>> cheerd
>>
>> On 3 February 2016 at 22:06, Devesh Raj Singh 
>> wrote:
>>
>>> Hi,
>>>
>>> i am trying to create dummy variables in sparkR by creating new columns
>>> for categorical variables. But it is not appending the columns
>>>
>>>
>>> df <- createDataFrame(sqlContext, iris)
>>> class(dtypes(df))
>>>
>>> cat.column<-vector(mode="character",length=nrow(df))
>>> cat.column<-collect(select(df,df$Species))
>>> lev<-length(levels(as.factor(unlist(cat.column
>>> varb.names<-vector(mode="character",length=lev)
>>> for (i in 1:lev){
>>>
>>>   varb.names[i]<-paste0(colnames(cat.column),i)
>>>
>>> }
>>>
>>> for (j in 1:lev)
>>>
>>> {
>>>
>>>dummy.df.new<-withColumn(df,paste0(colnames
>>>(cat.column),j),if else(df$Species==levels(as.factor(un
>>> list(cat.column))
>>>[j],1,0) )
>>>
>>> }
>>>
>>> I am getting the below output for
>>>
>>> head(dummy.df.new)
>>>
>>> output:
>>>
>>>   Sepal_Length Sepal_Width Petal_Length Petal_Width Species Species1
>>> 1  5.1 3.5  1.4 0.2  setosa1
>>> 2  4.9 3.0  1.4 0.2  setosa1
>>> 3  4.7 3.2  1.3 0.2  setosa1
>>> 4  4.6 3.1  1.5 0.2  setosa1
>>> 5  5.0 3.6  1.4 0.2  setosa1
>>> 6  5.4 3.9  1.7 0.4  setosa1
>>>
>>> Problem: Species2 and Species3 column are not getting added to the
>>> dataframe
>>>
>>> --
>>> Warm regards,
>>> Devesh.
>>>
>>
>>
>>
>> --
>> Franc
>>
>
>
>
> --
> Warm regards,
> Devesh.
>



-- 
Franc


Re: sparkR not able to create /append new columns

2016-02-03 Thread Franc Carter
I had problems doing this as well - I ended up using 'withColumn', it's not
particularly graceful but it worked (1.5.2 on AWS EMR)

cheerd

On 3 February 2016 at 22:06, Devesh Raj Singh 
wrote:

> Hi,
>
> i am trying to create dummy variables in sparkR by creating new columns
> for categorical variables. But it is not appending the columns
>
>
> df <- createDataFrame(sqlContext, iris)
> class(dtypes(df))
>
> cat.column<-vector(mode="character",length=nrow(df))
> cat.column<-collect(select(df,df$Species))
> lev<-length(levels(as.factor(unlist(cat.column
> varb.names<-vector(mode="character",length=lev)
> for (i in 1:lev){
>
>   varb.names[i]<-paste0(colnames(cat.column),i)
>
> }
>
> for (j in 1:lev)
>
> {
>
>dummy.df.new<-withColumn(df,paste0(colnames
>(cat.column),j),if else(df$Species==levels(as.factor(un
> list(cat.column))
>[j],1,0) )
>
> }
>
> I am getting the below output for
>
> head(dummy.df.new)
>
> output:
>
>   Sepal_Length Sepal_Width Petal_Length Petal_Width Species Species1
> 1  5.1 3.5  1.4 0.2  setosa1
> 2  4.9 3.0  1.4 0.2  setosa1
> 3  4.7 3.2  1.3 0.2  setosa1
> 4  4.6 3.1  1.5 0.2  setosa1
> 5  5.0 3.6  1.4 0.2  setosa1
> 6  5.4 3.9  1.7 0.4  setosa1
>
> Problem: Species2 and Species3 column are not getting added to the
> dataframe
>
> --
> Warm regards,
> Devesh.
>



-- 
Franc


Re: sparkR not able to create /append new columns

2016-02-03 Thread Devesh Raj Singh
Hi,

but "withColumn" will only add once, if i want to add columns to the same
dataframe in a loop it will keep overwriting the added column and in the
end the last added column( in the loop) will be the added column. like in
my code above.

On Wed, Feb 3, 2016 at 5:05 PM, Franc Carter  wrote:

>
> I had problems doing this as well - I ended up using 'withColumn', it's
> not particularly graceful but it worked (1.5.2 on AWS EMR)
>
> cheerd
>
> On 3 February 2016 at 22:06, Devesh Raj Singh 
> wrote:
>
>> Hi,
>>
>> i am trying to create dummy variables in sparkR by creating new columns
>> for categorical variables. But it is not appending the columns
>>
>>
>> df <- createDataFrame(sqlContext, iris)
>> class(dtypes(df))
>>
>> cat.column<-vector(mode="character",length=nrow(df))
>> cat.column<-collect(select(df,df$Species))
>> lev<-length(levels(as.factor(unlist(cat.column
>> varb.names<-vector(mode="character",length=lev)
>> for (i in 1:lev){
>>
>>   varb.names[i]<-paste0(colnames(cat.column),i)
>>
>> }
>>
>> for (j in 1:lev)
>>
>> {
>>
>>dummy.df.new<-withColumn(df,paste0(colnames
>>(cat.column),j),if else(df$Species==levels(as.factor(un
>> list(cat.column))
>>[j],1,0) )
>>
>> }
>>
>> I am getting the below output for
>>
>> head(dummy.df.new)
>>
>> output:
>>
>>   Sepal_Length Sepal_Width Petal_Length Petal_Width Species Species1
>> 1  5.1 3.5  1.4 0.2  setosa1
>> 2  4.9 3.0  1.4 0.2  setosa1
>> 3  4.7 3.2  1.3 0.2  setosa1
>> 4  4.6 3.1  1.5 0.2  setosa1
>> 5  5.0 3.6  1.4 0.2  setosa1
>> 6  5.4 3.9  1.7 0.4  setosa1
>>
>> Problem: Species2 and Species3 column are not getting added to the
>> dataframe
>>
>> --
>> Warm regards,
>> Devesh.
>>
>
>
>
> --
> Franc
>



-- 
Warm regards,
Devesh.


RE: sparkR not able to create /append new columns

2016-02-03 Thread Sun, Rui
Devesh,

Note that DataFrame is immutable. withColumn returns a new DataFrame instead of 
adding a column in-pace to the DataFrame being operated.

So, you can modify the for loop like:

for (j in 1:lev)

{

   dummy.df.new<-withColumn(df,
   paste0(colnames(cat.column),j),
   ifelse(df$Species==levels(as.factor(unlist(cat.column)))[j],1,0) )

   df<-dummy.df.new
}

As you can see, withColumn supports adding only one column, it may be more 
convenient if withColumn supports adding multiple columns at once. There is a 
JIRA requesting such feature 
(https://issues.apache.org/jira/browse/SPARK-12225) which is still under 
discussion. If you desire this feature, you could comment on it.

From: Franc Carter [mailto:franc.car...@gmail.com]
Sent: Wednesday, February 3, 2016 7:40 PM
To: Devesh Raj Singh
Cc: user@spark.apache.org
Subject: Re: sparkR not able to create /append new columns


Yes, I didn't work out how to solve that - sorry


On 3 February 2016 at 22:37, Devesh Raj Singh 
<raj.deves...@gmail.com<mailto:raj.deves...@gmail.com>> wrote:
Hi,

but "withColumn" will only add once, if i want to add columns to the same 
dataframe in a loop it will keep overwriting the added column and in the end 
the last added column( in the loop) will be the added column. like in my code 
above.

On Wed, Feb 3, 2016 at 5:05 PM, Franc Carter 
<franc.car...@gmail.com<mailto:franc.car...@gmail.com>> wrote:

I had problems doing this as well - I ended up using 'withColumn', it's not 
particularly graceful but it worked (1.5.2 on AWS EMR)

cheerd

On 3 February 2016 at 22:06, Devesh Raj Singh 
<raj.deves...@gmail.com<mailto:raj.deves...@gmail.com>> wrote:
Hi,

i am trying to create dummy variables in sparkR by creating new columns for 
categorical variables. But it is not appending the columns


df <- createDataFrame(sqlContext, iris)
class(dtypes(df))

cat.column<-vector(mode="character",length=nrow(df))
cat.column<-collect(select(df,df$Species))
lev<-length(levels(as.factor(unlist(cat.column
varb.names<-vector(mode="character",length=lev)
for (i in 1:lev){

  varb.names[i]<-paste0(colnames(cat.column),i)

}

for (j in 1:lev)

{

   dummy.df.new<-withColumn(df,paste0(colnames
   (cat.column),j),if else(df$Species==levels(as.factor(un list(cat.column))
   [j],1,0) )

}

I am getting the below output for

head(dummy.df.new)

output:

  Sepal_Length Sepal_Width Petal_Length Petal_Width Species Species1
1  5.1 3.5  1.4 0.2  setosa1
2  4.9 3.0  1.4 0.2  setosa1
3  4.7 3.2  1.3 0.2  setosa1
4  4.6 3.1  1.5 0.2  setosa1
5  5.0 3.6  1.4 0.2  setosa1
6  5.4 3.9  1.7 0.4  setosa1

Problem: Species2 and Species3 column are not getting added to the dataframe

--
Warm regards,
Devesh.



--
Franc



--
Warm regards,
Devesh.



--
Franc