overwriting a spark output using pyspark

2016-03-07 Thread Devesh Raj Singh
I am trying to overwrite a spark dataframe using the following option but I am not successful spark_df.write.format('com.databricks.spark.csv').option("header", "true",mode='overwrite').save(self.output_file_path) the mode=overwrite command is not successful -- Warm regards, Devesh.

Re: pandas dataframe to spark csv

2016-02-23 Thread Devesh Raj Singh
ricks/spark-csv > > Using the above solution you can read CSV directly into a dataframe as > well. > > Regards, > Gourav > > On Tue, Feb 23, 2016 at 12:03 PM, Devesh Raj Singh <raj.deves...@gmail.com > <javascript:_e(%7B%7D,'cvml','raj.deves...@gmail.com');>> wro

pandas dataframe to spark csv

2016-02-23 Thread Devesh Raj Singh
Hi, I have imported spark csv dataframe in python and read the spark data the converted the dataframe to pandas dataframe using toPandas() I want to convert the pandas dataframe back to spark csv and write the csv to a location. Please suggest -- Warm regards, Devesh.

Reading CSV file using pyspark

2016-02-18 Thread Devesh Raj Singh
Hi, I want to read CSV file in pyspark I am running pyspark on pycharm I am trying to load a csv using pyspark import os import sys os.environ['SPARK_HOME']="/Users/devesh/Downloads/spark-1.5.1-bin-hadoop2.6" sys.path.append("/Users/devesh/Downloads/spark-1.5.1-bin-hadoop2.6/python/") # Now

reading spark dataframe in python

2016-02-16 Thread Devesh Raj Singh
Hi, I want to read a spark dataframe using python and then convert the spark dataframe to pandas dataframe then convert the pandas dataframe back to spark dataframe ( after doing some data analysis) . Please suggest. -- Warm regards, Devesh.

Re: different behavior while using createDataFrame and read.df in SparkR

2016-02-06 Thread Devesh Raj Singh
> > > > When calling createDataFrame on iris, the “.” Character in column names > will be replaced with “_”. > > It seems that when you create a DataFrame from the CSV file, the “.” > Character in column names are still there. > > > > *From:* Devesh Raj Singh [mailt

problem in creating function in sparkR for dummy handling

2016-02-04 Thread Devesh Raj Singh
Hi, I have written a code to create dummy variables in sparkR df <- createDataFrame(sqlContext, iris) class(dtypes(df)) cat.column<-vector(mode="character",length=nrow(df)) cat.column<-collect(select(df,df$Species)) lev<-length(levels(as.factor(unlist(cat.column for (j in 1:lev){

different behavior while using createDataFrame and read.df in SparkR

2016-02-04 Thread Devesh Raj Singh
Hi, I am using Spark 1.5.1 When I do this df <- createDataFrame(sqlContext, iris) #creating a new column for category "Setosa" df$Species1<-ifelse((df)[[5]]=="setosa",1,0) head(df) output: new column created Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1

Re: sparkR not able to create /append new columns

2016-02-04 Thread Devesh Raj Singh
browse/SPARK-12225) which is still under > discussion. If you desire this feature, you could comment on it. > > > > *From:* Franc Carter [mailto:franc.car...@gmail.com] > *Sent:* Wednesday, February 3, 2016 7:40 PM > *To:* Devesh Raj Singh > *Cc:* user@spark.apache.org > *Subj

sparkR not able to create /append new columns

2016-02-03 Thread Devesh Raj Singh
Hi, i am trying to create dummy variables in sparkR by creating new columns for categorical variables. But it is not appending the columns df <- createDataFrame(sqlContext, iris) class(dtypes(df)) cat.column<-vector(mode="character",length=nrow(df)) cat.column<-collect(select(df,df$Species))

Re: sparkR not able to create /append new columns

2016-02-03 Thread Devesh Raj Singh
<franc.car...@gmail.com> wrote: > > I had problems doing this as well - I ended up using 'withColumn', it's > not particularly graceful but it worked (1.5.2 on AWS EMR) > > cheerd > > On 3 February 2016 at 22:06, Devesh Raj Singh <raj.deves...@gmail.com> > wro

can we do column bind of 2 dataframes in spark R? similar to cbind in R?

2016-02-01 Thread Devesh Raj Singh
Hi, I want to merge 2 dataframes in sparkR columnwise similar to cbind in R. We have "unionAll" for r bind but could not find anything for cbind in sparkR -- Warm regards, Devesh.

Re: NA value handling in sparkR

2016-01-27 Thread Devesh Raj Singh
rs into null types, like createDataFrame does for , and > then one would be able to use dropna() etc. > > > > On Mon, Jan 25, 2016 at 3:24 AM, Devesh Raj Singh <raj.deves...@gmail.com> > wrote: > >> Hi, >> >> Yes you are right. >> >> I think the

Re: NA value handling in sparkR

2016-01-26 Thread Devesh Raj Singh
ve an option for read.df to convert any >> "NA" it encounters into null types, like createDataFrame does for , and >> then one would be able to use dropna() etc. >> >> >> >> On Mon, Jan 25, 2016 at 3:24 AM, Devesh Raj Singh <raj.deves...@gmail.com

Re: NA value handling in sparkR

2016-01-25 Thread Devesh Raj Singh
suppose its > possible that createDataFrame converts R's values to null, so dropna() > works with that. But perhaps read.df() does not convert R s to null, as > those are most likely interpreted as strings when they come in from the > csv. Just a guess, can anyone confirm? > > D

NA value handling in sparkR

2016-01-24 Thread Devesh Raj Singh
Hi, I have applied the following code on airquality dataset available in R , which has some missing values. I want to omit the rows which has NAs library(SparkR) Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:spark-csv_2.10:1.2.0" "sparkr-shell"') sc <-

avg(df$column) not returning a value but just the text "Column avg"

2016-01-21 Thread Devesh Raj Singh
Hi, I want to create average of numerical columns in iris dataset using sparkR Sys.setenv('SPARKR_SUBMIT_ARGS'='"--packages" "com.databricks:spark-csv_2.10:1.3.0" "sparkr-shell"') library(SparkR) sc=sparkR.init(master="local",sparkHome =

can we create dummy variables from categorical variables, using sparkR

2016-01-19 Thread Devesh Raj Singh
Hi, Can we create dummy variables for categorical variables in sparkR like we do using "dummies" package in R -- Warm regards, Devesh.