Hi Zeff, Thanks for the reply, but could you tell me why is it taking so much time. What could be wrong , also when I remove the DataFrame from memory using rm(). It does not clear the memory but the object is deleted.
Also , What about the R functions which are not supported in SparkR. Like ddply ?? How to access the nth ROW of SparkR DataFrame. ​Regards, Vipul​ On 23 November 2015 at 14:25, Jeff Zhang <zjf...@gmail.com> wrote: > >>> Do I need to create a new DataFrame for every update to the DataFrame > like > addition of new column or need to update the original sales DataFrame. > > Yes, DataFrame is immutable, and every mutation of DataFrame will produce > a new DataFrame. > > > > On Mon, Nov 23, 2015 at 4:44 PM, Vipul Rai <vipulrai8...@gmail.com> wrote: > >> Hello Rui, >> >> Sorry , What I meant was the resultant of the original dataframe to which >> a new column was added gives a new DataFrame. >> >> Please check this for more >> >> https://spark.apache.org/docs/1.5.1/api/R/index.html >> >> Check for >> WithColumn >> >> >> Thanks, >> Vipul >> >> >> On 23 November 2015 at 12:42, Sun, Rui <rui....@intel.com> wrote: >> >>> Vipul, >>> >>> Not sure if I understand your question. DataFrame is immutable. You >>> can't update a DataFrame. >>> >>> Could you paste some log info for the OOM error? >>> >>> -----Original Message----- >>> From: vipulrai [mailto:vipulrai8...@gmail.com] >>> Sent: Friday, November 20, 2015 12:11 PM >>> To: user@spark.apache.org >>> Subject: SparkR DataFrame , Out of memory exception for very small file. >>> >>> Hi Users, >>> >>> I have a general doubt regarding DataFrames in SparkR. >>> >>> I am trying to read a file from Hive and it gets created as DataFrame. >>> >>> sqlContext <- sparkRHive.init(sc) >>> >>> #DF >>> sales <- read.df(sqlContext, "hdfs://sample.csv", header ='true', >>> source = "com.databricks.spark.csv", inferSchema='true') >>> >>> registerTempTable(sales,"Sales") >>> >>> Do I need to create a new DataFrame for every update to the DataFrame >>> like addition of new column or need to update the original sales DataFrame. >>> >>> sales1<- SparkR::sql(sqlContext,"Select a.* , 607 as C1 from Sales as a") >>> >>> >>> Please help me with this , as the orignal file is only 20MB but it >>> throws out of memory exception on a cluster of 4GB Master and Two workers >>> of 4GB each. >>> >>> Also, what is the logic with DataFrame do I need to register and drop >>> tempTable after every update?? >>> >>> Thanks, >>> Vipul >>> >>> >>> >>> -- >>> View this message in context: >>> http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-DataFrame-Out-of-memory-exception-for-very-small-file-tp25435.html >>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For >>> additional commands, e-mail: user-h...@spark.apache.org >>> >>> >> >> >> -- >> Regards, >> Vipul Rai >> www.vipulrai.me >> +91-8892598819 >> <http://in.linkedin.com/in/vipulrai/> >> > > > > -- > Best Regards > > Jeff Zhang > -- Regards, Vipul Rai www.vipulrai.me +91-8892598819 <http://in.linkedin.com/in/vipulrai/>