Re: SparkR DataFrame , Out of memory exception for very small file.

Vipul Rai Mon, 23 Nov 2015 01:11:38 -0800

Hi Zeff,

Thanks for the reply, but could you tell me why is it taking so much time.
What could be wrong , also when I remove the DataFrame from memory using
rm().
It does not clear the memory but the object is deleted.


Also , What about the R functions which are not supported in SparkR.
Like ddply ??

How to access the nth ROW of SparkR DataFrame.

Regards,
Vipul

On 23 November 2015 at 14:25, Jeff Zhang <zjf...@gmail.com> wrote:

> >>> Do I need to create a new DataFrame for every update to the DataFrame
> like
> addition of new column or  need to update the original sales DataFrame.
>
> Yes, DataFrame is immutable, and every mutation of DataFrame will produce
> a new DataFrame.
>
>
>
> On Mon, Nov 23, 2015 at 4:44 PM, Vipul Rai <vipulrai8...@gmail.com> wrote:
>
>> Hello Rui,
>>
>> Sorry , What I meant was the resultant of the original dataframe to which
>> a new column was added gives a new DataFrame.
>>
>> Please check this for more
>>
>> https://spark.apache.org/docs/1.5.1/api/R/index.html
>>
>> Check for
>> WithColumn
>>
>>
>> Thanks,
>> Vipul
>>
>>
>> On 23 November 2015 at 12:42, Sun, Rui <rui....@intel.com> wrote:
>>
>>> Vipul,
>>>
>>> Not sure if I understand your question. DataFrame is immutable. You
>>> can't update a DataFrame.
>>>
>>> Could you paste some log info for the OOM error?
>>>
>>> -----Original Message-----
>>> From: vipulrai [mailto:vipulrai8...@gmail.com]
>>> Sent: Friday, November 20, 2015 12:11 PM
>>> To: user@spark.apache.org
>>> Subject: SparkR DataFrame , Out of memory exception for very small file.
>>>
>>> Hi Users,
>>>
>>> I have a general doubt regarding DataFrames in SparkR.
>>>
>>> I am trying to read a file from Hive and it gets created as DataFrame.
>>>
>>> sqlContext <- sparkRHive.init(sc)
>>>
>>> #DF
>>> sales <- read.df(sqlContext, "hdfs://sample.csv", header ='true',
>>>                  source = "com.databricks.spark.csv", inferSchema='true')
>>>
>>> registerTempTable(sales,"Sales")
>>>
>>> Do I need to create a new DataFrame for every update to the DataFrame
>>> like addition of new column or  need to update the original sales DataFrame.
>>>
>>> sales1<- SparkR::sql(sqlContext,"Select a.* , 607 as C1 from Sales as a")
>>>
>>>
>>> Please help me with this , as the orignal file is only 20MB but it
>>> throws out of memory exception on a cluster of 4GB Master and Two workers
>>> of 4GB each.
>>>
>>> Also, what is the logic with DataFrame do I need to register and drop
>>> tempTable after every update??
>>>
>>> Thanks,
>>> Vipul
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-DataFrame-Out-of-memory-exception-for-very-small-file-tp25435.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For
>>> additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>>
>>
>> --
>> Regards,
>> Vipul Rai
>> www.vipulrai.me
>> +91-8892598819
>> <http://in.linkedin.com/in/vipulrai/>
>>
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>



-- 
Regards,
Vipul Rai
www.vipulrai.me
+91-8892598819
<http://in.linkedin.com/in/vipulrai/>

Re: SparkR DataFrame , Out of memory exception for very small file.

Reply via email to