That's weird, DataFrame.count() should not require lots of memory on
driver, could you provide a way to reproduce it (could generate fake
dataset)?

On Sat, Apr 9, 2016 at 4:33 PM, Buntu Dev <buntu...@gmail.com> wrote:
> I've allocated about 4g for the driver. For the count stage, I notice the
> Shuffle Write to be 13.9 GB.
>
> On Sat, Apr 9, 2016 at 11:43 AM, Ndjido Ardo BAR <ndj...@gmail.com> wrote:
>>
>> What's the size of your driver?
>> On Sat, 9 Apr 2016 at 20:33, Buntu Dev <buntu...@gmail.com> wrote:
>>>
>>> Actually, df.show() works displaying 20 rows but df.count() is the one
>>> which is causing the driver to run out of memory. There are just 3 INT
>>> columns.
>>>
>>> Any idea what could be the reason?
>>>
>>> On Sat, Apr 9, 2016 at 10:47 AM, <ndj...@gmail.com> wrote:
>>>>
>>>> You seem to have a lot of column :-) !
>>>> df.count() displays the size of your data frame.
>>>> df.columns.size() the number of columns.
>>>>
>>>> Finally, I suggest you check the size of your drive and customize it
>>>> accordingly.
>>>>
>>>> Cheers,
>>>>
>>>> Ardo
>>>>
>>>> Sent from my iPhone
>>>>
>>>> > On 09 Apr 2016, at 19:37, bdev <buntu...@gmail.com> wrote:
>>>> >
>>>> > I keep running out of memory on the driver when I attempt to do
>>>> > df.show().
>>>> > Can anyone let me know how to estimate the size of the dataframe?
>>>> >
>>>> > Thanks!
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > View this message in context:
>>>> > http://apache-spark-user-list.1001560.n3.nabble.com/How-to-estimate-the-size-of-dataframe-using-pyspark-tp26729.html
>>>> > Sent from the Apache Spark User List mailing list archive at
>>>> > Nabble.com.
>>>> >
>>>> > ---------------------------------------------------------------------
>>>> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>> > For additional commands, e-mail: user-h...@spark.apache.org
>>>> >
>>>
>>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to