That's weird, DataFrame.count() should not require lots of memory on driver, could you provide a way to reproduce it (could generate fake dataset)?
On Sat, Apr 9, 2016 at 4:33 PM, Buntu Dev <buntu...@gmail.com> wrote: > I've allocated about 4g for the driver. For the count stage, I notice the > Shuffle Write to be 13.9 GB. > > On Sat, Apr 9, 2016 at 11:43 AM, Ndjido Ardo BAR <ndj...@gmail.com> wrote: >> >> What's the size of your driver? >> On Sat, 9 Apr 2016 at 20:33, Buntu Dev <buntu...@gmail.com> wrote: >>> >>> Actually, df.show() works displaying 20 rows but df.count() is the one >>> which is causing the driver to run out of memory. There are just 3 INT >>> columns. >>> >>> Any idea what could be the reason? >>> >>> On Sat, Apr 9, 2016 at 10:47 AM, <ndj...@gmail.com> wrote: >>>> >>>> You seem to have a lot of column :-) ! >>>> df.count() displays the size of your data frame. >>>> df.columns.size() the number of columns. >>>> >>>> Finally, I suggest you check the size of your drive and customize it >>>> accordingly. >>>> >>>> Cheers, >>>> >>>> Ardo >>>> >>>> Sent from my iPhone >>>> >>>> > On 09 Apr 2016, at 19:37, bdev <buntu...@gmail.com> wrote: >>>> > >>>> > I keep running out of memory on the driver when I attempt to do >>>> > df.show(). >>>> > Can anyone let me know how to estimate the size of the dataframe? >>>> > >>>> > Thanks! >>>> > >>>> > >>>> > >>>> > -- >>>> > View this message in context: >>>> > http://apache-spark-user-list.1001560.n3.nabble.com/How-to-estimate-the-size-of-dataframe-using-pyspark-tp26729.html >>>> > Sent from the Apache Spark User List mailing list archive at >>>> > Nabble.com. >>>> > >>>> > --------------------------------------------------------------------- >>>> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>> > For additional commands, e-mail: user-h...@spark.apache.org >>>> > >>> >>> > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org