Why not attach a bigger hard disk to the machines and point your
SPARK_LOCAL_DIRS to it?

Thanks
Best Regards

On Sat, Aug 29, 2015 at 1:13 AM, fsacerdoti <fsacerd...@jumptrading.com>
wrote:

> Hello,
>
> Similar to the thread below [1], when I tried to create an RDD from a 4GB
> pandas dataframe I encountered the error
>
>     TypeError: cannot create an RDD from type: <type 'list'>
>
> However looking into the code shows this is raised from a generic "except
> Exception:" predicate (pyspark/sql/context.py:238 in spark-1.4.1). A
> debugging session reveals the true error is SPARK_LOCAL_DIRS ran out of
> space:
>
> -> rdd = self._sc.parallelize(data)
> (Pdb)
> *IOError: (28, 'No space left on device')*
>
> In this case, creating an RDD from a large matrix (~50mill rows) is
> required
> for us. I'm a bit concerned about spark's process here:
>
>    a. turning the dataframe into records (data.to_records)
>    b. writing it to tmp
>    c. reading it back again in scala.
>
> Is there a better way? The intention would be to operate on slices of this
> large dataframe using numpy operations via spark's transformations and
> actions.
>
> Thanks,
> FDS
>
> 1. https://www.mail-archive.com/user@spark.apache.org/msg35139.html
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/IOError-on-createDataFrame-tp13888.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>

Reply via email to