Peter Aberline created SPARK-8510: ------------------------------------- Summary: Store and read NumPy arrays and matrices as values in sequence files Key: SPARK-8510 URL: https://issues.apache.org/jira/browse/SPARK-8510 Project: Spark Issue Type: Improvement Components: PySpark Reporter: Peter Aberline Priority: Minor
I have extended the provided example code DoubleArrayWritable example to store NumPy double type arrays and matrices as arrays of doubles and nested arrays of doubles. Pandas DataFrames can be easily converted to NumPy matrices, so I've also added the ability to store the schema-less data from DataFrames and Series that contain double data. Other than my own use there seems to be demand for this functionality: http://mail-archives.us.apache.org/mod_mbox/spark-user/201506.mbox/%3CCAJQK-mg1PUCc_hkV=q3n-01ioq_pkwe1g-c39ximco3khqn...@mail.gmail.com%3E I'll be issuing a PR for this shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org