Peter Aberline created SPARK-8510:
-------------------------------------

             Summary: Store and read NumPy arrays and matrices as values in 
sequence files
                 Key: SPARK-8510
                 URL: https://issues.apache.org/jira/browse/SPARK-8510
             Project: Spark
          Issue Type: Improvement
          Components: PySpark
            Reporter: Peter Aberline
            Priority: Minor


I have extended the provided example code DoubleArrayWritable example to store 
NumPy double type arrays and matrices as arrays of doubles and nested arrays of 
doubles.

Pandas DataFrames can be easily converted to NumPy matrices, so I've also added 
the ability to store the schema-less data from DataFrames and Series that 
contain double data. 

Other than my own use there seems to be demand for this functionality:

http://mail-archives.us.apache.org/mod_mbox/spark-user/201506.mbox/%3CCAJQK-mg1PUCc_hkV=q3n-01ioq_pkwe1g-c39ximco3khqn...@mail.gmail.com%3E

I'll be issuing a PR for this shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to