[ https://issues.apache.org/jira/browse/SPARK-20353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-20353. ------------------------------- Resolution: Won't Fix > Implement Tensorflow TFRecords file format > ------------------------------------------ > > Key: SPARK-20353 > URL: https://issues.apache.org/jira/browse/SPARK-20353 > Project: Spark > Issue Type: Improvement > Components: Input/Output, SQL > Affects Versions: 2.1.0 > Reporter: Mathew Wicks > Priority: Minor > > Spark is a very good prepossessing engine for tools like Tensorflow. However, > we lack native support for Tensorflow's core file format, TFRecords. > There is a project which implements this functionality as an external JAR. > (But is not user friendly, or robust enough for production use.) > https://github.com/tensorflow/ecosystem/tree/master/spark/spark-tensorflow-connector > Here is some discussion around the above. > https://github.com/tensorflow/ecosystem/issues/32 > If we were to implement "tfrecords" as a data-frame writable/readable format, > we would have to account for the various datatypes that can be present in > spark columns, and which ones are actually useful in Tensorflow. > Note: The `spark-tensorflow-connector` described above, does not properly > support the vector data type. > Further discussion of whether this is within the scope of Spark SQL is > strongly welcomed. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org