Joseph K. Bradley created SPARK-20528: -----------------------------------------
Summary: Add BinaryFileReader and Writer for DataFrames Key: SPARK-20528 URL: https://issues.apache.org/jira/browse/SPARK-20528 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 2.2.0 Reporter: Joseph K. Bradley It would be very useful to have a binary data reader/writer for DataFrames, presumably called via {{spark.read.binaryFiles}}, etc. Currently, going through RDDs is annoying since it requires different code paths for Scala vs Python: Scala: {code} val binaryFilesRDD = sc.binaryFiles("mypath") val binaryFilesDF = spark.createDataFrame(binaryFilesRDD) {code} Python: {code} binaryFilesRDD = sc.binaryFiles("mypath") binaryFilesRDD_recast = binaryFilesRDD.map(lambda x: (x[0], bytearray(x[1]))) binaryFilesDF = spark.createDataFrame(binaryFilesRDD_recast) {code} This is because Scala and Python {{sc.binaryFiles}} return different types, which makes sense in RDD land but not DataFrame land. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org