We have a huge binary file in a custom serialization format (e.g. header tells the length of the record, then there is a varying number of items for that record). This is produced by an old c++ application. What would be best approach to deserialize it into a Hive table or a Spark RDD? Format is known and well documented.
-- Ruslan Dautkhanov