Thanks to both of you, this should get me started.
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
It should be easy to start with a custom Hadoop InputFormat that reads the
file and creates a `RDD[Row]`, since you know the records size, it should
be pretty easy to make the InputFormat to produce splits, so then you could
read the file in parallel.
On Mon, Jun 12, 2017 at 6:01 AM, OBones
Try
https://mapr.com/blog/spark-data-source-api-extending-our-spark-sql-query-engine/
Thanks,
Assaf.
-Original Message-
From: OBones [mailto:obo...@free.fr]
Sent: Monday, June 12, 2017 1:01 PM
To: user@spark.apache.org
Subject: [How-To] Custom file format as source