No need to do that. Reading the header with Spark automatically is trivial.
On Wed, Mar 24, 2021 at 5:25 AM Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > If it is a csv then it is a flat file somewhere in a directory I guess. > > Get the header out by doing > > */usr/bin/zcat csvfile.gz |head -n 1* > Title Number,Tenure,Property > Address,District,County,Region,Postcode,Multiple Address Indicator,Price > Paid,Proprietor Name (1),Company Registration No. (1),Proprietorship > Category (1),Country Incorporated (1),Proprietor (1) Address (1),Proprietor > (1) Address (2),Proprietor (1) Address (3),Proprietor Name (2),Company > Registration No. (2),Proprietorship Category (2),Country Incorporated > (2),Proprietor (2) Address (1),Proprietor (2) Address (2),Proprietor (2) > Address (3),Proprietor Name (3),Company Registration No. (3),Proprietorship > Category (3),Country Incorporated (3),Proprietor (3) Address (1),Proprietor > (3) Address (2),Proprietor (3) Address (3),Proprietor Name (4),Company > Registration No. (4),Proprietorship Category (4),Country Incorporated > (4),Proprietor (4) Address (1),Proprietor (4) Address (2),Proprietor (4) > Address (3),Date Proprietor Added,Additional Proprietor Indicator > > > 10GB is not much of a big CSV file > > that will resolve the header anyway. > > > Also how are you running the spark, in a local mode (single jvm) or > other distributed modes (yarn, standalone) ? > > > HTH >