Hi!
  I'm new to Spark. I have a case study that where the data is store in CSV
files. These files have headers with morte than 1000 columns. I would like
to know what are the best practice to parsing them and in special the
following points:
1. Getting and parsing all the files from a folder
2. What CSV parser do you use?
3. I would like to select just some columns whose names matches a pattern
and then pass the selected columns values (plus the column names) to the
processing and save the output to a CSV (preserving the selected columns).

If you have any experience with some points above, it will be really
helpful (for me and for the others that will encounter the same cases) if
you can share your thoughts.
Thanks.
  Regards,
 Florin

Reply via email to