I have some large .txt files about ~100GB containing a dataset in fixed
width file. This contains some errors:
- character characters in column that are supposed to be numeric,
- invalid characters
- rows with too many characters, possibly due to invalid characters or some
missing end of line character (so two rows in the original data become one
row in the .txt file).

The errors are not very frequent, but stop me from importing with readr
::read_fwf()


Is there some package, or workflow, in R to pre-process the files,
separating the valid from the not-valid rows into different files? This can
be done by ETL point-click tools, such as Pentaho PDI. Is there some
equivalent code in R to do this?

I googled it and could not find a solution. I also asked this in
StackOverflow and got no answer (here
<http://stackoverflow.com/questions/39414886/fix-errors-in-csv-and-fwf-files-corrupted-characters-when-importing-to-r>
).

regards
Lucas Mation
IPEA - Brasil

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to