[Numpy-discussion] Data filtering with np.genfromtxt

Éric Depagne Tue, 24 Sep 2019 07:19:29 -0700

Hi all, 

I am reading  large csv file, that has 8.5 million lines and 216 columns using 
genfromtxt.
I'm not interested in all of the 216 columns, so I filter them out using the  
"usecols" and 
"converters" parameters.


That works very well, but in my original large file, all the columns I extract 
are not filled 
with values. As expected in these cases, genfromtxt replaces them by nan and 
thus, in the 
final array, there are rows that contain these nans. 
I'd like to know if there is a way to filterout at the genfromtxt level the 
lines that do contain 
these nans, so that they do not appear in my final array. 

I'd like to have something like:
genfromtxt extracts the line using the parameters I need.
If the extracted line contains a NaN, do nothing and process the next line. 
If it has no NaNs, add it to the output array as usual.

I could of course remove in the array created by genfromtxt() all the rows that 
contain nans 
(and x[~np.isnan(x).any(axis=1)] does it nicely), but I'd like to be able to 
get a given size of 
the output array. 
The idea is that I can get, for instance, the first 10000 (or any number) lines 
of the input file 
that contain all the columns I need not just the first 10000.

I've found a few examples on SO that do some filtering, but the ones I've found 
do not 
process the extracted lines.

Any help appreciated.

Éric.

-- 
Un clavier azerty en vaut deux
----------------------------------------------------------
Éric Depagne

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Data filtering with np.genfromtxt

Reply via email to