Should be - if you need cast... t.column(i).cast(..) uses arrow cast..
BR, Jacek pon., 1 mar 2021 o 17:04 Jacek Pliszka <jacek.plis...@gmail.com> napisał(a): > > Use np.column_stack and list comprehension: > > t = pq.read_table('a.pq') > matrix = np.column_stack([t.column(i) for i in range(t.num_columns)]) > > If you need case - use pyarrow or numpy one - depending on your case. > > BR, > > Jacek > > pon., 1 mar 2021 o 14:07 jonathan mercier <jonathan.merc...@cnrgh.fr> > napisał(a): > > > > Thanks for the hint. > > I do not saw a to_numpy method from Tabl object so I think I have to do > > it manually in python > > > > something like: > > > > #### python3 > > > > import pyarrow.parquet as pq > > import numpy as np > > data = pq.read_table(dataset_path') > > matrix = np.zeros((data.num_rows,data.num_columns),dtype=np.bool_) > > for i,col in enumerate(data.columns): > > matrix[:,i] = col > > > > > > > > > > Le lundi 01 mars 2021 à 11:31 +0100, Jacek Pliszka a écrit : > > > Other will probably give you better hints but > > > > > > You do not need to convert to Pandas. read in arrow and convert to > > > numpy directly if numpy is what you want. > > > > > > BR, > > > > > > Jacek > > > > > > pon., 1 mar 2021 o 11:24 jonathan mercier <jonathan.merc...@cnrgh.fr> > > > napisał(a): > > > > > > > > Dear, > > > > > > > > I try to studies 300 000 samples of SARS-Cov 2 with parquet/pyarrow > > > > thus I own a table with 300 000 columns and around 45 000 row of > > > > presence/absence (0/1). It is a file of ~150 Mo. > > > > > > > > I read this file like this: > > > > > > > > import pyarrow.parquet as pq > > > > data = > > > > pq.read_table(dataset_path).to_pandas().to_numpy().astype(numpy.bool_ > > > > ) > > > > > > > > And this statement take 1 hour … > > > > So is there a trick to speedup to load in memory those data ? > > > > Is it possible to distribute the loading with a library such as ray ? > > > > > > > > thanks > > > > > > > > Best regards > > > > > > > > > > > > -- > > > > Researcher computational biology > > > > PhD, Jonathan MERCIER > > > > > > > > Bioinformatics (LBI) > > > > 2, rue Gaston > > > > Crémieux > > > > 91057 Evry Cedex > > > > > > > > > > > > Tel :(+33)1 60 87 83 44 > > > > Email :jonathan.merc...@cnrgh.fr > > > > > > > > > > > > > > > > -- > > Researcher computational biology > > PhD, Jonathan MERCIER > > > > Bioinformatics (LBI) > > 2, rue Gaston > > Crémieux > > 91057 Evry Cedex > > > > > > Tel :(+33)1 60 87 83 44 > > Email :jonathan.merc...@cnrgh.fr > > > > > >