Should be - if you need cast...

t.column(i).cast(..) uses arrow cast..

BR,

Jacek

pon., 1 mar 2021 o 17:04 Jacek Pliszka <jacek.plis...@gmail.com> napisał(a):
>
> Use np.column_stack and list comprehension:
>
> t = pq.read_table('a.pq')
> matrix = np.column_stack([t.column(i) for i in range(t.num_columns)])
>
> If you need case - use pyarrow or numpy one - depending on your case.
>
> BR,
>
> Jacek
>
> pon., 1 mar 2021 o 14:07 jonathan mercier <jonathan.merc...@cnrgh.fr>
> napisał(a):
> >
> > Thanks for the hint.
> > I do not saw a to_numpy method from Tabl object so I think I have to do
> > it manually in python
> >
> > something like:
> >
> > #### python3
> >
> > import pyarrow.parquet as pq
> > import numpy as np
> > data = pq.read_table(dataset_path')
> > matrix = np.zeros((data.num_rows,data.num_columns),dtype=np.bool_)
> > for i,col in enumerate(data.columns):
> >     matrix[:,i] = col
> >
> >
> >
> >
> > Le lundi 01 mars 2021 à 11:31 +0100, Jacek Pliszka a écrit :
> > > Other will probably give you better hints but
> > >
> > > You do not need to convert to Pandas.  read in arrow and convert to
> > > numpy directly if numpy is what you want.
> > >
> > > BR,
> > >
> > > Jacek
> > >
> > > pon., 1 mar 2021 o 11:24 jonathan mercier <jonathan.merc...@cnrgh.fr>
> > > napisał(a):
> > > >
> > > > Dear,
> > > >
> > > > I try to studies 300 000 samples of SARS-Cov 2 with parquet/pyarrow
> > > > thus I own a table with 300 000 columns and around 45 000 row of
> > > > presence/absence (0/1). It is a  file of ~150 Mo.
> > > >
> > > > I read this file like this:
> > > >
> > > > import pyarrow.parquet as pq
> > > > data =
> > > > pq.read_table(dataset_path).to_pandas().to_numpy().astype(numpy.bool_
> > > > )
> > > >
> > > > And this statement take 1 hour …
> > > > So is there a trick to speedup to load in memory those data ?
> > > > Is it possible to distribute the loading with a library such as ray ?
> > > >
> > > > thanks
> > > >
> > > > Best regards
> > > >
> > > >
> > > > --
> > > >                 Researcher computational biology
> > > >                 PhD, Jonathan MERCIER
> > > >
> > > >                 Bioinformatics (LBI)
> > > >                 2, rue Gaston
> > > >                 Crémieux
> > > >                 91057 Evry Cedex
> > > >
> > > >
> > > >                 Tel :(+33)1 60 87 83 44
> > > >                 Email :jonathan.merc...@cnrgh.fr
> > > >
> > > >
> > > >
> >
> > --
> >                 Researcher computational biology
> >                 PhD, Jonathan MERCIER
> >
> >                 Bioinformatics (LBI)
> >                 2, rue Gaston
> >                 Crémieux
> >                 91057 Evry Cedex
> >
> >
> >                 Tel :(+33)1 60 87 83 44
> >                 Email :jonathan.merc...@cnrgh.fr
> >
> >
> >

Reply via email to