hi Guillaume, I cannot use our MDS solver at this scale. Even if you fit it in RAM it will be slow.
I would play with https://github.com/lmcinnes/umap unless you really what a classic MDS. Alex On Thu, Oct 11, 2018 at 10:31 AM Guillaume Favelier <guillaume.favel...@lip6.fr> wrote: > > Hello J.B, > > Thank you for your quick reply. > > > If you try with a very small (e.g., 100 sample) data file, does your code > > employing MDS work? > > As you increase the number of samples, does the script continue to work? > So I tried the same script while increasing the number of samples (100, > 1000 and 10000) and it works indeed without swapping on my workstation. > > > That is 49,000,000 entries, plus overhead for a data structure. > I thought that even 49M entries of doubles would be able to be processed > with 64G of RAM. Is there something to configure to allow this computation? > > The typical datasets I use can have around 200-300k rows with a few columns > (usually up to 3). > > Best regards, > > Guillaume > > Quoting "Brown J.B. via scikit-learn" <scikit-learn@python.org>: > > > Hello Guillaume, > > > > You are computing a distance matrix of shape 70000x70000 to generate MDS > > coordinates. > > That is 49,000,000 entries, plus overhead for a data structure. > > > > If you try with a very small (e.g., 100 sample) data file, does your code > > employing MDS work? > > As you increase the number of samples, does the script continue to work? > > > > Hope this helps you get started. > > J.B. > > > > 2018年10月9日(火) 18:22 Guillaume Favelier <guillaume.favel...@lip6.fr>: > > > >> Hi everyone, > >> > >> I'm trying to use some dimension reduction algorithm [1] on my dataset > >> [2] in a > >> python script [3] but for some reason, Python seems to consume a lot of my > >> main memory and even swap on my configuration [4] so I don't have the > >> expected result > >> but a memory error instead. > >> > >> I have the impression that this behaviour is not intended so can you > >> help me know > >> what I did wrong or miss somewhere please? > >> > >> [1]: MDS - > >> http://scikit-learn.org/stable/modules/generated/sklearn.manifold.MDS.html > >> [2]: dragon.csv - 69827 rows, 3 columns (x,y,z) > >> [3]: dragon.py - 10 lines > >> [4]: dragon_swap.png - htop on my workstation > >> > >> TAR archive: > >> https://drive.google.com/open?id=1d1S99XeI7wNEq131wkBUCBrctPQRgpxn > >> > >> Best regards, > >> > >> Guillaume Favelier > >> > >> _______________________________________________ > >> scikit-learn mailing list > >> scikit-learn@python.org > >> https://mail.python.org/mailman/listinfo/scikit-learn > >> > > > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn