I would love to, but to go to Lille from my country I would need a visa. Which is not that easy to acquire. So maybe I will come to PharoDays 2018. And I will definitely try to come to ESUG Conference in September.
Oleks On Tue, May 16, 2017 at 7:26 PM, <[email protected]> wrote: > > > Envoyé de mon iPhone > > Le 11 mai 2017 à 11:43, "[email protected]" <[email protected]> a > écrit : > > ---------- Message transféré ---------- > De : "[email protected]" <[email protected]> > Date : 11 mai 2017 10:54 > Objet : Re: 11/05/17 - Tabular Data Structures for Data Analysis - > Oleksandr Zaytsev > À : "Nick Papoylias" <[email protected]> > Cc : > > > > On Thu, May 11, 2017 at 10:20 AM, Nick Papoylias <[email protected]> > wrote: > >> >> >> On Thu, May 11, 2017 at 5:24 AM, Oleksandr Zaytsev <[email protected] >> > wrote: >> >>> >>> *A. Work done* >>> >>> - Downloaded the threaded VM as suggested by Esteban Lorenzano to >>> make Iceberg work. And it does! I have successfully pushed my >>> NeuralNetwork >>> code to GitHub: https://github.com/olekscode/MLNeuralNetwork >>> - Joined the PolyMath organization on GitHub >>> - Created a repository for the TabularDataset project >>> https://github.com/PolyMathOrg/TabularDataset >>> <https://github.com/PolyMathOrg/TabularDataset> as a part of >>> PolyMath organization on GitHub >>> - Fixed a PolyMath issue #25 and made a PR >>> - Read an article from Wolfram Mathematica documentation regarding >>> Dataset. It was one of the reading suggestions sent to me by Nick >>> Papoylias >>> >>> >>> *B. Next steps* >>> >>> - Fix more issues of PolyMath, using Iceberg. I have to get used to >>> it by the time the coding phase starts >>> - Read the rest of Nick Papoylias's suggestions >>> >>> >>> *C. Help needed* >>> >>> - The Dataset in Wolfram, as well as Pandas in Python, has a very >>> advanced indexing system. Smalltalk has its own special conventions for >>> indexing, so I think that it would be great if I got familiar with them. >>> Could you suggest me some reading on this topic (what are the indexing >>> conventions in Smalltalk?). >>> For example, in Wolfram, I can write *dataset[[-1]]* to extract the >>> last row. But in Pharo indexes can not be negative. In Pharo I would say >>> *dataset >>> last*. But how about *dataset[[-5]]*? >>> >>> This would be a good exercise for you ;) In Pharo you can easily add >> negative indexing yourself. >> >> *Hint:* You know the index of the last element, since this is the size >> of the collection, so... ;) >> >> No need for changes, this exists already. > > Use atWrap: index put: value and atWrap: with negative indexes. > 'hello' atWrap: -2 > > There is a specific version for Array using a primitive. > #[ 10 20 30 40 ] atWrap: -1 > > atWrap:0 gives you the last item. > atWrap: -1 gives 30 > > This is different from 0 based index languages. > > The interesing thing about atWrap: is that it uses modulo interally so you > do not need to care about that. > > ($/ split: 'abc/def/ghi/jkl') atWrap: -1 > --> 'ghi' > > The Matrix class has a bunch of things API wise but the class is highly > inefficient, doing copies all the time etc. It would be nice to have some > kind of futures/copy on write style things in there. > > I miss cbind and rbind. These are useful. I have some half baked super > inefficient implementations of these things for Matrix. > > https://stat.ethz.ch/R-manual/R-devel/library/base/html/cbind.html > > The ability to name columns is also nice to have. > > In R one does: > > df <- dataframe() > cbind(df, c(1,2,3)) > cbind(df, c(4,5,6)) > names(df)<-("C1", "C2", "C3") > names can be found back with: > > names(df) > > A Smalltalkish style would be welcome. > > > > > Interesting ! Are you coming to PharoDays ? We can talk about that if we > found time. > > Maybe looking at the Voyage queries can be helpful. > > Phil > > > >> Try adding an extention method to Ordrered or SequenceableCollection. >> >> If the Pharo by example chapter is not enough or the MOOC, read the source >> itself in the core, to see how basic methods are implemented (it is less >> scary, >> than it sounds). >> >> You can also try Chapters 9, 10, 11 of the blue book (some API changes >> may apply): >> >> <http://goog_1902892863> >> http://sdmeta.gforge.inria.fr/FreeBooks/BlueBook/Bluebook.pdf >> >> >>> - Or what is the best way of implementing this index: >>> *dataset[["name"]]* (extracts a named row), *dataset[[1]*] (extracts >>> the first row)? Should I create two separate messages: *dataset >>> rowNamed: 'name'* and *dataset rowAt: 1*? >>> >>> rowNamed: > rowAt: > > yes, look like it. > > But if we want to model things like R dataframes for example, this has to > be seen as a vectorized operation, so you can to use row slices, column > slices, and logical indexes. > > Check this out: > > http://www.r-tutor.com/r-introduction/data-frame/data-frame-row-slice > https://www.r-bloggers.com/working-with-data-frames/ > > > >> The internal representation of your data-structure can be anything at the >> moment, *as long as you encapsulate it.* >> >> (ie it can be nested OrderedCollections with meta-data for column-names >> to indexes, or dictionary of collections etc). >> >> *If you don't expose it to the user* (ie return it from the public api, >> or expect knowledge of it in argument passing), >> we can easily change it later. So *first make it work, and we optimize >> later ;)* >> >> For your case it will be a little bit trickier because *you also have >> the notions of a) rows and b) columns*, which >> are exposed to the user. So *you would need to create abstractions* for >> these too. >> >> Cheers, >> >> Nick >> >>> >>> - >>> >>> >>> If someone else is having problems with Iceberg on Linux, try >>> downloading the threaded VM: >>> >>> wget -O- get.pharo.org/vmT60 | bash >>> >>> And use SSH (not HTTPS) remote URL. >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Pharo Google Summer of Code" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> To view this discussion on the web visit https://groups.google.com/d/ms >>> gid/pharo-gsoc/CAEp0Uzu-8fw3dA6ezVoj-QptvLcB8cWPHvZ1tfLg1Ce8 >>> qkTqfQ%40mail.gmail.com >>> <https://groups.google.com/d/msgid/pharo-gsoc/CAEp0Uzu-8fw3dA6ezVoj-QptvLcB8cWPHvZ1tfLg1Ce8qkTqfQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Pharo Google Summer of Code" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> To view this discussion on the web visit https://groups.google.com/d/ms >> gid/pharo-gsoc/CACEStOgLC6HbYJ8HBLHWfs5%2BwqN3ib_kdVGuVizx7G >> h1c0sM%3DA%40mail.gmail.com >> <https://groups.google.com/d/msgid/pharo-gsoc/CACEStOgLC6HbYJ8HBLHWfs5%2BwqN3ib_kdVGuVizx7Gh1c0sM%3DA%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> For more options, visit https://groups.google.com/d/optout. >> > >
