Check https://medium.com/@i.oleks/interface-for-selecting-data-from-a-dataset-d7c6b5fb378f
https://medium.com/@i.oleks/an-example-of-test-driven-development-c9ce033f05ef Loading a dataframe from a dataset and then using transducers for a transformation pipeline would be great. DataFrame is part of Polymath. https://github.com/PolyMathOrg/PolyMath http://www.smalltalkhub.com/#!/~PolyMath/PolyMath Phil On Tue, Jun 6, 2017 at 1:09 PM, Steffen Märcker <merk...@web.de> wrote: > Hi Phil, > > Coupling this with Olek's work on the DataFrame could really come handy. >> > > I am new to this list. Could you please elaborate? > > Cheers! > Steffen > > > > On Mon, Jun 5, 2017 at 9:14 AM, Stephane Ducasse <stepharo.s...@gmail.com> >> wrote: >> >> Hi Steffen >>> >>> >>> > The short answer is that the compact notation turned out to work much >>> better >>> > for me in my code, especially, if multiple transducers are involved. >>> But >>> > that's my personal taste. You can choose which suits you better. In >>> fact, >>> > >>> > 1000 take. >>> > >>> > just sits on top and simply calls >>> > >>> > Take number: 1000. >>> >>> To me this is much much better. >>> >>> >>> > If the need arises, we could of course factor the compact notation out >>> into >>> > a separate package. >>> Good idea >>> >>> Btw, would you prefer (Take n: 1000) over (Take number: >>> > 1000)? >>> >>> I tend to prefer explicit selector :) >>> >>> >>> > Damien, you're right, I experimented with additional styles. Right now, >>> we >>> > already have in the basic Transducer package: >>> > >>> > (collection transduce: #squared map * 1000 take. "which is equal to" >>> > (collection transduce: #squared map) transduce: 1000 take. >>> > >>> > Basically, one can split #transduce:reduce:init: into single calls of >>> > #transduce:, #reduce:, and #init:, depending on the needs. >>> > I also have an (unfinished) extension, that allows to write: >>> > >>> > (collection transduce map: #squared) take: 1000. >>> >>> To me this is much mre readable. >>> I cannot and do not want to use the other forms. >>> >>> >>> > This feels familiar, but becomes a bit hard to read if more than two >>> steps >>> > are needed. >>> > >>> > collection transduce >>> > map: #squared; >>> > take: 1000. >>> >>> Why this is would hard to read. We do that all the time everywhere. >>> >>> >>> > I think, this alternative would reads nicely. But as the message chain >>> has >>> > to modify the underlying object (an eduction), very snaky side effects >>> may >>> > occur. E.g., consider >>> > >>> > eduction := collection transduce. >>> > squared := eduction map: #squared. >>> > take := squared take: 1000. >>> > >>> > Now, all three variables hold onto the same object, which first squares >>> all >>> > elements and than takes the first 1000. >>> >>> This is because the programmer did not understand what he did. No? >>> >>> >>> >>> Stef >>> >>> PS: I played with infinite stream and iteration back in 1993 in CLOS. >>> Now I do not like to mix things because it breaks my flow of thinking. >>> >>> >>> > >>> > Best, >>> > Steffen >>> > >>> > >>> > >>> > >>> > >>> > Am .06.2017, 21:28 Uhr, schrieb Damien Pollet >>> > <damien.pollet+ph...@gmail.com>: >>> > >>> >> If I recall correctly, there is an alternate protocol that looks more >>> like >>> >> xtreams or the traditional select/collect iterations. >>> >> >>> >> On 2 June 2017 at 21:12, Stephane Ducasse <stepharo.s...@gmail.com> >>> wrote: >>> >> >>> >>> I have a design question >>> >>> >>> >>> why the library is implemented in functional style vs messages? >>> >>> I do not see why this is needed. To my eyes the compact notation >>> >>> goes against readibility of code and it feels ad-hoc in Smalltalk. >>> >>> >>> >>> >>> >>> I really prefer >>> >>> >>> >>> square := Map function: #squared. >>> >>> take := Take number: 1000. >>> >>> >>> >>> Because I know that I can read it and understand it. >>> >>> From that perspective I prefer Xtreams. >>> >>> >>> >>> Stef >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Wed, May 31, 2017 at 2:23 PM, Steffen Märcker <merk...@web.de> >>> wrote: >>> >>> >>> >>>> Hi, >>> >>>> >>> >>>> I am the developer of the library 'Transducers' for VisualWorks. It >>> was >>> >>>> formerly known as 'Reducers', but this name was a poor choice. I'd >>> like >>> >>>> to >>> >>>> port it to Pharo, if there is any interest on your side. I hope to >>> learn >>> >>>> more about Pharo in this process, since I am mainly a VW guy. And >>> most >>> >>>> likely, I will come up with a bunch of questions. :-) >>> >>>> >>> >>>> Meanwhile, I'll cross-post the introduction from VWnc below. I'd be >>> very >>> >>>> happy to hear your optinions, questions and I hope we can start a >>> >>>> fruitful >>> >>>> discussion - even if there is not Pharo port yet. >>> >>>> >>> >>>> Best, Steffen >>> >>>> >>> >>>> >>> >>>> >>> >>>> Transducers are building blocks that encapsulate how to process >>> elements >>> >>>> of a data sequence independently of the underlying input and output >>> >>>> source. >>> >>>> >>> >>>> >>> >>>> >>> >>>> # Overview >>> >>>> >>> >>>> ## Encapsulate >>> >>>> Implementations of enumeration methods, such as #collect:, have the >>> >>>> logic >>> >>>> how to process a single element in common. >>> >>>> However, that logic is reimplemented each and every time. >>> Transducers >>> >>>> make >>> >>>> it explicit and facilitate re-use and coherent behavior. >>> >>>> For example: >>> >>>> - #collect: requires mapping: (aBlock1 map) >>> >>>> - #select: requires filtering: (aBlock2 filter) >>> >>>> >>> >>>> >>> >>>> ## Compose >>> >>>> In practice, algorithms often require multiple processing steps, >>> e.g., >>> >>>> mapping only a filtered set of elements. >>> >>>> Transducers are inherently composable, and hereby, allow to make the >>> >>>> combination of steps explicit. >>> >>>> Since transducers do not build intermediate collections, their >>> >>>> composition >>> >>>> is memory-efficient. >>> >>>> For example: >>> >>>> - (aBlock1 filter) * (aBlock2 map) "(1.) filter and (2.) map >>> elements" >>> >>>> >>> >>>> >>> >>>> ## Re-Use >>> >>>> Transducers are decoupled from the input and output sources, and >>> hence, >>> >>>> they can be reused in different contexts. >>> >>>> For example: >>> >>>> - enumeration of collections >>> >>>> - processing of streams >>> >>>> - communicating via channels >>> >>>> >>> >>>> >>> >>>> >>> >>>> # Usage by Example >>> >>>> >>> >>>> We build a coin flipping experiment and count the occurrence of >>> heads >>> >>>> and >>> >>>> tails. >>> >>>> >>> >>>> First, we associate random numbers with the sides of a coin. >>> >>>> >>> >>>> scale := [:x | (x * 2 + 1) floor] map. >>> >>>> sides := #(heads tails) replace. >>> >>>> >>> >>>> Scale is a transducer that maps numbers x between 0 and 1 to 1 and >>> 2. >>> >>>> Sides is a transducer that replaces the numbers with heads an tails >>> by >>> >>>> lookup in an array. >>> >>>> Next, we choose a number of samples. >>> >>>> >>> >>>> count := 1000 take. >>> >>>> >>> >>>> Count is a transducer that takes 1000 elements from a source. >>> >>>> We keep track of the occurrences of heads an tails using a bag. >>> >>>> >>> >>>> collect := [:bag :c | bag add: c; yourself]. >>> >>>> >>> >>>> Collect is binary block (reducing function) that collects events in >>> a >>> >>>> bag. >>> >>>> We assemble the experiment by transforming the block using the >>> >>>> transducers. >>> >>>> >>> >>>> experiment := (scale * sides * count) transform: collect. >>> >>>> >>> >>>> From left to right we see the steps involved: scale, sides, count >>> and >>> >>>> collect. >>> >>>> Transforming assembles these steps into a binary block (reducing >>> >>>> function) >>> >>>> we can use to run the experiment. >>> >>>> >>> >>>> samples := Random new >>> >>>> reduce: experiment >>> >>>> init: Bag new. >>> >>>> >>> >>>> Here, we use #reduce:init:, which is mostly similar to >>> #inject:into:. >>> >>>> To execute a transformation and a reduction together, we can use >>> >>>> #transduce:reduce:init:. >>> >>>> >>> >>>> samples := Random new >>> >>>> transduce: scale * sides * count >>> >>>> reduce: collect >>> >>>> init: Bag new. >>> >>>> >>> >>>> We can also express the experiment as data-flow using #<~. >>> >>>> This enables us to build objects that can be re-used in other >>> >>>> experiments. >>> >>>> >>> >>>> coin := sides <~ scale <~ Random new. >>> >>>> flip := Bag <~ count. >>> >>>> >>> >>>> Coin is an eduction, i.e., it binds transducers to a source and >>> >>>> understands #reduce:init: among others. >>> >>>> Flip is a transformed reduction, i.e., it binds transducers to a >>> >>>> reducing >>> >>>> function and an initial value. >>> >>>> By sending #<~, we draw further samples from flipping the coin. >>> >>>> >>> >>>> samples := flip <~ coin. >>> >>>> >>> >>>> This yields a new Bag with another 1000 samples. >>> >>>> >>> >>>> >>> >>>> >>> >>>> # Basic Concepts >>> >>>> >>> >>>> ## Reducing Functions >>> >>>> >>> >>>> A reducing function represents a single step in processing a data >>> >>>> sequence. >>> >>>> It takes an accumulated result and a value, and returns a new >>> >>>> accumulated >>> >>>> result. >>> >>>> For example: >>> >>>> >>> >>>> collect := [:col :e | col add: e; yourself]. >>> >>>> sum := #+. >>> >>>> >>> >>>> A reducing function can also be ternary, i.e., it takes an >>> accumulated >>> >>>> result, a key and a value. >>> >>>> For example: >>> >>>> >>> >>>> collect := [:dic :k :v | dict at: k put: v; yourself]. >>> >>>> >>> >>>> Reducing functions may be equipped with an optional completing >>> action. >>> >>>> After finishing processing, it is invoked exactly once, e.g., to >>> free >>> >>>> resources. >>> >>>> >>> >>>> stream := [:str :e | str nextPut: each; yourself] completing: >>> >>>> #close. >>> >>>> absSum := #+ completing: #abs >>> >>>> >>> >>>> A reducing function can end processing early by signaling Reduced >>> with a >>> >>>> result. >>> >>>> This mechanism also enables the treatment of infinite sources. >>> >>>> >>> >>>> nonNil := [:res :e | e ifNil: [Reduced signalWith: res] ifFalse: >>> >>>> [res]]. >>> >>>> >>> >>>> The primary approach to process a data sequence is the reducing >>> protocol >>> >>>> with the messages #reduce:init: and #transduce:reduce:init: if >>> >>>> transducers >>> >>>> are involved. >>> >>>> The behavior is similar to #inject:into: but in addition it takes >>> care >>> >>>> of: >>> >>>> - handling binary and ternary reducing functions, >>> >>>> - invoking the completing action after finishing, and >>> >>>> - stopping the reduction if Reduced is signaled. >>> >>>> The message #transduce:reduce:init: just combines the transformation >>> and >>> >>>> the reducing step. >>> >>>> >>> >>>> However, as reducing functions are step-wise in nature, an >>> application >>> >>>> may >>> >>>> choose other means to process its data. >>> >>>> >>> >>>> >>> >>>> ## Reducibles >>> >>>> >>> >>>> A data source is called reducible if it implements the reducing >>> >>>> protocol. >>> >>>> Default implementations are provided for collections and streams. >>> >>>> Additionally, blocks without an argument are reducible, too. >>> >>>> This allows to adapt to custom data sources without additional >>> effort. >>> >>>> For example: >>> >>>> >>> >>>> "XStreams adaptor" >>> >>>> xstream := filename reading. >>> >>>> reducible := [[xstream get] on: Incomplete do: [Reduced >>> signal]]. >>> >>>> >>> >>>> "natural numbers" >>> >>>> n := 0. >>> >>>> reducible := [n := n+1]. >>> >>>> >>> >>>> >>> >>>> ## Transducers >>> >>>> >>> >>>> A transducer is an object that transforms a reducing function into >>> >>>> another. >>> >>>> Transducers encapsulate common steps in processing data sequences, >>> such >>> >>>> as >>> >>>> map, filter, concatenate, and flatten. >>> >>>> A transducer transforms a reducing function into another via >>> #transform: >>> >>>> in order to add those steps. >>> >>>> They can be composed using #* which yields a new transducer that >>> does >>> >>>> both >>> >>>> transformations. >>> >>>> Most transducers require an argument, typically blocks, symbols or >>> >>>> numbers: >>> >>>> >>> >>>> square := Map function: #squared. >>> >>>> take := Take number: 1000. >>> >>>> >>> >>>> To facilitate compact notation, the argument types implement >>> >>>> corresponding >>> >>>> methods: >>> >>>> >>> >>>> squareAndTake := #squared map * 1000 take. >>> >>>> >>> >>>> Transducers requiring no argument are singletons and can be accessed >>> by >>> >>>> their class name. >>> >>>> >>> >>>> flattenAndDedupe := Flatten * Dedupe. >>> >>>> >>> >>>> >>> >>>> >>> >>>> # Advanced Concepts >>> >>>> >>> >>>> ## Data flows >>> >>>> >>> >>>> Processing a sequence of data can often be regarded as a data flow. >>> >>>> The operator #<~ allows define a flow from a data source through >>> >>>> processing steps to a drain. >>> >>>> For example: >>> >>>> >>> >>>> squares := Set <~ 1000 take <~ #squared map <~ (1 to: 1000). >>> >>>> fileOut writeStream <~ #isSeparator filter <~ fileIn readStream. >>> >>>> >>> >>>> In both examples #<~ is only used to set up the data flow using >>> reducing >>> >>>> functions and transducers. >>> >>>> In contrast to streams, transducers are completely independent from >>> >>>> input >>> >>>> and output sources. >>> >>>> Hence, we have a clear separation of reading data, writing data and >>> >>>> processing elements. >>> >>>> - Sources know how to iterate over data with a reducing function, >>> e.g., >>> >>>> via #reduce:init:. >>> >>>> - Drains know how to collect data using a reducing function. >>> >>>> - Transducers know how to process single elements. >>> >>>> >>> >>>> >>> >>>> ## Reductions >>> >>>> >>> >>>> A reduction binds an initial value or a block yielding an initial >>> value >>> >>>> to >>> >>>> a reducing function. >>> >>>> The idea is to define a ready-to-use process that can be applied in >>> >>>> different contexts. >>> >>>> Reducibles handle reductions via #reduce: and #transduce:reduce: >>> >>>> For example: >>> >>>> >>> >>>> sum := #+ init: 0. >>> >>>> sum1 := #(1 1 1) reduce: sum. >>> >>>> sum2 := (1 to: 1000) transduce: #odd filter reduce: sum. >>> >>>> >>> >>>> asSet := [:set :e | set add: e; yourself] initializer: [Set >>> new]. >>> >>>> set1 := #(1 1 1) reduce: asSet. >>> >>>> set2 := #(1 to: 1000) transduce: #odd filter reduce: asSet. >>> >>>> >>> >>>> By combining a transducer with a reduction, a process can be further >>> >>>> modified. >>> >>>> >>> >>>> sumOdds := sum <~ #odd filter >>> >>>> setOdds := asSet <~ #odd filter >>> >>>> >>> >>>> >>> >>>> ## Eductions >>> >>>> >>> >>>> An eduction combines a reducible data sources with a transducer. >>> >>>> The idea is to define a transformed (virtual) data source that needs >>> not >>> >>>> to be stored in memory. >>> >>>> >>> >>>> odds1 := #odd filter <~ #(1 2 3) readStream. >>> >>>> odds2 := #odd filter <~ (1 to 1000). >>> >>>> >>> >>>> Depending on the underlying source, eductions can be processed once >>> >>>> (streams, e.g., odds1) or multiple times (collections, e.g., odds2). >>> >>>> Since no intermediate data is stored, transducers actions are lazy, >>> >>>> i.e., >>> >>>> they are invoked each time the eduction is processed. >>> >>>> >>> >>>> >>> >>>> >>> >>>> # Origins >>> >>>> >>> >>>> Transducers is based on the same-named Clojure library and its >>> ideas. >>> >>>> Please see: >>> >>>> http://clojure.org/transducers >>> >>>> >>> >>>> >>> > >>> >>> >>> > >