Re: [Pharo-users] Porting Transducers to Pharo

p...@highoctane.be Tue, 06 Jun 2017 05:15:39 -0700

Check

https://medium.com/@i.oleks/interface-for-selecting-data-from-a-dataset-d7c6b5fb378f


https://medium.com/@i.oleks/an-example-of-test-driven-development-c9ce033f05ef

Loading a dataframe from a dataset and then using transducers for a
transformation pipeline would be great.

DataFrame is part of Polymath.

https://github.com/PolyMathOrg/PolyMath
http://www.smalltalkhub.com/#!/~PolyMath/PolyMath


Phil

On Tue, Jun 6, 2017 at 1:09 PM, Steffen Märcker <merk...@web.de> wrote:

> Hi Phil,
>
> Coupling this with Olek's work on the DataFrame could really come handy.
>>
>
> I am new to this list. Could you please elaborate?
>
> Cheers!
> Steffen
>
>
>
> On Mon, Jun 5, 2017 at 9:14 AM, Stephane Ducasse <stepharo.s...@gmail.com>
>> wrote:
>>
>> Hi Steffen
>>>
>>>
>>> > The short answer is that the compact notation turned out to work much
>>> better
>>> > for me in my code, especially, if multiple transducers are involved.
>>> But
>>> > that's my personal taste. You can choose which suits you better. In
>>> fact,
>>> >
>>> >   1000 take.
>>> >
>>> > just sits on top and simply calls
>>> >
>>> >   Take number: 1000.
>>>
>>> To me this is much much better.
>>>
>>>
>>> > If the need arises, we could of course factor the compact notation out
>>> into
>>> > a separate package.
>>> Good idea
>>>
>>>  Btw, would you prefer (Take n: 1000) over (Take number:
>>> > 1000)?
>>>
>>> I tend to prefer explicit selector :)
>>>
>>>
>>> > Damien, you're right, I experimented with additional styles. Right now,
>>> we
>>> > already have in the basic Transducer package:
>>> >
>>> >   (collection transduce: #squared map * 1000 take. "which is equal to"
>>> >   (collection transduce: #squared map) transduce: 1000 take.
>>> >
>>> > Basically, one can split #transduce:reduce:init: into single calls of
>>> > #transduce:, #reduce:, and #init:, depending on the needs.
>>> > I also have an (unfinished) extension, that allows to write:
>>> >
>>> >   (collection transduce map: #squared) take: 1000.
>>>
>>> To me this is much mre readable.
>>> I cannot and do not want to use the other forms.
>>>
>>>
>>> > This feels familiar, but becomes a bit hard to read if more than two
>>> steps
>>> > are needed.
>>> >
>>> >   collection transduce
>>> >                map: #squared;
>>> >                take: 1000.
>>>
>>> Why this is would hard to read. We do that all the time everywhere.
>>>
>>>
>>> > I think, this alternative would reads nicely. But as the message chain
>>> has
>>> > to modify the underlying object (an eduction), very snaky side effects
>>> may
>>> > occur. E.g., consider
>>> >
>>> >   eduction := collection transduce.
>>> >   squared  := eduction map: #squared.
>>> >   take     := squared take: 1000.
>>> >
>>> > Now, all three variables hold onto the same object, which first squares
>>> all
>>> > elements and than takes the first 1000.
>>>
>>> This is because the programmer did not understand what he did. No?
>>>
>>>
>>>
>>> Stef
>>>
>>> PS: I played with infinite stream and iteration back in 1993 in CLOS.
>>> Now I do not like to mix things because it breaks my flow of thinking.
>>>
>>>
>>> >
>>> > Best,
>>> > Steffen
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > Am .06.2017, 21:28 Uhr, schrieb Damien Pollet
>>> > <damien.pollet+ph...@gmail.com>:
>>> >
>>> >> If I recall correctly, there is an alternate protocol that looks more
>>> like
>>> >> xtreams or the traditional select/collect iterations.
>>> >>
>>> >> On 2 June 2017 at 21:12, Stephane Ducasse <stepharo.s...@gmail.com>
>>> wrote:
>>> >>
>>> >>> I have a design question
>>> >>>
>>> >>> why the library is implemented in functional style vs messages?
>>> >>> I do not see why this is needed. To my eyes the compact notation
>>> >>> goes against readibility of code and it feels ad-hoc in Smalltalk.
>>> >>>
>>> >>>
>>> >>> I really prefer
>>> >>>
>>> >>> square := Map function: #squared.
>>> >>> take := Take number: 1000.
>>> >>>
>>> >>> Because I know that I can read it and understand it.
>>> >>> From that perspective I prefer Xtreams.
>>> >>>
>>> >>> Stef
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Wed, May 31, 2017 at 2:23 PM, Steffen Märcker <merk...@web.de>
>>> wrote:
>>> >>>
>>> >>>> Hi,
>>> >>>>
>>> >>>> I am the developer of the library 'Transducers' for VisualWorks. It
>>> was
>>> >>>> formerly known as 'Reducers', but this name was a poor choice. I'd
>>> like
>>> >>>> to
>>> >>>> port it to Pharo, if there is any interest on your side. I hope to
>>> learn
>>> >>>> more about Pharo in this process, since I am mainly a VW guy. And
>>> most
>>> >>>> likely, I will come up with a bunch of questions. :-)
>>> >>>>
>>> >>>> Meanwhile, I'll cross-post the introduction from VWnc below. I'd be
>>> very
>>> >>>> happy to hear your optinions, questions and I hope we can start a
>>> >>>> fruitful
>>> >>>> discussion - even if there is not Pharo port yet.
>>> >>>>
>>> >>>> Best, Steffen
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> Transducers are building blocks that encapsulate how to process
>>> elements
>>> >>>> of a data sequence independently of the underlying input and output
>>> >>>> source.
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> # Overview
>>> >>>>
>>> >>>> ## Encapsulate
>>> >>>> Implementations of enumeration methods, such as #collect:, have the
>>> >>>> logic
>>> >>>> how to process a single element in common.
>>> >>>> However, that logic is reimplemented each and every time.
>>> Transducers
>>> >>>> make
>>> >>>> it explicit and facilitate re-use and coherent behavior.
>>> >>>> For example:
>>> >>>> - #collect: requires mapping: (aBlock1 map)
>>> >>>> - #select: requires filtering: (aBlock2 filter)
>>> >>>>
>>> >>>>
>>> >>>> ## Compose
>>> >>>> In practice, algorithms often require multiple processing steps,
>>> e.g.,
>>> >>>> mapping only a filtered set of elements.
>>> >>>> Transducers are inherently composable, and hereby, allow to make the
>>> >>>> combination of steps explicit.
>>> >>>> Since transducers do not build intermediate collections, their
>>> >>>> composition
>>> >>>> is memory-efficient.
>>> >>>> For example:
>>> >>>> - (aBlock1 filter) * (aBlock2 map)   "(1.) filter and (2.) map
>>> elements"
>>> >>>>
>>> >>>>
>>> >>>> ## Re-Use
>>> >>>> Transducers are decoupled from the input and output sources, and
>>> hence,
>>> >>>> they can be reused in different contexts.
>>> >>>> For example:
>>> >>>> - enumeration of collections
>>> >>>> - processing of streams
>>> >>>> - communicating via channels
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> # Usage by Example
>>> >>>>
>>> >>>> We build a coin flipping experiment and count the occurrence of
>>> heads
>>> >>>> and
>>> >>>> tails.
>>> >>>>
>>> >>>> First, we associate random numbers with the sides of a coin.
>>> >>>>
>>> >>>>     scale := [:x | (x * 2 + 1) floor] map.
>>> >>>>     sides := #(heads tails) replace.
>>> >>>>
>>> >>>> Scale is a transducer that maps numbers x between 0 and 1 to 1 and
>>> 2.
>>> >>>> Sides is a transducer that replaces the numbers with heads an tails
>>> by
>>> >>>> lookup in an array.
>>> >>>> Next, we choose a number of samples.
>>> >>>>
>>> >>>>     count := 1000 take.
>>> >>>>
>>> >>>> Count is a transducer that takes 1000 elements from a source.
>>> >>>> We keep track of the occurrences of heads an tails using a bag.
>>> >>>>
>>> >>>>     collect := [:bag :c | bag add: c; yourself].
>>> >>>>
>>> >>>> Collect is binary block (reducing function) that collects events in
>>> a
>>> >>>> bag.
>>> >>>> We assemble the experiment by transforming the block using the
>>> >>>> transducers.
>>> >>>>
>>> >>>>     experiment := (scale * sides * count) transform: collect.
>>> >>>>
>>> >>>>   From left to right we see the steps involved: scale, sides, count
>>> and
>>> >>>> collect.
>>> >>>> Transforming assembles these steps into a binary block (reducing
>>> >>>> function)
>>> >>>> we can use to run the experiment.
>>> >>>>
>>> >>>>     samples := Random new
>>> >>>>                   reduce: experiment
>>> >>>>                   init: Bag new.
>>> >>>>
>>> >>>> Here, we use #reduce:init:, which is mostly similar to
>>> #inject:into:.
>>> >>>> To execute a transformation and a reduction together, we can use
>>> >>>> #transduce:reduce:init:.
>>> >>>>
>>> >>>>     samples := Random new
>>> >>>>                   transduce: scale * sides * count
>>> >>>>                   reduce: collect
>>> >>>>                   init: Bag new.
>>> >>>>
>>> >>>> We can also express the experiment as data-flow using #<~.
>>> >>>> This enables us to build objects that can be re-used in other
>>> >>>> experiments.
>>> >>>>
>>> >>>>     coin := sides <~ scale <~ Random new.
>>> >>>>     flip := Bag <~ count.
>>> >>>>
>>> >>>> Coin is an eduction, i.e., it binds transducers to a source and
>>> >>>> understands #reduce:init: among others.
>>> >>>> Flip is a transformed reduction, i.e., it binds transducers to a
>>> >>>> reducing
>>> >>>> function and an initial value.
>>> >>>> By sending #<~, we draw further samples from flipping the coin.
>>> >>>>
>>> >>>>     samples := flip <~ coin.
>>> >>>>
>>> >>>> This yields a new Bag with another 1000 samples.
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> # Basic Concepts
>>> >>>>
>>> >>>> ## Reducing Functions
>>> >>>>
>>> >>>> A reducing function represents a single step in processing a data
>>> >>>> sequence.
>>> >>>> It takes an accumulated result and a value, and returns a new
>>> >>>> accumulated
>>> >>>> result.
>>> >>>> For example:
>>> >>>>
>>> >>>>     collect := [:col :e | col add: e; yourself].
>>> >>>>     sum := #+.
>>> >>>>
>>> >>>> A reducing function can also be ternary, i.e., it takes an
>>> accumulated
>>> >>>> result, a key and a value.
>>> >>>> For example:
>>> >>>>
>>> >>>>     collect := [:dic :k :v | dict at: k put: v; yourself].
>>> >>>>
>>> >>>> Reducing functions may be equipped with an optional completing
>>> action.
>>> >>>> After finishing processing, it is invoked exactly once, e.g., to
>>> free
>>> >>>> resources.
>>> >>>>
>>> >>>>     stream := [:str :e | str nextPut: each; yourself] completing:
>>> >>>> #close.
>>> >>>>     absSum := #+ completing: #abs
>>> >>>>
>>> >>>> A reducing function can end processing early by signaling Reduced
>>> with a
>>> >>>> result.
>>> >>>> This mechanism also enables the treatment of infinite sources.
>>> >>>>
>>> >>>>     nonNil := [:res :e | e ifNil: [Reduced signalWith: res] ifFalse:
>>> >>>> [res]].
>>> >>>>
>>> >>>> The primary approach to process a data sequence is the reducing
>>> protocol
>>> >>>> with the messages #reduce:init: and #transduce:reduce:init: if
>>> >>>> transducers
>>> >>>> are involved.
>>> >>>> The behavior is similar to #inject:into: but in addition it takes
>>> care
>>> >>>> of:
>>> >>>> - handling binary and ternary reducing functions,
>>> >>>> - invoking the completing action after finishing, and
>>> >>>> - stopping the reduction if Reduced is signaled.
>>> >>>> The message #transduce:reduce:init: just combines the transformation
>>> and
>>> >>>> the reducing step.
>>> >>>>
>>> >>>> However, as reducing functions are step-wise in nature, an
>>> application
>>> >>>> may
>>> >>>> choose other means to process its data.
>>> >>>>
>>> >>>>
>>> >>>> ## Reducibles
>>> >>>>
>>> >>>> A data source is called reducible if it implements the reducing
>>> >>>> protocol.
>>> >>>> Default implementations are provided for collections and streams.
>>> >>>> Additionally, blocks without an argument are reducible, too.
>>> >>>> This allows to adapt to custom data sources without additional
>>> effort.
>>> >>>> For example:
>>> >>>>
>>> >>>>     "XStreams adaptor"
>>> >>>>     xstream := filename reading.
>>> >>>>     reducible := [[xstream get] on: Incomplete do: [Reduced
>>> signal]].
>>> >>>>
>>> >>>>     "natural numbers"
>>> >>>>     n := 0.
>>> >>>>     reducible := [n := n+1].
>>> >>>>
>>> >>>>
>>> >>>> ## Transducers
>>> >>>>
>>> >>>> A transducer is an object that transforms a reducing function into
>>> >>>> another.
>>> >>>> Transducers encapsulate common steps in processing data sequences,
>>> such
>>> >>>> as
>>> >>>> map, filter, concatenate, and flatten.
>>> >>>> A transducer transforms a reducing function into another via
>>> #transform:
>>> >>>> in order to add those steps.
>>> >>>> They can be composed using #* which yields a new transducer that
>>> does
>>> >>>> both
>>> >>>> transformations.
>>> >>>> Most transducers require an argument, typically blocks, symbols or
>>> >>>> numbers:
>>> >>>>
>>> >>>>     square := Map function: #squared.
>>> >>>>     take := Take number: 1000.
>>> >>>>
>>> >>>> To facilitate compact notation, the argument types implement
>>> >>>> corresponding
>>> >>>> methods:
>>> >>>>
>>> >>>>     squareAndTake := #squared map * 1000 take.
>>> >>>>
>>> >>>> Transducers requiring no argument are singletons and can be accessed
>>> by
>>> >>>> their class name.
>>> >>>>
>>> >>>>     flattenAndDedupe := Flatten * Dedupe.
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> # Advanced Concepts
>>> >>>>
>>> >>>> ## Data flows
>>> >>>>
>>> >>>> Processing a sequence of data can often be regarded as a data flow.
>>> >>>> The operator #<~ allows define a flow from a data source through
>>> >>>> processing steps to a drain.
>>> >>>> For example:
>>> >>>>
>>> >>>>     squares := Set <~ 1000 take <~ #squared map <~ (1 to: 1000).
>>> >>>>     fileOut writeStream <~ #isSeparator filter <~ fileIn readStream.
>>> >>>>
>>> >>>> In both examples #<~ is only used to set up the data flow using
>>> reducing
>>> >>>> functions and transducers.
>>> >>>> In contrast to streams, transducers are completely independent from
>>> >>>> input
>>> >>>> and output sources.
>>> >>>> Hence, we have a clear separation of reading data, writing data and
>>> >>>> processing elements.
>>> >>>> - Sources know how to iterate over data with a reducing function,
>>> e.g.,
>>> >>>> via #reduce:init:.
>>> >>>> - Drains know how to collect data using a reducing function.
>>> >>>> - Transducers know how to process single elements.
>>> >>>>
>>> >>>>
>>> >>>> ## Reductions
>>> >>>>
>>> >>>> A reduction binds an initial value or a block yielding an initial
>>> value
>>> >>>> to
>>> >>>> a reducing function.
>>> >>>> The idea is to define a ready-to-use process that can be applied in
>>> >>>> different contexts.
>>> >>>> Reducibles handle reductions via #reduce: and #transduce:reduce:
>>> >>>> For example:
>>> >>>>
>>> >>>>     sum := #+ init: 0.
>>> >>>>     sum1 := #(1 1 1) reduce: sum.
>>> >>>>     sum2 := (1 to: 1000) transduce: #odd filter reduce: sum.
>>> >>>>
>>> >>>>     asSet := [:set :e | set add: e; yourself] initializer: [Set
>>> new].
>>> >>>>     set1 := #(1 1 1) reduce: asSet.
>>> >>>>     set2 := #(1 to: 1000) transduce: #odd filter reduce: asSet.
>>> >>>>
>>> >>>> By combining a transducer with a reduction, a process can be further
>>> >>>> modified.
>>> >>>>
>>> >>>>     sumOdds := sum <~ #odd filter
>>> >>>>     setOdds := asSet <~ #odd filter
>>> >>>>
>>> >>>>
>>> >>>> ## Eductions
>>> >>>>
>>> >>>> An eduction combines a reducible data sources with a transducer.
>>> >>>> The idea is to define a transformed (virtual) data source that needs
>>> not
>>> >>>> to be stored in memory.
>>> >>>>
>>> >>>>     odds1 := #odd filter <~ #(1 2 3) readStream.
>>> >>>>     odds2 := #odd filter <~ (1 to 1000).
>>> >>>>
>>> >>>> Depending on the underlying source, eductions can be processed once
>>> >>>> (streams, e.g., odds1) or multiple times (collections, e.g., odds2).
>>> >>>> Since no intermediate data is stored, transducers actions are lazy,
>>> >>>> i.e.,
>>> >>>> they are invoked each time the eduction is processed.
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> # Origins
>>> >>>>
>>> >>>> Transducers is based on the same-named Clojure library and its
>>> ideas.
>>> >>>> Please see:
>>> >>>> http://clojure.org/transducers
>>> >>>>
>>> >>>>
>>> >
>>>
>>>
>>>
>
>

Re: [Pharo-users] Porting Transducers to Pharo

Reply via email to