On 11/22/2017 10:54 AM, Friedrich Rentsch wrote:
On 11/21/2017 03:26 PM, Jason wrote:
On Monday, November 20, 2017 at 10:49:01 AM UTC-5, Jason wrote:
a pipeline can be described as a sequence of functions that are
applied to an input with each subsequent function getting the output
of the preceding function:
out = f6(f5(f4(f3(f2(f1(in))))))
However this isn't very readable and does not support conditionals.
Tensorflow has tensor-focused pipepines:
fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu,
scope='fc1')
fc2 = layers.fully_connected(fc1, 256,
activation_fn=tf.nn.relu, scope='fc2')
out = layers.fully_connected(fc2, 10, activation_fn=None,
scope='out')
I have some code which allows me to mimic this, but with an implied
parameter.
def executePipeline(steps, collection_funcs = [map, filter, reduce]):
results = None
for step in steps:
func = step[0]
params = step[1]
if func in collection_funcs:
print func, params[0]
results = func(functools.partial(params[0],
*params[1:]), results)
else:
print func
if results is None:
results = func(*params)
else:
results = func(*(params+(results,)))
return results
executePipeline( [
(read_rows, (in_file,)),
(map, (lower_row, field)),
(stash_rows, ('stashed_file', )),
(map, (lemmatize_row, field)),
(vectorize_rows, (field, min_count,)),
(evaluate_rows, (weights, None)),
(recombine_rows, ('stashed_file', )),
(write_rows, (out_file,))
]
)
Which gets me close, but I can't control where rows gets passed in.
In the above code, it is always the last parameter.
I feel like I'm reinventing a wheel here. I was wondering if
there's already something that exists?
Why do I want this? Because I'm tired of writing code that is locked
away in a bespoke function. I'd have an army of functions all
slightly different in functionality. I require flexibility in
defining pipelines, and I don't want a custom pipeline to require any
low-level coding. I just want to feed a sequence of functions to a
script and have it process it. A middle ground between the shell |
operator and bespoke python code. Sure, I could write many binaries
bound by shell, but there are some things done far easier in python
because of its extensive libraries and it can exist throughout the
execution of the pipeline whereas any temporary persistence has to
be though environment variables or files.
Well after examining your feedback, it looks like Grapevine has 99%
of the concepts that I wanted to invent, even if the | operator seems
a bit clunky. I personally prefer the affluent interface convention.
But this should work.
Kamaelia could also work, but it seems a little bit more grandiose.
Thanks everyone who chimed in!
This looks very much like I what I have been working on of late: a
generic processing paradigm based on chainable building blocks. I call
them Workshops, because the base class can be thought of as a workshop
that takes some raw material, processes it and delivers the product
(to the next in line). Your example might look something like this:
>>> import workshops as WS
>>> Vectorizer = WS.Chain (
WS.File_Reader (), # WS provides
WS.Map (lower_row), # WS provides (wrapped builtin)
Row_Stasher (), # You provide
WS.Map (lemmatize_row), # WS provides
Row_Vectorizer (), # Yours
Row_Evaluator (), # Yours
Row_Recombiner (),
WS.File_Writer (),
_name = 'Vectorizer'
)
Parameters are process-control settings that travel through a
subscription-based mailing system separate from the payload pipe.
>>> Vectorizer.post (min_count = ..., ) # Set all parameters that
control the entire run.
>>> Vectorizer.post ("File_Writer", file_name =
'output_file_name') # Addressed, not meant for File_Reader
Run
>>> Vectorizer ('input_file_name') # File Writer returns 0 if
the Chain completes successfully.
0
If you would provide a list of your functions (input, output,
parameters) I'd be happy to show a functioning solution. Writing a
Shop follows a simple standard pattern: Naming the subscriptions, if
any, and writing a single method that reads the subscribed parameters,
if any, then takes payload, processes it and returns the product.
I intend to share the system, provided there's an interest. I'd
have to tidy it up quite a bit, though, before daring to release it.
There's a lot more to it . . .
Frederic
I'm sorry, I made a mistake with the "From" item. My address is
obviously not "python-list". It is "anthra.nor...@bluewin.ch".
Frederic
--
https://mail.python.org/mailman/listinfo/python-list