I think that getting rid of the manifest.json and introducing a new kind of
resourse-id for an engine to be registered is a good idea.

Currently in the repository there are three important keys:
* engine id
* engine version - depends only on the path the engine was built at to
distinguish copies
* engine instance id - because of the name may be associated with the
engine itself, but in fact is the identificator of trained models for an
engine.
When running deploy you either get the latest trained model for the
engine-id and engine-version, what strictly ties it to the location it was
compiled at or you specify engine instance id. I am not sure, but I think
that in the latter case you could get a model for a completely different
engine, what could potentially fail because of initialization with improper
parameters.
What is more, the engine object creation relies only on the full name of
the EngineFactory, so the actual engine, which gets loaded is determined by
the current CLASSPATH. I guess that it is probably the place, which should
be modified if we want a multi-tenant architecture.
I have to admit that these things hadn't been completely clear to me, until
I went through the code.

We could introduce a new type of service for engine and model management. I
like the idea of the repository to push built engines under chosen ids. We
could also add some versioning of them if necessary.
I treat this approach purely as some kind of package management system.

As Pat said, a similar approach would let us rely only on the repository
and thanks to that run pio commands regardless of the machine and location.

Separating the engine part from the rest of PIO could potentially enable us
to come up with different architectures in the future and push us towards
micro-services ecosystem.

What do you think of separating models from engines in more visible way? I
mean that engine variants in terms of algorithm parameters are more like
model variants. I just see an engine only as code being a dependency for
application related models/algorithms. So you would register an engine - as
a code once and run training for some domain specific data (app) and
algorithm parameters, what would result in a different identifier, that
would be later used for deployment.

Regards,
Marcin




niedz., 18.09.2016 o 20:02 użytkownik Pat Ferrel <p...@occamsmachete.com>
napisał:

> This sounds like a good case for Donald’s suggestion.
>
> What I was trying to add to the discussion is a way to make all commands
> rely on state in the megastore, rather than any file on any machine in a
> cluster or on ordering of execution or execution from a location in a
> directory structure. All commands would then be stateless.
>
> This enables real use cases like provisioning PIO machines and running
> `pio deploy <resource-id>` to get a new PredictionServer. Provisioning can
> be container and discovery based rather cleanly.
>
>
> On Sep 17, 2016, at 5:26 PM, Mars Hall <m...@heroku.com> wrote:
>
> Hello folks,
>
> Great to hear about this possibility. I've been working on running
> PredictionIO on Heroku https://www.heroku.com
>
> Heroku's 12-factor architecture https://12factor.net prefers "stateless
> builds" to ensure that compiled artifacts result in processes which may be
> cheaply restarted, replaced, and scaled via process count & size. I imagine
> this stateless property would be valuable for others as well.
>
> The fact that `pio build` inserts stateful metadata into a database causes
> ripples throughout the lifecycle of PIO engines on Heroku:
>
> * An engine cannot be built for production without the production database
> available. When a production database contains PII (personally identifiable
> information) which has security compliance requirements, the build system
> may not be privileged to access that PII data. This also affects CI
> (continuous integration/testing), where engines would need to be rebuilt in
> production, defeating assurances CI is supposed to provide.
>
> * The build artifacts cannot be reliably reused. "Slugs" at Heroku are
> intended to be stateless, so that you can rollback to a previous version
> during the lifetime of an app. With `pio build` causing database
> side-effects, there's a greater-than-zero probability of slug-to-metadata
> inconsistencies eventually surfacing in a long-running system.
>
>
> From my user-perspective, a few changes to the CLI would fix it:
>
> 1. add a "skip registration" option, `pio build
> --without-engine-registration`
> 2. a new command `pio app register` that could be run separately in the
> built engine (before training)
>
> Alas, I do not know PredictionIO internals, so I can only offer a
> suggestion for how this might be solved.
>
>
> Donald, one specific note,
>
> Regarding "No automatic version matching of PIO binary distribution and
> artifacts version used in the engine template":
>
> The Heroku slug contains the PredictionIO binary distribution used to
> build the engine, so there's never a version matching issue. I guess some
> systems might deploy only the engine artifacts to production where a
> pre-existing PIO binary is available, but that seems like a risky practice
> for long-running systems.
>
>
> Thanks for listening,
>
> *Mars Hall
> Customer Facing Architect
> Salesforce App Cloud / Heroku
> San Francisco, California
>
> > On Sep 16, 2016, at 10:42, Donald Szeto <don...@apache.org> wrote:
> >
> > Hi all,
> >
> > I want to start the discussion of removing engine registration. How many
> people actually take advantage of being able to run pio commands everywhere
> outside of an engine template directory? This will be a nontrivial change
> on the operational side so I want to gauge the potential impact to existing
> users.
> >
> > Pros:
> > - Stateless build. This would work well with many PaaS.
> > - Eliminate the "pio build" command once and for all.
> > - Ability to use your own build system, i.e. Maven, Ant, Gradle, etc.
> > - Potentially better experience with IDE since engine templates no
> longer depends on an SBT plugin.
> >
> > Cons:
> > - Inability to run pio engine training and deployment commands outside
> of engine template directory.
> > - No automatic version matching of PIO binary distribution and artifacts
> version used in the engine template.
> > - A less unified user experience: from pio-build-train-deploy to build,
> then pio-train-deploy.
> >
> > Regards,
> > Donald
>
>
>

Reply via email to