I agree, having a data model defined and documented would help a lot in
separating processing from a specific ingest flow.

Michael

On Thu, Jun 22, 2017 at 1:31 PM, Jonathan Natkins <[email protected]>
wrote:

> Personally, I'd love for there to be more information about the expected
> schema for the ML jobs, as well as information about where the data can be
> picked up from. The documentation seems to be mostly written with a
> specific example in mind, so is not extremely helpful when trying to
> integrate new data sources. A data dictionary would help with being able to
> map fields from data formats (other logs, etc) to fields that spot-ml can
> process.
>
> Whatever happened to the open data model that was being discussed for Spot?
>
> Thanks!
> Natty
>
> On Thu, Jun 22, 2017 at 10:10 AM Barona, Ricardo <[email protected]
> >
> wrote:
>
> > Hi everyone.
> >
> > I’m happy to see how more people is playing with Spot and particularly
> > with spot-ml everytime.
> >
> > Something that I’ve noticed thanks to these two Jira issues (
> > https://issues.apache.org/jira/browse/SPOT-149 and
> > https://issues.apache.org/jira/browse/SPOT-174) is that sometimes users
> > are going to want to try spot-ml without ingesting data using spot-ingest
> > and I think that’s cool but seems like that can lead to inconsistent
> schema
> > issues.
> >
> > I’d like to know what you think, what would be the best approach to deal
> > with this; I’m thinking that we can add schema validation to spot-ml
> before
> > anything else happens but I don’t know if that’s going to lock things too
> > much.
> >
> > Please share your thoughts.
> >
> > Thanks,
> > Ricardo Barona
> >
> --
> Jonathan "Natty" Natkins
> StreamSets | Field Engineering Director
> mobile: 609.577.1600 | linkedin <http://www.linkedin.com/in/nattyice>
>



-- 
Michael Ridley <[email protected]>
office: (650) 352-1337
mobile: (571) 438-2420
Senior Solutions Architect
Cloudera, Inc.

Reply via email to