Errata: if people thinks it’s ok…
On 6/22/17, 1:48 PM, "Barona, Ricardo" <[email protected]> wrote:
Completely agree. We recently incorporated this mark down document to
spot-ml folder:
https://github.com/apache/incubator-spot/blob/master/spot-ml/SUSPICIOUS_CONNECTS_SCHEMA.md.
But we can always improve.
Going back to the main issue, if people things it’s ok I’ll create an issue
for:
- Spot-ml check schema for Flow, DNS and Proxy input data
- Make more consistent the documentation about required schema for spot-ml
when not using spot-ingest
On 6/22/17, 1:37 PM, "Michael Ridley" <[email protected]> wrote:
I agree, having a data model defined and documented would help a lot in
separating processing from a specific ingest flow.
Michael
On Thu, Jun 22, 2017 at 1:31 PM, Jonathan Natkins <[email protected]>
wrote:
> Personally, I'd love for there to be more information about the
expected
> schema for the ML jobs, as well as information about where the data
can be
> picked up from. The documentation seems to be mostly written with a
> specific example in mind, so is not extremely helpful when trying to
> integrate new data sources. A data dictionary would help with being
able to
> map fields from data formats (other logs, etc) to fields that spot-ml
can
> process.
>
> Whatever happened to the open data model that was being discussed for
Spot?
>
> Thanks!
> Natty
>
> On Thu, Jun 22, 2017 at 10:10 AM Barona, Ricardo
<[email protected]
> >
> wrote:
>
> > Hi everyone.
> >
> > I’m happy to see how more people is playing with Spot and
particularly
> > with spot-ml everytime.
> >
> > Something that I’ve noticed thanks to these two Jira issues (
> > https://issues.apache.org/jira/browse/SPOT-149 and
> > https://issues.apache.org/jira/browse/SPOT-174) is that sometimes
users
> > are going to want to try spot-ml without ingesting data using
spot-ingest
> > and I think that’s cool but seems like that can lead to inconsistent
> schema
> > issues.
> >
> > I’d like to know what you think, what would be the best approach to
deal
> > with this; I’m thinking that we can add schema validation to spot-ml
> before
> > anything else happens but I don’t know if that’s going to lock
things too
> > much.
> >
> > Please share your thoughts.
> >
> > Thanks,
> > Ricardo Barona
> >
> --
> Jonathan "Natty" Natkins
> StreamSets | Field Engineering Director
> mobile: 609.577.1600 | linkedin <http://www.linkedin.com/in/nattyice>
>
--
Michael Ridley <[email protected]>
office: (650) 352-1337
mobile: (571) 438-2420
Senior Solutions Architect
Cloudera, Inc.