Re: [DISCUSS] Google Summer of Code 2018

2018-02-28 Thread Pierre Villard
A lot of great ideas here!
I think some cool processors or new controller services would make a lot of
sense for GSoC: no need to have a deep knowledge of the NiFi framework to
get started.
Everything around provenance would certainly sound attractive from a
student perspective: graph analytics and machine learning are trendy
subjects.

Pierre

2018-02-28 14:33 GMT+01:00 Matt Burgess :

> Yes I do! Sorry all, I had sent the original message in haste to get
> the information out for discussion, but didn't have the time at that
> moment to share everything else, including my enthusiasm and some
> actual ideas :) Here are some I came up with, note that many may not
> be "industrial-strength" but still interesting student projects:
>
> - Anything to do with provenance. Uwe has a wonderful idea that I will
> respond to separately, but there are lots of applications and
> approaches that can make use provenance, such as graph analytics (find
> flow bottlenecks, e.g.), machine learning (predict likelihood of
> reaching a failure connection based on attributes and/or content),
> etc.
> - An Apache Calcite adapter that can read from a NiFi Output Port.
> This probably makes more sense from a SQL Streaming perspective than
> emulating a relational DB, but is an interesting application of
> Calcite and NiFi.
> - An UpdateAttributeUsingJava processor (with a better name), this
> could use Janino to quickly evaluate Java expressions that can
> leverage attributes and perhaps all of Expression Language to perform
> more powerful functions (without needing a full scripted processor)
> - A RouteOnProbability processor, to support Monte Carlo simulations.
> User-defined properties could have values whose sum is 1 and whose
> keys become the outgoing relationship names.
> - A SampleReservoir processor, to do reservoir sampling (good for
> testing downstream flows without throwing a ton of data at it)
> - YAML Record Reader/Writer
>
> Looks like proposals are being accepted on March 18 (I don't know if
> that's for students proposing/selecting projects or for organizations
> to propose possible projects) , but there are a number of Apache Jira
> issues already tagged as gsoc2018 [1].
>
> Regards,
> Matt
>
> [1] http://s.apache.org/gsoc2018ideas
>
>
> On Wed, Feb 28, 2018 at 12:12 AM, Joe Witt  wrote:
> > Matt
> >
> > Did you have some ideas/features/enhancements in mind you think would
> > be good to propose?
> >
> > Thanks
> > Joe
> >
> > On Tue, Feb 27, 2018 at 6:56 PM, Matt Burgess 
> wrote:
> >> If you haven't heard yet, the Apache Software Foundation was selected
> >> as an organization for this year's Google Summer of Code [1]. I've
> >> seen activity on other Apache projects' mailing lists requesting ideas
> >> for issues, features, components, etc. that could be good
> >> proposals/ideas for GSoC, and I'd like to also make that request of
> >> this community.
> >>
> >> As Michael Mior (of Apache Calcite PMC) eloquently put it: "It's no
> >> guarantee we would get someone to work on it, but it could be a good
> >> push to move some isolated bits of functionality forward that may not
> >> get much attention otherwise."
> >>
> >> Thoughts?
> >>
> >> Thanks in advance,
> >> Matt
> >>
> >> [1] https://summerofcode.withgoogle.com/organizations/5718432427802624/
>


Re: [DISCUSS] Google Summer of Code 2018

2018-02-28 Thread Matt Burgess
Yes I do! Sorry all, I had sent the original message in haste to get
the information out for discussion, but didn't have the time at that
moment to share everything else, including my enthusiasm and some
actual ideas :) Here are some I came up with, note that many may not
be "industrial-strength" but still interesting student projects:

- Anything to do with provenance. Uwe has a wonderful idea that I will
respond to separately, but there are lots of applications and
approaches that can make use provenance, such as graph analytics (find
flow bottlenecks, e.g.), machine learning (predict likelihood of
reaching a failure connection based on attributes and/or content),
etc.
- An Apache Calcite adapter that can read from a NiFi Output Port.
This probably makes more sense from a SQL Streaming perspective than
emulating a relational DB, but is an interesting application of
Calcite and NiFi.
- An UpdateAttributeUsingJava processor (with a better name), this
could use Janino to quickly evaluate Java expressions that can
leverage attributes and perhaps all of Expression Language to perform
more powerful functions (without needing a full scripted processor)
- A RouteOnProbability processor, to support Monte Carlo simulations.
User-defined properties could have values whose sum is 1 and whose
keys become the outgoing relationship names.
- A SampleReservoir processor, to do reservoir sampling (good for
testing downstream flows without throwing a ton of data at it)
- YAML Record Reader/Writer

Looks like proposals are being accepted on March 18 (I don't know if
that's for students proposing/selecting projects or for organizations
to propose possible projects) , but there are a number of Apache Jira
issues already tagged as gsoc2018 [1].

Regards,
Matt

[1] http://s.apache.org/gsoc2018ideas


On Wed, Feb 28, 2018 at 12:12 AM, Joe Witt  wrote:
> Matt
>
> Did you have some ideas/features/enhancements in mind you think would
> be good to propose?
>
> Thanks
> Joe
>
> On Tue, Feb 27, 2018 at 6:56 PM, Matt Burgess  wrote:
>> If you haven't heard yet, the Apache Software Foundation was selected
>> as an organization for this year's Google Summer of Code [1]. I've
>> seen activity on other Apache projects' mailing lists requesting ideas
>> for issues, features, components, etc. that could be good
>> proposals/ideas for GSoC, and I'd like to also make that request of
>> this community.
>>
>> As Michael Mior (of Apache Calcite PMC) eloquently put it: "It's no
>> guarantee we would get someone to work on it, but it could be a good
>> push to move some isolated bits of functionality forward that may not
>> get much attention otherwise."
>>
>> Thoughts?
>>
>> Thanks in advance,
>> Matt
>>
>> [1] https://summerofcode.withgoogle.com/organizations/5718432427802624/


Re: [DISCUSS] Google Summer of Code 2018

2018-02-27 Thread u...@moosheimer.com
Hi Matt,

not sure it matches the GSoC.
I am thinking about process mining. Take the Provenance/Data Lineage
information from the NiFi repository or from Apache Atlas (and maybe
some additional information from the processors) and analyze whether the
processes are optimal and display it graphically.

See https://en.wikipedia.org/wiki/Process_mining or
https://coda.fluxicon.com/book/intro.html

Best Regards,
Uwe

Am 28.02.2018 um 06:12 schrieb Joe Witt:
> Matt
>
> Did you have some ideas/features/enhancements in mind you think would
> be good to propose?
>
> Thanks
> Joe
>
> On Tue, Feb 27, 2018 at 6:56 PM, Matt Burgess  wrote:
>> If you haven't heard yet, the Apache Software Foundation was selected
>> as an organization for this year's Google Summer of Code [1]. I've
>> seen activity on other Apache projects' mailing lists requesting ideas
>> for issues, features, components, etc. that could be good
>> proposals/ideas for GSoC, and I'd like to also make that request of
>> this community.
>>
>> As Michael Mior (of Apache Calcite PMC) eloquently put it: "It's no
>> guarantee we would get someone to work on it, but it could be a good
>> push to move some isolated bits of functionality forward that may not
>> get much attention otherwise."
>>
>> Thoughts?
>>
>> Thanks in advance,
>> Matt
>>
>> [1] https://summerofcode.withgoogle.com/organizations/5718432427802624/




Re: [DISCUSS] Google Summer of Code 2018

2018-02-27 Thread Joe Witt
Matt

Did you have some ideas/features/enhancements in mind you think would
be good to propose?

Thanks
Joe

On Tue, Feb 27, 2018 at 6:56 PM, Matt Burgess  wrote:
> If you haven't heard yet, the Apache Software Foundation was selected
> as an organization for this year's Google Summer of Code [1]. I've
> seen activity on other Apache projects' mailing lists requesting ideas
> for issues, features, components, etc. that could be good
> proposals/ideas for GSoC, and I'd like to also make that request of
> this community.
>
> As Michael Mior (of Apache Calcite PMC) eloquently put it: "It's no
> guarantee we would get someone to work on it, but it could be a good
> push to move some isolated bits of functionality forward that may not
> get much attention otherwise."
>
> Thoughts?
>
> Thanks in advance,
> Matt
>
> [1] https://summerofcode.withgoogle.com/organizations/5718432427802624/


[DISCUSS] Google Summer of Code 2018

2018-02-27 Thread Matt Burgess
If you haven't heard yet, the Apache Software Foundation was selected
as an organization for this year's Google Summer of Code [1]. I've
seen activity on other Apache projects' mailing lists requesting ideas
for issues, features, components, etc. that could be good
proposals/ideas for GSoC, and I'd like to also make that request of
this community.

As Michael Mior (of Apache Calcite PMC) eloquently put it: "It's no
guarantee we would get someone to work on it, but it could be a good
push to move some isolated bits of functionality forward that may not
get much attention otherwise."

Thoughts?

Thanks in advance,
Matt

[1] https://summerofcode.withgoogle.com/organizations/5718432427802624/