Re: [DISCUSS] Google Summer of Code 2018
A lot of great ideas here! I think some cool processors or new controller services would make a lot of sense for GSoC: no need to have a deep knowledge of the NiFi framework to get started. Everything around provenance would certainly sound attractive from a student perspective: graph analytics and machine learning are trendy subjects. Pierre 2018-02-28 14:33 GMT+01:00 Matt Burgess: > Yes I do! Sorry all, I had sent the original message in haste to get > the information out for discussion, but didn't have the time at that > moment to share everything else, including my enthusiasm and some > actual ideas :) Here are some I came up with, note that many may not > be "industrial-strength" but still interesting student projects: > > - Anything to do with provenance. Uwe has a wonderful idea that I will > respond to separately, but there are lots of applications and > approaches that can make use provenance, such as graph analytics (find > flow bottlenecks, e.g.), machine learning (predict likelihood of > reaching a failure connection based on attributes and/or content), > etc. > - An Apache Calcite adapter that can read from a NiFi Output Port. > This probably makes more sense from a SQL Streaming perspective than > emulating a relational DB, but is an interesting application of > Calcite and NiFi. > - An UpdateAttributeUsingJava processor (with a better name), this > could use Janino to quickly evaluate Java expressions that can > leverage attributes and perhaps all of Expression Language to perform > more powerful functions (without needing a full scripted processor) > - A RouteOnProbability processor, to support Monte Carlo simulations. > User-defined properties could have values whose sum is 1 and whose > keys become the outgoing relationship names. > - A SampleReservoir processor, to do reservoir sampling (good for > testing downstream flows without throwing a ton of data at it) > - YAML Record Reader/Writer > > Looks like proposals are being accepted on March 18 (I don't know if > that's for students proposing/selecting projects or for organizations > to propose possible projects) , but there are a number of Apache Jira > issues already tagged as gsoc2018 [1]. > > Regards, > Matt > > [1] http://s.apache.org/gsoc2018ideas > > > On Wed, Feb 28, 2018 at 12:12 AM, Joe Witt wrote: > > Matt > > > > Did you have some ideas/features/enhancements in mind you think would > > be good to propose? > > > > Thanks > > Joe > > > > On Tue, Feb 27, 2018 at 6:56 PM, Matt Burgess > wrote: > >> If you haven't heard yet, the Apache Software Foundation was selected > >> as an organization for this year's Google Summer of Code [1]. I've > >> seen activity on other Apache projects' mailing lists requesting ideas > >> for issues, features, components, etc. that could be good > >> proposals/ideas for GSoC, and I'd like to also make that request of > >> this community. > >> > >> As Michael Mior (of Apache Calcite PMC) eloquently put it: "It's no > >> guarantee we would get someone to work on it, but it could be a good > >> push to move some isolated bits of functionality forward that may not > >> get much attention otherwise." > >> > >> Thoughts? > >> > >> Thanks in advance, > >> Matt > >> > >> [1] https://summerofcode.withgoogle.com/organizations/5718432427802624/ >
Re: [DISCUSS] Google Summer of Code 2018
Yes I do! Sorry all, I had sent the original message in haste to get the information out for discussion, but didn't have the time at that moment to share everything else, including my enthusiasm and some actual ideas :) Here are some I came up with, note that many may not be "industrial-strength" but still interesting student projects: - Anything to do with provenance. Uwe has a wonderful idea that I will respond to separately, but there are lots of applications and approaches that can make use provenance, such as graph analytics (find flow bottlenecks, e.g.), machine learning (predict likelihood of reaching a failure connection based on attributes and/or content), etc. - An Apache Calcite adapter that can read from a NiFi Output Port. This probably makes more sense from a SQL Streaming perspective than emulating a relational DB, but is an interesting application of Calcite and NiFi. - An UpdateAttributeUsingJava processor (with a better name), this could use Janino to quickly evaluate Java expressions that can leverage attributes and perhaps all of Expression Language to perform more powerful functions (without needing a full scripted processor) - A RouteOnProbability processor, to support Monte Carlo simulations. User-defined properties could have values whose sum is 1 and whose keys become the outgoing relationship names. - A SampleReservoir processor, to do reservoir sampling (good for testing downstream flows without throwing a ton of data at it) - YAML Record Reader/Writer Looks like proposals are being accepted on March 18 (I don't know if that's for students proposing/selecting projects or for organizations to propose possible projects) , but there are a number of Apache Jira issues already tagged as gsoc2018 [1]. Regards, Matt [1] http://s.apache.org/gsoc2018ideas On Wed, Feb 28, 2018 at 12:12 AM, Joe Wittwrote: > Matt > > Did you have some ideas/features/enhancements in mind you think would > be good to propose? > > Thanks > Joe > > On Tue, Feb 27, 2018 at 6:56 PM, Matt Burgess wrote: >> If you haven't heard yet, the Apache Software Foundation was selected >> as an organization for this year's Google Summer of Code [1]. I've >> seen activity on other Apache projects' mailing lists requesting ideas >> for issues, features, components, etc. that could be good >> proposals/ideas for GSoC, and I'd like to also make that request of >> this community. >> >> As Michael Mior (of Apache Calcite PMC) eloquently put it: "It's no >> guarantee we would get someone to work on it, but it could be a good >> push to move some isolated bits of functionality forward that may not >> get much attention otherwise." >> >> Thoughts? >> >> Thanks in advance, >> Matt >> >> [1] https://summerofcode.withgoogle.com/organizations/5718432427802624/
Re: [DISCUSS] Google Summer of Code 2018
Hi Matt, not sure it matches the GSoC. I am thinking about process mining. Take the Provenance/Data Lineage information from the NiFi repository or from Apache Atlas (and maybe some additional information from the processors) and analyze whether the processes are optimal and display it graphically. See https://en.wikipedia.org/wiki/Process_mining or https://coda.fluxicon.com/book/intro.html Best Regards, Uwe Am 28.02.2018 um 06:12 schrieb Joe Witt: > Matt > > Did you have some ideas/features/enhancements in mind you think would > be good to propose? > > Thanks > Joe > > On Tue, Feb 27, 2018 at 6:56 PM, Matt Burgesswrote: >> If you haven't heard yet, the Apache Software Foundation was selected >> as an organization for this year's Google Summer of Code [1]. I've >> seen activity on other Apache projects' mailing lists requesting ideas >> for issues, features, components, etc. that could be good >> proposals/ideas for GSoC, and I'd like to also make that request of >> this community. >> >> As Michael Mior (of Apache Calcite PMC) eloquently put it: "It's no >> guarantee we would get someone to work on it, but it could be a good >> push to move some isolated bits of functionality forward that may not >> get much attention otherwise." >> >> Thoughts? >> >> Thanks in advance, >> Matt >> >> [1] https://summerofcode.withgoogle.com/organizations/5718432427802624/
Re: [DISCUSS] Google Summer of Code 2018
Matt Did you have some ideas/features/enhancements in mind you think would be good to propose? Thanks Joe On Tue, Feb 27, 2018 at 6:56 PM, Matt Burgesswrote: > If you haven't heard yet, the Apache Software Foundation was selected > as an organization for this year's Google Summer of Code [1]. I've > seen activity on other Apache projects' mailing lists requesting ideas > for issues, features, components, etc. that could be good > proposals/ideas for GSoC, and I'd like to also make that request of > this community. > > As Michael Mior (of Apache Calcite PMC) eloquently put it: "It's no > guarantee we would get someone to work on it, but it could be a good > push to move some isolated bits of functionality forward that may not > get much attention otherwise." > > Thoughts? > > Thanks in advance, > Matt > > [1] https://summerofcode.withgoogle.com/organizations/5718432427802624/
[DISCUSS] Google Summer of Code 2018
If you haven't heard yet, the Apache Software Foundation was selected as an organization for this year's Google Summer of Code [1]. I've seen activity on other Apache projects' mailing lists requesting ideas for issues, features, components, etc. that could be good proposals/ideas for GSoC, and I'd like to also make that request of this community. As Michael Mior (of Apache Calcite PMC) eloquently put it: "It's no guarantee we would get someone to work on it, but it could be a good push to move some isolated bits of functionality forward that may not get much attention otherwise." Thoughts? Thanks in advance, Matt [1] https://summerofcode.withgoogle.com/organizations/5718432427802624/