Hello, I mentored Arnaud to contribute the sketching extension into Beam and from a quick look at Alex paper + implementation, I think this should be an independent extension. Sketching is a collection of transforms that rely on probabilistic data structures to give approximate results and correspond clearly to the data sketching category.
Alex work is clearly a different area, it is more about data preprocessing and feature extraction, so I think it should be in a different module. Agree 100% that the best option is to do a rewrite on Java, this also has the advantage of easier maintainability. It would be really nice to have a new extension for this in Beam so don't hesitate to ask in the mailing list / slack if you have questions. Regards, Ismaël On Mon, Oct 29, 2018 at 10:38 AM Maximilian Michels <m...@apache.org> wrote: > > Hey Alex, > > No need to reimplement. Java is the best option, since we don't > currently have a Scala API in Beam. > > Cheers, > Max > > On 25.10.18 21:50, Alex wrote: > > Great! Right now there is a lot on that code I do not understand, hope in > > the next days I can document myself. > > > > Should I reimplement my algorithms in Scala? Or could I create a wrapper > > that interface with the sketching extension? > > > > Cheers.On Oct 24, 2018 15:00, Maximilian Michels <m...@apache.org> wrote: > >> > >> Welcome Alejandro! Interesting work. The sketching extension looks like > >> a good place for your algorithms. > >> > >> -Max > >> > >> On 23.10.18 19:05, Lukasz Cwik wrote: > >>> Arnoud Fournier (afourn...@talend.com <mailto:afourn...@talend.com>) > >>> started by adding a library to support sketching > >>> (https://github.com/apache/beam/tree/master/sdks/java/extensions/sketching), > >>> I feel as those some of these could be added there or possibly within > >>> another extension. > >>> > >>> On Tue, Oct 23, 2018 at 9:54 AM Austin Bennett > >>> <whatwouldausti...@gmail.com <mailto:whatwouldausti...@gmail.com>> wrote: > >>> > >>> Hi Beam Devs, > >>> > >>> Alejandro, copied, is an enthusiastic developer, who recently coded > >>> up: > >>> https://github.com/elbaulp/DPASF (associated paper found: > >>> https://arxiv.org/abs/1810.06021). > >>> > >>> He had been looking to contribute that code to FlinkML, at which > >>> point I found him and alerted him to Beam. He has been learning a > >>> bit on Beam recently. Would this data-preprocessing be a welcome > >>> contribution to the project. If yes, perhaps others better versed > >>> in internals (I'm not there yet -- though could follow along!) would > >>> be willing to provide feedback to shape this to be a suitable Beam > >>> contribution. > >>> > >>> Cheers, > >>> Austin > >>> > >>>