Hi Anant,
The blog post about R-on-Dataflow should work for R-on-Beam -- it just
predates Beam; there is no longer any Dataflow Python that isn't based on
Beam :)
What have you tried?
Thanks,
Dan
On Mon, Apr 10, 2017 at 11:18 PM, Anant Bhandarkar <
anant.bhandar...@impactanalytics.co> wrote:
> Hi,
> We are using Big Query for our querying needs.
> We are also looking to use Dataflow with some of the statistical
> libraries. We are using R libraries to build these statistical models.
>
> We are looking to run our data through the statistical models such as ELM
> , GAM, ARIMA etc. We see that python doesn't have all these libraries which
> we get as Cran packages in R.
>
> We have seen this example where there is a possibility to run R on data
> flow.
>
>
> https://medium.com/google-cloud/cloud-dataflow-can-autoscale
> -r-programs-for-massively-parallel-data-processing-492b57bd732d
> https://github.com/gregmcinnes/incubator-beam/blob/python-
> sdk/sdks/python/apache_beam/examples/complete/wordlength/
> wordlength_R/wordlength_R.py
>
> If we are able to use parallelization provided by Dataflow along with R
> libraries this would be a great for us as a team and also the whole Data
> science community which relies on R Packages.
>
> We would need some help from the Beam to achieve this.
>
> I see that it will be a very good use case for the whole of data science
> community that will enable usage of both Python and R on Beam and Dataflow.
>
> Regards,
> Anant
>
>