Hi Piper, RE: https://github.com/perl-spark
Thank you for the reply. There seems to be two issues here: 1) 'What is going on with Perl-Spark?' and 2). 'Can we make an effort to produce Raku-Spark?'. Below I only address the former question. The "perl-spark" Github project appears to contain 13 repos (you graciously provided the link to the "Spark" repo). The only person I see associated with the "perl-spark" Github project is Kent Fredric, who sadly passed away earlier this year: http://blogs.perl.org/users/neilb/2021/04/kent-fredrics-cpan-distributions.html https://forums.gentoo.org/viewtopic-t-1130094-postdays-0-postorder-asc-start-0.html https://lwn.net/Articles/846054/ https://givealittle.co.nz/cause/kent-fredrics-funeral-costs I recall Kent was active in the "Raku" Github-renaming discussion (e.g. https://github.com/Raku/problem-solving/issues/81#issuecomment-528756303), and he wanted Raku (née Perl6) to have a fresh start. While I see efforts are underway to have Kent's CPAN distributions adopted, I don't know of a similar process on Github. While each of the 13 repositories under the "perl-spark" umbrella can easily be forked, it's unclear to me if the entire Github project can similarly be forked. I have copied some active members of the Perl community on this email, in the hopes that they can help transfer the "perl-spark" Github project. Best Regards, Bill. On Sun, Nov 28, 2021 at 10:34 PM Piper H <pott...@gmail.com> wrote: > > William, I didn't use SparkR. I use R primarily for plotting. > > Spark's basic API is quite simple, it does the distributed computing of map, > filter, group, reduce etc, which are all covered by perl's map, sort, grep > functions IMO. > > for instance, this common statistics on Spark: > > >>> fruit.take(5) > [('peach', 1), ('apricot', 2), ('apple', 3), ('haw', 1), ('persimmon', 9)] > >>> > >>> > >>> fruit.filter(lambda x:x[0] == 'apple').reduceByKey(lambda > >>> x,y:x+y).collect() > [('apple', 86)] > > Which is easily implemented by perl's grep and map functions. > But we need a distributed computing framework of perl6. > > Yes there is already the perl-spark project: > https://github.com/perl-spark/Spark > Which didn't get updated for many years. I don't think it's still in active > development. > > So I asked the original question. > > Thank you. > Piper > > > On Mon, Nov 29, 2021 at 1:44 PM William Michels <w...@caa.columbia.edu> wrote: >> >> Hi Piper! >> >> Have you used SparkR (R on Spark)? >> >> https://spark.apache.org/docs/latest/sparkr.html >> >> I'm encouraged by the data-type mapping between R and Spark. It >> suggests to me that with a reasonable Spark API, mapping data types >> between Raku and Spark should be straightforward: >> >> https://spark.apache.org/docs/latest/sparkr.html#data-type-mapping-between-r-and-spark >> >> Best Regards, >> >> Bill. >> >> >> On Sat, Nov 27, 2021 at 12:16 AM Piper H <pott...@gmail.com> wrote: >> > >> > I use perl5 everyday for data statistics. >> > The scripts are running on a single server for the computing tasks. >> > I also use R, which has the similar usage. >> > When we face very large data, we change to Apache Spark for distributed >> > computing. >> > Spark's interface languages (python, scala, even ruby) are not flexible, >> > but their computing capability is amazing, due to the whole cluster >> > contributing the computing powers. >> > Yes I know perl5 is somewhat old, but in perl6 why won't we make that a >> > distributed computing framework like Spark? Then it will help a lot to the >> > data programmer who already knows perl. >> > I expect a lot from this project. >> > >> > Thanks. >> > Piper