Hi Piper,

RE:
https://github.com/perl-spark

Thank you for the reply. There seems to be two issues here: 1) 'What
is going on with Perl-Spark?' and 2). 'Can we make an effort to
produce Raku-Spark?'. Below I only address the former question.

The "perl-spark" Github project appears to contain 13 repos (you
graciously provided the link to the "Spark" repo).  The only person I
see associated with the "perl-spark" Github project is Kent Fredric,
who sadly passed away earlier this year:

http://blogs.perl.org/users/neilb/2021/04/kent-fredrics-cpan-distributions.html
https://forums.gentoo.org/viewtopic-t-1130094-postdays-0-postorder-asc-start-0.html
https://lwn.net/Articles/846054/
https://givealittle.co.nz/cause/kent-fredrics-funeral-costs

I recall Kent was active in the "Raku" Github-renaming discussion
(e.g. https://github.com/Raku/problem-solving/issues/81#issuecomment-528756303),
and he wanted Raku (née Perl6) to have a fresh start. While I see
efforts are underway to have Kent's CPAN distributions adopted, I
don't know of a similar process on Github. While each of the 13
repositories under the "perl-spark" umbrella can easily be forked,
it's unclear to me if the entire Github project can similarly be
forked.

I have copied some active members of the Perl community on this email,
in the hopes that they can help transfer the "perl-spark" Github
project.

Best Regards, Bill.



On Sun, Nov 28, 2021 at 10:34 PM Piper H <pott...@gmail.com> wrote:
>
> William, I didn't use SparkR. I use R primarily for plotting.
>
> Spark's basic API is quite simple, it does the distributed computing of map, 
> filter, group, reduce etc, which are all covered by perl's map, sort, grep 
> functions IMO.
>
> for instance, this common statistics on Spark:
>
> >>> fruit.take(5)
> [('peach', 1), ('apricot', 2), ('apple', 3), ('haw', 1), ('persimmon', 9)]
> >>>
> >>>
> >>> fruit.filter(lambda x:x[0] == 'apple').reduceByKey(lambda 
> >>> x,y:x+y).collect()
> [('apple', 86)]
>
> Which is easily implemented by perl's grep and map functions.
> But we need a distributed computing framework of perl6.
>
> Yes there is already the perl-spark project:
> https://github.com/perl-spark/Spark
> Which didn't get updated for many years. I don't think it's still in active 
> development.
>
> So I asked the original question.
>
> Thank you.
> Piper
>
>
> On Mon, Nov 29, 2021 at 1:44 PM William Michels <w...@caa.columbia.edu> wrote:
>>
>> Hi Piper!
>>
>> Have you used SparkR (R on Spark)?
>>
>> https://spark.apache.org/docs/latest/sparkr.html
>>
>> I'm encouraged by the data-type mapping between R and Spark. It
>> suggests to me that with a reasonable Spark API, mapping data types
>> between Raku and Spark should be straightforward:
>>
>> https://spark.apache.org/docs/latest/sparkr.html#data-type-mapping-between-r-and-spark
>>
>> Best Regards,
>>
>> Bill.
>>
>>
>> On Sat, Nov 27, 2021 at 12:16 AM Piper H <pott...@gmail.com> wrote:
>> >
>> > I use perl5 everyday for data statistics.
>> > The scripts are running on a single server for the computing tasks.
>> > I also use R, which has the similar usage.
>> > When we face very large data, we change to Apache Spark for distributed 
>> > computing.
>> > Spark's interface languages (python, scala, even ruby) are not flexible, 
>> > but their computing capability is amazing, due to the whole cluster 
>> > contributing the computing powers.
>> > Yes I know perl5 is somewhat old, but in perl6 why won't we make that a 
>> > distributed computing framework like Spark? Then it will help a lot to the 
>> > data programmer who already knows perl.
>> > I expect a lot from this project.
>> >
>> > Thanks.
>> > Piper

Reply via email to