It's really a very big discussion around Pyspark Vs Scala. I have little
bit experience about how we can automate the CI/CD when it's a JVM based
language.
I would like to take this as an opportunity to understand the end-to-end
CI/CD flow for Pyspark based ETL pipelines.

Could someone please list down the steps how the pipeline automation works
when it comes to Pyspark based pipelines in Production ?

//William

On Fri, Oct 23, 2020 at 11:24 AM Wim Van Leuven <
wim.vanleu...@highestpoint.biz> wrote:

> I think Sean is right, but in your argumentation you mention that 
> 'functionality
> is sacrificed in favour of the availability of resources'. That's where I
> disagree with you but agree with Sean. That is mostly not true.
>
> In your previous posts you also mentioned this . The only reason we
> sometimes have to bail out to Scala is for performance with certain udfs
>
> On Thu, 22 Oct 2020 at 23:11, Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
>> Thanks for the feedback Sean.
>>
>> Kind regards,
>>
>> Mich
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Thu, 22 Oct 2020 at 20:34, Sean Owen <sro...@gmail.com> wrote:
>>
>>> I don't find this trolling; I agree with the observation that 'the
>>> skills you have' are a valid and important determiner of what tools you
>>> pick.
>>> I disagree that you just have to pick the optimal tool for everything.
>>> Sounds good until that comes in contact with the real world.
>>> For Spark, Python vs Scala just doesn't matter a lot, especially if
>>> you're doing DataFrame operations. By design. So I can't see there being
>>> one answer to this.
>>>
>>> On Thu, Oct 22, 2020 at 2:23 PM Gourav Sengupta <
>>> gourav.sengu...@gmail.com> wrote:
>>>
>>>> Hi Mich,
>>>>
>>>> this is turning into a troll now, can you please stop this?
>>>>
>>>> No one uses Scala where Python should be used, and no one uses Python
>>>> where Scala should be used - it all depends on requirements. Everyone
>>>> understands polyglot programming and how to use relevant technologies best
>>>> to their advantage.
>>>>
>>>>
>>>> Regards,
>>>> Gourav Sengupta
>>>>
>>>>
>>>>>>

-- 
Regards,
William R
+919037075164

Reply via email to