Re: SparkSQL vs Dataframe vs Dataset

2021-12-06 Thread yonghua
‌ >From my experience, SQL is easy for the guys who already know SQL syntax. With >the correct indexing SQL is also fast. But within programs dataframe is must >faster and convenient for loading large data structure from the external.   De : "rajat kumar" A : "user @spark" Envoyé: lundi 6

SparkSQL vs Dataframe vs Dataset

2021-12-06 Thread rajat kumar
Hi Users, Is there any use case when we need to use SQL vs Dataframe vs Dataset? Is there any recommended approach or any advantage/performance gain over others? Thanks Rajat

Re: Conda Python Env in K8S

2021-12-06 Thread Mich Talebzadeh
Hi Meikel, Well the short answer is it is what it is, for one reason or other. if someone else managed to make it work, then no doubt will be delighted to hear it. Until then I prefer the built- in docker image. Also by centralising this in the docker image, it will be available if a node fails

RE: Conda Python Env in K8S

2021-12-06 Thread Bode, Meikel, NMA-CFD
Hi Mich, Thanks for your response. Yes -py-files options works. I also tested it. The question is why the -archives option doesn't? >From Jira I can see that it should be available since 3.1.0: https://issues.apache.org/jira/browse/SPARK-33530 https://issues.apache.org/jira/browse/SPARK-33615