exclude rules in analyzer

2022-03-16 Thread Shi Yuhang
I have found that we can use `spark.sql.optimizer.excludedRules` to exclude rules in the optimizer, but we can't exclude rules in the analyzer. I wonder why it is not supported or is there any plan to support it?

Re: pivoting panda dataframe

2022-03-16 Thread ayan guha
Column bind is called join in relational world, spark uses the same. Pivot in true sense is harder to achieve because you really dont know how many columns you will end up with, but spark has a pivot function On Thu, 17 Mar 2022 at 9:16 am, Mich Talebzadeh wrote: > OK this is the version that

Re: pivoting panda dataframe

2022-03-16 Thread Mich Talebzadeh
OK this is the version that works with Panda only without Spark import random import string import math import datetime import time import pandas as pd class UsedFunctions: def randomString(self,length): letters = string.ascii_letters result_str = ''.join(random.choice(letters) for i

Unsubscribe

2022-03-16 Thread van wilson
> On Mar 16, 2022, at 7:38 AM, wrote: > > Thanks, Jayesh and all. I finally get the correlation data frame using agg > with list of functions. > I think the list of functions which generate a column should be more detailed > description. > > Liang > > - 原始邮件 - > 发件人:"Lalwani,

回复:Re: 回复:Re: 回复:Re: calculate correlation_between_multiple_columns_and_one_specific_column_after_groupby_the_spark_data_frame

2022-03-16 Thread ckgppl_yan
Thanks, Jayesh and all. I finally get the correlation data frame using agg with list of functions.I think the list of functions which generate a column should be more detailed description. Liang - 原始邮件 - 发件人:"Lalwani, Jayesh" 收件人:"ckgppl_...@sina.cn" , Enrico Minack , Sean Owen

Skip single integration test case in Spark on K8s

2022-03-16 Thread Pralabh Kumar
Hi Spark team I am running Spark kubernetes integration test suite on cloud. build/mvn install \ -f pom.xml \ -pl resource-managers/kubernetes/integration-tests -am -Pscala-2.12 -Phadoop-3.1.1 -Phive -Phive-thriftserver -Pyarn -Pkubernetes -Pkubernetes-integration-tests \ -Djava.version=8 \

Re: 回复:Re: 回复:Re: calculate correlation between_multiple_columns_and_one_specific_column_after_groupby_the_spark_data_frame

2022-03-16 Thread Lalwani, Jayesh
No, You don’t need 30 dataframes and self joins. Convert a list of columns to a list of functions, and then pass the list of functions to the agg function From: "ckgppl_...@sina.cn" Reply-To: "ckgppl_...@sina.cn" Date: Wednesday, March 16, 2022 at 8:16 AM To: Enrico Minack , Sean Owen Cc:

Play data development with Scala and Spark

2022-03-16 Thread Bitfox
Hello, I have written a free book which is available online, giving a beginner introduction to Scala and Spark development. https://github.com/bitfoxtop/Play-Data-Development-with-Scala-and-Spark/blob/main/PDDWS2-v1.pdf If you can read Chinese then you are welcome to give any feedback. I will

回复:Re: 回复:Re: calculate correlation between_multiple_columns_and_one_specific_column_after_groupby_the_spark_data_frame

2022-03-16 Thread ckgppl_yan
Thanks, Enrico.I just found that I need to group the data frame then calculate the correlation. So I will get a list of dataframe, not columns. So I used following solution:use following codes to create a mutable data frame df_all. I used the first datacol to calculate correlation.

Re: 回复:Re: calculate correlation between multiple columns and one specific column after groupby the spark data frame

2022-03-16 Thread Enrico Minack
If you have a list of Columns called `columns`, you can pass them to the `agg` method as:   agg(columns.head, columns.tail: _*) Enrico Am 16.03.22 um 08:02 schrieb ckgppl_...@sina.cn: Thanks, Sean. I modified the codes and have generated a list of columns. I am working on convert a list of

Re: Question on List to DF

2022-03-16 Thread Gourav Sengupta
Hi Jayesh, thanks found your email quite interesting :) Regards, Gourav On Wed, Mar 16, 2022 at 8:02 AM Bitfox wrote: > Thank you. that makes sense. > > On Wed, Mar 16, 2022 at 2:03 PM Lalwani, Jayesh > wrote: > >> The toDF function in scala uses a bit of Scala magic that allows you to >>

Re: spark 3.2.1: Unexpected reuse of dynamic PVC

2022-03-16 Thread Andreas Weise
minor correction: >> (hence our *ReadWriteOnce* Storage should be sufficient right?... On Wed, Mar 16, 2022 at 11:33 AM Andreas Weise wrote: > Hi, > > when using dynamic allocation on k8s with dynamic pvc reuse, I face that > only few executors are running. 2 of 4 are stucked in

Re: Question on List to DF

2022-03-16 Thread Bitfox
Thank you. that makes sense. On Wed, Mar 16, 2022 at 2:03 PM Lalwani, Jayesh wrote: > The toDF function in scala uses a bit of Scala magic that allows you to > add methods to existing classes. Here’s a link to explanation >

回复:Re: calculate correlation between multiple columns and one specific column after groupby the spark data frame

2022-03-16 Thread ckgppl_yan
Thanks, Sean. I modified the codes and have generated a list of columns.I am working on convert a list of columns to a new data frame. It seems that there is no direct API to do this. - 原始邮件 - 发件人:Sean Owen 收件人:ckgppl_...@sina.cn 抄送人:user 主题:Re: calculate correlation between multiple

Re: Question on List to DF

2022-03-16 Thread Lalwani, Jayesh
The toDF function in scala uses a bit of Scala magic that allows you to add methods to existing classes. Here’s a link to explanation https://www.oreilly.com/library/view/scala-cookbook/9781449340292/ch01s11.html In short, you can implement a class that extends the List class and add methods