Re: HPA - Kubernetes for Spark

2021-01-10 Thread Sachit Murarka
Hi , Yes I know by setting shuffle tracking property enabled we can use DRA. But , it is marked as experimental. Is it advised to use ? Also , regarding HPA. We do not have HPA differently as such for Spark. Right? Kind Regards, Sachit Murarka On Mon, Jan 11, 2021 at 2:17 AM Sandish Kumar HN

[Spark SQL]HiveQL and Spark SQL producing different results

2021-01-10 Thread Ying Zhou
Hi, I run some SQL using both Hive and Spark. Usually we get the same results. However when a window function is in the script Hive and Spark can produce different results. Is this intended behavior or either Hive or Spark has a bug? Thanks, Ying

Re: HPA - Kubernetes for Spark

2021-01-10 Thread Sandish Kumar HN
Sachit, K8S based spark dynamic allocation is only available on Spark 3.0.X+ and that too without External Shuffling Service. https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/user-guide.md#dynamic-allocation

HPA - Kubernetes for Spark

2021-01-10 Thread Sachit Murarka
Hi All, I have read about HPA Horizontal Pod Autoscaling(for pod scaling). I understand it can be achieved by setting the request and limit for resources in yaml: kubectl autoscale deploy/application-cpu --cpu-percent=95 --min=1 --max=10 // example command. But does Kubernetes actually work

Re: Spark 3.0.1 not connecting with Hive 2.1.1

2021-01-10 Thread michael.yang
Hi Pradyumn, It seems you did not configure spark-default.conf file well. Below configurations are needed to use hive 2.1.1 as metastore and execution engine. spark.sql.hive.metastore.version=2.1.1 spark.sql.hive.metastore.jars=/opt/cloudera/parcels/CDH/lib/hive/lib/* Thanks. Michael Yang --