Re: Elasticsearch support for Spark 3.x

2023-09-08 Thread Alfie Davidson
I am pretty certain you need to change the write.format from “es” to “org.elasticsearch.spark.sql”Sent from my iPhoneOn 8 Sep 2023, at 03:10, Dipayan Dev wrote:++ DevOn Thu, 7 Sep 2023 at 10:22 PM, Dipayan Dev wrote:Hi, Can you please elaborate your last response? I

Re: please help the problem of big parquet file can not be splitted to read

2023-03-23 Thread Alfie Davidson
I’m pretty sure snappy file is not splittable. That’s why you have a single task (and most likely core) reading the 1.9GB snappy file Sent from my iPhone > On 23 Mar 2023, at 07:36, yangjie01 wrote: >  > Is there only one RowGroup for this file? You can check this by printing the > file's

[SPARK SQL] Make max multi table join limit configurable in OptimizeSkewedJoin

2020-09-12 Thread Alfie Davidson
Hi All, First time contributing, so reaching out by email before creating a JIRA ticket and PR. I would like to propose a small change/enhancement to OptimizeSkewedJoin. Currently, OptimizeSkewedJoin has a hardcoded limit for multi table joins (limit = 2). For processes that have multiple