[ https://issues.apache.org/jira/browse/SPARK-45876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alexander Petrossian (PAF) updated SPARK-45876: ----------------------------------------------- Attachment: Снимок экрана 2023-11-14 в 16.58.56.png > Filters are not pushed down across lateral view > ----------------------------------------------- > > Key: SPARK-45876 > URL: https://issues.apache.org/jira/browse/SPARK-45876 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 3.5.0 > Reporter: Alexander Petrossian (PAF) > Priority: Major > Attachments: Снимок экрана 2023-11-14 в 16.33.48.png, Снимок экрана > 2023-11-14 в 16.35.56.png, Снимок экрана 2023-11-14 в 16.55.11.png, Снимок > экрана 2023-11-14 в 16.55.31.png, Снимок экрана 2023-11-14 в 16.58.56.png > > > {code:python} > from pyspark.sql import SparkSession > spark = SparkSession.builder.config("spark.sql.catalogImplementation", > "hive").appName("Write ORC File").getOrCreate() > spark.sql('drop TABLE if exists test').show() > spark.sql('CREATE EXTERNAL TABLE test (request > struct<characteristic:array<struct<id:string,value:string>>>)' > 'ROW FORMAT SERDE "org.apache.hadoop.hive.ql.io.orc.OrcSerde" ' > 'STORED AS INPUTFORMAT "org.apache.hadoop.hive.ql.io.orc.OrcInputFormat" ' > 'OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat" ' > 'LOCATION "testfolder"').show() > spark.sql("select request from test lateral view > explode(request.characteristic) cTable as c where > c.value='79640000000'").explain() > {code} > shows > {code} > == Physical Plan == > *(1) Project [request#2] > +- *(1) Filter (isnotnull(c#4.value) AND (c#4.value = 79640000000)) > +- *(1) Generate explode(request#2.characteristic), [request#2], false, > [c#4] > +- *(1) ColumnarToRow > +- FileScan orc spark_catalog.default.test[request#2] Batched: true, > DataFilters: [], Format: ORC, Location: InMemoryFileIndex(1 > paths)[file:/Users/paf/Downloads/spark-warehouse/testfolder], > PartitionFilters: [], PushedFilters: [], ReadSchema: > struct<request:struct<characteristic:array<struct<id:string,value:string>>>> > {code} > Which is extremely slow. > Suppose I search for a column value, which is totally out of min/max > statistics range. > Search could have been much faster, but no. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org