[jira] [Comment Edited] (SPARK-25185) CBO rowcount statistics doesn't work for partitioned parquet external table
[ https://issues.apache.org/jira/browse/SPARK-25185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000775#comment-17000775 ] Zhenhua Wang edited comment on SPARK-25185 at 12/20/19 9:58 AM: Hi, [~imamitsehgal] and [~raoyvn], could you please check if [SPARK-30269|https://issues.apache.org/jira/browse/SPARK-30269] solves your problem? was (Author: zhenhuawang): Hi, [~imamitsehgal] and [~raoyvn], could you please check if [SPARK-30269](https://issues.apache.org/jira/browse/SPARK-30269) solves your problem? > CBO rowcount statistics doesn't work for partitioned parquet external table > --- > > Key: SPARK-25185 > URL: https://issues.apache.org/jira/browse/SPARK-25185 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 2.2.1, 2.3.0 > Environment: > Tried on Ubuntu, FreBSD and windows, running spark-shell in local mode > reading data from local file system >Reporter: Amit >Priority: Major > > Created a dummy partitioned data with partition column on string type col1=a > and col1=b > added csv data-> read through spark -> created partitioned external table-> > msck repair table to load partition. Did analyze on all columns and partition > column as well. > ~println(spark.sql("select * from test_p where > e='1a'").queryExecution.toStringWithStats)~ > ~val op = spark.sql("select * from test_p where > e='1a'").queryExecution.optimizedPlan~ > // e is the partitioned column > ~val stat = op.stats(spark.sessionState.conf)~ > ~print(stat.rowCount)~ > > Created the same way in parquet the rowcount comes up correctly in case of > csv but in parquet it shows as None. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-25185) CBO rowcount statistics doesn't work for partitioned parquet external table
[ https://issues.apache.org/jira/browse/SPARK-25185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974724#comment-16974724 ] venkata yerubandi edited comment on SPARK-25185 at 11/15/19 1:21 AM: - Is there any update on this issue ? we are facing the same issue in spark 2.4.0 was (Author: raoyvn): Is there any update on this issue ? we are facing the same issue > CBO rowcount statistics doesn't work for partitioned parquet external table > --- > > Key: SPARK-25185 > URL: https://issues.apache.org/jira/browse/SPARK-25185 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 2.2.1, 2.3.0 > Environment: > Tried on Ubuntu, FreBSD and windows, running spark-shell in local mode > reading data from local file system >Reporter: Amit >Priority: Major > > Created a dummy partitioned data with partition column on string type col1=a > and col1=b > added csv data-> read through spark -> created partitioned external table-> > msck repair table to load partition. Did analyze on all columns and partition > column as well. > ~println(spark.sql("select * from test_p where > e='1a'").queryExecution.toStringWithStats)~ > ~val op = spark.sql("select * from test_p where > e='1a'").queryExecution.optimizedPlan~ > // e is the partitioned column > ~val stat = op.stats(spark.sessionState.conf)~ > ~print(stat.rowCount)~ > > Created the same way in parquet the rowcount comes up correctly in case of > csv but in parquet it shows as None. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org