[ https://issues.apache.org/jira/browse/HUDI-5609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yue Zhang updated HUDI-5609: ---------------------------- Fix Version/s: 0.14.0 (was: 0.13.1) > Hudi table not queryable by SQL on Databricks Spark > --------------------------------------------------- > > Key: HUDI-5609 > URL: https://issues.apache.org/jira/browse/HUDI-5609 > Project: Apache Hudi > Issue Type: Bug > Components: spark-sql > Reporter: Ethan Guo > Assignee: Ethan Guo > Priority: Blocker > Fix For: 0.14.0 > > > Customer: I’ve tried this with 0.12.2 and still receive the same error. does > the table format version also need to be updated? i.e. we’re writing with > Hudi 0.11.1 using EMR but reading from Databricks using Hudi 0.12.2 and Spark > 3.3. > > What have been tried so far on 0.12.2: > # > !https://a.slack-edge.com/production-standard-emoji-assets/14.0/apple-medium/2...@2x.png! > SparkSQL > so just tried Spark SQL and doesn’t work (different issue) > SET hoodie.file.index.enable=false > select count(*) from validated_sales; > returns 0 count but no errors > 2. > !https://a.slack-edge.com/production-standard-emoji-assets/14.0/apple-medium/2...@2x.png! > when running via pyspark > %python > df = spark.read.format('hudi')\ > .load('s3://<bucket>/validated_sales/*/*/*') > df.count() > all is good with 0.12.2 Hudi and Databricks 11.3 (spark 3.3). > 3. > !https://a.slack-edge.com/production-standard-emoji-assets/14.0/apple-medium/2...@2x.png! > without the wildcard in pyspark > %python > df = spark.read.format('hudi')\ > .load('s3://<bucket>/validated_sales') > df.count() > count = 0 > 4. > !https://a.slack-edge.com/production-standard-emoji-assets/14.0/apple-medium/2...@2x.png! > without wildcard but with recursive option set in pyspark > %python > df = spark.read.format('hudi')\ > .option("recursiveFileLookup","true")\ > .load('s3://<bucket>/validated_sales') > df.count() > count = 250k -- This message was sent by Atlassian Jira (v8.20.10#820010)