[ https://issues.apache.org/jira/browse/SPARK-31139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun resolved SPARK-31139. ----------------------------------- Resolution: Invalid I close this blocker issue as `Invalid` because this is not a regression. Please feel free to reopen this if there is a valid regression case. > Fileformat datasources (ORC, Json) case sensitivity regressions > --------------------------------------------------------------- > > Key: SPARK-31139 > URL: https://issues.apache.org/jira/browse/SPARK-31139 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.0.0 > Reporter: Tae-kyeom, Kim > Priority: Blocker > Attachments: FileBasedDataSourceSuite.scala.diff > > > In addition to https://issues.apache.org/jira/browse/SPARK-31116 > Not only parquet, json and orc also have case sensitivity issues. > Following demonstrate test failure based SPARK-31116's test cases. (diff of > FileBasedDataSourceSuite is in attachement) > ---- > > {code:java} > [info] - SPARK-31116: Select simple columns correctly in case insensitive > manner *** FAILED *** (4 seconds, 277 milliseconds) [info] Results do not > match for query: [info] Timezone: > sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] > [info] Timezone Env: [info] [info] == Parsed Logical Plan == [info] > Relation[camelcase#56] json [info] [info] == Analyzed Logical Plan == [info] > camelcase: string [info] Relation[camelcase#56] json [info] [info] == > Optimized Logical Plan == [info] Relation[camelcase#56] json [info] [info] == > Physical Plan == [info] FileScan json [camelcase#56] Batched: false, > DataFilters: [], Format: JSON, Location: > InMemoryFileIndex[file:/Users/kimtkyeom/Dev/spark_devel/target/tmp/spark-95f1357a-85c9-444f-bdcc-..., > PartitionFilters: [], PushedFilters: [], ReadSchema: > struct<camelcase:string> [info] [info] == Results == [info] [info] == Results > == [info] !== Correct Answer - 1 == == Spark Answer - 1 == [info] !struct<> > struct<camelcase:string> [info] ![A] [null] (QueryTest.scala:248) > [info] - SPARK-31116: Select nested columns correctly in case insensitive > manner *** FAILED *** (2 seconds, 117 milliseconds) [info] Results do not > match for query: [info] Timezone: > sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] > [info] Timezone Env: [info] [info] == Parsed Logical Plan == [info] > Relation[StructColumn#147] json [info] [info] == Analyzed Logical Plan == > [info] StructColumn: struct<LowerCase:bigint,camelcase:bigint> [info] > Relation[StructColumn#147] json [info] [info] == Optimized Logical Plan == > [info] Relation[StructColumn#147] json [info] [info] == Physical Plan == > [info] FileScan json [StructColumn#147] Batched: false, DataFilters: [], > Format: JSON, Location: > InMemoryFileIndex[file:/Users/kimtkyeom/Dev/spark_devel/target/tmp/spark-f9ecd1a4-e5aa-4dd7-bdfd-..., > PartitionFilters: [], PushedFilters: [], ReadSchema: > struct<StructColumn:struct<LowerCase:bigint,camelcase:bigint>> [info] [info] > == Results == [info] [info] == Results == [info] !== Correct Answer - 1 == == > Spark Answer - 1 == [info] !struct<> > struct<StructColumn:struct<LowerCase:bigint,camelcase:bigint>> [info] > ![[0,1]] [[null,null]] (QueryTest.scala:248) > [info] - SPARK-31116: Select nested columns correctly in case sensitive > manner *** FAILED *** (871 milliseconds) [info] Results do not match for > query: [info] Timezone: > sun.util.calendar.ZoneInfo[id="America/Los_Angeles",offset=-28800000,dstSavings=3600000,useDaylight=true,transitions=185,lastRule=java.util.SimpleTimeZone[id=America/Los_Angeles,offset=-28800000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTime=7200000,endTimeMode=0]] > [info] Timezone Env: [info] [info] == Parsed Logical Plan == [info] > Relation[StructColumn#329] json [info] [info] == Analyzed Logical Plan == > [info] StructColumn: struct<LowerCase:bigint,camelcase:bigint> [info] > Relation[StructColumn#329] json [info] [info] == Optimized Logical Plan == > [info] Relation[StructColumn#329] json [info] [info] == Physical Plan == > [info] FileScan json [StructColumn#329] Batched: false, DataFilters: [], > Format: JSON, Location: > InMemoryFileIndex[file:/Users/kimtkyeom/Dev/spark_devel/target/tmp/spark-612baf76-a9d0-41e5-89f4-..., > PartitionFilters: [], PushedFilters: [], ReadSchema: > struct<StructColumn:struct<LowerCase:bigint,camelcase:bigint>> [info] [info] > == Results == [info] [info] == Results == [info] !== Correct Answer - 1 == == > Spark Answer - 1 == [info] !struct<> > struct<StructColumn:struct<LowerCase:bigint,camelcase:bigint>> [info] ![null] > [[null,null]] (QueryTest.scala:248) > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org