[ https://issues.apache.org/jira/browse/SPARK-34435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
dc-heros resolved SPARK-34435. ------------------------------ Resolution: Cannot Reproduce > ArrayIndexOutOfBoundsException when select in different case > ------------------------------------------------------------ > > Key: SPARK-34435 > URL: https://issues.apache.org/jira/browse/SPARK-34435 > Project: Spark > Issue Type: Bug > Components: Optimizer, SQL > Affects Versions: 3.0.1 > Reporter: Enver Osmanov > Priority: Trivial > > h5. Actual behavior: > Select column with different case after remapping fail with > ArrayIndexOutOfBoundsException. > h5. Expected behavior: > Spark shouldn't fail with ArrayIndexOutOfBoundsException. > Spark is case insensetive by default, so select should return selected > column. > h5. Test case: > {code:java} > case class User(aA: String, bb: String) > // ... > val user = User("John", "Doe") > val ds = Seq(user).toDS().map(identity) > ds.select("aa").show(false) > {code} > h5. Additional notes: > Test case is reproducible with Spark 3.0.1. There are no errors with Spark > 2.4.7. > I belive problem could be solved by changing filter in > `SchemaPruning#pruneDataSchema` from this: > {code:java} > val dataSchemaFieldNames = dataSchema.fieldNames.toSet > val mergedDataSchema = > StructType(mergedSchema.filter(f => dataSchemaFieldNames.contains(f.name))) > {code} > to this: > {code:java} > val dataSchemaFieldNames = dataSchema.fieldNames.map(_.toLowerCase).toSet > val mergedDataSchema = > StructType(mergedSchema.filter(f => > dataSchemaFieldNames.contains(f.name.toLowerCase))) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org