[ https://issues.apache.org/jira/browse/SPARK-28189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Luke updated SPARK-28189: ------------------------- Summary: Pyspark - df.drop() is Case Sensitive when Referring to Upstream Tables (was: Pyspark - df.drop is Case Sensitive when Referring to Upstream Tables) > Pyspark - df.drop() is Case Sensitive when Referring to Upstream Tables > ------------------------------------------------------------------------ > > Key: SPARK-28189 > URL: https://issues.apache.org/jira/browse/SPARK-28189 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 2.4.0 > Reporter: Luke > Priority: Minor > > Column names in general are case insensitive in Pyspark, and df.drop("col") > in general is also case insensitive. > However, when referring to an upstream table, such as from a join, e.g. > {code:java} > vals1 = [('Pirate',1),('Monkey',2),('Ninja',3),('Spaghetti',4)] > df1 = spark.createDataFrame(valuesA, ['KEY','field']) > vals2 = [('Rutabaga',1),('Pirate',2),('Ninja',3),('Darth Vader',4)] > df2 = spark.createDataFrame(valuesB, ['KEY','CAPS']) > df_joined = df1.join(df2, df1['key'] = df2['key'], "left") > {code} > > drop will become case sensitive. e.g. > {code:java} > # from above, df1 consists of columns ['KEY', 'field'] > # from above, df2 consists of columns ['KEY', 'CAPS'] > df_joined.select(df2['key']) # will give a result > df_joined.drop(caps) # will also give a result > {code} > however, note the following > {code:java} > df_joined.drop(df2['key']) # no-op > df_joined.drop(df2['caps']) # no-op > df_joined.drop(df2['KEY']) # will drop column as expected > df_joined.drop(df2['CAPS']) # will drop column as expected > {code} > > > so in summary, using df.drop(df2['col']) doesn't align with expected case > insensitivity for column names, even though functions like select, join, and > dropping a column generally are case insensitive. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org