[ https://issues.apache.org/jira/browse/SPARK-28189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Luke updated SPARK-28189: ------------------------- Description: Column names in general are case insensitive in Pyspark, and df.drop() in general is also case insensitive. However, when referring to an upstream table, such as from a join, e.g. {code:java} vals1 = [('Pirate',1),('Monkey',2),('Ninja',3),('Spaghetti',4)] df1 = spark.createDataFrame(vals1, ['KEY','field']) vals2 = [('Rutabaga',1),('Pirate',2),('Ninja',3),('Darth Vader',4)] df2 = spark.createDataFrame(vals2, ['KEY','CAPS']) df_joined = df1.join(df2, df1['key'] == df2['key'], "left") {code} drop will become case sensitive. e.g. {code:java} # from above, df1 consists of columns ['KEY', 'field'] # from above, df2 consists of columns ['KEY', 'CAPS'] df_joined.select(df2['key']) # will give a result df_joined.drop('caps') # will also give a result {code} however, note the following {code:java} df_joined.drop(df2['key']) # no-op df_joined.drop(df2['caps']) # no-op df_joined.drop(df2['KEY']) # will drop column as expected df_joined.drop(df2['CAPS']) # will drop column as expected {code} so in summary, using df.drop(df2['col']) doesn't align with expected case insensitivity for column names, even though functions like select, join, and dropping a column generally are case insensitive. was: Column names in general are case insensitive in Pyspark, and df.drop() in general is also case insensitive. However, when referring to an upstream table, such as from a join, e.g. {code:java} vals1 = [('Pirate',1),('Monkey',2),('Ninja',3),('Spaghetti',4)] df1 = spark.createDataFrame(vals1, ['KEY','field']) vals2 = [('Rutabaga',1),('Pirate',2),('Ninja',3),('Darth Vader',4)] df2 = spark.createDataFrame(vals2, ['KEY','CAPS']) df_joined = df1.join(df2, df1['key'] == df2['key'], "left") {code} drop will become case sensitive. e.g. {code:java} # from above, df1 consists of columns ['KEY', 'field'] # from above, df2 consists of columns ['KEY', 'CAPS'] df_joined.select(df2['key']) # will give a result df_joined.drop(caps) # will also give a result {code} however, note the following {code:java} df_joined.drop(df2['key']) # no-op df_joined.drop(df2['caps']) # no-op df_joined.drop(df2['KEY']) # will drop column as expected df_joined.drop(df2['CAPS']) # will drop column as expected {code} so in summary, using df.drop(df2['col']) doesn't align with expected case insensitivity for column names, even though functions like select, join, and dropping a column generally are case insensitive. > Pyspark - df.drop() is Case Sensitive when Referring to Upstream Tables > ------------------------------------------------------------------------ > > Key: SPARK-28189 > URL: https://issues.apache.org/jira/browse/SPARK-28189 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 2.4.0 > Reporter: Luke > Priority: Minor > > Column names in general are case insensitive in Pyspark, and df.drop() in > general is also case insensitive. > However, when referring to an upstream table, such as from a join, e.g. > {code:java} > vals1 = [('Pirate',1),('Monkey',2),('Ninja',3),('Spaghetti',4)] > df1 = spark.createDataFrame(vals1, ['KEY','field']) > vals2 = [('Rutabaga',1),('Pirate',2),('Ninja',3),('Darth Vader',4)] > df2 = spark.createDataFrame(vals2, ['KEY','CAPS']) > df_joined = df1.join(df2, df1['key'] == df2['key'], "left") > {code} > > drop will become case sensitive. e.g. > {code:java} > # from above, df1 consists of columns ['KEY', 'field'] > # from above, df2 consists of columns ['KEY', 'CAPS'] > df_joined.select(df2['key']) # will give a result > df_joined.drop('caps') # will also give a result > {code} > however, note the following > {code:java} > df_joined.drop(df2['key']) # no-op > df_joined.drop(df2['caps']) # no-op > df_joined.drop(df2['KEY']) # will drop column as expected > df_joined.drop(df2['CAPS']) # will drop column as expected > {code} > > > so in summary, using df.drop(df2['col']) doesn't align with expected case > insensitivity for column names, even though functions like select, join, and > dropping a column generally are case insensitive. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org