[jira] [Commented] (SPARK-39732) pyspark.pandas.DataFrame.drop drops dataframe if axis not specified
[ https://issues.apache.org/jira/browse/SPARK-39732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17566839#comment-17566839 ] Andreas Saltveit commented on SPARK-39732: -- 3.2.0 [https://spark.apache.org/docs/3.3.0/api/python/reference/pyspark.pandas/api/pyspark.pandas.DataFrame.drop.html#pyspark.pandas.DataFrame.drop] 3.3.0 [https://spark.apache.org/docs/3.3.0/api/python/reference/pyspark.pandas/api/pyspark.pandas.DataFrame.drop.html#pyspark.pandas.DataFrame.drop] > pyspark.pandas.DataFrame.drop drops dataframe if axis not specified > --- > > Key: SPARK-39732 > URL: https://issues.apache.org/jira/browse/SPARK-39732 > Project: Spark > Issue Type: Bug > Components: Pandas API on Spark >Affects Versions: 3.3.0 >Reporter: Andreas Saltveit >Priority: Major > > import pyspark.pandas as pd > data = [\{"Category": 'A', "ID": 1, "Value": 121.44, "Truth": True}, > \{"Category": 'B', "ID": 2, "Value": 300.01, "Truth": False}, > \{"Category": 'C', "ID": 3, "Value": 10.99, "Truth": None}, > \{"Category": 'E', "ID": 4, "Value": 33.87, "Truth": True} > ] > df = pd.DataFrame(data) > df.display() > --drops dataframe "Query returned no results" > df1=df.drop(["ID","Category"]) > df1.display() > --works > df2=df.drop(["ID","Category"], 1) > df2.display() -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39732) pyspark.pandas.DataFrame.drop drops dataframe if axis not specified
[ https://issues.apache.org/jira/browse/SPARK-39732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17566840#comment-17566840 ] Andreas Saltveit commented on SPARK-39732: -- Seems weird to change default. > pyspark.pandas.DataFrame.drop drops dataframe if axis not specified > --- > > Key: SPARK-39732 > URL: https://issues.apache.org/jira/browse/SPARK-39732 > Project: Spark > Issue Type: Bug > Components: Pandas API on Spark >Affects Versions: 3.3.0 >Reporter: Andreas Saltveit >Priority: Major > > import pyspark.pandas as pd > data = [\{"Category": 'A', "ID": 1, "Value": 121.44, "Truth": True}, > \{"Category": 'B', "ID": 2, "Value": 300.01, "Truth": False}, > \{"Category": 'C', "ID": 3, "Value": 10.99, "Truth": None}, > \{"Category": 'E', "ID": 4, "Value": 33.87, "Truth": True} > ] > df = pd.DataFrame(data) > df.display() > --drops dataframe "Query returned no results" > df1=df.drop(["ID","Category"]) > df1.display() > --works > df2=df.drop(["ID","Category"], 1) > df2.display() -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39732) pyspark.pandas.DataFrame.drop drops dataframe if axis not specified
[ https://issues.apache.org/jira/browse/SPARK-39732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17566835#comment-17566835 ] Andreas Saltveit commented on SPARK-39732: -- this is a behavior change from old spark version. I have used this in production code and had an incident due to this. Default axis used to be 1. > pyspark.pandas.DataFrame.drop drops dataframe if axis not specified > --- > > Key: SPARK-39732 > URL: https://issues.apache.org/jira/browse/SPARK-39732 > Project: Spark > Issue Type: Bug > Components: Pandas API on Spark >Affects Versions: 3.3.0 >Reporter: Andreas Saltveit >Priority: Major > > import pyspark.pandas as pd > data = [\{"Category": 'A', "ID": 1, "Value": 121.44, "Truth": True}, > \{"Category": 'B', "ID": 2, "Value": 300.01, "Truth": False}, > \{"Category": 'C', "ID": 3, "Value": 10.99, "Truth": None}, > \{"Category": 'E', "ID": 4, "Value": 33.87, "Truth": True} > ] > df = pd.DataFrame(data) > df.display() > --drops dataframe "Query returned no results" > df1=df.drop(["ID","Category"]) > df1.display() > --works > df2=df.drop(["ID","Category"], 1) > df2.display() -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39732) pyspark.pandas.DataFrame.drop drops dataframe if axis not specified
[ https://issues.apache.org/jira/browse/SPARK-39732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17564767#comment-17564767 ] Andreas Saltveit commented on SPARK-39732: -- Introduced after 2022.07.04 > pyspark.pandas.DataFrame.drop drops dataframe if axis not specified > --- > > Key: SPARK-39732 > URL: https://issues.apache.org/jira/browse/SPARK-39732 > Project: Spark > Issue Type: Bug > Components: Pandas API on Spark >Affects Versions: 3.3.0 >Reporter: Andreas Saltveit >Priority: Major > > import pyspark.pandas as pd > data = [\{"Category": 'A', "ID": 1, "Value": 121.44, "Truth": True}, > \{"Category": 'B', "ID": 2, "Value": 300.01, "Truth": False}, > \{"Category": 'C', "ID": 3, "Value": 10.99, "Truth": None}, > \{"Category": 'E', "ID": 4, "Value": 33.87, "Truth": True} > ] > df = pd.DataFrame(data) > df.display() > --drops dataframe "Query returned no results" > df1=df.drop(["ID","Category"]) > df1.display() > --works > df2=df.drop(["ID","Category"], 1) > df2.display() -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39732) pyspark.pandas.DataFrame.drop drops dataframe if axis not specified
[ https://issues.apache.org/jira/browse/SPARK-39732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andreas Saltveit updated SPARK-39732: - Description: import pyspark.pandas as pd data = [\{"Category": 'A', "ID": 1, "Value": 121.44, "Truth": True}, \{"Category": 'B', "ID": 2, "Value": 300.01, "Truth": False}, \{"Category": 'C', "ID": 3, "Value": 10.99, "Truth": None}, \{"Category": 'E', "ID": 4, "Value": 33.87, "Truth": True} ] df = pd.DataFrame(data) df.display() --drops dataframe "Query returned no results" df1=df.drop(["ID","Category"]) df1.display() --works df2=df.drop(["ID","Category"], 1) df2.display() was: import pyspark.pandas as pd data = [\{"Category": 'A', "ID": 1, "Value": 121.44, "Truth": True}, \{"Category": 'B', "ID": 2, "Value": 300.01, "Truth": False}, \{"Category": 'C', "ID": 3, "Value": 10.99, "Truth": None}, \{"Category": 'E', "ID": 4, "Value": 33.87, "Truth": True} ] df = pd.DataFrame(data) df.display() # drops dataframe "Query returned no results" df1=df.drop(["ID","Category"]) df1.display() # works df2=df.drop(["ID","Category"], 1) df2.display() > pyspark.pandas.DataFrame.drop drops dataframe if axis not specified > --- > > Key: SPARK-39732 > URL: https://issues.apache.org/jira/browse/SPARK-39732 > Project: Spark > Issue Type: Bug > Components: Pandas API on Spark >Affects Versions: 3.3.0 >Reporter: Andreas Saltveit >Priority: Major > > import pyspark.pandas as pd > data = [\{"Category": 'A', "ID": 1, "Value": 121.44, "Truth": True}, > \{"Category": 'B', "ID": 2, "Value": 300.01, "Truth": False}, > \{"Category": 'C', "ID": 3, "Value": 10.99, "Truth": None}, > \{"Category": 'E', "ID": 4, "Value": 33.87, "Truth": True} > ] > df = pd.DataFrame(data) > df.display() > --drops dataframe "Query returned no results" > df1=df.drop(["ID","Category"]) > df1.display() > --works > df2=df.drop(["ID","Category"], 1) > df2.display() -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39732) pyspark.pandas.DataFrame.drop drops dataframe if axis not specified
Andreas Saltveit created SPARK-39732: Summary: pyspark.pandas.DataFrame.drop drops dataframe if axis not specified Key: SPARK-39732 URL: https://issues.apache.org/jira/browse/SPARK-39732 Project: Spark Issue Type: Bug Components: Pandas API on Spark Affects Versions: 3.3.0 Reporter: Andreas Saltveit import pyspark.pandas as pd data = [\{"Category": 'A', "ID": 1, "Value": 121.44, "Truth": True}, \{"Category": 'B', "ID": 2, "Value": 300.01, "Truth": False}, \{"Category": 'C', "ID": 3, "Value": 10.99, "Truth": None}, \{"Category": 'E', "ID": 4, "Value": 33.87, "Truth": True} ] df = pd.DataFrame(data) df.display() # drops dataframe "Query returned no results" df1=df.drop(["ID","Category"]) df1.display() # works df2=df.drop(["ID","Category"], 1) df2.display() -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org