[ https://issues.apache.org/jira/browse/SPARK-49041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated SPARK-49041: ----------------------------------- Labels: pull-request-available (was: ) > Raise proper error for dropDuplicates when wrong subset is given > ---------------------------------------------------------------- > > Key: SPARK-49041 > URL: https://issues.apache.org/jira/browse/SPARK-49041 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 4.0.0 > Reporter: Haejoon Lee > Priority: Major > Labels: pull-request-available > > Currently dropDuplicates raise unrelated internal error so we should improve > it > {code:java} > >>> df.dropDuplicates(None) > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > File "/.../spark/python/pyspark/sql/classic/dataframe.py", line 1249, in > dropDuplicates > jdf = self._jdf.dropDuplicates(self._jseq(subset)) > File "/.../spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", > line 1322, in __call__ > return_value = get_return_value( > File "/.../spark/python/pyspark/errors/exceptions/captured.py", line 247, > in deco > return f(*a, **kw) > File "/.../spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line > 326, in get_return_value > raise Py4JJavaError( > py4j.protocol.Py4JJavaError: An error occurred while calling > o56.dropDuplicates. > : org.apache.spark.SparkException: [INTERNAL_ERROR] Undefined error message > parameter for error class: '_LEGACY_ERROR_TEMP_1201', MessageTemplate: Cannot > resolve column name "<colName>" among (<fieldNames>)., Parameters: > Map(colName -> null, fieldNames -> name, age) SQLSTATE: XX000 > at > org.apache.spark.SparkException$.internalError(SparkException.scala:107) > at > org.apache.spark.ErrorClassesJsonReader.getErrorMessage(ErrorClassesJSONReader.scala:58) > at > org.apache.spark.SparkThrowableHelper$.getMessage(SparkThrowableHelper.scala:56) > at > org.apache.spark.SparkThrowableHelper$.getMessage(SparkThrowableHelper.scala:43) > at > org.apache.spark.sql.AnalysisException.<init>(AnalysisException.scala:47) > at > org.apache.spark.sql.AnalysisException.<init>(AnalysisException.scala:82) > at > org.apache.spark.sql.errors.QueryCompilationErrors$.cannotResolveColumnNameAmongAttributesError(QueryCompilationErrors.scala:2257) > at > org.apache.spark.sql.Dataset.$anonfun$groupColsFromDropDuplicates$1(Dataset.scala:3267) > at scala.collection.immutable.List.flatMap(List.scala:294) > at scala.collection.immutable.List.flatMap(List.scala:79) > at > org.apache.spark.sql.Dataset.groupColsFromDropDuplicates(Dataset.scala:3261) > at org.apache.spark.sql.Dataset.dropDuplicates(Dataset.scala:3126) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:568) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374) > at py4j.Gateway.invoke(Gateway.java:282) > at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) > at py4j.commands.CallCommand.execute(CallCommand.java:79) > at > py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182) > at py4j.ClientServerConnection.run(ClientServerConnection.java:106) > at java.base/java.lang.Thread.run(Thread.java:840) > Caused by: java.lang.IllegalArgumentException: Cannot resolve variable > 'colName' (enableSubstitutionInVariables=false). > at > org.apache.commons.text.StringSubstitutor.substitute(StringSubstitutor.java:1535) > at > org.apache.commons.text.StringSubstitutor.substitute(StringSubstitutor.java:1392) > at > org.apache.commons.text.StringSubstitutor.replace(StringSubstitutor.java:896) > at > org.apache.spark.ErrorClassesJsonReader.getErrorMessage(ErrorClassesJSONReader.scala:53) > ... 22 more {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org