[ https://issues.apache.org/jira/browse/SPARK-5462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Reynold Xin resolved SPARK-5462. -------------------------------- Resolution: Fixed Fix Version/s: 1.3.0 Assignee: Josh Rosen > Catalyst UnresolvedException "Invalid call to qualifiers on unresolved > object" error when accessing fields in DataFrames returned from sqlCtx.sql() > --------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: SPARK-5462 > URL: https://issues.apache.org/jira/browse/SPARK-5462 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.3.0 > Reporter: Josh Rosen > Assignee: Josh Rosen > Priority: Blocker > Fix For: 1.3.0 > > > When trying to access fields on a Python DataFrame created via inferSchema, I > ran into a confusing Catalyst Py4J error. Here's a reproduction: > {code} > from pyspark import SparkContext > from pyspark.sql import SQLContext, Row > sc = SparkContext("local", "test") > sqlContext = SQLContext(sc) > # Load a text file and convert each line to a Row. > lines = sc.textFile("examples/src/main/resources/people.txt") > parts = lines.map(lambda l: l.split(",")) > people = parts.map(lambda p: Row(name=p[0], age=int(p[1]))) > # Infer the schema, and register the SchemaRDD as a table. > schemaPeople = sqlContext.inferSchema(people) > schemaPeople.registerTempTable("people") > # SQL can be run over SchemaRDDs that have been registered as a table. > teenagers = sqlContext.sql("SELECT name FROM people WHERE age >= 13 AND age > <= 19") > print teenagers.name > {code} > This fails with the following error: > {code} > Traceback (most recent call last): > File "/Users/joshrosen/Documents/spark/sqltest.py", line 19, in <module> > print teenagers.name > File "/Users/joshrosen/Documents/Spark/python/pyspark/sql.py", line 2154, > in __getattr__ > return Column(self._jdf.apply(name)) > File > "/Users/joshrosen/Documents/Spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", > line 538, in __call__ > File > "/Users/joshrosen/Documents/Spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", > line 300, in get_return_value > py4j.protocol.Py4JJavaError: An error occurred while calling o66.apply. > : org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to > qualifiers on unresolved object, tree: 'name > at > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.qualifiers(unresolved.scala:50) > at > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.qualifiers(unresolved.scala:46) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$2.apply(LogicalPlan.scala:143) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$2.apply(LogicalPlan.scala:140) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) > at scala.collection.immutable.List.foreach(List.scala:318) > at > scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251) > at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:140) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:126) > at org.apache.spark.sql.DataFrame.resolve(DataFrame.scala:122) > at org.apache.spark.sql.DataFrame.apply(DataFrame.scala:237) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) > at py4j.Gateway.invoke(Gateway.java:259) > at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) > at py4j.commands.CallCommand.execute(CallCommand.java:79) > at py4j.GatewayConnection.run(GatewayConnection.java:207) > at java.lang.Thread.run(Thread.java:745) > {code} > This is distinct from the helpful error message that I get when trying to > access a non-existent column. This error didn't occur when I tried the same > thing with a DataFrame created via jsonRDD. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org