[ 
https://issues.apache.org/jira/browse/SPARK-5462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen reassigned SPARK-5462:
---------------------------------

    Assignee: Josh Rosen

> Catalyst UnresolvedException "Invalid call to qualifiers on unresolved 
> object" error when accessing fields in Python DataFrame
> ------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-5462
>                 URL: https://issues.apache.org/jira/browse/SPARK-5462
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, SQL
>    Affects Versions: 1.3.0
>            Reporter: Josh Rosen
>            Assignee: Josh Rosen
>            Priority: Blocker
>
> When trying to access fields on a Python DataFrame created via inferSchema, I 
> ran into a confusing Catalyst Py4J error.  Here's a reproduction:
> {code}
> from pyspark import SparkContext
> from pyspark.sql import SQLContext, Row
> sc = SparkContext("local", "test")
> sqlContext = SQLContext(sc)
> # Load a text file and convert each line to a Row.
> lines = sc.textFile("examples/src/main/resources/people.txt")
> parts = lines.map(lambda l: l.split(","))
> people = parts.map(lambda p: Row(name=p[0], age=int(p[1])))
> # Infer the schema, and register the SchemaRDD as a table.
> schemaPeople = sqlContext.inferSchema(people)
> schemaPeople.registerTempTable("people")
> # SQL can be run over SchemaRDDs that have been registered as a table.
> teenagers = sqlContext.sql("SELECT name FROM people WHERE age >= 13 AND age 
> <= 19")
> print teenagers.name
> {code}
> This fails with the following error:
> {code}
> Traceback (most recent call last):
>   File "/Users/joshrosen/Documents/spark/sqltest.py", line 19, in <module>
>     print teenagers.name
>   File "/Users/joshrosen/Documents/Spark/python/pyspark/sql.py", line 2154, 
> in __getattr__
>     return Column(self._jdf.apply(name))
>   File 
> "/Users/joshrosen/Documents/Spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
>  line 538, in __call__
>   File 
> "/Users/joshrosen/Documents/Spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
>  line 300, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling o66.apply.
> : org.apache.spark.sql.catalyst.analysis.UnresolvedException: Invalid call to 
> qualifiers on unresolved object, tree: 'name
>       at 
> org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.qualifiers(unresolved.scala:50)
>       at 
> org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute.qualifiers(unresolved.scala:46)
>       at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$2.apply(LogicalPlan.scala:143)
>       at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$2.apply(LogicalPlan.scala:140)
>       at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
>       at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
>       at scala.collection.immutable.List.foreach(List.scala:318)
>       at 
> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
>       at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
>       at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:140)
>       at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:126)
>       at org.apache.spark.sql.DataFrame.resolve(DataFrame.scala:122)
>       at org.apache.spark.sql.DataFrame.apply(DataFrame.scala:237)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>       at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>       at py4j.Gateway.invoke(Gateway.java:259)
>       at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
>       at py4j.commands.CallCommand.execute(CallCommand.java:79)
>       at py4j.GatewayConnection.run(GatewayConnection.java:207)
>       at java.lang.Thread.run(Thread.java:745)
> {code}
> This is distinct from the helpful error message that I get when trying to 
> access a non-existent column.  This error didn't occur when I tried the same 
> thing with a DataFrame created via jsonRDD.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to