[ https://issues.apache.org/jira/browse/SPARK-8670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Josh Rosen updated SPARK-8670: ------------------------------ Affects Version/s: 1.5.0 1.4.1 > Nested columns can't be referenced (but they can be selected) > ------------------------------------------------------------- > > Key: SPARK-8670 > URL: https://issues.apache.org/jira/browse/SPARK-8670 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL > Affects Versions: 1.4.0, 1.4.1, 1.5.0 > Reporter: Nicholas Chammas > > This is strange and looks like a regression from 1.3. > {code} > import json > daterz = [ > { > 'name': 'Nick', > 'stats': { > 'age': 28 > } > }, > { > 'name': 'George', > 'stats': { > 'age': 31 > } > } > ] > df = sqlContext.jsonRDD(sc.parallelize(daterz).map(lambda x: json.dumps(x))) > df.select('stats.age').show() > df['stats.age'] # 1.4 fails on this line > {code} > On 1.3 this works and yields: > {code} > age > 28 > 31 > Out[1]: Column<stats.age AS age#2958L> > {code} > On 1.4, however, this gives an error on the last line: > {code} > +---+ > |age| > +---+ > | 28| > | 31| > +---+ > --------------------------------------------------------------------------- > IndexError Traceback (most recent call last) > <ipython-input-1-04bd990e94c6> in <module>() > 19 > 20 df.select('stats.age').show() > ---> 21 df['stats.age'] > /path/to/spark/python/pyspark/sql/dataframe.pyc in __getitem__(self, item) > 678 if isinstance(item, basestring): > 679 if item not in self.columns: > --> 680 raise IndexError("no such column: %s" % item) > 681 jc = self._jdf.apply(item) > 682 return Column(jc) > IndexError: no such column: stats.age > {code} > This means, among other things, that you can't join DataFrames on nested > columns. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org