[jira] [Updated] (SPARK-8670) Nested columns can't be referenced (but they can be selected)

2015-08-13 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-8670:

Assignee: Wenchen Fan

 Nested columns can't be referenced (but they can be selected)
 -

 Key: SPARK-8670
 URL: https://issues.apache.org/jira/browse/SPARK-8670
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark, SQL
Affects Versions: 1.4.0, 1.4.1, 1.5.0
Reporter: Nicholas Chammas
Assignee: Wenchen Fan
Priority: Blocker

 This is strange and looks like a regression from 1.3.
 {code}
 import json
 daterz = [
   {
 'name': 'Nick',
 'stats': {
   'age': 28
 }
   },
   {
 'name': 'George',
 'stats': {
   'age': 31
 }
   }
 ]
 df = sqlContext.jsonRDD(sc.parallelize(daterz).map(lambda x: json.dumps(x)))
 df.select('stats.age').show()
 df['stats.age']  # 1.4 fails on this line
 {code}
 On 1.3 this works and yields:
 {code}
 age
 28 
 31 
 Out[1]: Columnstats.age AS age#2958L
 {code}
 On 1.4, however, this gives an error on the last line:
 {code}
 +---+
 |age|
 +---+
 | 28|
 | 31|
 +---+
 ---
 IndexErrorTraceback (most recent call last)
 ipython-input-1-04bd990e94c6 in module()
  19 
  20 df.select('stats.age').show()
 --- 21 df['stats.age']
 /path/to/spark/python/pyspark/sql/dataframe.pyc in __getitem__(self, item)
 678 if isinstance(item, basestring):
 679 if item not in self.columns:
 -- 680 raise IndexError(no such column: %s % item)
 681 jc = self._jdf.apply(item)
 682 return Column(jc)
 IndexError: no such column: stats.age
 {code}
 This means, among other things, that you can't join DataFrames on nested 
 columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8670) Nested columns can't be referenced (but they can be selected)

2015-08-12 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-8670:
--
Priority: Critical  (was: Major)

 Nested columns can't be referenced (but they can be selected)
 -

 Key: SPARK-8670
 URL: https://issues.apache.org/jira/browse/SPARK-8670
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark, SQL
Affects Versions: 1.4.0, 1.4.1, 1.5.0
Reporter: Nicholas Chammas
Priority: Critical

 This is strange and looks like a regression from 1.3.
 {code}
 import json
 daterz = [
   {
 'name': 'Nick',
 'stats': {
   'age': 28
 }
   },
   {
 'name': 'George',
 'stats': {
   'age': 31
 }
   }
 ]
 df = sqlContext.jsonRDD(sc.parallelize(daterz).map(lambda x: json.dumps(x)))
 df.select('stats.age').show()
 df['stats.age']  # 1.4 fails on this line
 {code}
 On 1.3 this works and yields:
 {code}
 age
 28 
 31 
 Out[1]: Columnstats.age AS age#2958L
 {code}
 On 1.4, however, this gives an error on the last line:
 {code}
 +---+
 |age|
 +---+
 | 28|
 | 31|
 +---+
 ---
 IndexErrorTraceback (most recent call last)
 ipython-input-1-04bd990e94c6 in module()
  19 
  20 df.select('stats.age').show()
 --- 21 df['stats.age']
 /path/to/spark/python/pyspark/sql/dataframe.pyc in __getitem__(self, item)
 678 if isinstance(item, basestring):
 679 if item not in self.columns:
 -- 680 raise IndexError(no such column: %s % item)
 681 jc = self._jdf.apply(item)
 682 return Column(jc)
 IndexError: no such column: stats.age
 {code}
 This means, among other things, that you can't join DataFrames on nested 
 columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8670) Nested columns can't be referenced (but they can be selected)

2015-08-12 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-8670:
--
Issue Type: Sub-task  (was: Bug)
Parent: SPARK-9564

 Nested columns can't be referenced (but they can be selected)
 -

 Key: SPARK-8670
 URL: https://issues.apache.org/jira/browse/SPARK-8670
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark, SQL
Affects Versions: 1.4.0, 1.4.1, 1.5.0
Reporter: Nicholas Chammas

 This is strange and looks like a regression from 1.3.
 {code}
 import json
 daterz = [
   {
 'name': 'Nick',
 'stats': {
   'age': 28
 }
   },
   {
 'name': 'George',
 'stats': {
   'age': 31
 }
   }
 ]
 df = sqlContext.jsonRDD(sc.parallelize(daterz).map(lambda x: json.dumps(x)))
 df.select('stats.age').show()
 df['stats.age']  # 1.4 fails on this line
 {code}
 On 1.3 this works and yields:
 {code}
 age
 28 
 31 
 Out[1]: Columnstats.age AS age#2958L
 {code}
 On 1.4, however, this gives an error on the last line:
 {code}
 +---+
 |age|
 +---+
 | 28|
 | 31|
 +---+
 ---
 IndexErrorTraceback (most recent call last)
 ipython-input-1-04bd990e94c6 in module()
  19 
  20 df.select('stats.age').show()
 --- 21 df['stats.age']
 /path/to/spark/python/pyspark/sql/dataframe.pyc in __getitem__(self, item)
 678 if isinstance(item, basestring):
 679 if item not in self.columns:
 -- 680 raise IndexError(no such column: %s % item)
 681 jc = self._jdf.apply(item)
 682 return Column(jc)
 IndexError: no such column: stats.age
 {code}
 This means, among other things, that you can't join DataFrames on nested 
 columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8670) Nested columns can't be referenced (but they can be selected)

2015-08-12 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-8670:

Target Version/s: 1.5.0
Priority: Blocker  (was: Critical)

 Nested columns can't be referenced (but they can be selected)
 -

 Key: SPARK-8670
 URL: https://issues.apache.org/jira/browse/SPARK-8670
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark, SQL
Affects Versions: 1.4.0, 1.4.1, 1.5.0
Reporter: Nicholas Chammas
Priority: Blocker

 This is strange and looks like a regression from 1.3.
 {code}
 import json
 daterz = [
   {
 'name': 'Nick',
 'stats': {
   'age': 28
 }
   },
   {
 'name': 'George',
 'stats': {
   'age': 31
 }
   }
 ]
 df = sqlContext.jsonRDD(sc.parallelize(daterz).map(lambda x: json.dumps(x)))
 df.select('stats.age').show()
 df['stats.age']  # 1.4 fails on this line
 {code}
 On 1.3 this works and yields:
 {code}
 age
 28 
 31 
 Out[1]: Columnstats.age AS age#2958L
 {code}
 On 1.4, however, this gives an error on the last line:
 {code}
 +---+
 |age|
 +---+
 | 28|
 | 31|
 +---+
 ---
 IndexErrorTraceback (most recent call last)
 ipython-input-1-04bd990e94c6 in module()
  19 
  20 df.select('stats.age').show()
 --- 21 df['stats.age']
 /path/to/spark/python/pyspark/sql/dataframe.pyc in __getitem__(self, item)
 678 if isinstance(item, basestring):
 679 if item not in self.columns:
 -- 680 raise IndexError(no such column: %s % item)
 681 jc = self._jdf.apply(item)
 682 return Column(jc)
 IndexError: no such column: stats.age
 {code}
 This means, among other things, that you can't join DataFrames on nested 
 columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8670) Nested columns can't be referenced (but they can be selected)

2015-08-12 Thread Josh Rosen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-8670:
--
Affects Version/s: 1.5.0
   1.4.1

 Nested columns can't be referenced (but they can be selected)
 -

 Key: SPARK-8670
 URL: https://issues.apache.org/jira/browse/SPARK-8670
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark, SQL
Affects Versions: 1.4.0, 1.4.1, 1.5.0
Reporter: Nicholas Chammas

 This is strange and looks like a regression from 1.3.
 {code}
 import json
 daterz = [
   {
 'name': 'Nick',
 'stats': {
   'age': 28
 }
   },
   {
 'name': 'George',
 'stats': {
   'age': 31
 }
   }
 ]
 df = sqlContext.jsonRDD(sc.parallelize(daterz).map(lambda x: json.dumps(x)))
 df.select('stats.age').show()
 df['stats.age']  # 1.4 fails on this line
 {code}
 On 1.3 this works and yields:
 {code}
 age
 28 
 31 
 Out[1]: Columnstats.age AS age#2958L
 {code}
 On 1.4, however, this gives an error on the last line:
 {code}
 +---+
 |age|
 +---+
 | 28|
 | 31|
 +---+
 ---
 IndexErrorTraceback (most recent call last)
 ipython-input-1-04bd990e94c6 in module()
  19 
  20 df.select('stats.age').show()
 --- 21 df['stats.age']
 /path/to/spark/python/pyspark/sql/dataframe.pyc in __getitem__(self, item)
 678 if isinstance(item, basestring):
 679 if item not in self.columns:
 -- 680 raise IndexError(no such column: %s % item)
 681 jc = self._jdf.apply(item)
 682 return Column(jc)
 IndexError: no such column: stats.age
 {code}
 This means, among other things, that you can't join DataFrames on nested 
 columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8670) Nested columns can't be referenced (but they can be selected)

2015-06-26 Thread Nicholas Chammas (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicholas Chammas updated SPARK-8670:

Description: 
This is strange and looks like a regression from 1.3.

{code}
import json

daterz = [
  {
'name': 'Nick',
'stats': {
  'age': 28
}
  },
  {
'name': 'George',
'stats': {
  'age': 31
}
  }
]

df = sqlContext.jsonRDD(sc.parallelize(daterz).map(lambda x: json.dumps(x)))

df.select('stats.age').show()
df['stats.age']  # 1.4 fails on this line
{code}

On 1.3 this works and yields:

{code}
age
28 
31 
Out[1]: Columnstats.age AS age#2958L
{code}

On 1.4, however, this gives an error on the last line:

{code}
+---+
|age|
+---+
| 28|
| 31|
+---+

---
IndexErrorTraceback (most recent call last)
ipython-input-1-04bd990e94c6 in module()
 19 
 20 df.select('stats.age').show()
--- 21 df['stats.age']

/path/to/spark/python/pyspark/sql/dataframe.pyc in __getitem__(self, item)
678 if isinstance(item, basestring):
679 if item not in self.columns:
-- 680 raise IndexError(no such column: %s % item)
681 jc = self._jdf.apply(item)
682 return Column(jc)

IndexError: no such column: stats.age
{code}

This means, among other things, that you can't join DataFrames on nested 
columns.

  was:
This is strange and looks like a regression from 1.3.

{code}
import json

daterz = [
  {
'name': 'Nick',
'stats': {
  'age': 28
}
  },
  {
'name': 'George',
'stats': {
  'age': 31
}
  }
]

df = sqlContext.jsonRDD(sc.parallelize(daterz).map(lambda x: json.dumps(x)))

df.select('stats.age').show()
df['stats.age']  # 1.4 fails on this line
{code}

On 1.3 this works and yields:

{code}
age
28 
31 
Out[1]: Columnstats.age AS age#2958L
{code}

On 1.4, however, this gives an error on the last line:

{code}
+---+
|age|
+---+
| 28|
| 31|
+---+

---
IndexErrorTraceback (most recent call last)
ipython-input-1-04bd990e94c6 in module()
 19 
 20 df.select('stats.age').show()
--- 21 df['stats.age']

/path/to/spark/python/pyspark/sql/dataframe.pyc in __getitem__(self, item)
678 if isinstance(item, basestring):
679 if item not in self.columns:
-- 680 raise IndexError(no such column: %s % item)
681 jc = self._jdf.apply(item)
682 return Column(jc)

IndexError: no such column: stats.age
{code}


 Nested columns can't be referenced (but they can be selected)
 -

 Key: SPARK-8670
 URL: https://issues.apache.org/jira/browse/SPARK-8670
 Project: Spark
  Issue Type: Bug
  Components: PySpark, SQL
Affects Versions: 1.4.0
Reporter: Nicholas Chammas

 This is strange and looks like a regression from 1.3.
 {code}
 import json
 daterz = [
   {
 'name': 'Nick',
 'stats': {
   'age': 28
 }
   },
   {
 'name': 'George',
 'stats': {
   'age': 31
 }
   }
 ]
 df = sqlContext.jsonRDD(sc.parallelize(daterz).map(lambda x: json.dumps(x)))
 df.select('stats.age').show()
 df['stats.age']  # 1.4 fails on this line
 {code}
 On 1.3 this works and yields:
 {code}
 age
 28 
 31 
 Out[1]: Columnstats.age AS age#2958L
 {code}
 On 1.4, however, this gives an error on the last line:
 {code}
 +---+
 |age|
 +---+
 | 28|
 | 31|
 +---+
 ---
 IndexErrorTraceback (most recent call last)
 ipython-input-1-04bd990e94c6 in module()
  19 
  20 df.select('stats.age').show()
 --- 21 df['stats.age']
 /path/to/spark/python/pyspark/sql/dataframe.pyc in __getitem__(self, item)
 678 if isinstance(item, basestring):
 679 if item not in self.columns:
 -- 680 raise IndexError(no such column: %s % item)
 681 jc = self._jdf.apply(item)
 682 return Column(jc)
 IndexError: no such column: stats.age
 {code}
 This means, among other things, that you can't join DataFrames on nested 
 columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org