Re: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved attributes: pyspark on yarn

2015-03-03 Thread Michael Armbrust
In Spark 1.2 you'll have to create a partitioned hive table

in order to read parquet data in this format.  In Spark 1.3 the parquet
data source will auto discover partitions when they are laid out in this
format.

Michael

On Mon, Jan 5, 2015 at 1:01 PM, Sam Flint  wrote:

> Below is the code that I am running.  I get an error for unresolved
> attributes.  Can anyone point me in the right direction?  Running from
> pyspark shell using yarn "MASTER=yarn-client pyspark"
>
> Error is below code:
>
>
> # Import SQLContext and data types
> from pyspark.sql import *
>
> # sc is an existing SparkContext.
> sqlContext = SQLContext(sc)
>
> # The result of loading a parquet file is also a SchemaRDD.
> # Try loading all data that you have
> parquetFile =
> sqlContext.parquetFile("/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.0.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.1.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.10.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.11.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.2.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.3.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.4.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.5.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.6.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.7.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.8.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.9.parq,/user/hive/warehouse/impala_new_4/key=20141001/f1448ca083a5e224-159572f61b50d7a3_854675293_data.0.parq,/user/hive/warehouse/impala_new_4/key=20141001/f1448ca083a5e224-159572f61b50d7a3_854675293_data.1.parq,/user/hive/warehouse/impala_new_4/key=20141001/f1448ca083a5e224-159572f61b50d7a3_854675293_data.2.parq,/user/hive/warehouse/impala_new_4/key=20141001/f1448ca083a5e224-159572f61b50d7a3_854675293_data.3.parq,/user/hive/warehouse/impala_new_4/key=20141001/f1448ca083a5e224-159572f61b50d7a3_854675293_data.4.parq")
>
>
>
> # Parquet files can also be registered as tables and then used in SQL
> statements.
> parquetFile.registerTempTable("parquetFileone")
>
>
> results = sqlContext.sql("SELECT * FROM parquetFileone where key=20141001
> ")
>
> #print results
> for result in results.collect():
>   print result
>
>
>
> Traceback (most recent call last):
>   File "", line 1, in 
>   File
> "/opt/cloudera/parcels/CDH-5.2.1-1.cdh5.2.1.p0.12/lib/spark/python/pyspark/sql.py",
> line 1615, in collect
> rows = RDD.collect(self)
>   File
> "/opt/cloudera/parcels/CDH-5.2.1-1.cdh5.2.1.p0.12/lib/spark/python/pyspark/rdd.py",
> line 678, in collect
> bytesInJava = self._jrdd.collect().iterator()
>   File
> "/opt/cloudera/parcels/CDH-5.2.1-1.cdh5.2.1.p0.12/lib/spark/python/pyspark/sql.py",
> line 1527, in _jrdd
> self._lazy_jrdd = self._jschema_rdd.javaToPython()
>   File
> "/opt/cloudera/parcels/CDH-5.2.1-1.cdh5.2.1.p0.12/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
> line 538, in __call__
>   File
> "/opt/cloudera/parcels/CDH-5.2.1-1.cdh5.2.1.p0.12/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
> line 300, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling
> o29.javaToPython.
> : org.apache.spark.sql.catalyst.errors.package$TreeNodeException:
> Unresolved attributes: *, tree:
> Project [*]
>  Filter ('key = 20141001)
>   Subquery parquetFileone
>ParquetRelation
> /user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.0.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.1.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.10.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.11.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.2.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.3.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.4.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.5.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.6.parq,/user/hive/w

Re: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved attributes: pyspark on yarn

2015-03-03 Thread Gustavo Enrique Salazar Torres
Hi Sam:

Shouldn't you define the table schema? I had the same problem in Scala and
then I solved it defining the schema. I did this:

sqlContext.applySchema(dataRDD, tableSchema).registerTempTable(tableName)

Hope it helps.

On Mon, Jan 5, 2015 at 7:01 PM, Sam Flint  wrote:

> Below is the code that I am running.  I get an error for unresolved
> attributes.  Can anyone point me in the right direction?  Running from
> pyspark shell using yarn "MASTER=yarn-client pyspark"
>
> Error is below code:
>
>
> # Import SQLContext and data types
> from pyspark.sql import *
>
> # sc is an existing SparkContext.
> sqlContext = SQLContext(sc)
>
> # The result of loading a parquet file is also a SchemaRDD.
> # Try loading all data that you have
> parquetFile =
> sqlContext.parquetFile("/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.0.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.1.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.10.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.11.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.2.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.3.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.4.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.5.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.6.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.7.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.8.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.9.parq,/user/hive/warehouse/impala_new_4/key=20141001/f1448ca083a5e224-159572f61b50d7a3_854675293_data.0.parq,/user/hive/warehouse/impala_new_4/key=20141001/f1448ca083a5e224-159572f61b50d7a3_854675293_data.1.parq,/user/hive/warehouse/impala_new_4/key=20141001/f1448ca083a5e224-159572f61b50d7a3_854675293_data.2.parq,/user/hive/warehouse/impala_new_4/key=20141001/f1448ca083a5e224-159572f61b50d7a3_854675293_data.3.parq,/user/hive/warehouse/impala_new_4/key=20141001/f1448ca083a5e224-159572f61b50d7a3_854675293_data.4.parq")
>
>
>
> # Parquet files can also be registered as tables and then used in SQL
> statements.
> parquetFile.registerTempTable("parquetFileone")
>
>
> results = sqlContext.sql("SELECT * FROM parquetFileone where key=20141001
> ")
>
> #print results
> for result in results.collect():
>   print result
>
>
>
> Traceback (most recent call last):
>   File "", line 1, in 
>   File
> "/opt/cloudera/parcels/CDH-5.2.1-1.cdh5.2.1.p0.12/lib/spark/python/pyspark/sql.py",
> line 1615, in collect
> rows = RDD.collect(self)
>   File
> "/opt/cloudera/parcels/CDH-5.2.1-1.cdh5.2.1.p0.12/lib/spark/python/pyspark/rdd.py",
> line 678, in collect
> bytesInJava = self._jrdd.collect().iterator()
>   File
> "/opt/cloudera/parcels/CDH-5.2.1-1.cdh5.2.1.p0.12/lib/spark/python/pyspark/sql.py",
> line 1527, in _jrdd
> self._lazy_jrdd = self._jschema_rdd.javaToPython()
>   File
> "/opt/cloudera/parcels/CDH-5.2.1-1.cdh5.2.1.p0.12/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
> line 538, in __call__
>   File
> "/opt/cloudera/parcels/CDH-5.2.1-1.cdh5.2.1.p0.12/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
> line 300, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling
> o29.javaToPython.
> : org.apache.spark.sql.catalyst.errors.package$TreeNodeException:
> Unresolved attributes: *, tree:
> Project [*]
>  Filter ('key = 20141001)
>   Subquery parquetFileone
>ParquetRelation
> /user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.0.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.1.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.10.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.11.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.2.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.3.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.4.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.5.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.6.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.7.parq,/user/hiv

org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved attributes: pyspark on yarn

2015-01-05 Thread Sam Flint
Below is the code that I am running.  I get an error for unresolved
attributes.  Can anyone point me in the right direction?  Running from
pyspark shell using yarn "MASTER=yarn-client pyspark"

Error is below code:


# Import SQLContext and data types
from pyspark.sql import *

# sc is an existing SparkContext.
sqlContext = SQLContext(sc)

# The result of loading a parquet file is also a SchemaRDD.
# Try loading all data that you have
parquetFile =
sqlContext.parquetFile("/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.0.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.1.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.10.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.11.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.2.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.3.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.4.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.5.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.6.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.7.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.8.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.9.parq,/user/hive/warehouse/impala_new_4/key=20141001/f1448ca083a5e224-159572f61b50d7a3_854675293_data.0.parq,/user/hive/warehouse/impala_new_4/key=20141001/f1448ca083a5e224-159572f61b50d7a3_854675293_data.1.parq,/user/hive/warehouse/impala_new_4/key=20141001/f1448ca083a5e224-159572f61b50d7a3_854675293_data.2.parq,/user/hive/warehouse/impala_new_4/key=20141001/f1448ca083a5e224-159572f61b50d7a3_854675293_data.3.parq,/user/hive/warehouse/impala_new_4/key=20141001/f1448ca083a5e224-159572f61b50d7a3_854675293_data.4.parq")



# Parquet files can also be registered as tables and then used in SQL
statements.
parquetFile.registerTempTable("parquetFileone")


results = sqlContext.sql("SELECT * FROM parquetFileone where key=20141001 ")

#print results
for result in results.collect():
  print result



Traceback (most recent call last):
  File "", line 1, in 
  File
"/opt/cloudera/parcels/CDH-5.2.1-1.cdh5.2.1.p0.12/lib/spark/python/pyspark/sql.py",
line 1615, in collect
rows = RDD.collect(self)
  File
"/opt/cloudera/parcels/CDH-5.2.1-1.cdh5.2.1.p0.12/lib/spark/python/pyspark/rdd.py",
line 678, in collect
bytesInJava = self._jrdd.collect().iterator()
  File
"/opt/cloudera/parcels/CDH-5.2.1-1.cdh5.2.1.p0.12/lib/spark/python/pyspark/sql.py",
line 1527, in _jrdd
self._lazy_jrdd = self._jschema_rdd.javaToPython()
  File
"/opt/cloudera/parcels/CDH-5.2.1-1.cdh5.2.1.p0.12/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
line 538, in __call__
  File
"/opt/cloudera/parcels/CDH-5.2.1-1.cdh5.2.1.p0.12/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling
o29.javaToPython.
: org.apache.spark.sql.catalyst.errors.package$TreeNodeException:
Unresolved attributes: *, tree:
Project [*]
 Filter ('key = 20141001)
  Subquery parquetFileone
   ParquetRelation
/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.0.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.1.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.10.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.11.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.2.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.3.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.4.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.5.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.6.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.7.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.8.parq,/user/hive/warehouse/impala_new_4/key=20141001/69446344000a3a17-c90aac1f33a0fbc_875501925_data.9.parq,/user/hive/warehouse/impala_new_4/key=20141001/f1448ca083a5e224-159572f61b50d7a3_854675293_data.0.parq,/user/hive/warehouse/impala_new_4/key=20141001/f1448ca083a5e224-159572f61b50d7a3_854675293_