Re: Spark 1.6.2 can read hive tables created with sqoop, but Spark 2.0.0 cannot

2016-08-11 Thread cdecleene
The data is uncorrupted as I can create the dataframe from the underlying raw
parquet from spark 2.0.0 if instead of using SparkSession.sql() to create a
dataframe I use SparkSession.read.parquet(). 





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-6-2-can-read-hive-tables-created-with-sqoop-but-Spark-2-0-0-cannot-tp27502p27516.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Spark 1.6.2 can read hive tables created with sqoop, but Spark 2.0.0 cannot

2016-08-10 Thread cdecleene
Using the scala api instead of the python api yields the same results.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-6-2-can-read-hive-tables-created-with-sqoop-but-Spark-2-0-0-cannot-tp27502p27506.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Spark 1.6.2 can read hive tables created with sqoop, but Spark 2.0.0 cannot

2016-08-09 Thread Mich Talebzadeh
Hi,

Is this table created as external table in Hive?

Do you see data through Spark-sql or Hive thrift server.

There is an issue with Zeppelin seeing data when connecting to Spark Thrift
Server. Rows display null value.

HTH



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 9 August 2016 at 22:32, cdecleene  wrote:

> Some details of an example table hive table that spark 2.0 could not
> read...
>
> SerDe Library:
> org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
> InputFormat:
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
> OutputFormat:
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
>
> COLUMN_STATS_ACCURATE   false
> kite.compression.type   snappy
> numFiles0
> numRows -1
> rawDataSize -1
> totalSize0
>
> All fields within the table are of type "string" and there are less than 20
> of them.
>
> When I say that spark 2.0 cannot read the hive table, I mean that when I
> attempt to execute the following from a pyspark shell...
>
> spark = SparkSession.builder.enableHiveSupport().getOrCreate()
> df = spark.sql("SELECT * FROM dra_agency_analytics.raw_ewt_agcy_dim")
>
> ... the dataframe df has the correct number of rows and the correct
> columns,
> but all values read as "None".
>
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Spark-1-6-2-can-read-hive-tables-
> created-with-sqoop-but-Spark-2-0-0-cannot-tp27502.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: Spark 1.6.2 can read hive tables created with sqoop, but Spark 2.0.0 cannot

2016-08-09 Thread Davies Liu
Can you get all the fields back using Scala or SQL (bin/spark-sql)?

On Tue, Aug 9, 2016 at 2:32 PM, cdecleene  wrote:
> Some details of an example table hive table that spark 2.0 could not read...
>
> SerDe Library:
> org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
> InputFormat:
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
> OutputFormat:
> org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
>
> COLUMN_STATS_ACCURATE   false
> kite.compression.type   snappy
> numFiles0
> numRows -1
> rawDataSize -1
> totalSize0
>
> All fields within the table are of type "string" and there are less than 20
> of them.
>
> When I say that spark 2.0 cannot read the hive table, I mean that when I
> attempt to execute the following from a pyspark shell...
>
> spark = SparkSession.builder.enableHiveSupport().getOrCreate()
> df = spark.sql("SELECT * FROM dra_agency_analytics.raw_ewt_agcy_dim")
>
> ... the dataframe df has the correct number of rows and the correct columns,
> but all values read as "None".
>
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-6-2-can-read-hive-tables-created-with-sqoop-but-Spark-2-0-0-cannot-tp27502.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Spark 1.6.2 can read hive tables created with sqoop, but Spark 2.0.0 cannot

2016-08-09 Thread cdecleene
Some details of an example table hive table that spark 2.0 could not read...  

SerDe Library:  
org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
InputFormat:
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
OutputFormat:   
org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat

COLUMN_STATS_ACCURATE   false   
kite.compression.type   snappy  
numFiles0
numRows -1
rawDataSize -1
totalSize0

All fields within the table are of type "string" and there are less than 20
of them. 

When I say that spark 2.0 cannot read the hive table, I mean that when I
attempt to execute the following from a pyspark shell... 

spark = SparkSession.builder.enableHiveSupport().getOrCreate()
df = spark.sql("SELECT * FROM dra_agency_analytics.raw_ewt_agcy_dim")

... the dataframe df has the correct number of rows and the correct columns,
but all values read as "None". 




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-6-2-can-read-hive-tables-created-with-sqoop-but-Spark-2-0-0-cannot-tp27502.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org