Hello,

Is there a way to find the DDL of the “temporary” view created in current 
session with spark sql:

For example :
create or replace temporary view
tmp_v as
select
    c1 from table table_x;

“Show create table “ does not work for this case as it is not a table .
“Describe” could  show the columns while not the ddl.


Thanks very much.
Keith

From: Anastasios Zouzias [mailto:zouz...@gmail.com]
Sent: Sunday, October 1, 2017 3:05 PM
To: Kanagha Kumar <kpra...@salesforce.com>
Cc: user @spark <user@spark.apache.org>
Subject: Re: Error - Spark reading from HDFS via dataframes - Java

Hi,

Set the inferschema option to true in spark-csv. you may also want to set the 
mode option. See readme below

https://github.com/databricks/spark-csv/blob/master/README.md<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdatabricks%2Fspark-csv%2Fblob%2Fmaster%2FREADME.md&data=02%7C01%7Caisun%40ebay.com%7C06202a50b81d4fc9b8bb08d5089ad6cf%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C636424383471862502&sdata=7ksnhv3SpxslH6w%2BauGRw9qnDmu7aWg8tagUwWdFBd8%3D&reserved=0>

Best,
Anastasios

Am 01.10.2017 07:58 schrieb "Kanagha Kumar" 
<kpra...@salesforce.com<mailto:kpra...@salesforce.com>>:
Hi,

I'm trying to read data from HDFS in spark as dataframes. Printing the schema, 
I see all columns are being read as strings. I'm converting it to RDDs and 
creating another dataframe by passing in the correct schema ( how the rows 
should be interpreted finally).

I'm getting the following error:

Caused by: java.lang.RuntimeException: java.lang.String is not a valid external 
type for schema of bigint


Spark read API:

Dataset<Row> hdfs_dataset = new SQLContext(spark).read().option("header", 
"false").csv("hdfs:/inputpath/*");

Dataset<Row> ds = new 
SQLContext(spark).createDataFrame(hdfs_dataset.toJavaRDD(), conversionSchema);
This is the schema to be converted to:
StructType(StructField(COL1,StringType,true),
StructField(COL2,StringType,true),
StructField(COL3,LongType,true),
StructField(COL4,StringType,true),
StructField(COL5,StringType,true),
StructField(COL6,LongType,true))

This is the original schema obtained once read API was invoked
StructType(StructField(_c1,StringType,true),
StructField(_c2,StringType,true),
StructField(_c3,StringType,true),
StructField(_c4,StringType,true),
StructField(_c5,StringType,true),
StructField(_c6,StringType,true))

My interpretation is even when a JavaRDD is cast to dataframe by passing in the 
new schema, values are not getting type casted.
This is occurring because the above read API reads data as string types from 
HDFS.

How can I  convert an RDD to dataframe by passing in the correct schema once it 
is read?
How can the values by type cast correctly during this RDD to dataframe 
conversion?

Or how can I read data from HDFS with an input schema in java?
Any suggestions are helpful. Thanks!


Reply via email to