Hi Cheng, thank you very much for helping me to finally find out the secret
of this magic...
actually we defined this external table with
SID STRING
REQUEST_ID STRING
TIMES_DQ TIMESTAMP
TOTAL_PRICE FLOAT
...
using "desc table ext_fullorders" it is only shown as
[# col_name data_type comment ]
...
[times_dq string from deserializer ]
[total_price string from deserializer ]
...
because, as you said, CSVSerde sets all field object inspectors to
javaStringObjectInspector
and therefore there are comments "from deserializer"
but in StorageDescriptor, are the real user defined types,
using "desc extended table ext_fullorders" we can see his
sd:StorageDescriptor
is:
FieldSchema(name:times_dq, type:timestamp, comment:null),
FieldSchema(name:total_price, type:float, comment:null)
and Spark HiveContext reads the schema info from this StorageDescriptor
https://github.com/apache/spark/blob/7e191fe29bb09a8560cd75d453c4f7f662dff406/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala#L316
so, in the SchemaRDD, the fields in Row were filled with strings (via
fillObject, all of values were retrieved from CSVSerDe with
javaStringObjectInspector)
but Spark considers that some of them are float or timestamp (schema info
were got from sd:StorageDescriptor)
crazy...
and sorry for update on the weekend...
a little more about how i fand this problem and why it is a trouble for us.
we use the new spark thrift server, to query normal managed hive table, it
works fine
but when we try to access the external tables with custom SerDe such as this
CSVSerDe, then we will get this ClassCastException, such as:
java.lang.ClassCastException: java.lang.String cannot be cast to
java.lang.Float
the reason is
https://github.com/apache/spark/blob/d94a44d7caaf3fe7559d9ad7b10872fa16cf81ca/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/server/SparkSQLOperationManager.scala#L104-L105
here Spark's thrift server try to get a float value from SparkRow, because
in the schema info (sd:StorageDescriptor) this column is float, but actually
in SparkRow, this field was filled with string value...
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/HiveContext-schemaRDD-printSchema-get-different-dataTypes-feature-or-a-bug-really-strange-and-surpri-tp8035p8157.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]