Hi Team,

Data is returned when queried from hive.
But not in spark ,Could you assist in finding the gap.

Details below

******************************Approach 1 --- 
successful****************************

select * from emp_cow limit 2;
20190503171506  20190503171506_0_424    4       default 
71ff4cc6-bd8e-4c48-a075-98f32efc14b2_0_20190503171506.parquet  413Vivian Walter 
-1641   1556883906604   608806001       511.63  1461868200000   401217383000
20190503171506  20190503171506_0_425    8       default 
71ff4cc6-bd8e-4c48-a075-98f32efc14b2_0_20190503171506.parquet  813Oprah Gross   
-32255  1556883906604   761166471       536.4   1516473000000   816189568000

******************************Approach 2 --- 
successful****************************

spark.read.format("com.uber.hoodie").load("/apps/hive/warehouse/emp_cow_03/default/*").show
+-------------------+--------------------+------------------+----------------------+--------------------+------+------------------+---------+-------------+---------+---------+-------------+-------------+
|_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|
   _hoodie_file_name|emp_id|          emp_name|emp_short|           ts| 
emp_long|emp_float|     emp_date|emp_timestamp|
+-------------------+--------------------+------------------+----------------------+--------------------+------+------------------+---------+-------------+---------+---------+-------------+-------------+
|     20190503171506|20190503171506_0_424|                 4|               
default|71ff4cc6-bd8e-4c4...|     4|   13Vivian Walter|    
-1641|1556883906604|608806001|   511.63|1461868200000| 401217383000|
+----

******************************Approach 3 --- No 
records****************************


***To read RO table as a Hive table using Spark****
But when I read from spark as hive table - no records returned.


sqlContext.sql("select * from hudi.emp_cow_03").show; ---- in scala console 
select * from hudi.emp_cow_03                                       ---- in 
spark console

NO result.

Only headers/column names are printed.


FYI Table DDL


CREATE EXTERNAL TABLE `emp_cow`(
  `_hoodie_commit_time` string,
  `_hoodie_commit_seqno` string,
  `_hoodie_record_key` string,
  `_hoodie_partition_path` string,
  `_hoodie_file_name` string,
  `emp_id` int,
  `emp_name` string,
  `emp_short` int,
  `ts` bigint,
  `emp_long` bigint,
  `emp_float` float,
  `emp_date` bigint,
  `emp_timestamp` bigint)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
  'com.uber.hoodie.hadoop.HoodieInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  'hdfs://nn10.htrunk.com/apps/hive/warehouse/emp_cow'

Reply via email to