Re: Read RO table in Spark as hive table | No records returned

Kabeer Ahmed Sun, 19 May 2019 11:49:46 -0700

Hi Vinod,

On the backdrop of your email, are there any examples where HiveSyncTool has 
been used programmatically to drive creation and management of the Hive Table?
I have a need to create HiveTable and manage it programmatically. Any example 
would help.
Or others who can chip in here and say that they have used APIs to drive this 
would help me to definitively spend time on this.
Thanks
Kabeer.


On May 17 2019, at 4:03 pm, Vinoth Chandar <vin...@apache.org> wrote:
> Glad you got it working.. Any reason why you are not using the Hive sync
> tool to manage the table creation/registration to Hive?
>
> On Fri, May 17, 2019 at 7:04 AM satish.sidnakoppa...@gmail.com <
> satish.sidnakoppa...@gmail.com> wrote:
>
> >
> >
> > On 2019/05/17 12:45:26, satish.sidnakoppa...@gmail.com <
> > satish.sidnakoppa...@gmail.com> wrote:
> > >
> > >
> > > On 2019/05/17 12:37:10, satish.sidnakoppa...@gmail.com <
> > satish.sidnakoppa...@gmail.com> wrote:
> > > > Hi Team,
> > > >
> > > > Data is returned when queried from hive.
> > > > But not in spark ,Could you assist in finding the gap.
> > > >
> > > > Details below
> > > > ******************************Approach 1 ---
> > successful****************************
> > > >
> > > > select * from emp_cow limit 2;
> > > > 20190503171506 20190503171506_0_424 4 default
> > >
> >
> > 71ff4cc6-bd8e-4c48-a075-98f32efc14b2_0_20190503171506.parquet 413Vivian
> > Walter -1641 1556883906604 608806001 511.63 1461868200000
> > 401217383000
> > > > 20190503171506 20190503171506_0_425 8 default
> > >
> >
> > 71ff4cc6-bd8e-4c48-a075-98f32efc14b2_0_20190503171506.parquet 813Oprah
> > Gross -32255 1556883906604 761166471 536.4 1516473000000
> > 816189568000
> > > >
> > > > ******************************Approach 2 ---
> > successful****************************
> > > >
> > > >
> > spark.read.format("com.uber.hoodie").load("/apps/hive/warehouse/emp_cow_03/default/*").show
> > > >
> > +-------------------+--------------------+------------------+----------------------+--------------------+------+------------------+---------+-------------+---------+---------+-------------+-------------+
> > > >
> > |_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|
> > _hoodie_file_name|emp_id| emp_name|emp_short| ts|
> > emp_long|emp_float| emp_date|emp_timestamp|
> > > >
> > +-------------------+--------------------+------------------+----------------------+--------------------+------+------------------+---------+-------------+---------+---------+-------------+-------------+
> > > > | 20190503171506|20190503171506_0_424| 4|
> > >
> >
> > default|71ff4cc6-bd8e-4c4...| 4| 13Vivian Walter|
> > -1641|1556883906604|608806001| 511.63|1461868200000| 401217383000|
> > > > +----
> > > >
> > > > ******************************Approach 3 --- No
> > records****************************
> > > >
> > > >
> > > > ***To read RO table as a Hive table using Spark****
> > > > But when I read from spark as hive table - no records returned.
> > > >
> > > >
> > > > sqlContext.sql("select * from hudi.emp_cow").show; ---- in scala
> > console
> > > > select * from hudi.emp_cow ----
> > >
> >
> > in spark console
> > > >
> > > > NO result.
> > > > Only headers/column names are printed.
> > > >
> > > > FYI Table DDL
> > > >
> > > > CREATE EXTERNAL TABLE `emp_cow`(
> > > > `_hoodie_commit_time` string,
> > > > `_hoodie_commit_seqno` string,
> > > > `_hoodie_record_key` string,
> > > > `_hoodie_partition_path` string,
> > > > `_hoodie_file_name` string,
> > > > `emp_id` int,
> > > > `emp_name` string,
> > > > `emp_short` int,
> > > > `ts` bigint,
> > > > `emp_long` bigint,
> > > > `emp_float` float,
> > > > `emp_date` bigint,
> > > > `emp_timestamp` bigint)
> > > > ROW FORMAT SERDE
> > > > 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
> > > > STORED AS INPUTFORMAT
> > > > 'com.uber.hoodie.hadoop.HoodieInputFormat'
> > > > OUTPUTFORMAT
> > > > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
> > > > LOCATION
> > > > '/apps/hive/warehouse/emp_cow'
> > > >
> > >
> > >
> > >
> > > Fixed the typo mistake
> > > path is /apps/hive/warehouse/emp_cow
> > > table name is emp_cow
> > >
> >
> >
> > Issue fixed.
> > Path in table creation was incorrect.
> > LOCATION '/apps/hive/warehouse/emp_cow'
> > should
> > LOCATION '/apps/hive/warehouse/emp_cow/default'
>
>

Re: Read RO table in Spark as hive table | No records returned

Reply via email to