RE: Questime abou the Payload in Hudi
Hi, I am very interested in fix this behavior. Actually, I have implemented a new Payload for our use case which can upload both the delta and the parquest record. However, there still contain some problem in that implementation. Go back to the question 3, in the Payload, there are three methods, the question is on 'combineAndGetUpdateValue' method. In the preCombine method, this function will pick the payload with greatest ordering value. The comparison is between two different payload, and both payload contain the comparable variable 'orderingVal' to indicate the order. However, in the function 'combineAndGetUpdateValue' the comparision will between the current Payload and current IndexedRecord in parquest. The question is how can I get the 'orderingVal' in this IndexedRecord? After searching nearly all code related to the Payload, I found that Hudi first get the 'PRECOMBINE_FIELD_OPT_KEY' from Hudi config, and then extract this fieldname as the orderingVal and combine both 'orderingVal' and 'record' as an Payload object. So in the Hudi Payload object I cannot get the value of 'PRECOMBINE_FIELD_OPT_KEY', however, this value is necessary for extracting orderingVal in 'IndexedRecord'. In our case, I can hard-code the fieldname in the 'combineAndGetUpdateValue', below I contain the example of the method I used in our case. But I don't like this method and obviously this is not a good way to do this. @Override public Optional combineAndGetUpdateValue(IndexedRecord currentValue, Schema schema) throws IOException { GenericRecord record = HoodieAvroUtils.bytesToAvro(this.recordBytes, schema); Long thisDATE = (Long) record.get("RESULT_DATE_UTS"); Long currDATE = (Long) ((GenericRecord) currentValue).get("RESULT_DATE_UTS"); if (currDATE.compareTo(thisDATE) < 0) { return Optional.of(record); } else { return Optional.of(currentValue); } } I have the idea that I can contain the 'PRECOMBINE_FIELD_OPT_KEY' in the Payload, but which means that I will change a lot of file in Hudi to help me make this change. Would you mind me do this? Otherwise, if you have any other good idea or suggestion about how to fix this behavior, we can have some discussion about it. I am glad to make a patch about it. Thanks so much for the reply and help. Mit freundlichen Grüßen / Best regards Yuanbin Cheng CR/PJ-AI-S1 -Original Message- From: Vinoth Chandar Sent: Friday, May 17, 2019 8:02 AM To: dev@hudi.apache.org Subject: Re: Questime abou the Payload in Hudi Hi, What you mentioned is correct. @Override public Optional combineAndGetUpdateValue(IndexedRecord currentValue, Schema schema) throws IOException { // combining strategy here trivially ignores currentValue on disk and writes this record return getInsertValue(schema); } I think we could change this behavior to match pre-combining. Are you interested in sending a patch? Thanks Vinoth On Fri, May 17, 2019 at 7:18 AM Vinoth Chandar wrote: > Thanks for the clear example. Let me check this out and get back shortly. > > On Thu, May 16, 2019 at 5:29 PM Yanjia Li > wrote: > >> Hello Vinoth, >> >> I could add an example here to clarify this question. >> >> We have DF1{id:1, ts: 9} and DF2{id:1, ts:1; id:1, ts:2}. We save DF1 >> first, then upsert DF2 to DF1. With the default payload, we will have >> the final result DF{id:1, ts:2}. But we are looking for DF{id:1, >> ts:9}. If I didn’t understand wrong, the precombine only combine the >> data in the delta dataframe, which is DF2 in the example. And the >> default payload only guarantees that we keep the latest time stamp in >> the current batch. In this example, the newer data arrived before the >> older data. We would like to confirm that whether we will need to >> write our own payload to handle this case. It will also be helpful to >> know if anyone else had similar issue before. >> >> Thanks so much! >> Gary >> >> On Thu, May 16, 2019 at 2:49 PM Vinoth Chandar wrote: >> >> > Hi, >> > >> > (Please subscribe to the mailing list, so the message actually >> > comes >> over >> > directly to the list.) >> > >> > On 1, the default payload overwrites the record on storage with new >> coming >> > record, if the precombine field has a higher value. for e.g, if you >> > use timestamp field, then it will overwrite with latest record >> > while it will not overwrite if you accidentally write a much older record. >> > >> > On 2, I think you can achieve this by setting the precombine key >> properly.. >> > IIUC, you don't want the older record to overwrite the newer record? >> > >> > On 3, you can configure the PRECOMBINE key as documented here >> > http://hudi.apache.org/configurations.html#PRECOMBINE_FIELD_OPT_KEY ? >> > >> > Hope that helps. Please let me know if I missed something. >> > >> > >> > Thanks >> > Vinoth >> > >> > On Thu, May 16, 2019 at 7:07 AM FIXED-TERM Cheng Yuanbin >> > (CR/PJ-AI-S1) <
Re: Upgrade HUDI to Hive 2.x
I am in favor of deprecating Hive 1.x unless someone has a strong objection. Most cloud offerings like EMR/Data Proc all support Hive 2.x and Hive 3.x is going to grow. This seems like a move in the right direction /thanks/vinoth On Fri, May 17, 2019 at 11:55 AM nishith agarwal wrote: > All, > > Is anyone using Hudi with Hive 1.x ? Currently, Hudi has a dependency on > Hive 1.x and works against Hive 2.x by using specific profiles. > There are non-backwards compatible changes in the HiveRecordReader for Hive > 1.x vs Hive 2.x. I'm planning to upgrade to Hive 2.x which would > essentially mean Hudi's realtime view (HudiRealtimeInputFormat) will NOT > work with Hive 1.x anymore (mostly if the schema has nested columns). Also, > I'm un-sure if Hive 2.x protocol is backward compatible with Hive 1.x (we > depend on forwards compatibility right now for Hudi to work with 2.x and > beyond). > Let me know what you guys think. > > Thanks, > Nishith >
Upgrade HUDI to Hive 2.x
All, Is anyone using Hudi with Hive 1.x ? Currently, Hudi has a dependency on Hive 1.x and works against Hive 2.x by using specific profiles. There are non-backwards compatible changes in the HiveRecordReader for Hive 1.x vs Hive 2.x. I'm planning to upgrade to Hive 2.x which would essentially mean Hudi's realtime view (HudiRealtimeInputFormat) will NOT work with Hive 1.x anymore (mostly if the schema has nested columns). Also, I'm un-sure if Hive 2.x protocol is backward compatible with Hive 1.x (we depend on forwards compatibility right now for Hudi to work with 2.x and beyond). Let me know what you guys think. Thanks, Nishith
Re: Read RO table in Spark as hive table | No records returned
Glad you got it working.. Any reason why you are not using the Hive sync tool to manage the table creation/registration to Hive? On Fri, May 17, 2019 at 7:04 AM satish.sidnakoppa...@gmail.com < satish.sidnakoppa...@gmail.com> wrote: > > > On 2019/05/17 12:45:26, satish.sidnakoppa...@gmail.com < > satish.sidnakoppa...@gmail.com> wrote: > > > > > > On 2019/05/17 12:37:10, satish.sidnakoppa...@gmail.com < > satish.sidnakoppa...@gmail.com> wrote: > > > Hi Team, > > > > > > Data is returned when queried from hive. > > > But not in spark ,Could you assist in finding the gap. > > > > > > Details below > > > > > > **Approach 1 --- > successful > > > > > > select * from emp_cow limit 2; > > > 20190503171506 20190503171506_0_4244 default > 71ff4cc6-bd8e-4c48-a075-98f32efc14b2_0_20190503171506.parquet 413Vivian > Walter -1641 1556883906604 608806001 511.63 146186820 > 401217383000 > > > 20190503171506 20190503171506_0_4258 default > 71ff4cc6-bd8e-4c48-a075-98f32efc14b2_0_20190503171506.parquet 813Oprah > Gross -32255 1556883906604 761166471 536.4 151647300 > 816189568000 > > > > > > **Approach 2 --- > successful > > > > > > > spark.read.format("com.uber.hoodie").load("/apps/hive/warehouse/emp_cow_03/default/*").show > > > > +---++--+--++--+--+-+-+-+-+-+-+ > > > > |_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path| > _hoodie_file_name|emp_id| emp_name|emp_short| ts| > emp_long|emp_float| emp_date|emp_timestamp| > > > > +---++--+--++--+--+-+-+-+-+-+-+ > > > | 20190503171506|20190503171506_0_424| 4| > default|71ff4cc6-bd8e-4c4...| 4| 13Vivian Walter| > -1641|1556883906604|608806001| 511.63|146186820| 401217383000| > > > + > > > > > > **Approach 3 --- No > records > > > > > > > > > ***To read RO table as a Hive table using Spark > > > But when I read from spark as hive table - no records returned. > > > > > > > > > sqlContext.sql("select * from hudi.emp_cow").show; in scala > console > > > select * from hudi.emp_cow > in spark console > > > > > > NO result. > > > > > > Only headers/column names are printed. > > > > > > > > > FYI Table DDL > > > > > > > > > CREATE EXTERNAL TABLE `emp_cow`( > > > `_hoodie_commit_time` string, > > > `_hoodie_commit_seqno` string, > > > `_hoodie_record_key` string, > > > `_hoodie_partition_path` string, > > > `_hoodie_file_name` string, > > > `emp_id` int, > > > `emp_name` string, > > > `emp_short` int, > > > `ts` bigint, > > > `emp_long` bigint, > > > `emp_float` float, > > > `emp_date` bigint, > > > `emp_timestamp` bigint) > > > ROW FORMAT SERDE > > > 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' > > > STORED AS INPUTFORMAT > > > 'com.uber.hoodie.hadoop.HoodieInputFormat' > > > OUTPUTFORMAT > > > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' > > > LOCATION > > > '/apps/hive/warehouse/emp_cow' > > > > > > > > > > > Fixed the typo mistake > > > > path is /apps/hive/warehouse/emp_cow > > table name is emp_cow > > > > > Issue fixed. > > Path in table creation was incorrect. > > LOCATION '/apps/hive/warehouse/emp_cow' > should > LOCATION '/apps/hive/warehouse/emp_cow/default' >
Re: Questime abou the Payload in Hudi
Hi, What you mentioned is correct. @Override public Optional combineAndGetUpdateValue(IndexedRecord currentValue, Schema schema) throws IOException { // combining strategy here trivially ignores currentValue on disk and writes this record return getInsertValue(schema); } I think we could change this behavior to match pre-combining. Are you interested in sending a patch? Thanks Vinoth On Fri, May 17, 2019 at 7:18 AM Vinoth Chandar wrote: > Thanks for the clear example. Let me check this out and get back shortly. > > On Thu, May 16, 2019 at 5:29 PM Yanjia Li > wrote: > >> Hello Vinoth, >> >> I could add an example here to clarify this question. >> >> We have DF1{id:1, ts: 9} and DF2{id:1, ts:1; id:1, ts:2}. We save DF1 >> first, then upsert DF2 to DF1. With the default payload, we will have the >> final result DF{id:1, ts:2}. But we are looking for DF{id:1, ts:9}. If I >> didn’t understand wrong, the precombine only combine the data in the delta >> dataframe, which is DF2 in the example. And the default payload only >> guarantees that we keep the latest time stamp in the current batch. In >> this >> example, the newer data arrived before the older data. We would like to >> confirm that whether we will need to write our own payload to handle this >> case. It will also be helpful to know if anyone else had similar issue >> before. >> >> Thanks so much! >> Gary >> >> On Thu, May 16, 2019 at 2:49 PM Vinoth Chandar wrote: >> >> > Hi, >> > >> > (Please subscribe to the mailing list, so the message actually comes >> over >> > directly to the list.) >> > >> > On 1, the default payload overwrites the record on storage with new >> coming >> > record, if the precombine field has a higher value. for e.g, if you use >> > timestamp field, then it will overwrite with latest record while it will >> > not overwrite if you accidentally write a much older record. >> > >> > On 2, I think you can achieve this by setting the precombine key >> properly.. >> > IIUC, you don't want the older record to overwrite the newer record? >> > >> > On 3, you can configure the PRECOMBINE key as documented here >> > http://hudi.apache.org/configurations.html#PRECOMBINE_FIELD_OPT_KEY ? >> > >> > Hope that helps. Please let me know if I missed something. >> > >> > >> > Thanks >> > Vinoth >> > >> > On Thu, May 16, 2019 at 7:07 AM FIXED-TERM Cheng Yuanbin (CR/PJ-AI-S1) < >> > fixed-term.yuanbin.ch...@us.bosch.com> wrote: >> > >> > > Hi Hudi, >> > > >> > > We want to use Apache Hudi for immigrating our data pipeline from >> > batching >> > > to incremental. >> > > We face several questions about the Hudi. We so appreciate you can >> help >> > us >> > > figure them out. >> > > >> > > >> > > 1. In the default Payload (OverwriteWithLatestAvroPayload), this >> > > payload only concern and merge the records with the same key value in >> the >> > > Delta dataframe (new coming records), right? >> > > >> > > 2. In our usage case, we want to keep the latest record in our >> > > system. However. In the default Payload, if the Delta dataframe >> contains >> > > the older record than the record in the record already written in >> Hudi, >> > it >> > > will simply overwrite them, which is not what we want. Do you have >> some >> > > suggestions about how to get the global latest record in Hudi? >> > > >> > > 3. We have implemented a custom Payload class in order to get the >> > > global latest record. However, we found that in the Payload class, we >> > have >> > > to hard-code the PRECOMBINE_FIELD_OPT_KEY value in Payload to get the >> > value >> > > in currentValue in order to compare them. Can I ask is any method I >> can >> > get >> > > PRECOMBINE_FIELD_OPT_KEY in Payload, or is there any suggested method >> for >> > > dealing with this issue. >> > > >> > > Thanks so much! >> > > >> > > Mit freundlichen Grüßen / Best regards >> > > >> > > Yuanbin Cheng >> > > >> > > >> > > >> > >> >
Re: Questime abou the Payload in Hudi
Thanks for the clear example. Let me check this out and get back shortly. On Thu, May 16, 2019 at 5:29 PM Yanjia Li wrote: > Hello Vinoth, > > I could add an example here to clarify this question. > > We have DF1{id:1, ts: 9} and DF2{id:1, ts:1; id:1, ts:2}. We save DF1 > first, then upsert DF2 to DF1. With the default payload, we will have the > final result DF{id:1, ts:2}. But we are looking for DF{id:1, ts:9}. If I > didn’t understand wrong, the precombine only combine the data in the delta > dataframe, which is DF2 in the example. And the default payload only > guarantees that we keep the latest time stamp in the current batch. In this > example, the newer data arrived before the older data. We would like to > confirm that whether we will need to write our own payload to handle this > case. It will also be helpful to know if anyone else had similar issue > before. > > Thanks so much! > Gary > > On Thu, May 16, 2019 at 2:49 PM Vinoth Chandar wrote: > > > Hi, > > > > (Please subscribe to the mailing list, so the message actually comes over > > directly to the list.) > > > > On 1, the default payload overwrites the record on storage with new > coming > > record, if the precombine field has a higher value. for e.g, if you use > > timestamp field, then it will overwrite with latest record while it will > > not overwrite if you accidentally write a much older record. > > > > On 2, I think you can achieve this by setting the precombine key > properly.. > > IIUC, you don't want the older record to overwrite the newer record? > > > > On 3, you can configure the PRECOMBINE key as documented here > > http://hudi.apache.org/configurations.html#PRECOMBINE_FIELD_OPT_KEY ? > > > > Hope that helps. Please let me know if I missed something. > > > > > > Thanks > > Vinoth > > > > On Thu, May 16, 2019 at 7:07 AM FIXED-TERM Cheng Yuanbin (CR/PJ-AI-S1) < > > fixed-term.yuanbin.ch...@us.bosch.com> wrote: > > > > > Hi Hudi, > > > > > > We want to use Apache Hudi for immigrating our data pipeline from > > batching > > > to incremental. > > > We face several questions about the Hudi. We so appreciate you can help > > us > > > figure them out. > > > > > > > > > 1. In the default Payload (OverwriteWithLatestAvroPayload), this > > > payload only concern and merge the records with the same key value in > the > > > Delta dataframe (new coming records), right? > > > > > > 2. In our usage case, we want to keep the latest record in our > > > system. However. In the default Payload, if the Delta dataframe > contains > > > the older record than the record in the record already written in Hudi, > > it > > > will simply overwrite them, which is not what we want. Do you have some > > > suggestions about how to get the global latest record in Hudi? > > > > > > 3. We have implemented a custom Payload class in order to get the > > > global latest record. However, we found that in the Payload class, we > > have > > > to hard-code the PRECOMBINE_FIELD_OPT_KEY value in Payload to get the > > value > > > in currentValue in order to compare them. Can I ask is any method I can > > get > > > PRECOMBINE_FIELD_OPT_KEY in Payload, or is there any suggested method > for > > > dealing with this issue. > > > > > > Thanks so much! > > > > > > Mit freundlichen Grüßen / Best regards > > > > > > Yuanbin Cheng > > > > > > > > > > > >
Re: Read RO table in Spark as hive table | No records returned
On 2019/05/17 12:45:26, satish.sidnakoppa...@gmail.com wrote: > > > On 2019/05/17 12:37:10, satish.sidnakoppa...@gmail.com > wrote: > > Hi Team, > > > > Data is returned when queried from hive. > > But not in spark ,Could you assist in finding the gap. > > > > Details below > > > > **Approach 1 --- > > successful > > > > select * from emp_cow limit 2; > > 20190503171506 20190503171506_0_4244 default > > 71ff4cc6-bd8e-4c48-a075-98f32efc14b2_0_20190503171506.parquet 413Vivian > > Walter -1641 1556883906604 608806001 511.63 146186820 > > 401217383000 > > 20190503171506 20190503171506_0_4258 default > > 71ff4cc6-bd8e-4c48-a075-98f32efc14b2_0_20190503171506.parquet 813Oprah > > Gross -32255 1556883906604 761166471 536.4 151647300 > > 816189568000 > > > > **Approach 2 --- > > successful > > > > spark.read.format("com.uber.hoodie").load("/apps/hive/warehouse/emp_cow_03/default/*").show > > +---++--+--++--+--+-+-+-+-+-+-+ > > |_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path| > >_hoodie_file_name|emp_id| emp_name|emp_short| ts| > > emp_long|emp_float| emp_date|emp_timestamp| > > +---++--+--++--+--+-+-+-+-+-+-+ > > | 20190503171506|20190503171506_0_424| 4| > > default|71ff4cc6-bd8e-4c4...| 4| 13Vivian Walter| > > -1641|1556883906604|608806001| 511.63|146186820| 401217383000| > > + > > > > **Approach 3 --- No > > records > > > > > > ***To read RO table as a Hive table using Spark > > But when I read from spark as hive table - no records returned. > > > > > > sqlContext.sql("select * from hudi.emp_cow").show; in scala console > > select * from hudi.emp_cow in > > spark console > > > > NO result. > > > > Only headers/column names are printed. > > > > > > FYI Table DDL > > > > > > CREATE EXTERNAL TABLE `emp_cow`( > > `_hoodie_commit_time` string, > > `_hoodie_commit_seqno` string, > > `_hoodie_record_key` string, > > `_hoodie_partition_path` string, > > `_hoodie_file_name` string, > > `emp_id` int, > > `emp_name` string, > > `emp_short` int, > > `ts` bigint, > > `emp_long` bigint, > > `emp_float` float, > > `emp_date` bigint, > > `emp_timestamp` bigint) > > ROW FORMAT SERDE > > 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' > > STORED AS INPUTFORMAT > > 'com.uber.hoodie.hadoop.HoodieInputFormat' > > OUTPUTFORMAT > > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' > > LOCATION > > '/apps/hive/warehouse/emp_cow' > > > > > > Fixed the typo mistake > > path is /apps/hive/warehouse/emp_cow > table name is emp_cow > Issue fixed. Path in table creation was incorrect. LOCATION '/apps/hive/warehouse/emp_cow' should LOCATION '/apps/hive/warehouse/emp_cow/default'
Re: Read RO table in Spark as hive table | No records returned
On 2019/05/17 12:37:10, satish.sidnakoppa...@gmail.com wrote: > Hi Team, > > Data is returned when queried from hive. > But not in spark ,Could you assist in finding the gap. > > Details below > > **Approach 1 --- > successful > > select * from emp_cow limit 2; > 20190503171506 20190503171506_0_4244 default > 71ff4cc6-bd8e-4c48-a075-98f32efc14b2_0_20190503171506.parquet 413Vivian > Walter -1641 1556883906604 608806001 511.63 146186820 > 401217383000 > 20190503171506 20190503171506_0_4258 default > 71ff4cc6-bd8e-4c48-a075-98f32efc14b2_0_20190503171506.parquet 813Oprah Gross > -32255 1556883906604 761166471 536.4 151647300 816189568000 > > **Approach 2 --- > successful > > spark.read.format("com.uber.hoodie").load("/apps/hive/warehouse/emp_cow_03/default/*").show > +---++--+--++--+--+-+-+-+-+-+-+ > |_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path| >_hoodie_file_name|emp_id| emp_name|emp_short| ts| > emp_long|emp_float| emp_date|emp_timestamp| > +---++--+--++--+--+-+-+-+-+-+-+ > | 20190503171506|20190503171506_0_424| 4| > default|71ff4cc6-bd8e-4c4...| 4| 13Vivian Walter| > -1641|1556883906604|608806001| 511.63|146186820| 401217383000| > + > > **Approach 3 --- No > records > > > ***To read RO table as a Hive table using Spark > But when I read from spark as hive table - no records returned. > > > sqlContext.sql("select * from hudi.emp_cow").show; in scala console > select * from hudi.emp_cow in > spark console > > NO result. > > Only headers/column names are printed. > > > FYI Table DDL > > > CREATE EXTERNAL TABLE `emp_cow`( > `_hoodie_commit_time` string, > `_hoodie_commit_seqno` string, > `_hoodie_record_key` string, > `_hoodie_partition_path` string, > `_hoodie_file_name` string, > `emp_id` int, > `emp_name` string, > `emp_short` int, > `ts` bigint, > `emp_long` bigint, > `emp_float` float, > `emp_date` bigint, > `emp_timestamp` bigint) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' > STORED AS INPUTFORMAT > 'com.uber.hoodie.hadoop.HoodieInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' > LOCATION > '/apps/hive/warehouse/emp_cow' > Fixed the typo mistake path is /apps/hive/warehouse/emp_cow table name is emp_cow
Read RO table in Spark as hive table | No records returned
Hi Team, Data is returned when queried from hive. But not in spark ,Could you assist in finding the gap. Details below **Approach 1 --- successful select * from emp_cow limit 2; 20190503171506 20190503171506_0_4244 default 71ff4cc6-bd8e-4c48-a075-98f32efc14b2_0_20190503171506.parquet 413Vivian Walter -1641 1556883906604 608806001 511.63 146186820 401217383000 20190503171506 20190503171506_0_4258 default 71ff4cc6-bd8e-4c48-a075-98f32efc14b2_0_20190503171506.parquet 813Oprah Gross -32255 1556883906604 761166471 536.4 151647300 816189568000 **Approach 2 --- successful spark.read.format("com.uber.hoodie").load("/apps/hive/warehouse/emp_cow_03/default/*").show +---++--+--++--+--+-+-+-+-+-+-+ |_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path| _hoodie_file_name|emp_id| emp_name|emp_short| ts| emp_long|emp_float| emp_date|emp_timestamp| +---++--+--++--+--+-+-+-+-+-+-+ | 20190503171506|20190503171506_0_424| 4| default|71ff4cc6-bd8e-4c4...| 4| 13Vivian Walter| -1641|1556883906604|608806001| 511.63|146186820| 401217383000| + **Approach 3 --- No records ***To read RO table as a Hive table using Spark But when I read from spark as hive table - no records returned. sqlContext.sql("select * from hudi.emp_cow_03").show; in scala console select * from hudi.emp_cow_03 in spark console NO result. Only headers/column names are printed. FYI Table DDL CREATE EXTERNAL TABLE `emp_cow`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `emp_id` int, `emp_name` string, `emp_short` int, `ts` bigint, `emp_long` bigint, `emp_float` float, `emp_date` bigint, `emp_timestamp` bigint) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'com.uber.hoodie.hadoop.HoodieInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 'hdfs://nn10.htrunk.com/apps/hive/warehouse/emp_cow'