Re: Hudi Query Latest Records

Balaji Varadarajan Fri, 09 Oct 2020 01:41:25 -0700

 The table description looks ok. Are you seeing an exception or incorrect data. 
This might require some debugging. Please open a support github ticket and we 
will look at it . Please provide same query output in hive and spark along with 
file listings of your dataset and .hoodie folder.
Thanks,Balaji.V
    On Friday, October 9, 2020, 01:25:58 AM PDT, Ranganath Tirumala 
<[email protected]> wrote:  
 
 Hi Balaji,


Here is the desc formatted

col_name    data_type    comment    
# col_name                data_type              comment                
    NULL    NULL    
_hoodie_commit_time    string        
_hoodie_commit_seqno    string        
_hoodie_record_key    string        
_hoodie_partition_path    string        
_hoodie_file_name    string        
ee_id    bigint        
er_id    bigint        
evnt_src    string        
evnt_typ    string        
evnt_confidence    string        
evnt_yr    string        
evnt_src_id    string        
evnt_amt    string        
evnt_prtn    string        
evnt_sys_dt    string        
evnt_bus_dt    string        
evnt_strt_dt    string        
evnt_end_dt    string        
evnt_id    string        
    NULL    NULL    
# Detailed Table Information    NULL    NULL    
Database:              default              NULL    
OwnerType:              USER                    NULL    
Owner:                  user999                  NULL    
CreateTime:            Wed Oct 07 22:17:42 AEDT 2020    NULL    
LastAccessTime:        UNKNOWN                NULL    
Retention:              0                      NULL    
Location:              hdfs://path-to-external-table    NULL    
Table Type:            EXTERNAL_TABLE          NULL    
Table Parameters:    NULL    NULL    
    EXTERNAL                TRUE                    
    last_commit_time_sync    20201009072526          
    numFiles                2619                    
    totalSize              51903292933            
    transient_lastDdlTime    1602069462              
    NULL    NULL    
# Storage Information    NULL    NULL    
SerDe Library:
    org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe    NULL
InputFormat:            org.apache.hudi.hadoop.HoodieParquetInputFormat    NULL 
   
OutputFormat:
    org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat    NULL
Compressed:            No                      NULL    
Num Buckets:            -1                      NULL    
Bucket Columns:        []                      NULL    
Sort Columns:          []                      NULL    
Storage Desc Params:    NULL    NULL    
    serialization.format    1


On Fri, 9 Oct 2020 at 19:07, Balaji Varadarajan <[email protected]>
wrote:

>  Can you paste the detailed hive table description. (desc formatted .....)
> Balaji.V
>    On Friday, October 9, 2020, 12:37:19 AM PDT, Ranganath Tirumala <
> [email protected]> wrote:
>
>  Hi Balaji,
>
> I cannot get this to work on hive / hue.
> It works as expected using spark shell.
>
> Any idea how I can get this to work in hive / hue?
>
> Regards,
>
> Ranganath
>
> On Thu, 1 Oct 2020 at 09:45, Balaji Varadarajan <[email protected]
> >
> wrote:
>
> >  Assuming commit1 happened before commit2, this is what you should expect
> > when running a standard query through query engines.
> > Balaji.V
> >
> >    On Tuesday, September 29, 2020, 03:04:17 PM PDT, Ranganath Tirumala <
> > [email protected]> wrote:
> >
> >  Hi,
> >
> > Is there a way we can query to get the latest record across commits?
> >
> > e.g.
> > commit-1
> > Record-1, Value A
> > Record-2, Value A
> >
> > commit-2
> > Record-1, Value B
> > Record-3, Value B
> >
> > desired output
> > Record-1, Value B
> > Record-2, Value A
> > Record-3, Value B
> >
> > --
> > Regards,
> >
> > Ranganath Tirumala
> >
>
>
>
> --
> Regards,
>
> Ranganath Tirumala
>



-- 
Regards,

Ranganath Tirumala

Re: Hudi Query Latest Records

Reply via email to