Re: [PR] First commit on supporting parquet [incubator-xtable]

via GitHub Mon, 24 Feb 2025 11:57:24 -0800


unical1988 commented on PR #650:
URL: https://github.com/apache/incubator-xtable/pull/650#issuecomment-2679491739


   > Thanks for working on the PR @unical1988, added comments.
   > 
   > There seems to be some confusion about extracting partition values, let me 
know what you think of this.
   > 
   > ```
   > basePath/ 
   >                 p1/.. (Can be recursive partitions for parquet files)
   >                 p2/ ..
   >                 p3/.. 
   >                 .hoodie/  (Hudi Metadata)
   >                 metadata/ (Iceberg metadata) 
   >                 _delta_log/ (Delta metadata) 
   > ```
   > 
   > To extract the partition fields (emphasis on fields here not the actual 
values) we can it in two ways:
   > 
   > 1. Assume table is not partitioned, this would just sync the parquet files 
in the target formats using the physical paths you have extracted in one of the 
classes. When you read those tables, partition pruning won't work.
   > 2. Ask user input (from YAML configuration) for the partition fields from 
the parquet file schema. Many of these analytical datasets  are partitioned by 
date either through an actual date column in the parquet file or a timestamp 
field through which the date is actually extracted.
   
   We would want to read the configuration (or the partition fields) into a 
Java object (if I am not wrong). p1/ then could be date - year - month -day and 
p2/could be location and p3/ could be ID, so given these fields we could 
extract the partitionValues located at the related subdirectories for a 
specific parquet file, is that correct?if yes, how could the Java object be 
defined?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] First commit on supporting parquet [incubator-xtable]

Reply via email to