[ 
https://issues.apache.org/jira/browse/IMPALA-10186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

guojingfeng updated IMPALA-10186:
---------------------------------
    Description: 
Current parquet writer write -1 of PageLocation.offset and 
PageLocation.first_row_index when meet a empty page. 

 hdfs-parquet-file-writer.cc  Line: 808 ~ 819
{code:java}
  // Write data pages
  for (const DataPage& page : pages_) {
    if (page.header.data_page_header.num_values == 0) {
      // Skip empty pages
      location.offset = -1;
      location.compressed_page_size = 0;
      location.first_row_index = -1;
      AddLocationToOffsetIndex(location);
      continue;
    }
{code}
But -1 values may cause   ComputeCandidatePages function run into unexpected 
status.
{code:java}
bool ComputeCandidatePages(
    const vector<parquet::PageLocation>& page_locations,
    const vector<RowRange>& candidate_ranges,
    const int64_t num_rows, vector<int>* candidate_pages) {
  if (!ValidatePageLocations(page_locations, num_rows)) return false
{code}
and then cause  IMPALA-9952

 

  was:
Current parquet writer write -1 of PageLocation.offset and 
PageLocation.first_row_index when meet a empty page. 

 hdfs-parquet-file-writer.cc  Line: 808 ~ 819
{code:java}
  // Write data pages
  for (const DataPage& page : pages_) {
    parquet::PageLocation location;    if 
(page.header.data_page_header.num_values == 0) {
      // Skip empty pages
      location.offset = -1;
      location.compressed_page_size = 0;
      location.first_row_index = -1;
      AddLocationToOffsetIndex(location);
      continue;
    }
{code}
But -1 values may cause   ComputeCandidatePages function run into unexpected 
status.
{code:java}
bool ComputeCandidatePages(
    const vector<parquet::PageLocation>& page_locations,
    const vector<RowRange>& candidate_ranges,
    const int64_t num_rows, vector<int>* candidate_pages) {
  if (!ValidatePageLocations(page_locations, num_rows)) return false
{code}
and then cause  IMPALA-9952

 


> Write invalid parquet PageLocations which table sort by some columns
> --------------------------------------------------------------------
>
>                 Key: IMPALA-10186
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10186
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: guojingfeng
>            Priority: Major
>
> Current parquet writer write -1 of PageLocation.offset and 
> PageLocation.first_row_index when meet a empty page. 
>  hdfs-parquet-file-writer.cc  Line: 808 ~ 819
> {code:java}
>   // Write data pages
>   for (const DataPage& page : pages_) {
>     if (page.header.data_page_header.num_values == 0) {
>       // Skip empty pages
>       location.offset = -1;
>       location.compressed_page_size = 0;
>       location.first_row_index = -1;
>       AddLocationToOffsetIndex(location);
>       continue;
>     }
> {code}
> But -1 values may cause   ComputeCandidatePages function run into unexpected 
> status.
> {code:java}
> bool ComputeCandidatePages(
>     const vector<parquet::PageLocation>& page_locations,
>     const vector<RowRange>& candidate_ranges,
>     const int64_t num_rows, vector<int>* candidate_pages) {
>   if (!ValidatePageLocations(page_locations, num_rows)) return false
> {code}
> and then cause  IMPALA-9952
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to