[GitHub] [incubator-doris] morningman opened a new issue #6318: [Enhance] Cancel the load job ASAP when encounter unqualified data

GitBox Sat, 24 Jul 2021 09:17:33 -0700


morningman opened a new issue #6318:
URL: https://github.com/apache/incubator-doris/issues/6318



   **Is your feature request related to a problem? Please describe.**
   
   If the `max_filter_ratio` of a load job is set to 0, in principle, when we 
encounter 1 line of unqualified data, we can terminate the entire load job. But 
in the current implementation, the load job will fail only after reading all 
the files. This will cause data quality problems to be discovered after all 
data processing is completed when loading some large quantities of data, which 
will affect the progress of troubleshooting.
   
   So I decided to optimize this part of the logic to cancel the wrong load job 
as soon as possible.
   
   Secondly, by default, the load job will only output the first 100 rows of 
unqualified data to help users troubleshoot data quality problems. However, the 
Tuple::to_string() method is called for each row of unqualified data, and this 
method will take up a lot of CPU resources.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [incubator-doris] morningman opened a new issue #6318: [Enhance] Cancel the load job ASAP when encounter unqualified data

Reply via email to