AshinGau opened a new pull request, #22187:
URL: https://github.com/apache/doris/pull/22187
## Proposed changes
Fix errors when analyzing hudi tables or reading hudi partition columns:
The `required_fields` is an empty string when only reading partition
columns, so errors are thrown when parsing column types.
```
W0724 16:44:38.431789 296139 jni-util.cpp:239]
java.util.NoSuchElementException: key not found:
at scala.collection.MapLike.default(MapLike.scala:236)
at scala.collection.MapLike.default$(MapLike.scala:235)
at scala.collection.AbstractMap.default(Map.scala:65)
at scala.collection.MapLike.apply(MapLike.scala:144)
at scala.collection.MapLike.apply$(MapLike.scala:143)
at scala.collection.AbstractMap.apply(Map.scala:65)
at
org.apache.doris.hudi.HoodieSplit.$anonfun$requiredTypes$1(BaseSplitReader.scala:89)
at
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at
scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at
scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at
scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
at scala.collection.TraversableLike.map(TraversableLike.scala:286)
at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198)
at org.apache.doris.hudi.HoodieSplit.<init>(BaseSplitReader.scala:88)
at
org.apache.doris.hudi.HudiJniScanner.<init>(HudiJniScanner.java:66)
```
The `HudiJniScanner.<init>` is failed, and the `_jni_scanner_obj` is not
initialized, so we can't call methods in `_jni_scanner_obj`.
```
5# JNI_ArgumentPusherVaArg::JNI_ArgumentPusherVaArg(_jmethodID*,
__va_list_tag*) in
/mnt/datadisk0/gaoxin/app/jdk1.8.0_131/jre/lib/amd64/server/[libjvm.so](http://libjvm.so/)
6# jni_CallObjectMethodV in
/mnt/datadisk0/gaoxin/app/jdk1.8.0_131/jre/lib/amd64/server/[libjvm.so](http://libjvm.so/)
7# JNIEnv_::CallObjectMethod(_jobject*, _jmethodID*, ...) in
/mnt/datadisk0/gaoxin/doris/output/be/lib/doris_be
8# doris::vectorized::JniConnector::close() in
/mnt/datadisk0/gaoxin/doris/output/be/lib/doris_be
9# doris::vectorized::JniConnector::~JniConnector() in
/mnt/datadisk0/gaoxin/doris/output/be/lib/doris_be
10# doris::vectorized::HudiJniReader::~HudiJniReader() in
/mnt/datadisk0/gaoxin/doris/output/be/lib/doris_be
11# doris::vectorized::VFileScanner::~VFileScanner() in
/mnt/datadisk0/gaoxin/doris/output/be/lib/doris_be
12# doris::vectorized::VFileScanner::~VFileScanner() in
/mnt/datadisk0/gaoxin/doris/output/be/lib/doris_be
13#
doris::vectorized::ScannerContext::_close_and_clear_scanners(doris::vectorized::VScanNode*,
doris::RuntimeState*) in /mnt/datadisk0/gaoxin/doris/output/be/lib/doris_be
```
## How to fix
1. If only read the partition columns, the `JniConnector` will produce empty
required fields, so `HudiJniScanner` should read the "_hoodie_record_key" field
at least to know how many rows in current hoodie split. Even if the
`JniConnector` doesn't read this field, the call of `releaseTable` in
`JniConnector` will reclaim the resource.
2. To prevent BE failure and exit, `JniConnector` can only call release
method after `HudiJniScanner` is initialized. It should be noted that
`VectorTable` is created lazily in `JniScanner`, so we don't need to reclaim
the resource when `HudiJniScanner` is failed to initialize.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]