This is an automated email from the ASF dual-hosted git repository.
morningman pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/doris.git
The following commit(s) were added to refs/heads/master by this push:
new 6e51632ca9 [docs](kerberos)add FAQ cases and enable krb5 debug (#22821)
6e51632ca9 is described below
commit 6e51632ca9c72a0d6d44ca71772532b11e0d741d
Author: slothever <[email protected]>
AuthorDate: Thu Aug 17 14:25:09 2023 +0800
[docs](kerberos)add FAQ cases and enable krb5 debug (#22821)
---
conf/be.conf | 4 +-
conf/fe.conf | 4 +-
docs/en/docs/lakehouse/faq.md | 166 +++++++++++++++++++------------------
docs/zh-CN/docs/lakehouse/faq.md | 172 ++++++++++++++++++++-------------------
4 files changed, 182 insertions(+), 164 deletions(-)
diff --git a/conf/be.conf b/conf/be.conf
index 6a7324bcb6..326bba2b9b 100644
--- a/conf/be.conf
+++ b/conf/be.conf
@@ -19,10 +19,10 @@ CUR_DATE=`date +%Y%m%d-%H%M%S`
PPROF_TMPDIR="$DORIS_HOME/log/"
-JAVA_OPTS="-Xmx1024m -DlogPath=$DORIS_HOME/log/jni.log
-Xloggc:$DORIS_HOME/log/be.gc.log.$CUR_DATE
-Djavax.security.auth.useSubjectCredsOnly=false -Dsun.java.command=DorisBE
-XX:-CriticalJNINatives -DJDBC_MIN_POOL=1 -DJDBC_MAX_POOL=100
-DJDBC_MAX_IDLE_TIME=300000 -DJDBC_MAX_WAIT_TIME=5000"
+JAVA_OPTS="-Xmx1024m -DlogPath=$DORIS_HOME/log/jni.log
-Xloggc:$DORIS_HOME/log/be.gc.log.$CUR_DATE
-Djavax.security.auth.useSubjectCredsOnly=false -Dsun.security.krb5.debug=true
-Dsun.java.command=DorisBE -XX:-CriticalJNINatives -DJDBC_MIN_POOL=1
-DJDBC_MAX_POOL=100 -DJDBC_MAX_IDLE_TIME=300000 -DJDBC_MAX_WAIT_TIME=5000"
# For jdk 9+, this JAVA_OPTS will be used as default JVM options
-JAVA_OPTS_FOR_JDK_9="-Xmx1024m -DlogPath=$DORIS_HOME/log/jni.log
-Xlog:gc:$DORIS_HOME/log/be.gc.log.$CUR_DATE
-Djavax.security.auth.useSubjectCredsOnly=false -Dsun.java.command=DorisBE
-XX:-CriticalJNINatives -DJDBC_MIN_POOL=1 -DJDBC_MAX_POOL=100
-DJDBC_MAX_IDLE_TIME=300000 -DJDBC_MAX_WAIT_TIME=5000"
+JAVA_OPTS_FOR_JDK_9="-Xmx1024m -DlogPath=$DORIS_HOME/log/jni.log
-Xlog:gc:$DORIS_HOME/log/be.gc.log.$CUR_DATE
-Djavax.security.auth.useSubjectCredsOnly=false -Dsun.security.krb5.debug=true
-Dsun.java.command=DorisBE -XX:-CriticalJNINatives -DJDBC_MIN_POOL=1
-DJDBC_MAX_POOL=100 -DJDBC_MAX_IDLE_TIME=300000 -DJDBC_MAX_WAIT_TIME=5000"
# since 1.2, the JAVA_HOME need to be set to run BE process.
# JAVA_HOME=/path/to/jdk/
diff --git a/conf/fe.conf b/conf/fe.conf
index c46887af0f..82701115b9 100644
--- a/conf/fe.conf
+++ b/conf/fe.conf
@@ -26,10 +26,10 @@ CUR_DATE=`date +%Y%m%d-%H%M%S`
# the output dir of stderr and stdout
LOG_DIR = ${DORIS_HOME}/log
-JAVA_OPTS="-Djavax.security.auth.useSubjectCredsOnly=false -Xss4m -Xmx8192m
-XX:+UseMembar -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=7
-XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+UseConcMarkSweepGC
-XX:+UseParNewGC -XX:+CMSClassUnloadingEnabled -XX:-CMSParallelRemarkEnabled
-XX:CMSInitiatingOccupancyFraction=80 -XX:SoftRefLRUPolicyMSPerMB=0
-Xloggc:$DORIS_HOME/log/fe.gc.log.$CUR_DATE"
+JAVA_OPTS="-Dsun.security.krb5.debug=true
-Djavax.security.auth.useSubjectCredsOnly=false -Xss4m -Xmx8192m -XX:+UseMembar
-XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=7 -XX:+PrintGCDateStamps
-XX:+PrintGCDetails -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
-XX:+CMSClassUnloadingEnabled -XX:-CMSParallelRemarkEnabled
-XX:CMSInitiatingOccupancyFraction=80 -XX:SoftRefLRUPolicyMSPerMB=0
-Xloggc:$DORIS_HOME/log/fe.gc.log.$CUR_DATE"
# For jdk 9+, this JAVA_OPTS will be used as default JVM options
-JAVA_OPTS_FOR_JDK_9="-Djavax.security.auth.useSubjectCredsOnly=false -Xss4m
-Xmx8192m -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=7
-XX:+CMSClassUnloadingEnabled -XX:-CMSParallelRemarkEnabled
-XX:CMSInitiatingOccupancyFraction=80 -XX:SoftRefLRUPolicyMSPerMB=0
-Xlog:gc*:$DORIS_HOME/log/fe.gc.log.$CUR_DATE:time"
+JAVA_OPTS_FOR_JDK_9="-Dsun.security.krb5.debug=true
-Djavax.security.auth.useSubjectCredsOnly=false -Xss4m -Xmx8192m
-XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=7 -XX:+CMSClassUnloadingEnabled
-XX:-CMSParallelRemarkEnabled -XX:CMSInitiatingOccupancyFraction=80
-XX:SoftRefLRUPolicyMSPerMB=0
-Xlog:gc*:$DORIS_HOME/log/fe.gc.log.$CUR_DATE:time"
##
## the lowercase properties are read by main program.
diff --git a/docs/en/docs/lakehouse/faq.md b/docs/en/docs/lakehouse/faq.md
index 9a756bf830..2ae9159638 100644
--- a/docs/en/docs/lakehouse/faq.md
+++ b/docs/en/docs/lakehouse/faq.md
@@ -27,19 +27,10 @@ under the License.
# FAQ
-1. What to do with errors such as `failed to get schema` and `Storage schema
reading not supported` when accessing Icerberg tables via Hive Metastore?
-
- To fix this, please place the Jar file package of `iceberg` runtime in the
`lib/` directory of Hive.
+## Kerberos
- And configure as follows in `hive-site.xml` :
- ```
-
metastore.storage.schema.reader.impl=org.apache.hadoop.hive.metastore.SerDeStorageSchemaReader
- ```
-
- After configuring, please restart Hive Metastore.
-
-2. What to do with the `GSS initiate failed` error when connecting to Hive
Metastore with Kerberos authentication?
+1. What to do with the `GSS initiate failed` error when connecting to Hive
Metastore with Kerberos authentication?
Usually it is caused by incorrect Kerberos authentication information, you
can troubleshoot by the following steps:
@@ -52,37 +43,80 @@ under the License.
3. Try to replace the IP in the principal with a domain name (do not use
the default `_HOST` placeholder)
4. Confirm that the `/etc/krb5.conf` file exists on all FE and BE nodes.
-3. What to do with the`java.lang.VerifyError: xxx` error when accessing HDFS
3.x?
+2. An error is reported when connecting to the Hive database through the Hive
Catalog: `RemoteException: SIMPLE authentication is not enabled. Available:
[TOKEN, KERBEROS]`
- Doris 1.2.1 and the older versions rely on Hadoop 2.8. Please update Hadoop
to 2.10.2 or update Doris to 1.2.2 or newer.
+ If both `show databases` and `show tables` are OK, and the above error
occurs when querying, we need to perform the following two operations:
+ - Core-site.xml and hdfs-site.xml need to be placed in the fe/conf and
be/conf directories
+ - The BE node executes the kinit of Kerberos, restarts the BE, and then
executes the query.
-4. An error is reported when using KMS to access HDFS:
`java.security.InvalidKeyException: Illegal key size`
+3. If an error is reported while querying the catalog with Kerberos:
`GSSException: No valid credentials provided (Mechanism level: Failed to find
any Kerberos Ticket)`.
+ - Restarting FE and BE can solve the problem in most cases.
+ - Before the restart all the nodes, can put
`-Djavax.security.auth.useSubjectCredsOnly=false` to the `JAVA_OPTS` in
`"${DORIS_HOME}/be/conf/be.conf"`, which can obtain credentials through the
underlying mechanism, rather than through the application.
+ - Get more solutions to common JAAS errors from the [JAAS
Troubleshooting](https://docs.oracle.com/javase/8/docs/technotes/guides/security/jgss/tutorials/Troubleshooting.html).
+4. The solutions when configuring Kerberos in the catalog and encounter an
error: `Unable to obtain password from user`.
+ - The principal used must exist in the klist, use `klist -kt your.keytab`
to check.
+ - Ensure the catalog configuration correct, such as missing the
`yarn.resourcemanager.principal`.
+ - If the preceding checks are correct, the JDK version installed by yum or
other package-management utility in the current system maybe have an
unsupported encryption algorithm. It is recommended to install JDK by yourself
and set `JAVA_HOME` environment variable.
+
+5. An error is reported when using KMS to access HDFS:
`java.security.InvalidKeyException: Illegal key size`
+
Upgrade the JDK version to a version >= Java 8 u162. Or download and
install the JCE Unlimited Strength Jurisdiction Policy Files corresponding to
the JDK.
-5. When querying a table in ORC format, FE reports an error `Could not obtain
block` or `Caused by: java.lang.NoSuchFieldError: types`
+6. If an error is reported while configuring Kerberos in the catalog: `SIMPLE
authentication is not enabled. Available:[TOKEN, KERBEROS]`.
+
+ Need to put `core-site.xml` to the `"${DORIS_HOME}/be/conf"` directory.
- For ORC files, by default, FE will access HDFS to obtain file information
and split files. In some cases, FE may not be able to access HDFS. It can be
solved by adding the following parameters:
+ If an error is reported while accessing HDFS: `No common protection layer
between client and server`, check the `hadoop.rpc.protection` on the client and
server to make them consistent.
- `"hive.exec.orc.split.strategy" = "BI"`
+ ```
+ <?xml version="1.0" encoding="UTF-8"?>
+ <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
+
+ <configuration>
+
+ <property>
+ <name>hadoop.security.authentication</name>
+ <value>kerberos</value>
+ </property>
+
+ </configuration>
+ ```
- Other options: HYBRID (default), ETL.
+## JDBC Catalog
-6. An error is reported when connecting to SQLServer through JDBC Catalog:
`unable to find valid certification path to requested target`
+1. An error is reported when connecting to SQLServer through JDBC Catalog:
`unable to find valid certification path to requested target`
Please add `trustServerCertificate=true` option in `jdbc_url`.
-7. When connecting to the MySQL database through the JDBC Catalog, the Chinese
characters are garbled, or the Chinese character condition query is incorrect
+2. When connecting to the MySQL database through the JDBC Catalog, the Chinese
characters are garbled, or the Chinese character condition query is incorrect
Please add `useUnicode=true&characterEncoding=utf-8` in `jdbc_url`
> Note: After version 1.2.3, these parameters will be automatically added
when using JDBC Catalog to connect to the MySQL database.
-8. An error is reported when connecting to the MySQL database through the JDBC
Catalog: `Establishing SSL connection without server's identity verification is
not recommended`
+3. An error is reported when connecting to the MySQL database through the JDBC
Catalog: `Establishing SSL connection without server's identity verification is
not recommended`
Please add `useSSL=true` in `jdbc_url`
-9. An error is reported when connecting Hive Catalog: `Caused by:
java.lang.NullPointerException`
+4. When using JDBC Catalog to synchronize MySQL data to Doris, the date data
synchronization error occurs. It is necessary to check whether the MySQL
version corresponds to the MySQL driver package. For example, the driver
com.mysql.cj.jdbc.Driver is required for MySQL8 and above.
+
+
+## Hive Catalog
+
+1. What to do with errors such as `failed to get schema` and `Storage schema
reading not supported` when accessing Icerberg tables via Hive Metastore?
+
+ To fix this, please place the Jar file package of `iceberg` runtime in the
`lib/` directory of Hive.
+
+ And configure as follows in `hive-site.xml` :
+
+ ```
+
metastore.storage.schema.reader.impl=org.apache.hadoop.hive.metastore.SerDeStorageSchemaReader
+ ```
+
+ After configuring, please restart Hive Metastore.
+
+2. An error is reported when connecting Hive Catalog: `Caused by:
java.lang.NullPointerException`
If there is stack trace in fe.log:
@@ -97,19 +131,40 @@ under the License.
Try adding `"metastore.filter.hook" =
"org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl"` in `create
catalog` statement.
-10. An error is reported when connecting to the Hive database through the Hive
Catalog: `RemoteException: SIMPLE authentication is not enabled. Available:
[TOKEN, KERBEROS]`
+3. If the `show tables` is normal after creating the Hive Catalog, but the
query report `java.net.UnknownHostException: xxxxx`
- If both `show databases` and `show tables` are OK, and the above error
occurs when querying, we need to perform the following two operations:
- - Core-site.xml and hdfs-site.xml need to be placed in the fe/conf and
be/conf directories
- - The BE node executes the kinit of Kerberos, restarts the BE, and then
executes the query.
+ Add a property in CATALOG:
+ ```
+ 'fs.defaultFS' = 'hdfs://<your_nameservice_or_actually_HDFS_IP_and_port>'
+ ```
-11. If the `show tables` is normal after creating the Hive Catalog, but the
query report `java.net.UnknownHostException: xxxxx`
+4. The table in orc format of Hive 1.x may encounter system column names such
as `_col0`, `_col1`, `_col2`... in the underlying orc file schema, which need
to be specified in the catalog configuration. Add `hive.version` to 1.x.x so
that it will use the column names in the hive table for mapping.
- Add a property in CATALOG:
+ ```sql
+ CREATE CATALOG hive PROPERTIES (
+ 'hive.version' = '1.x.x'
+ );
```
- 'fs.defaultFS' = 'hdfs://<your_nameservice_or_actually_HDFS_IP_and_port>'
+
+5. If an error related to the Hive Metastore is reported while querying the
catalog: `Invalid method name`.
+
+ Configure the `hive.version`.
+
+ ```sql
+ CREATE CATALOG hive PROPERTIES (
+ 'hive.version' = '2.x.x'
+ );
```
-12. The values of the partition fields in the hudi table can be found on hive,
but they cannot be found on doris.
+
+6. When querying a table in ORC format, FE reports an error `Could not obtain
block` or `Caused by: java.lang.NoSuchFieldError: types`
+
+ For ORC files, by default, FE will access HDFS to obtain file information
and split files. In some cases, FE may not be able to access HDFS. It can be
solved by adding the following parameters:
+
+ `"hive.exec.orc.split.strategy" = "BI"`
+
+ Other options: HYBRID (default), ETL.
+
+7. The values of the partition fields in the hudi table can be found on hive,
but they cannot be found on doris.
Doris and hive currently query hudi differently. Doris needs to add
partition fields to the avsc file of the hudi table structure. If not added, it
will cause Doris to query partition_ Val is empty (even if home. datasource.
live_sync. partition_fields=partition_val is set)
@@ -140,56 +195,13 @@ under the License.
}
```
-13. The table in orc format of Hive 1.x may encounter system column names such
as `_col0`, `_col1`, `_col2`... in the underlying orc file schema, which need
to be specified in the catalog configuration. Add `hive.version` to 1.x.x so
that it will use the column names in the hive table for mapping.
-
- ```sql
- CREATE CATALOG hive PROPERTIES (
- 'hive.version' = '1.x.x'
- );
- ```
-
-14. When using JDBC Catalog to synchronize MySQL data to Doris, the date data
synchronization error occurs. It is necessary to check whether the MySQL
version corresponds to the MySQL driver package. For example, the driver
com.mysql.cj.jdbc.Driver is required for MySQL8 and above.
+## HDFS
-15. If an error is reported while configuring Kerberos in the catalog: `SIMPLE
authentication is not enabled. Available:[TOKEN, KERBEROS]`.
-
- Need to put `core-site.xml` to the `"${DORIS_HOME}/be/conf"` directory.
+1. What to do with the`java.lang.VerifyError: xxx` error when accessing HDFS
3.x?
- If an error is reported while accessing HDFS: `No common protection layer
between client and server`, check the `hadoop.rpc.protection` on the client and
server to make them consistent.
-
- ```
- <?xml version="1.0" encoding="UTF-8"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
-
- <configuration>
-
- <property>
- <name>hadoop.security.authentication</name>
- <value>kerberos</value>
- </property>
-
- </configuration>
- ```
-
-16. The solutions when configuring Kerberos in the catalog and encounter an
error: `Unable to obtain password from user`.
- - The principal used must exist in the klist, use `klist -kt your.keytab`
to check.
- - Ensure the catalog configuration correct, such as missing the
`yarn.resourcemanager.principal`.
- - If the preceding checks are correct, the JDK version installed by yum or
other package-management utility in the current system maybe have an
unsupported encryption algorithm. It is recommended to install JDK by yourself
and set `JAVA_HOME` environment variable.
-
-17. If an error is reported while querying the catalog with Kerberos:
`GSSException: No valid credentials provided (Mechanism level: Failed to find
any Kerberos Ticket)`.
- - Restarting FE and BE can solve the problem in most cases.
- - Before the restart all the nodes, can put
`-Djavax.security.auth.useSubjectCredsOnly=false` to the `JAVA_OPTS` in
`"${DORIS_HOME}/be/conf/be.conf"`, which can obtain credentials through the
underlying mechanism, rather than through the application.
- - Get more solutions to common JAAS errors from the [JAAS
Troubleshooting](https://docs.oracle.com/javase/8/docs/technotes/guides/security/jgss/tutorials/Troubleshooting.html).
-
-18. If an error related to the Hive Metastore is reported while querying the
catalog: `Invalid method name`.
-
- Configure the `hive.version`.
+ Doris 1.2.1 and the older versions rely on Hadoop 2.8. Please update Hadoop
to 2.10.2 or update Doris to 1.2.2 or newer.
- ```sql
- CREATE CATALOG hive PROPERTIES (
- 'hive.version' = '2.x.x'
- );
- ```
-19. Use Hedged Read to optimize the problem of slow HDFS reading.
+2. Use Hedged Read to optimize the problem of slow HDFS reading.
In some cases, the high load of HDFS may lead to a long time to read the
data on HDFS, thereby slowing down the overall query efficiency. HDFS Client
provides Hedged Read.
This function can start another read thread to read the same data when a
read request exceeds a certain threshold and is not returned, and whichever is
returned first will use the result.
diff --git a/docs/zh-CN/docs/lakehouse/faq.md b/docs/zh-CN/docs/lakehouse/faq.md
index 11e2cc937f..50c06d6623 100644
--- a/docs/zh-CN/docs/lakehouse/faq.md
+++ b/docs/zh-CN/docs/lakehouse/faq.md
@@ -27,21 +27,11 @@ under the License.
# 常见问题
-1. 通过 Hive Metastore 访问 Iceberg 表报错:`failed to get schema` 或 `Storage schema
reading not supported`
-
- 在 Hive 的 lib/ 目录放上 `iceberg` 运行时有关的 jar 包。
-
- 在 `hive-site.xml` 配置:
-
- ```
-
metastore.storage.schema.reader.impl=org.apache.hadoop.hive.metastore.SerDeStorageSchemaReader
- ```
-
- 配置完成后需要重启Hive Metastore。
+## Kerberos
-2. 连接 Kerberos 认证的 Hive Metastore 报错:`GSS initiate failed`
+1. 连接 Kerberos 认证的 Hive Metastore 报错:`GSS initiate failed`
- 通常是因为 Kerberos 认证信息填写不正确导致的,可以通过以下步骤排查:
+ 通常是因为 Kerberos 认证信息填写不正确导致的,可以通过以下步骤排查:
1. 1.2.1 之前的版本中,Doris 依赖的 libhdfs3 库没有开启 gsasl。请更新至 1.2.2 之后的版本。
2. 确认对各个组件,设置了正确的 keytab 和 principal,并确认 keytab 文件存在于所有 FE、BE 节点上。
@@ -51,40 +41,82 @@ under the License.
3. 尝试将 principal 中的 ip 换成域名(不要使用默认的 `_HOST` 占位符)
4. 确认 `/etc/krb5.conf` 文件存在于所有 FE、BE 节点上。
-
-3. 访问 HDFS 3.x 时报错:`java.lang.VerifyError: xxx`
- 1.2.1 之前的版本中,Doris 依赖的 Hadoop 版本为 2.8。需更新至 2.10.2。或更新 Doris 至 1.2.2 之后的版本。
+2. 通过 Hive Catalog 连接 Hive 数据库报错:`RemoteException: SIMPLE authentication is
not enabled. Available:[TOKEN, KERBEROS]`.
+
+ 如果在 `show databases` 和 `show tables` 都是没问题的情况下,查询的时候出现上面的错误,我们需要进行下面两个操作:
+ - fe/conf、be/conf 目录下需放置 core-site.xml 和 hdfs-site.xml
+ - BE 节点执行 Kerberos 的 kinit 然后重启 BE ,然后再去执行查询即可.
+
+3. 查询配置了Kerberos的外表,遇到该报错:`GSSException: No valid credentials provided
(Mechanism level: Failed to find any Kerberos Ticket)`,一般重启FE和BE能够解决该问题。
+
+ -
重启所有节点前可在`"${DORIS_HOME}/be/conf/be.conf"`中的JAVA_OPTS参数里配置`-Djavax.security.auth.useSubjectCredsOnly=false`,通过底层机制去获取JAAS
credentials信息,而不是应用程序。
+ - 在[JAAS
Troubleshooting](https://docs.oracle.com/javase/8/docs/technotes/guides/security/jgss/tutorials/Troubleshooting.html)中可获取更多常见JAAS报错的解决方法。
+
+4. 在Catalog中配置Kerberos时,报错`Unable to obtain password from user`的解决方法:
+
+ - 用到的principal必须在klist中存在,使用`klist -kt your.keytab`检查。
+ - 检查catalog配置是否正确,比如漏配`yarn.resourcemanager.principal`。
+ - 若上述检查没问题,则当前系统yum或者其他包管理软件安装的JDK版本存在不支持的加密算法,建议自行安装JDK并设置`JAVA_HOME`环境变量。
-4. 使用 KMS 访问 HDFS 时报错:`java.security.InvalidKeyException: Illegal key size`
+5. 使用 KMS 访问 HDFS 时报错:`java.security.InvalidKeyException: Illegal key size`
升级 JDK 版本到 >= Java 8 u162 的版本。或者下载安装 JDK 相应的 JCE Unlimited Strength
Jurisdiction Policy Files。
-5. 查询 ORC 格式的表,FE 报错 `Could not obtain block` 或 `Caused by:
java.lang.NoSuchFieldError: types`
+6. 在Catalog中配置Kerberos时,如果报错`SIMPLE authentication is not enabled.
Available:[TOKEN,
KERBEROS]`,那么需要将`core-site.xml`文件放到`"${DORIS_HOME}/be/conf"`目录下。
- 对于 ORC 文件,在默认情况下,FE 会访问 HDFS 获取文件信息,进行文件切分。部分情况下,FE 可能无法访问到
HDFS。可以通过添加以下参数解决:
+ 如果访问HDFS报错`No common protection layer between client and
server`,检查客户端和服务端的`hadoop.rpc.protection`属性,使他们保持一致。
- `"hive.exec.orc.split.strategy" = "BI"`
+ ```
+ <?xml version="1.0" encoding="UTF-8"?>
+ <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
+
+ <configuration>
+
+ <property>
+ <name>hadoop.security.authentication</name>
+ <value>kerberos</value>
+ </property>
+
+ </configuration>
+ ```
- 其他选项:HYBRID(默认),ETL。
+## JDBC Catalog
-6. 通过 JDBC Catalog 连接 SQLServer 报错:`unable to find valid certification path to
requested target`
+1. 通过 JDBC Catalog 连接 SQLServer 报错:`unable to find valid certification path to
requested target`
请在 `jdbc_url` 中添加 `trustServerCertificate=true` 选项。
-7. 通过 JDBC Catalog 连接 MySQL 数据库,中文字符乱码,或中文字符条件查询不正确
+2. 通过 JDBC Catalog 连接 MySQL 数据库,中文字符乱码,或中文字符条件查询不正确
请在 `jdbc_url` 中添加 `useUnicode=true&characterEncoding=utf-8`
> 注:1.2.3 版本后,使用 JDBC Catalog 连接 MySQL 数据库,会自动添加这些参数。
-8. 通过 JDBC Catalog 连接 MySQL 数据库报错:`Establishing SSL connection without
server's identity verification is not recommended`
+3. 通过 JDBC Catalog 连接 MySQL 数据库报错:`Establishing SSL connection without
server's identity verification is not recommended`
请在 `jdbc_url` 中添加 `useSSL=true`
-9. 连接 Hive Catalog 报错:`Caused by: java.lang.NullPointerException`
+4. 使用JDBC
Catalog将MySQL数据同步到Doris中,日期数据同步错误。需要校验下MySQL的版本是否与MySQL的驱动包是否对应,比如MySQL8以上需要使用驱动com.mysql.cj.jdbc.Driver。
+
+
+## Hive Catalog
+
+1. 通过 Hive Metastore 访问 Iceberg 表报错:`failed to get schema` 或 `Storage schema
reading not supported`
+
+ 在 Hive 的 lib/ 目录放上 `iceberg` 运行时有关的 jar 包。
+
+ 在 `hive-site.xml` 配置:
+
+ ```
+
metastore.storage.schema.reader.impl=org.apache.hadoop.hive.metastore.SerDeStorageSchemaReader
+ ```
- 如 fe.log 中有如下堆栈:
+ 配置完成后需要重启Hive Metastore。
+
+2. 连接 Hive Catalog 报错:`Caused by: java.lang.NullPointerException`
+
+ 如 fe.log 中有如下堆栈:
```
Caused by: java.lang.NullPointerException
@@ -95,22 +127,39 @@ under the License.
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
~[?:1.8.0_181]
```
- 可以尝试在 `create catalog` 语句中添加 `"metastore.filter.hook" =
"org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl"` 解决。
-
-10. 通过 Hive Catalog 连接 Hive 数据库报错:`RemoteException: SIMPLE authentication is
not enabled. Available:[TOKEN, KERBEROS]`
-
- 如果在 `show databases` 和 `show tables` 都是没问题的情况下,查询的时候出现上面的错误,我们需要进行下面两个操作:
-- fe/conf、be/conf 目录下需放置 core-site.xml 和 hdfs-site.xml
- - BE 节点执行 Kerberos 的 kinit 然后重启 BE ,然后再去执行查询即可.
+ 可以尝试在 `create catalog` 语句中添加 `"metastore.filter.hook" =
"org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl"` 解决。
-
-11. 如果创建 Hive Catalog 后能正常`show tables`,但查询时报`java.net.UnknownHostException:
xxxxx`
+3. 如果创建 Hive Catalog 后能正常`show tables`,但查询时报`java.net.UnknownHostException:
xxxxx`
可以在 CATALOG 的 PROPERTIES 中添加
```
'fs.defaultFS' = 'hdfs://<your_nameservice_or_actually_HDFS_IP_and_port>'
```
-12. 在hive上可以查到hudi表分区字段的值,但是在doris查不到。
+4. Hive 1.x 的 orc 格式的表可能会遇到底层 orc 文件 schema 中列名为 `_col0`,`_col1`,`_col2`...
这类系统列名,此时需要在 catalog 配置中添加 `hive.version` 为 1.x.x,这样就会使用 hive 表中的列名进行映射。
+
+ ```sql
+ CREATE CATALOG hive PROPERTIES (
+ 'hive.version' = '1.x.x'
+ );
+ ```
+
+5. 使用Catalog查询表数据时发现与Hive Metastore相关的报错:`Invalid method
name`,需要设置`hive.version`参数。
+
+ ```sql
+ CREATE CATALOG hive PROPERTIES (
+ 'hive.version' = '2.x.x'
+ );
+ ```
+
+6. 查询 ORC 格式的表,FE 报错 `Could not obtain block` 或 `Caused by:
java.lang.NoSuchFieldError: types`
+
+ 对于 ORC 文件,在默认情况下,FE 会访问 HDFS 获取文件信息,进行文件切分。部分情况下,FE 可能无法访问到
HDFS。可以通过添加以下参数解决:
+
+ `"hive.exec.orc.split.strategy" = "BI"`
+
+ 其他选项:HYBRID(默认),ETL。
+
+7. 在hive上可以查到hudi表分区字段的值,但是在doris查不到。
doris和hive目前查询hudi的方式不一样,doris需要在hudi表结构的avsc文件里添加上分区字段,如果没加,就会导致doris查询partition_val为空(即使设置了hoodie.datasource.hive_sync.partition_fields=partition_val也不可以)
```
@@ -140,54 +189,13 @@ under the License.
}
```
-13. Hive 1.x 的 orc 格式的表可能会遇到底层 orc 文件 schema 中列名为 `_col0`,`_col1`,`_col2`...
这类系统列名,此时需要在 catalog 配置中添加 `hive.version` 为 1.x.x,这样就会使用 hive 表中的列名进行映射。
-
- ```sql
- CREATE CATALOG hive PROPERTIES (
- 'hive.version' = '1.x.x'
- );
- ```
+## HDFS
-14. 使用JDBC
Catalog将MySQL数据同步到Doris中,日期数据同步错误。需要校验下MySQL的版本是否与MySQL的驱动包是否对应,比如MySQL8以上需要使用驱动com.mysql.cj.jdbc.Driver。
+1. 访问 HDFS 3.x 时报错:`java.lang.VerifyError: xxx`
-15. 在Catalog中配置Kerberos时,如果报错`SIMPLE authentication is not enabled.
Available:[TOKEN,
KERBEROS]`,那么需要将`core-site.xml`文件放到`"${DORIS_HOME}/be/conf"`目录下。
-
- 如果访问HDFS报错`No common protection layer between client and
server`,检查客户端和服务端的`hadoop.rpc.protection`属性,使他们保持一致。
-
- ```
- <?xml version="1.0" encoding="UTF-8"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
-
- <configuration>
-
- <property>
- <name>hadoop.security.authentication</name>
- <value>kerberos</value>
- </property>
-
- </configuration>
- ```
-
-16. 在Catalog中配置Kerberos时,报错`Unable to obtain password from user`的解决方法:
-
- - 用到的principal必须在klist中存在,使用`klist -kt your.keytab`检查。
- - 检查catalog配置是否正确,比如漏配`yarn.resourcemanager.principal`。
- - 若上述检查没问题,则当前系统yum或者其他包管理软件安装的JDK版本存在不支持的加密算法,建议自行安装JDK并设置`JAVA_HOME`环境变量。
-
-17. 查询配置了Kerberos的外表,遇到该报错:`GSSException: No valid credentials provided
(Mechanism level: Failed to find any Kerberos Ticket)`,一般重启FE和BE能够解决该问题。
-
- -
重启所有节点前可在`"${DORIS_HOME}/be/conf/be.conf"`中的JAVA_OPTS参数里配置`-Djavax.security.auth.useSubjectCredsOnly=false`,通过底层机制去获取JAAS
credentials信息,而不是应用程序。
- - 在[JAAS
Troubleshooting](https://docs.oracle.com/javase/8/docs/technotes/guides/security/jgss/tutorials/Troubleshooting.html)中可获取更多常见JAAS报错的解决方法。
-
-18. 使用Catalog查询表数据时发现与Hive Metastore相关的报错:`Invalid method
name`,需要设置`hive.version`参数。
-
- ```sql
- CREATE CATALOG hive PROPERTIES (
- 'hive.version' = '1.x.x'
- );
- ```
+ 1.2.1 之前的版本中,Doris 依赖的 Hadoop 版本为 2.8。需更新至 2.10.2。或更新 Doris 至 1.2.2 之后的版本。
-19. 使用 Hedged Read 优化 HDFS 读取慢的问题。
+2. 使用 Hedged Read 优化 HDFS 读取慢的问题。
在某些情况下,HDFS 的负载较高可能导致读取某个 HDFS 上的数据副本的时间较长,从而拖慢整体的查询效率。HDFS Client 提供了
Hedged Read 功能。
该功能可以在一个读请求超过一定阈值未返回时,启动另一个读线程读取同一份数据,哪个先返回就是用哪个结果。
@@ -230,5 +238,3 @@ under the License.
注意,这里的值是单个 HDFS Client 的累计值,而不是单个查询的数值。同一个 HDFS Client 会被多个查询复用。
-
-
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]