[DISCUSS] StreamPark Platform configuration files improvements

Huajie Wang Sat, 30 Mar 2024 03:09:42 -0700

hi devs:


Currently, the streampark platform provides multiple configuration
files for user configuration, such as: application.yml,
application-pgsql.yml, application-mysql.yml, kerberos.yml... , We can
improve these configuration files. Many config files are internal
system configurations, for example, in application.yml, a large number
of configurations are internal platform configurations, such as
jackson config for integration Spring Boot, swagger-ui config. the
'allow-circular-references' parameter for Spring... These do not need
user configuration and should not be exposed to users.

application.yml:
```yaml

server:
  port: 10000
  undertow:
    buffer-size: 1024
    direct-buffers: true
    threads:
      io: 4
      worker: 20

logging:
  level:
    root: info

knife4j:
  enable: true
  basic:
    # basic authentication, used to access swagger-ui and doc
    enable: false
    username: admin
    password: streampark

springdoc:
  api-docs:
    enabled: true
  swagger-ui:
    path: /swagger-ui.html
  packages-to-scan: org.apache.streampark.console

spring:
  profiles.active: h2 #[h2,pgsql,mysql]
  application.name: StreamPark
  devtools.restart.enabled: false
  mvc.pathmatch.matching-strategy: ant_path_matcher
  servlet:
    multipart:
      enabled: true
      max-file-size: 500MB
      max-request-size: 500MB
  aop.proxy-target-class: true
  messages.encoding: utf-8
  jackson:
    date-format: yyyy-MM-dd HH:mm:ss
    time-zone: GMT+8
    deserialization:
      fail-on-unknown-properties: false
  main:
    allow-circular-references: true
    banner-mode: off
  mvc:
    converters:
      preferred-json-mapper: jackson

management:
  endpoints:
    web:
      exposure:
        include: [ 'health', 'httptrace', 'metrics' ]
  endpoint:
    health:
      enabled: true
      show-details: always
      probes:
        enabled: true
  health:
    ldap:
      enabled: false

streampark:
  proxy:
    # knox process address
https://cdpsit02.example.cn:8443/gateway/cdp-proxy/yarn
    yarn-url:
    # lark alert proxy,default https://open.feishu.cn
    lark-url:
  yarn:
      # default simple, or kerberos
    http-auth: simple

  # HADOOP_USER_NAME
  hadoop-user-name: hdfs
  # local workspace, used to store source code and build dir etc.
  workspace:
    local: /opt/streampark_workspace
    remote: hdfs:///streampark   # support hdfs:///streampark/ 、
/streampark 、hdfs://host:ip/streampark/

  # remote docker register namespace for streampark
  docker:
    # instantiating DockerHttpClient
    http-client:
      max-connections: 10000
      connection-timeout-sec: 10000
      response-timeout-sec: 12000
      docker-host: ""

  # flink-k8s tracking configuration
  flink-k8s:
    tracking:
      silent-state-keep-sec: 10
      polling-task-timeout-sec:
        job-status: 120
        cluster-metric: 120
      polling-interval-sec:
        job-status: 2
        cluster-metric: 3
    # If you need to specify an ingress controller, you can use this.
    ingress:
      class: nginx

  # packer garbage resources collection configuration
  packer-gc:
    # maximum retention time for temporary build resources
    max-resource-expired-hours: 120
    # gc task running interval hours
    exec-cron: 0 0 0/6 * * ?

  shiro:
    # token timeout, unit second
    jwtTimeOut: 86400
    # backend authentication-free resources url
    anonUrl: >

ldap:
  # Is ldap enabled? If so, please modify the urls
  enable: false
  ## AD server IP, default port 389
  urls: ldap://99.99.99.99:389
  ## Login Account
  base-dn: dc=streampark,dc=com
  username: cn=Manager,dc=streampark,dc=com
  password: streampark
  user:
    identity-attribute: uid
    email-attribute: mail

```


So, I propose that we improve these configurations by providing users
with only one configuration file(only one). The configurations in this
file should be completely user-focused, clear, and core
configurations.

e.g:
```yaml

# logging level
logging.level.root: info
# server port
server.port: 10000
# The user's login session has a validity period. If it exceeds this
time, the user will be automatically logout
# unit: s|m|h|d, s: second, m:minute, h:hour, d: day
server.session.ttl: 2h # unit[s|m|h|d], e.g: 24h, 2d....

# see: 
https://github.com/undertow-io/undertow/blob/master/core/src/main/java/io/undertow/Undertow.java
server.undertow.direct-buffers: true
server.undertow.buffer-size: 1024
server.undertow.threads.io: 16
server.undertow.threads.worker: 256

# system database, default h2, mysql|pgsql|h2
datasource.dialect: h2 # h2, pgsql
#-------if datasource.dialect is mysql or pgsql, it is necessary to set-------
datasource.username:
datasource.password:
# mysql jdbc url example:
# datasource.url:
jdbc:mysql://localhost:3306/streampark?useUnicode=true&characterEncoding=UTF-8&useJDBCCompliantTimezoneShift=true&useLegacyDatetimeCode=false&serverTimezone=GMT%2B8
# postgresql jdbc url example:
# datasource.url:
jdbc:postgresql://localhost:5432/streampark?stringtype=unspecified
datasource.url:
#---------------------------------------------------------------------------------

# Directory for storing locally built project
streampark.workspace.local: /tmp/streampark
# The root hdfs path of the jars, Same as yarn.provided.lib.dirs for
flink on yarn-application
# and Same as --jars for spark on yarn
streampark.workspace.remote: hdfs:///streampark/
# hadoop yarn proxy path, e.g: knox process address
https://streampark.com:8443/proxy/yarn
streampark.proxy.yarn-url:
# lark proxy address, default https://open.feishu.cn
streampark.proxy.lark-url:
# flink on yarn or spark on yarn, monitoring job status from yarn, it
is necessary to set hadoop.http.authentication.type
streampark.yarn.http-auth: simple  # default simple, or kerberos
# flink on yarn or spark on yarn, it is necessary to set
streampark.hadoop-user-name: hdfs
# flink on k8s ingress setting, If an ingress controller is specified
in the configuration, the ingress class
#  kubernetes.io/ingress.class must be specified when creating the
ingress, since there are often
#  multiple ingress controllers in a production environment.
streampark.flink-k8s.ingress.class: nginx

# sign streampark with ldap.
ldap.enable: false  # ldap enabled
ldap.urls: ldap://99.99.99.99:389 #AD server IP, default port 389
ldap.base-dn: dc=streampark,dc=com  # Login Account
ldap.username: cn=Manager,dc=streampark,dc=com
ldap.password: streampark
ldap.user.identity-attribute: uid
ldap.user.email-attribute: mail

# flink on yarn or spark on yarn, when the hadoop cluster enable
kerberos authentication,
# it is necessary to set up Kerberos authentication related parameters.
security.kerberos.login.enable: false
security.kerberos.login.debug: false
# kerberos principal path
security.kerberos.login.principal:
security.kerberos.login.krb5:
security.kerberos.login.keytab:
security.kerberos.ttl: 2h # unit [s|m|h|d]

```

this is issue: https://github.com/apache/incubator-streampark/issues/3641

What's your opinion on this? Welcome to discuss



Best,
Huajie Wang

[DISCUSS] StreamPark Platform configuration files improvements

Reply via email to