Druid stores strings as UTF-8 and from a storage and query basis, it should
work fine with any language. The "wikiticker-2015-09-12-sampled.json.gz"
dataset used for the tutorial has strings in a variety of languages (check
the "page" field): https://druid.apache.org/docs/latest/tutorials/index.html

So I wonder if there is an encoding problem with reading your input data?
If it's in a text format, it should be encoded as UTF-8 for Druid to be
able to read it properly.


On Fri, Jul 16, 2021 at 7:51 AM Y H <yurim2...@gmail.com> wrote:

> hi, i am using druid for develop analytic-web.
> And i found druid can't parse language without english
>
> [image: image.png]
>
> is there any option on utf-8 OR way to parse string correctly?
>
> i attached my druid environment file,
> please let me know way to parse string in druid
>
> thanks.
>
>
>
> environment
> ___________________________________________________
> DRUID_XMS=1g
> DRUID_MAXNEWSIZE=250m
> DRUID_NEWSIZE=250m
> DRUID_MAXDIRECTMEMORYSIZE=6172m
>
> druid_emitter_logging_logLevel=debug
>
> druid_extensions_loadList=["druid-stats","druid-histogram",
> "druid-datasketches", "druid-lookups-cached-global",
> "postgresql-metadata-storage", "druid-kafka-indexing-service",
> "druid-kafka-extraction-namespace"]
>
> druid_zk_service_host=zookeeper
>
> # kafka config
> listeners=PLAINTEXT://211.253.8.155:59092
>
>
> # druid_metadata_storage_host=
> druid_metadata_storage_type=postgresql
>
> druid_metadata_storage_connector_connectURI=jdbc:postgresql://postgres:5432/druid
> druid_metadata_storage_connector_user=druid
> druid_metadata_storage_connector_password=FoolishPassword
>
> druid_coordinator_balancer_strategy=cachingCost
>
> druid_indexer_runner_javaOptsArray=["-server", "-Xmx1g", "-Xms1g",
> "-XX:MaxDirectMemorySize=3g", "-Duser.timezone=UTC",
> "-Dfile.encoding=UTF-8",
> "-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager"]
> druid_indexer_fork_property_druid_processing_buffer_sizeBytes=268435456
>
> druid_storage_type=local
> druid_storage_storageDirectory=/opt/data/segments
> druid_indexer_logs_type=file
> druid_indexer_logs_directory=/opt/data/indexing-logs
>
> druid_processing_numThreads=2
> druid_processing_numMergeBuffers=2
>
>
> DRUID_LOG4J=<?xml version="1.0" encoding="UTF-8" ?><Configuration
> status="WARN"><Appenders><Console name="Console"
> target="SYSTEM_OUT"><PatternLayout pattern="%d{ISO8601} %p [%t] %c -
> %m%n"/></Console></Appenders><Loggers><Root level="info"><AppenderRef
> ref="Console"/></Root><Logger name="org.apache.druid.jetty.RequestLog"
> additivity="false" level="DEBUG"><AppenderRef
> ref="Console"/></Logger></Loggers></Configuration>
>
>

Reply via email to