Hey,
I use DUCC for english language and it works without any problem.
But lately i tried deploying a job for Arabic Language and all the
content of Arabic Text is replaced by *'?'* (Question Mark).
I am extracting Data from Accumlo and after processing i send it to ES6.
When i checked the log files of JD it shows that arabic data is coming
into CR without any problem.
But when i check another log file it shows that the moment data enters
into my AE arabic content is replaced by Question mark.
Please find the log files attached with this mail.
I think this may be a problem of CM because the data is fine inside CR
and the most interesting part is that if i try running the same pipeline
through CPM it works without any problem which means DUCC is facing
some issue.
I'll look forward to your reply.
--
Best Regards,
*Rohit Yadav*
Wed Jun 13 18:25:47 2018
050 ducc_ling Version 2.2.1 compiled Nov 22 2017 at 15:59:27
4050 Limits: CORE soft[0] hard[-1]
4050 Limits:CPU soft[-1] hard[-1]
4050 Limits: DATA soft[-1] hard[-1]
4050 Limits: FSIZE soft[-1] hard[-1]
4050 Limits:MEMLOCK soft[65536] hard[65536]
4050 Limits: NOFILE soft[65000] hard[65000]
4050 Limits: NPROC soft[256774] hard[256774]
4050 Limits:RSS soft[-1] hard[-1]
4050 Limits: STACK soft[8388608] hard[-1]
4050 Limits: AS soft[-1] hard[-1]
4050 Limits: LOCKS soft[-1] hard[-1]
4050 Limits: SIGPENDING soft[256774] hard[256774]
4050 Limits: MSGQUEUE soft[819200] hard[819200]
4050 Limits: NICE soft[0] hard[0]
4050 Limits: STACK soft[8388608] hard[-1]
4050 Limits: RTPRIO soft[0] hard[0]
1120 Changed to working directory /mario/Uima_Arabic_new_v_1.1
Environ[0] = DUCC_PROCESSID=0
Environ[1] = DUCC_UMASK=002
Environ[2] = USER=mario
Environ[3] = LANG=en_IN
Environ[4] = DUCC_STATE_UPDATE_PORT=52048
Environ[5] = DUCC_PROCESS_UNIQUEID=2196a7f9-1ecd-4716-9139-861ca674f834
Environ[6] = DUCC_JOBID=41010
Environ[7] = DUCC_IP=192.168.10.145
Environ[8] = DUCC_PROCESS_LOG_PREFIX=/mario/ducc/logs/41010/41010-JD-S145
Environ[9] = HOME=/mario
Environ[10] = DUCC_NODENAME=S145
1000 Command to exec: /usr/local/java/jdk1.8.0_25/jre/bin/java
arg[1]:
-Dducc.deploy.configuration=/mario/apache-uima-ducc-2.2.1/resources/ducc.properties
arg[2]: -Dducc.deploy.components=jd
arg[3]: -Dducc.job.id=41010
arg[4]: -Xmx300M
arg[5]: -Dducc.deploy.JobId=41010
arg[6]:
-Dducc.deploy.CollectionReaderXml=desc/orkash/Reader/Accumlo_collectionReaderDescriptor
arg[7]:
-Dducc.deploy.UserClasspath=/mario/apache-uima-ducc-2.2.1/lib/uima-ducc/user/*:UimaArabicES6.jar
arg[8]: -Dducc.deploy.WorkItemTimeout=10
arg[9]: -Dducc.deploy.JobDirectory=/mario/ducc/logs
arg[10]: -Dducc.deploy.JpFlowController=org.apache.uima.ducc.FlowController
arg[11]:
-Dducc.deploy.JpAeDescriptor=desc/orkash/Aggregate/Aggregate1_aeDescriptor
arg[12]:
-Dducc.deploy.JpCcDescriptor=desc/orkash/CASConsumer/casConsumer_Descriptor
arg[13]: -Dducc.deploy.JpThreadCount=5
arg[14]: -DDUCC_HOME=/mario/apache-uima-ducc-2.2.1
arg[15]: -Dducc.deploy.JpUniqueId=2196a7f9-1ecd-4716-9139-861ca674f834
arg[16]: -Dducc.process.log.dir=/mario/ducc/logs/41010/
arg[17]: -Dducc.process.log.basename=41010-JD-S145
arg[18]: -classpath
arg[19]:
/mario/apache-uima-ducc-2.2.1/lib/uima-ducc/*:/mario/apache-uima-ducc-2.2.1/lib/uima-ducc/user/*:/mario/apache-uima-ducc-2.2.1/apache-uima/lib/uima-core.jar:/mario/apache-uima-ducc-2.2.1/lib/apache-log4j/*:/mario/apache-uima-ducc-2.2.1/webserver/lib/*:/mario/apache-uima-ducc-2.2.1/apache-uima/apache-activemq/lib/*:/mario/apache-uima-ducc-2.2.1/apache-uima/apache-activemq/lib/optional/*:/mario/apache-uima-ducc-2.2.1/lib/apache-camel/*:/mario/apache-uima-ducc-2.2.1/lib/apache-commons/*:/mario/apache-uima-ducc-2.2.1/lib/google-gson/*:/mario/apache-uima-ducc-2.2.1/lib/springframework/*
arg[20]: org.apache.uima.ducc.common.main.DuccService
1001 Command launching...
13 Jun 2018 18:25:48,534 INFO DUCC.DuccService - J[N/A] T[1] Component
Starting Component
{ducc.agent.exclusion.file=/mario/apache-uima-ducc-2.2.1/resources/exclusion.nodes,
file.encoding.pkg=sun.io, ducc.orchestrator.http.node=S145,
ducc.sm.meta.ping.stability=10, ducc.rm.admin.endpoint.type=queue,
ducc.default.process.per.item.time.max=1440,
ducc.agent.managed.process.state.update.endpoint.type=socket,
java.home=/usr/local/java/jdk1.8.0_25/jre,
ducc.jd.share.quantum.reserve.count=3, ducc.sm.http.port=19988,
ducc.jd.communications.scheme=https, ducc.rm.reserve_overage=0, ducc.head=S145,
ducc.rm.class.definitions=ducc.classes, ducc.agent.jvm.args=-Xmx500M,
ducc.jd.queue.timeout.minutes=5,
ducc.daemons.state.change.endpoint=activemq:queue:ducc.daemons.state.change,
ducc.broker.memory.options=-Xmx1G,
java.endorsed.dirs=/usr/local/java/jdk1.8.0_25/jre/lib/endorsed,
ducc.sm.api.endpoint=activemq:queue:ducc.sm.api,
ducc.orchestrator.state.