Re: IgniteInterruptedException: Node is stopping

2017-11-27 Thread Denis Mekhanikov
Hi Hyma!

Looks like you encountered a classic deadlock. It happens because you put
values into cache in arbitrary order.
This line causes this problem:
*companyDao.nameCache.putAll(kvs)*

So, when multiple threads try to acquire the same locks in different order,
then these operations will be waiting for each other.
To avoid this problem, you should sort data by keys, before calling *putAll*
on it. It can be achieved by using TreeMap. I'm not sure, how to do it in
Scala, sorry.

Let me know if it helps.

Denis

чт, 23 нояб. 2017 г. в 21:14, Hyma :

> Below is the corresponding code where ignite step was in hung state.
>
> logInfo("Populating the canonical name Cache on Ignite Nodes")
> val time = System.currentTimeMillis()
> companyVORDD.mapPartitions(x => {
>   val kvs = x.map(comp =>
> (comp.wcaId,comp)).toMap[String,CompanyVO].asJava
>   companyDao.nameCache.putAll(kvs)
>   x
> }).count()
>
> And for your information, many times we won't see any issues with this and
> the hung state I mentioned aboveis happening only sometimes.
>
>
>
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: IgniteInterruptedException: Node is stopping

2017-11-23 Thread Hyma
Below is the corresponding code where ignite step was in hung state. 

logInfo("Populating the canonical name Cache on Ignite Nodes")
val time = System.currentTimeMillis()
companyVORDD.mapPartitions(x => {
  val kvs = x.map(comp =>
(comp.wcaId,comp)).toMap[String,CompanyVO].asJava
  companyDao.nameCache.putAll(kvs)
  x
}).count()

And for your information, many times we won't see any issues with this and
the hung state I mentioned aboveis happening only sometimes. 






--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: IgniteInterruptedException: Node is stopping

2017-11-22 Thread Michael Cherkasov
Hi Hyma,

Could you please show a code snippet where it is hanged?

Thanks,
Mike.

2017-11-22 12:48 GMT+03:00 Hyma :

> Thanks Mikhail.
>
> I suspected to increase the spark heartbeat/network timeout. But my
> question
> here is if an executor is lost, corresponding ignite node also gets out of
> cluster. In that case, ignite takes care of re balancing between the other
> active nodes right. My spark job was not killed and it keeps on running
> until I terminate the job, Instead my job is getting hung at the ignite
> cache load step for hours.
>
>
>
> --
> Sent from: http://apache-ignite-users.70518.x6.nabble.com/
>


Re: IgniteInterruptedException: Node is stopping

2017-11-22 Thread Hyma
Thanks Mikhail.

I suspected to increase the spark heartbeat/network timeout. But my question
here is if an executor is lost, corresponding ignite node also gets out of
cluster. In that case, ignite takes care of re balancing between the other
active nodes right. My spark job was not killed and it keeps on running
until I terminate the job, Instead my job is getting hung at the ignite
cache load step for hours. 



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


Re: IgniteInterruptedException: Node is stopping

2017-11-17 Thread Mikhail
Hi Hyma,

looks like your job takes too much time, you hit some timeout and spark
killed your jobs.
I don't see any other errors or warnings from your logs, it's very likely
that you need to increase some time out in spark.

thanks,
Mike.



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/


IgniteInterruptedException: Node is stopping

2017-11-17 Thread Hyma
Hi,

When loading ignite cache, we saw the spark job went into hung state at this
step.
We see one of the executor task has been running for hours and below are the
logs from this executor that had the failure.

Stdout log
Launch class org.apache.spark.executor.CoarseGrainedExecutorBackend by
calling
co.cask.cdap.app.runtime.spark.distributed.SparkContainerLauncher.launch
13:12:58.115 [main] INFO  c.c.c.l.a.LogAppenderInitializer - Initializing
log appender KafkaLogAppender
13:12:58.679 [authorization-enforcement-service] INFO 
c.c.c.s.a.AbstractAuthorizationService - Started authorization enforcement
service...
13:12:59.391 [main] INFO  c.c.c.c.g.LocationRuntimeModule - HDFS namespace
is /project/ecpprodcdap
13:12:59.438 [main] INFO  c.c.c.a.r.s.d.SparkContainerLauncher - Launch main
class
org.apache.spark.executor.CoarseGrainedExecutorBackend.main([--driver-url,
spark://CoarseGrainedScheduler@10.214.4.161:33947, --executor-id, 29,
--hostname, c893ach.ecom.bigdata.int.thomsonreuters.com, --cores, 5,
--app-id, application_1506331241975_7951, --user-class-path,
file:/data/7/yarn/nm/usercache/bigdata-app-ecplegalanalytics-svc/appcache/application_1506331241975_7951/container_e28_1506331241975_7951_01_30/__app__.jar])
13:12:59.501 [main] WARN  c.c.c.i.a.Classes - Cannot patch method
obtainTokenForHiveMetastore in
org.apache.spark.deploy.yarn.YarnSparkHadoopUtil due to non-void return
type: (Lorg/apache/hadoop/conf/Configuration;)Lscala/Option;
13:12:59.501 [main] WARN  c.c.c.i.a.Classes - Cannot patch method
obtainTokenForHBase in org.apache.spark.deploy.yarn.YarnSparkHadoopUtil due
to non-void return type:
(Lorg/apache/hadoop/conf/Configuration;)Lscala/Option;
13:13:26.130 [Executor task launch worker-0] WARN 
o.a.s.e.CoarseGrainedExecutorBackend - 17/11/16 13:13:26 INFO
dataloader.IgniteDataLoader: Starting the Ignite node on - 10.214.4.161
13:13:26.134 [Executor task launch worker-3] WARN 
o.a.s.e.CoarseGrainedExecutorBackend - 17/11/16 13:13:26 INFO
dataloader.IgniteDataLoader: Starting the Ignite node on - 10.214.4.161
13:13:26.134 [Executor task launch worker-2] WARN 
o.a.s.e.CoarseGrainedExecutorBackend - 17/11/16 13:13:26 INFO
dataloader.IgniteDataLoader: Starting the Ignite node on - 10.214.4.161
13:13:26.135 [Executor task launch worker-1] WARN 
o.a.s.e.CoarseGrainedExecutorBackend - 17/11/16 13:13:26 INFO
dataloader.IgniteDataLoader: Starting the Ignite node on - 10.214.4.161
13:13:26.135 [Executor task launch worker-4] WARN 
o.a.s.e.CoarseGrainedExecutorBackend - 17/11/16 13:13:26 INFO
dataloader.IgniteDataLoader: Starting the Ignite node on - 10.214.4.161
13:13:26.281 [Executor task launch worker-0] ERROR  - Failed to resolve
default logging config file: config/java.util.logging.properties
13:13:26.283 [Executor task launch worker-0] WARN 
o.a.s.e.CoarseGrainedExecutorBackend - Console logging handler is not
configured.
[13:13:26]__   
[13:13:26]   /  _/ ___/ |/ /  _/_  __/ __/ 
[13:13:26]  _/ // (7 7// /  / / / _/   
[13:13:26] /___/\___/_/|_/___/ /_/ /___/  
[13:13:26] 
[13:13:26] ver. 1.8.0#20161205-sha1:9ca40dbe
[13:13:26] 2016 Copyright(C) Apache Software Foundation
[13:13:26] 
[13:13:26] Ignite documentation: http://ignite.apache.org
[13:13:26] 
[13:13:26] Quiet mode.
[13:13:26]   ^-- To see **FULL** console log here add -DIGNITE_QUIET=false
or "-v" to ignite.{sh|bat}
[13:13:26] 
[13:13:26] OS: Linux 3.10.0-514.16.1.el7.x86_64 amd64
[13:13:26] VM information: Java(TM) SE Runtime Environment 1.8.0_121-b13
Oracle Corporation Java HotSpot(TM) 64-Bit Server VM 25.121-b13
[13:13:26] Configured plugins:
[13:13:26]   ^-- None
[13:13:26] 
[13:13:26] Security status [authentication=off, tls/ssl=off]
[13:13:27] Topology snapshot [ver=3, servers=3, clients=0, CPUs=48,
heap=96.0GB]
[13:13:27] To start Console Management & Monitoring run
ignitevisorcmd.{sh|bat}
[13:13:27] 
[13:13:27] Ignite node started OK (id=e98b003d,
grid=WCAGridapplication_1506331241975_7951)
[13:13:27] Topology snapshot [ver=2, servers=2, clients=0, CPUs=48,
heap=66.0GB]
13:13:27.660 [Executor task launch worker-0] WARN 
o.a.s.e.CoarseGrainedExecutorBackend - 17/11/16 13:13:27 INFO
dataloader.IgniteDataLoader: Started the Ignite node on - 10.214.4.161
13:13:27.660 [Executor task launch worker-2] WARN 
o.a.s.e.CoarseGrainedExecutorBackend - 17/11/16 13:13:27 INFO
dataloader.IgniteDataLoader: Started the Ignite node on - 10.214.4.161
13:13:27.661 [Executor task launch worker-3] WARN 
o.a.s.e.CoarseGrainedExecutorBackend - 17/11/16 13:13:27 INFO
dataloader.IgniteDataLoader: Started the Ignite node on - 10.214.4.161
13:13:27.661 [Executor task launch worker-1] WARN 
o.a.s.e.CoarseGrainedExecutorBackend - 17/11/16 13:13:27 INFO
dataloader.IgniteDataLoader: Started the Ignite node on - 10.214.4.161
13:13:27.661 [Executor task launch worker-4] WARN 
o.a.s.e.CoarseGrainedExecutorBackend - 17/11/16 13:13:27 INFO
dataloader.IgniteDataLoader: Started the Ignite node on - 10.214.4.161
13:13:27.674 [Executor