Re: Review Request 64141: [ATLAS-2287]: Include lucene libraries when building atlas distribution with Janus profile

Madhan Neethiraj Thu, 30 Nov 2017 11:04:32 -0800


> On Nov. 30, 2017, 2:47 p.m., David Radley wrote:
> > pom.xml
> > Line 713 (original), 716 (patched)
> > <https://reviews.apache.org/r/64141/diff/1/?file=1903423#file1903423line716>
> >
> >     I could start the UI but had lots of Zookeeper exceptions and posts to 
> > create entities did not work.
> >     
> >     I got errors like this :
> >     2017-11-30 12:16:28,975 INFO  - [main:] ~ GraphTransaction intercept 
> > for 
> > org.apache.atlas.repository.store.graph.v1.AtlasEntityStoreV1.addClassifications
> >  (GraphTransactionAdvisor$1:41)
> >     2017-11-30 12:16:28,975 INFO  - [main:] ~ GraphTransaction intercept 
> > for 
> > org.apache.atlas.repository.store.graph.v1.AtlasEntityStoreV1.updateClassifications
> >  (GraphTransactionAdvisor$1:41)
> >     2017-11-30 12:16:28,976 INFO  - [main:] ~ GraphTransaction intercept 
> > for 
> > org.apache.atlas.repository.store.graph.v1.AtlasEntityStoreV1.addClassification
> >  (GraphTransactionAdvisor$1:41)
> >     2017-11-30 12:16:28,976 INFO  - [main:] ~ GraphTransaction intercept 
> > for 
> > org.apache.atlas.repository.store.graph.v1.AtlasEntityStoreV1.deleteClassifications
> >  (GraphTransactionAdvisor$1:41)
> >     2017-11-30 12:16:28,976 INFO  - [main:] ~ GraphTransaction intercept 
> > for 
> > org.apache.atlas.repository.store.graph.v1.AtlasEntityStoreV1.getClassification
> >  (GraphTransactionAdvisor$1:41)
> >     2017-11-30 12:16:29,021 WARN  - [main-SendThread(localhost:9026):] ~ 
> > Session 0x0 for server null, unexpected error, closing socket connection 
> > and attempting reconnect (ClientCnxn$SendThread:1102)
> >     java.net.ConnectException: Connection refused
> >             at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> >             at 
> > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> >             at 
> > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
> >             at 
> > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
> >     2017-11-30 12:16:29,029 INFO  - [main:] ~ Starting service 
> > org.apache.atlas.web.service.ActiveInstanceElectorService (Services:53)
> >     2017-11-30 12:16:29,030 INFO  - [main:] ~ HA is not enabled, no need to 
> > start leader election service (ActiveInstanceElectorService:96)
> >     2017-11-30 12:16:29,030 INFO  - [main:] ~ Starting service 
> > org.apache.atlas.kafka.KafkaNotification (Services:53)
> >     2017-11
> >     
> >     
> >     and 
> >     017-11-30 12:16:31,194 INFO  - [main:] ~ Adding cross-site request 
> > forgery (CSRF) protection (AtlasCSRFPreventionFilter:98)
> >     2017-11-30 12:16:31,646 INFO  - [main:] ~ AuditFilter initialization 
> > started (AuditFilter:57)
> >     2017-11-30 12:30:47,004 WARN  - 
> > [NIOServerCxn.Factory:localhost/127.0.0.1:9026:] ~ caught end of stream 
> > exception (NIOServerCnxn:357)
> >     EndOfStreamException: Unable to read additional data from client 
> > sessionid 0x1600cdb55840000, likely client has closed socket
> >             at 
> > org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> >             at 
> > org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
> >             at java.lang.Thread.run(Thread.java:748)
> >     2017-11-30 12:30:47,109 WARN  - [zkCallback-3-thread-2:] ~ Watcher 
> > org.apache.solr.common.cloud.ConnectionManager@5232c71 name: 
> > ZooKeeperConnection Watcher:localhost:2181 got event WatchedEvent 
> > state:Disconnected type:None path:null path: null type: None 
> > (ConnectionManager:108)
> >     2017-11-30 12:30:47,111 WARN  - [zkCallback-3-thread-2:] ~ zkClient has 
> > disconnected (ConnectionManager:184)
> >     2017-11-30 12:30:48,060 WARN  - 
> > [NIOServerCxn.Factory:localhost/127.0.0.1:9026:] ~ caught end of stream 
> > exception (NIOServerCnxn:357)
> >     EndOfStreamException: Unable to read additional data from client 
> > sessionid 0x1600cdb55840001, likely client has closed socket
> >             at 
> > org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> >             at 
> > org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
> >             at java.lang.Thread.run(Thread.java:748)
> >     2017-11-30 12:30:48,255 WARN  - [SyncThread:0:] ~ fsync-ing the write 
> > ahead log in SyncThread:0 took 1246ms which will adversely effect operation 
> > latency. See the ZooKeeper troubleshooting guide (FileTxnLog:334)
> >     2017-11-30 12:30:48,508 ERROR - 
> > [ZkClient-EventThread-1116-localhost:9026:] ~ Controller 1 epoch 2 
> > initiated state change for partition [__consumer_offsets,19] from 
> > OfflinePartition to OnlinePartition failed (Logging$class:103)
> >     kafka.common.NoReplicaOnlineException: No replica for partition 
> > [__consumer_offsets,19] is alive. Live brokers are: [Set()], Assigned 
> > replicas are: [List(1)]
> >             at 
> > kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75)
> >             at 
> > kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:345)
> >             at 
> > kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:205)
> >             at 
> > kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120)
> >             at 
> > kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117)
> >             at 
> > scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
> >             at 
> > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
> >             at 
> > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
> >             at 
> > scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
> >             at 
> > scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
> >             at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
> >             at 
> > scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
> >             at 
> > kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117)
> >             at 
> > kafka.controller.PartitionStateMachine.startup(PartitionStateMachine.scala:70)
> >             at 
> > kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:335)
> >             at 
> > kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:166)
> >             at 
> > kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:84)
> >             at 
> > kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSession$1.apply$mcZ$sp(KafkaController.scala:1175)
> >             at 
> > kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSession$1.apply(KafkaController.scala:1173)
> >             at 
> > kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSession$1.apply(KafkaController.scala:1173)
> >             at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:231)
> >             at 
> > kafka.controller.KafkaController$SessionExpirationListener.handleNewSession(KafkaController.scala:1173)
> >             at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:735)
> >             at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
> >     2017-11-30 12:30:48,512 ERROR - 
> > [ZkClient-EventThread-1116-localhost:9026:] ~ Controller 1 epoch 2 
> > initiated state change for partition [__consumer_offsets,30] from 
> > OfflinePartition to OnlinePartition failed (Logging$class:103)
> >     kafka.common.NoReplicaOnlineException: No replica for partition 
> > [__consumer_offsets,30] is alive. Live brokers are: [Set()], Assigned 
> > replicas are: [List(1)]
> >             at 
> > kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75)
> >             at 
> > kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:345)
> >             at 
> > kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:205)
> >             at 
> > kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120)
> >             at kafka.controll
> 
> Madhan Neethiraj wrote:
>     2017-11-30 12:30:48,060 WARN  - 
> [NIOServerCxn.Factory:localhost/127.0.0.1:9026:] ~ caught end of stream 
> exception (NIOServerCnxn:357)
>     EndOfStreamException: Unable to read additional data from client 
> sessionid 0x1600cdb55840001, likely client has closed socket
>             at 
> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
>             at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
>             at java.lang.Thread.run(Thread.java:748)
>             
>     This error was caused by incorrect port number configuration in 
> conf/atlas-application.properties:
>       atlas.kafka.zookeeper.connect=localhost:9026
>     
>     The fix is to replace 9026 with 2181 (i.e. use the zookeeper in 
> embedded-hbase).
>     
>     However, after this change startup of 
> org.apache.atlas.kafka.KafkaNotification service seems to hang. I will look 
> into this further.


Use of port 9026 is indeed correct. No need to update the configuration.

WARN messagess about "connection refused" are specific to stand-alone dev-env 
deployment where Atlas uses an embedded Kafka & Zookeeper. Kafka & Zookeeper 
are started towards the end of initilization but before this is done, an 
attempt is made to connect to the Zookeeper - resulting in this WARN. This 
WARNing will be gone later after embedded Kafka and Zookeeper are started. Need 
to investigate from where the connection attempt is being made. But this issue 
shouldn't block Atlas being functional.

Another error, "Unable to read additional data from client sessionid 
0x1600cdb55840001, likely client has closed socket" - I think this might be due 
to low connect/session timeout values in conf/atlas-application.properties:
  atlas.kafka.zookeeper.session.timeout.ms=400
  atlas.kafka.zookeeper.connection.timeout.ms=200

Can you increase both to 60000 and try again?


- Madhan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64141/#review192267
-----------------------------------------------------------


On Nov. 29, 2017, 2:32 a.m., Sarath Subramanian wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64141/
> -----------------------------------------------------------
> 
> (Updated Nov. 29, 2017, 2:32 a.m.)
> 
> 
> Review request for atlas, Apoorv Naik, Ashutosh Mestry, and Madhan Neethiraj.
> 
> 
> Bugs: ATLAS-2287
>     https://issues.apache.org/jira/browse/ATLAS-2287
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> When Atlas is build using -Pdist profile, lucene jars are excluded during 
> packaging of the war file. Since we are not shading graphdb module for janus 
> profile, these jars are needed as run time dependency.
> Titan's shaded jar includes the lucene libraries and hence were excluded 
> during packaging of war to avoid duplicate dependencies.
> 
> 
> Diffs
> -----
> 
>   distro/pom.xml eea256d8 
>   pom.xml 3720c1f5 
>   webapp/pom.xml b4a96d36 
> 
> 
> Diff: https://reviews.apache.org/r/64141/diff/1/
> 
> 
> Testing
> -------
> 
> validated building atlas distribution using both janus and titan0 profile. 
> Atlas starts fine and basic functionalities working.
> 
> mvn clean install -DskipTests -Pdist,embedded-hbase-solr
> mvn clean install -DskipTests -Pdist,embedded-hbase-solr 
> -DGRAPH-PROVIDER=titan0
> 
> 
> Thanks,
> 
> Sarath Subramanian
> 
>

Re: Review Request 64141: [ATLAS-2287]: Include lucene libraries when building atlas distribution with Janus profile

Reply via email to