> On Nov. 30, 2017, 6:47 a.m., David Radley wrote: > > pom.xml > > Line 713 (original), 716 (patched) > > <https://reviews.apache.org/r/64141/diff/1/?file=1903423#file1903423line716> > > > > I could start the UI but had lots of Zookeeper exceptions and posts to > > create entities did not work. > > > > I got errors like this : > > 2017-11-30 12:16:28,975 INFO - [main:] ~ GraphTransaction intercept > > for > > org.apache.atlas.repository.store.graph.v1.AtlasEntityStoreV1.addClassifications > > (GraphTransactionAdvisor$1:41) > > 2017-11-30 12:16:28,975 INFO - [main:] ~ GraphTransaction intercept > > for > > org.apache.atlas.repository.store.graph.v1.AtlasEntityStoreV1.updateClassifications > > (GraphTransactionAdvisor$1:41) > > 2017-11-30 12:16:28,976 INFO - [main:] ~ GraphTransaction intercept > > for > > org.apache.atlas.repository.store.graph.v1.AtlasEntityStoreV1.addClassification > > (GraphTransactionAdvisor$1:41) > > 2017-11-30 12:16:28,976 INFO - [main:] ~ GraphTransaction intercept > > for > > org.apache.atlas.repository.store.graph.v1.AtlasEntityStoreV1.deleteClassifications > > (GraphTransactionAdvisor$1:41) > > 2017-11-30 12:16:28,976 INFO - [main:] ~ GraphTransaction intercept > > for > > org.apache.atlas.repository.store.graph.v1.AtlasEntityStoreV1.getClassification > > (GraphTransactionAdvisor$1:41) > > 2017-11-30 12:16:29,021 WARN - [main-SendThread(localhost:9026):] ~ > > Session 0x0 for server null, unexpected error, closing socket connection > > and attempting reconnect (ClientCnxn$SendThread:1102) > > java.net.ConnectException: Connection refused > > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > > at > > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) > > at > > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) > > at > > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) > > 2017-11-30 12:16:29,029 INFO - [main:] ~ Starting service > > org.apache.atlas.web.service.ActiveInstanceElectorService (Services:53) > > 2017-11-30 12:16:29,030 INFO - [main:] ~ HA is not enabled, no need to > > start leader election service (ActiveInstanceElectorService:96) > > 2017-11-30 12:16:29,030 INFO - [main:] ~ Starting service > > org.apache.atlas.kafka.KafkaNotification (Services:53) > > 2017-11 > > > > > > and > > 017-11-30 12:16:31,194 INFO - [main:] ~ Adding cross-site request > > forgery (CSRF) protection (AtlasCSRFPreventionFilter:98) > > 2017-11-30 12:16:31,646 INFO - [main:] ~ AuditFilter initialization > > started (AuditFilter:57) > > 2017-11-30 12:30:47,004 WARN - > > [NIOServerCxn.Factory:localhost/127.0.0.1:9026:] ~ caught end of stream > > exception (NIOServerCnxn:357) > > EndOfStreamException: Unable to read additional data from client > > sessionid 0x1600cdb55840000, likely client has closed socket > > at > > org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) > > at > > org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) > > at java.lang.Thread.run(Thread.java:748) > > 2017-11-30 12:30:47,109 WARN - [zkCallback-3-thread-2:] ~ Watcher > > org.apache.solr.common.cloud.ConnectionManager@5232c71 name: > > ZooKeeperConnection Watcher:localhost:2181 got event WatchedEvent > > state:Disconnected type:None path:null path: null type: None > > (ConnectionManager:108) > > 2017-11-30 12:30:47,111 WARN - [zkCallback-3-thread-2:] ~ zkClient has > > disconnected (ConnectionManager:184) > > 2017-11-30 12:30:48,060 WARN - > > [NIOServerCxn.Factory:localhost/127.0.0.1:9026:] ~ caught end of stream > > exception (NIOServerCnxn:357) > > EndOfStreamException: Unable to read additional data from client > > sessionid 0x1600cdb55840001, likely client has closed socket > > at > > org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) > > at > > org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) > > at java.lang.Thread.run(Thread.java:748) > > 2017-11-30 12:30:48,255 WARN - [SyncThread:0:] ~ fsync-ing the write > > ahead log in SyncThread:0 took 1246ms which will adversely effect operation > > latency. See the ZooKeeper troubleshooting guide (FileTxnLog:334) > > 2017-11-30 12:30:48,508 ERROR - > > [ZkClient-EventThread-1116-localhost:9026:] ~ Controller 1 epoch 2 > > initiated state change for partition [__consumer_offsets,19] from > > OfflinePartition to OnlinePartition failed (Logging$class:103) > > kafka.common.NoReplicaOnlineException: No replica for partition > > [__consumer_offsets,19] is alive. Live brokers are: [Set()], Assigned > > replicas are: [List(1)] > > at > > kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75) > > at > > kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:345) > > at > > kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:205) > > at > > kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120) > > at > > kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117) > > at > > scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733) > > at > > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) > > at > > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) > > at > > scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230) > > at > > scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) > > at scala.collection.mutable.HashMap.foreach(HashMap.scala:99) > > at > > scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732) > > at > > kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117) > > at > > kafka.controller.PartitionStateMachine.startup(PartitionStateMachine.scala:70) > > at > > kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:335) > > at > > kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:166) > > at > > kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:84) > > at > > kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSession$1.apply$mcZ$sp(KafkaController.scala:1175) > > at > > kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSession$1.apply(KafkaController.scala:1173) > > at > > kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSession$1.apply(KafkaController.scala:1173) > > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:231) > > at > > kafka.controller.KafkaController$SessionExpirationListener.handleNewSession(KafkaController.scala:1173) > > at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:735) > > at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) > > 2017-11-30 12:30:48,512 ERROR - > > [ZkClient-EventThread-1116-localhost:9026:] ~ Controller 1 epoch 2 > > initiated state change for partition [__consumer_offsets,30] from > > OfflinePartition to OnlinePartition failed (Logging$class:103) > > kafka.common.NoReplicaOnlineException: No replica for partition > > [__consumer_offsets,30] is alive. Live brokers are: [Set()], Assigned > > replicas are: [List(1)] > > at > > kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75) > > at > > kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:345) > > at > > kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:205) > > at > > kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120) > > at kafka.controll > > Madhan Neethiraj wrote: > 2017-11-30 12:30:48,060 WARN - > [NIOServerCxn.Factory:localhost/127.0.0.1:9026:] ~ caught end of stream > exception (NIOServerCnxn:357) > EndOfStreamException: Unable to read additional data from client > sessionid 0x1600cdb55840001, likely client has closed socket > at > org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) > at java.lang.Thread.run(Thread.java:748) > > This error was caused by incorrect port number configuration in > conf/atlas-application.properties: > atlas.kafka.zookeeper.connect=localhost:9026 > > The fix is to replace 9026 with 2181 (i.e. use the zookeeper in > embedded-hbase). > > However, after this change startup of > org.apache.atlas.kafka.KafkaNotification service seems to hang. I will look > into this further. > > Madhan Neethiraj wrote: > Use of port 9026 is indeed correct. No need to update the configuration. > > WARN messagess about "connection refused" are specific to stand-alone > dev-env deployment where Atlas uses an embedded Kafka & Zookeeper. Kafka & > Zookeeper are started towards the end of initilization but before this is > done, an attempt is made to connect to the Zookeeper - resulting in this > WARN. This WARNing will be gone later after embedded Kafka and Zookeeper are > started. Need to investigate from where the connection attempt is being made. > But this issue shouldn't block Atlas being functional. > > Another error, "Unable to read additional data from client sessionid > 0x1600cdb55840001, likely client has closed socket" - I think this might be > due to low connect/session timeout values in > conf/atlas-application.properties: > atlas.kafka.zookeeper.session.timeout.ms=400 > atlas.kafka.zookeeper.connection.timeout.ms=200 > > Can you increase both to 60000 and try again?
I validated my patch again on a bare CentOS machine and it works, the steps I followed: # Environment CentOS 6.8 Apache Maven 3.5.2 Java version: 1.8.0_151 1. Clear your m2 cache: rm -rf ~/.m2/repository 2. Make sure JAVA_HOME is set in your environment, echo $JAVA_HOME to confirm 3. Export the following (or add it to your bashrc and source it) export MANAGE_LOCAL_SOLR=true export MANAGE_LOCAL_HBASE=true # Clone and Build Atlas (for Janus profile) git clone https://github.com/apache/atlas.git -b master mvn clean install -DskipTests -Pdist,embedded-hbase-solr # Copy the distribution to atlas binary directory and extract mkdir /tmp/atlas_binary cp -r distro/target/apache-atlas-1.0.0-SNAPSHOT-bin.tar.gz /tmp/atlas_binary/ cd /tmp/atlas_binary/ tar -zxvf apache-atlas-1.0.0-SNAPSHOT-bin.tar.gz cd /tmp/atlas_binary/apache-atlas-1.0.0-SNAPSHOT echo atlas.graphdb.backend=org.apache.atlas.repository.graphdb.janus.AtlasJanusGraphDatabase >> conf/atlas-application.properties # Start Atlas ./bin/atlas_start.py It's going to take a while for Atlas to startup but it eventually gets there. The reason for the slowness is in the indexing stage (which is a different issue altogether). But the basic functionalities work # Atlas UI http://localhost:21000/ - Sarath ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/64141/#review192267 ----------------------------------------------------------- On Nov. 28, 2017, 6:32 p.m., Sarath Subramanian wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/64141/ > ----------------------------------------------------------- > > (Updated Nov. 28, 2017, 6:32 p.m.) > > > Review request for atlas, Apoorv Naik, Ashutosh Mestry, and Madhan Neethiraj. > > > Bugs: ATLAS-2287 > https://issues.apache.org/jira/browse/ATLAS-2287 > > > Repository: atlas > > > Description > ------- > > When Atlas is build using -Pdist profile, lucene jars are excluded during > packaging of the war file. Since we are not shading graphdb module for janus > profile, these jars are needed as run time dependency. > Titan's shaded jar includes the lucene libraries and hence were excluded > during packaging of war to avoid duplicate dependencies. > > > Diffs > ----- > > distro/pom.xml eea256d8 > pom.xml 3720c1f5 > webapp/pom.xml b4a96d36 > > > Diff: https://reviews.apache.org/r/64141/diff/1/ > > > Testing > ------- > > validated building atlas distribution using both janus and titan0 profile. > Atlas starts fine and basic functionalities working. > > mvn clean install -DskipTests -Pdist,embedded-hbase-solr > mvn clean install -DskipTests -Pdist,embedded-hbase-solr > -DGRAPH-PROVIDER=titan0 > > > Thanks, > > Sarath Subramanian > >
