(Review board wouldn’t post my comments to this review – probably because of other updates to the review while I was reviewing. Hence sending my comments by email)
>> David: The titan0 build failed for me because it was looking for JanusGraph. >> I think that this was because the configuration file was not generated for >> Titan 0. Default configuration is set for Janus graph. If build command specifies a graph-provider, make sure to review and update the configurations to match the specified flavor. If further auto-magic configuration setup is necessary for dev-deployments, I would suggest to take this up in a separate JIRA. >> David: why are we excluding titan jar file for a titan profile ? Because titan classes are included in atlas-graphdb-titan0 jar file. >> David: I suggest we remove this profile I agree on removing 'atlas-graphdb-titan1'. Sarath - can you please? I was able to bring up Atlas in a stand-alone dev-deploy, with default GRAPH_PROVIDER (janus). +1 from me for this patch. Thanks, Madhan From: Sarath Subramanian <nore...@reviews.apache.org> on behalf of Sarath Subramanian <sar...@apache.org> Reply-To: Sarath Subramanian <sar...@apache.org> Date: Thursday, November 30, 2017 at 11:25 AM To: Apoorv Naik <naik.apo...@gmail.com>, Madhan Neethiraj <mad...@apache.org>, Ashutosh Mestry <ames...@hortonworks.com> Cc: Nigel Jones <jon...@uk.ibm.com>, atlas <d...@atlas.incubator.apache.org>, David Radley <david...@apache.org>, Sarath Subramanian <sar...@apache.org> Subject: Re: Review Request 64141: [ATLAS-2287]: Include lucene libraries when building atlas distribution with Janus profile This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/64141/ On November 30th, 2017, 6:47 a.m. PST, David Radley wrote: pom.xml<https://reviews.apache.org/r/64141/diff/1/?file=1903423#file1903423line716> (Diff revision 1) 713 <graphArtifact>atlas-graphdb-janus</graphArtifact> 716 <graphArtifact>atlas-graphdb-janus</graphArtifact> I could start the UI but had lots of Zookeeper exceptions and posts to create entities did not work. I got errors like this : 2017-11-30 12:16:28,975 INFO - [main:] ~ GraphTransaction intercept for org.apache.atlas.repository.store.graph.v1.AtlasEntityStoreV1.addClassifications (GraphTransactionAdvisor$1:41) 2017-11-30 12:16:28,975 INFO - [main:] ~ GraphTransaction intercept for org.apache.atlas.repository.store.graph.v1.AtlasEntityStoreV1.updateClassifications (GraphTransactionAdvisor$1:41) 2017-11-30 12:16:28,976 INFO - [main:] ~ GraphTransaction intercept for org.apache.atlas.repository.store.graph.v1.AtlasEntityStoreV1.addClassification (GraphTransactionAdvisor$1:41) 2017-11-30 12:16:28,976 INFO - [main:] ~ GraphTransaction intercept for org.apache.atlas.repository.store.graph.v1.AtlasEntityStoreV1.deleteClassifications (GraphTransactionAdvisor$1:41) 2017-11-30 12:16:28,976 INFO - [main:] ~ GraphTransaction intercept for org.apache.atlas.repository.store.graph.v1.AtlasEntityStoreV1.getClassification (GraphTransactionAdvisor$1:41) 2017-11-30 12:16:29,021 WARN - [main-SendThread(localhost:9026):] ~ Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect (ClientCnxn$SendThread:1102) java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) 2017-11-30 12:16:29,029 INFO - [main:] ~ Starting service org.apache.atlas.web.service.ActiveInstanceElectorService (Services:53) 2017-11-30 12:16:29,030 INFO - [main:] ~ HA is not enabled, no need to start leader election service (ActiveInstanceElectorService:96) 2017-11-30 12:16:29,030 INFO - [main:] ~ Starting service org.apache.atlas.kafka.KafkaNotification (Services:53) 2017-11 and 017-11-30 12:16:31,194 INFO - [main:] ~ Adding cross-site request forgery (CSRF) protection (AtlasCSRFPreventionFilter:98) 2017-11-30 12:16:31,646 INFO - [main:] ~ AuditFilter initialization started (AuditFilter:57) 2017-11-30 12:30:47,004 WARN - [NIOServerCxn.Factory:localhost/127.0.0.1:9026:] ~ caught end of stream exception (NIOServerCnxn:357) EndOfStreamException: Unable to read additional data from client sessionid 0x1600cdb55840000, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:748) 2017-11-30 12:30:47,109 WARN - [zkCallback-3-thread-2:] ~ Watcher org.apache.solr.common.cloud.ConnectionManager@5232c71 name: ZooKeeperConnection Watcher:localhost:2181 got event WatchedEvent state:Disconnected type:None path:null path: null type: None (ConnectionManager:108) 2017-11-30 12:30:47,111 WARN - [zkCallback-3-thread-2:] ~ zkClient has disconnected (ConnectionManager:184) 2017-11-30 12:30:48,060 WARN - [NIOServerCxn.Factory:localhost/127.0.0.1:9026:] ~ caught end of stream exception (NIOServerCnxn:357) EndOfStreamException: Unable to read additional data from client sessionid 0x1600cdb55840001, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:748) 2017-11-30 12:30:48,255 WARN - [SyncThread:0:] ~ fsync-ing the write ahead log in SyncThread:0 took 1246ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide (FileTxnLog:334) 2017-11-30 12:30:48,508 ERROR - [ZkClient-EventThread-1116-localhost:9026:] ~ Controller 1 epoch 2 initiated state change for partition [consumer_offsets,19] from OfflinePartition to OnlinePartition failed (Logging$class:103) kafka.common.NoReplicaOnlineException: No replica for partition [consumer_offsets,19] is alive. Live brokers are: [Set()], Assigned replicas are: [List(1)] at kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75) at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:345) at kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:205) at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120) at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) at scala.collection.mutable.HashMap.foreach(HashMap.scala:99) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732) at kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117) at kafka.controller.PartitionStateMachine.startup(PartitionStateMachine.scala:70) at kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:335) at kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:166) at kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:84) at kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSession$1.apply$mcZ$sp(KafkaController.scala:1175) at kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSession$1.apply(KafkaController.scala:1173) at kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSession$1.apply(KafkaController.scala:1173) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:231) at kafka.controller.KafkaController$SessionExpirationListener.handleNewSession(KafkaController.scala:1173) at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:735) at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71) 2017-11-30 12:30:48,512 ERROR - [ZkClient-EventThread-1116-localhost:9026:] ~ Controller 1 epoch 2 initiated state change for partition [consumer_offsets,30] from OfflinePartition to OnlinePartition failed (Logging$class:103) kafka.common.NoReplicaOnlineException: No replica for partition [consumer_offsets,30] is alive. Live brokers are: [Set()], Assigned replicas are: [List(1)] at kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75) at kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:345) at kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:205) at kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120) at kafka.controll On November 30th, 2017, 9:48 a.m. PST, Madhan Neethiraj wrote: 2017-11-30 12:30:48,060 WARN - [NIOServerCxn.Factory:localhost/127.0.0.1:9026:] ~ caught end of stream exception (NIOServerCnxn:357) EndOfStreamException: Unable to read additional data from client sessionid 0x1600cdb55840001, likely client has closed socket at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228) at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208) at java.lang.Thread.run(Thread.java:748) This error was caused by incorrect port number configuration in conf/atlas-application.properties: atlas.kafka.zookeeper.connect=localhost:9026 The fix is to replace 9026 with 2181 (i.e. use the zookeeper in embedded-hbase). However, after this change startup of org.apache.atlas.kafka.KafkaNotification service seems to hang. I will look into this further. On November 30th, 2017, 11:03 a.m. PST, Madhan Neethiraj wrote: Use of port 9026 is indeed correct. No need to update the configuration. WARN messagess about "connection refused" are specific to stand-alone dev-env deployment where Atlas uses an embedded Kafka & Zookeeper. Kafka & Zookeeper are started towards the end of initilization but before this is done, an attempt is made to connect to the Zookeeper - resulting in this WARN. This WARNing will be gone later after embedded Kafka and Zookeeper are started. Need to investigate from where the connection attempt is being made. But this issue shouldn't block Atlas being functional. Another error, "Unable to read additional data from client sessionid 0x1600cdb55840001, likely client has closed socket" - I think this might be due to low connect/session timeout values in conf/atlas-application.properties: atlas.kafka.zookeeper.session.timeout.ms=400 atlas.kafka.zookeeper.connection.timeout.ms=200 Can you increase both to 60000 and try again? I validated my patch again on a bare CentOS machine and it works, the steps I followed: Environment CentOS 6.8 Apache Maven 3.5.2 Java version: 1.8.0_151 1. Clear your m2 cache: rm -rf ~/.m2/repository 2. Make sure JAVA_HOME is set in your environment, echo $JAVA_HOME to confirm 3. Export the following (or add it to your bashrc and source it) export MANAGE_LOCAL_SOLR=true export MANAGE_LOCAL_HBASE=true Clone and Build Atlas (for Janus profile) git clone https://github.com/apache/atlas.git -b master mvn clean install -DskipTests -Pdist,embedded-hbase-solr Copy the distribution to atlas binary directory and extract mkdir /tmp/atlas_binary cp -r distro/target/apache-atlas-1.0.0-SNAPSHOT-bin.tar.gz /tmp/atlas_binary/ cd /tmp/atlas_binary/ tar -zxvf apache-atlas-1.0.0-SNAPSHOT-bin.tar.gz cd /tmp/atlas_binary/apache-atlas-1.0.0-SNAPSHOT echo atlas.graphdb.backend=org.apache.atlas.repository.graphdb.janus.AtlasJanusGraphDatabase >> conf/atlas-application.properties Start Atlas ./bin/atlas_start.py It's going to take a while for Atlas to startup but it eventually gets there. The reason for the slowness is in the indexing stage (which is a different issue altogether). But the basic functionalities work Atlas UI http://localhost:21000/ - Sarath On November 28th, 2017, 6:32 p.m. PST, Sarath Subramanian wrote: Review request for atlas, Apoorv Naik, Ashutosh Mestry, and Madhan Neethiraj. By Sarath Subramanian. Updated Nov. 28, 2017, 6:32 p.m. Bugs: ATLAS-2287<https://issues.apache.org/jira/browse/ATLAS-2287> Repository: atlas Description When Atlas is build using -Pdist profile, lucene jars are excluded during packaging of the war file. Since we are not shading graphdb module for janus profile, these jars are needed as run time dependency. Titan's shaded jar includes the lucene libraries and hence were excluded during packaging of war to avoid duplicate dependencies. Testing validated building atlas distribution using both janus and titan0 profile. Atlas starts fine and basic functionalities working. mvn clean install -DskipTests -Pdist,embedded-hbase-solr mvn clean install -DskipTests -Pdist,embedded-hbase-solr -DGRAPH-PROVIDER=titan0 Diffs * distro/pom.xml (eea256d8) * pom.xml (3720c1f5) * webapp/pom.xml (b4a96d36) View Diff<https://reviews.apache.org/r/64141/diff/1/>