Re: Review Request 64141: [ATLAS-2287]: Include lucene libraries when building atlas distribution with Janus profile

Madhan Neethiraj Thu, 30 Nov 2017 11:55:37 -0800

(Review board wouldn’t post my comments to this review – probably because of 
other updates to the review while I was reviewing. Hence sending my comments by 
email)


>> David: The titan0 build failed for me because it was looking for JanusGraph. 
>> I think that this was because the configuration file was not generated for 
>> Titan 0.
Default configuration is set for Janus graph. If build command specifies a 
graph-provider, make sure to review and update the configurations to match the 
specified flavor. If further auto-magic configuration setup is necessary for 
dev-deployments, I would suggest to take this up in a separate JIRA.
>> David: why are we excluding titan jar file for a titan profile ?
Because titan classes are included in atlas-graphdb-titan0 jar file.

>> David: I suggest we remove this profile
I agree on removing 'atlas-graphdb-titan1'. Sarath - can you please?

I was able to bring up Atlas in a stand-alone dev-deploy, with default 
GRAPH_PROVIDER (janus).

+1 from me for this patch.

Thanks,
Madhan

From: Sarath Subramanian <nore...@reviews.apache.org> on behalf of Sarath 
Subramanian <sar...@apache.org>
Reply-To: Sarath Subramanian <sar...@apache.org>
Date: Thursday, November 30, 2017 at 11:25 AM
To: Apoorv Naik <naik.apo...@gmail.com>, Madhan Neethiraj <mad...@apache.org>, 
Ashutosh Mestry <ames...@hortonworks.com>
Cc: Nigel Jones <jon...@uk.ibm.com>, atlas <d...@atlas.incubator.apache.org>, 
David Radley <david...@apache.org>, Sarath Subramanian <sar...@apache.org>
Subject: Re: Review Request 64141: [ATLAS-2287]: Include lucene libraries when 
building atlas distribution with Janus profile

This is an automatically generated e-mail. To reply, visit: 
https://reviews.apache.org/r/64141/



On November 30th, 2017, 6:47 a.m. PST, David Radley wrote:
pom.xml<https://reviews.apache.org/r/64141/diff/1/?file=1903423#file1903423line716>
 (Diff revision 1)

713


                <graphArtifact>atlas-graphdb-janus</graphArtifact>

716


                <graphArtifact>atlas-graphdb-janus</graphArtifact>


I could start the UI but had lots of Zookeeper exceptions and posts to create 
entities did not work.



I got errors like this :

2017-11-30 12:16:28,975 INFO  - [main:] ~ GraphTransaction intercept for 
org.apache.atlas.repository.store.graph.v1.AtlasEntityStoreV1.addClassifications
 (GraphTransactionAdvisor$1:41)

2017-11-30 12:16:28,975 INFO  - [main:] ~ GraphTransaction intercept for 
org.apache.atlas.repository.store.graph.v1.AtlasEntityStoreV1.updateClassifications
 (GraphTransactionAdvisor$1:41)

2017-11-30 12:16:28,976 INFO  - [main:] ~ GraphTransaction intercept for 
org.apache.atlas.repository.store.graph.v1.AtlasEntityStoreV1.addClassification 
(GraphTransactionAdvisor$1:41)

2017-11-30 12:16:28,976 INFO  - [main:] ~ GraphTransaction intercept for 
org.apache.atlas.repository.store.graph.v1.AtlasEntityStoreV1.deleteClassifications
 (GraphTransactionAdvisor$1:41)

2017-11-30 12:16:28,976 INFO  - [main:] ~ GraphTransaction intercept for 
org.apache.atlas.repository.store.graph.v1.AtlasEntityStoreV1.getClassification 
(GraphTransactionAdvisor$1:41)

2017-11-30 12:16:29,021 WARN  - [main-SendThread(localhost:9026):] ~ Session 
0x0 for server null, unexpected error, closing socket connection and attempting 
reconnect (ClientCnxn$SendThread:1102)

java.net.ConnectException: Connection refused

        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

        at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)

        at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)

        at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)

2017-11-30 12:16:29,029 INFO  - [main:] ~ Starting service 
org.apache.atlas.web.service.ActiveInstanceElectorService (Services:53)

2017-11-30 12:16:29,030 INFO  - [main:] ~ HA is not enabled, no need to start 
leader election service (ActiveInstanceElectorService:96)

2017-11-30 12:16:29,030 INFO  - [main:] ~ Starting service 
org.apache.atlas.kafka.KafkaNotification (Services:53)

2017-11



and

017-11-30 12:16:31,194 INFO  - [main:] ~ Adding cross-site request forgery 
(CSRF) protection (AtlasCSRFPreventionFilter:98)

2017-11-30 12:16:31,646 INFO  - [main:] ~ AuditFilter initialization started 
(AuditFilter:57)

2017-11-30 12:30:47,004 WARN  - 
[NIOServerCxn.Factory:localhost/127.0.0.1:9026:] ~ caught end of stream 
exception (NIOServerCnxn:357)

EndOfStreamException: Unable to read additional data from client sessionid 
0x1600cdb55840000, likely client has closed socket

        at 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)

        at 
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)

        at java.lang.Thread.run(Thread.java:748)

2017-11-30 12:30:47,109 WARN  - [zkCallback-3-thread-2:] ~ Watcher 
org.apache.solr.common.cloud.ConnectionManager@5232c71 name: 
ZooKeeperConnection Watcher:localhost:2181 got event WatchedEvent 
state:Disconnected type:None path:null path: null type: None 
(ConnectionManager:108)

2017-11-30 12:30:47,111 WARN  - [zkCallback-3-thread-2:] ~ zkClient has 
disconnected (ConnectionManager:184)

2017-11-30 12:30:48,060 WARN  - 
[NIOServerCxn.Factory:localhost/127.0.0.1:9026:] ~ caught end of stream 
exception (NIOServerCnxn:357)

EndOfStreamException: Unable to read additional data from client sessionid 
0x1600cdb55840001, likely client has closed socket

        at 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)

        at 
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)

        at java.lang.Thread.run(Thread.java:748)

2017-11-30 12:30:48,255 WARN  - [SyncThread:0:] ~ fsync-ing the write ahead log 
in SyncThread:0 took 1246ms which will adversely effect operation latency. See 
the ZooKeeper troubleshooting guide (FileTxnLog:334)

2017-11-30 12:30:48,508 ERROR - [ZkClient-EventThread-1116-localhost:9026:] ~ 
Controller 1 epoch 2 initiated state change for partition [consumer_offsets,19] 
from OfflinePartition to OnlinePartition failed (Logging$class:103) 
kafka.common.NoReplicaOnlineException: No replica for partition 
[consumer_offsets,19] is alive. Live brokers are: [Set()], Assigned replicas 
are: [List(1)]

        at 
kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75)

        at 
kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:345)

        at 
kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:205)

        at 
kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120)

        at 
kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117)

        at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)

        at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)

        at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)

        at 
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)

        at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)

        at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)

        at 
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)

        at 
kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117)

        at 
kafka.controller.PartitionStateMachine.startup(PartitionStateMachine.scala:70)

        at 
kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:335)

        at 
kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:166)

        at 
kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:84)

        at 
kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSession$1.apply$mcZ$sp(KafkaController.scala:1175)

        at 
kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSession$1.apply(KafkaController.scala:1173)

        at 
kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSession$1.apply(KafkaController.scala:1173)

        at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:231)

        at 
kafka.controller.KafkaController$SessionExpirationListener.handleNewSession(KafkaController.scala:1173)

        at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:735)

        at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)

2017-11-30 12:30:48,512 ERROR - [ZkClient-EventThread-1116-localhost:9026:] ~ 
Controller 1 epoch 2 initiated state change for partition [consumer_offsets,30] 
from OfflinePartition to OnlinePartition failed (Logging$class:103) 
kafka.common.NoReplicaOnlineException: No replica for partition 
[consumer_offsets,30] is alive. Live brokers are: [Set()], Assigned replicas 
are: [List(1)]

        at 
kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75)

        at 
kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:345)

        at 
kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:205)

        at 
kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120)

        at kafka.controll

On November 30th, 2017, 9:48 a.m. PST, Madhan Neethiraj wrote:

2017-11-30 12:30:48,060 WARN  - 
[NIOServerCxn.Factory:localhost/127.0.0.1:9026:] ~ caught end of stream 
exception (NIOServerCnxn:357)

EndOfStreamException: Unable to read additional data from client sessionid 
0x1600cdb55840001, likely client has closed socket

        at 
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)

        at 
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)

        at java.lang.Thread.run(Thread.java:748)



This error was caused by incorrect port number configuration in 
conf/atlas-application.properties:

  atlas.kafka.zookeeper.connect=localhost:9026



The fix is to replace 9026 with 2181 (i.e. use the zookeeper in embedded-hbase).



However, after this change startup of org.apache.atlas.kafka.KafkaNotification 
service seems to hang. I will look into this further.

On November 30th, 2017, 11:03 a.m. PST, Madhan Neethiraj wrote:

Use of port 9026 is indeed correct. No need to update the configuration.



WARN messagess about "connection refused" are specific to stand-alone dev-env 
deployment where Atlas uses an embedded Kafka & Zookeeper. Kafka & Zookeeper 
are started towards the end of initilization but before this is done, an 
attempt is made to connect to the Zookeeper - resulting in this WARN. This 
WARNing will be gone later after embedded Kafka and Zookeeper are started. Need 
to investigate from where the connection attempt is being made. But this issue 
shouldn't block Atlas being functional.



Another error, "Unable to read additional data from client sessionid 
0x1600cdb55840001, likely client has closed socket" - I think this might be due 
to low connect/session timeout values in conf/atlas-application.properties:

  atlas.kafka.zookeeper.session.timeout.ms=400

  atlas.kafka.zookeeper.connection.timeout.ms=200



Can you increase both to 60000 and try again?

I validated my patch again on a bare CentOS machine and it works, the steps I 
followed:



Environment



CentOS 6.8

Apache Maven 3.5.2

Java version: 1.8.0_151



1.  Clear your m2 cache: rm -rf ~/.m2/repository

2.  Make sure JAVA_HOME is set in your environment, echo $JAVA_HOME to confirm

3.  Export the following (or add it to your bashrc and source it) export 
MANAGE_LOCAL_SOLR=true export MANAGE_LOCAL_HBASE=true



Clone and Build Atlas (for Janus profile)



git clone https://github.com/apache/atlas.git -b master



mvn clean install -DskipTests -Pdist,embedded-hbase-solr



Copy the distribution to atlas binary directory and extract



mkdir /tmp/atlas_binary

cp -r distro/target/apache-atlas-1.0.0-SNAPSHOT-bin.tar.gz /tmp/atlas_binary/



cd /tmp/atlas_binary/

tar -zxvf apache-atlas-1.0.0-SNAPSHOT-bin.tar.gz



cd /tmp/atlas_binary/apache-atlas-1.0.0-SNAPSHOT

echo 
atlas.graphdb.backend=org.apache.atlas.repository.graphdb.janus.AtlasJanusGraphDatabase
 >> conf/atlas-application.properties



Start Atlas



./bin/atlas_start.py



It's going to take a while for Atlas to startup but it eventually gets there. 
The reason for the slowness is in the indexing stage (which is a different 
issue altogether).

But the basic functionalities work



Atlas UI



http://localhost:21000/


- Sarath


On November 28th, 2017, 6:32 p.m. PST, Sarath Subramanian wrote:
Review request for atlas, Apoorv Naik, Ashutosh Mestry, and Madhan Neethiraj.
By Sarath Subramanian.

Updated Nov. 28, 2017, 6:32 p.m.
Bugs: ATLAS-2287<https://issues.apache.org/jira/browse/ATLAS-2287>
Repository: atlas
Description

When Atlas is build using -Pdist profile, lucene jars are excluded during 
packaging of the war file. Since we are not shading graphdb module for janus 
profile, these jars are needed as run time dependency.

Titan's shaded jar includes the lucene libraries and hence were excluded during 
packaging of war to avoid duplicate dependencies.


Testing

validated building atlas distribution using both janus and titan0 profile. 
Atlas starts fine and basic functionalities working.



mvn clean install -DskipTests -Pdist,embedded-hbase-solr

mvn clean install -DskipTests -Pdist,embedded-hbase-solr -DGRAPH-PROVIDER=titan0


Diffs

  *   distro/pom.xml (eea256d8)
  *   pom.xml (3720c1f5)
  *   webapp/pom.xml (b4a96d36)

View Diff<https://reviews.apache.org/r/64141/diff/1/>

Re: Review Request 64141: [ATLAS-2287]: Include lucene libraries when building atlas distribution with Janus profile

Reply via email to