Re: Review Request 64141: [ATLAS-2287]: Include lucene libraries when building atlas distribution with Janus profile

Sarath Subramanian Thu, 30 Nov 2017 11:26:08 -0800


> On Nov. 30, 2017, 6:47 a.m., David Radley wrote:
> > pom.xml
> > Line 713 (original), 716 (patched)
> > <https://reviews.apache.org/r/64141/diff/1/?file=1903423#file1903423line716>
> >
> >     I could start the UI but had lots of Zookeeper exceptions and posts to 
> > create entities did not work.
> >     
> >     I got errors like this :
> >     2017-11-30 12:16:28,975 INFO  - [main:] ~ GraphTransaction intercept 
> > for 
> > org.apache.atlas.repository.store.graph.v1.AtlasEntityStoreV1.addClassifications
> >  (GraphTransactionAdvisor$1:41)
> >     2017-11-30 12:16:28,975 INFO  - [main:] ~ GraphTransaction intercept 
> > for 
> > org.apache.atlas.repository.store.graph.v1.AtlasEntityStoreV1.updateClassifications
> >  (GraphTransactionAdvisor$1:41)
> >     2017-11-30 12:16:28,976 INFO  - [main:] ~ GraphTransaction intercept 
> > for 
> > org.apache.atlas.repository.store.graph.v1.AtlasEntityStoreV1.addClassification
> >  (GraphTransactionAdvisor$1:41)
> >     2017-11-30 12:16:28,976 INFO  - [main:] ~ GraphTransaction intercept 
> > for 
> > org.apache.atlas.repository.store.graph.v1.AtlasEntityStoreV1.deleteClassifications
> >  (GraphTransactionAdvisor$1:41)
> >     2017-11-30 12:16:28,976 INFO  - [main:] ~ GraphTransaction intercept 
> > for 
> > org.apache.atlas.repository.store.graph.v1.AtlasEntityStoreV1.getClassification
> >  (GraphTransactionAdvisor$1:41)
> >     2017-11-30 12:16:29,021 WARN  - [main-SendThread(localhost:9026):] ~ 
> > Session 0x0 for server null, unexpected error, closing socket connection 
> > and attempting reconnect (ClientCnxn$SendThread:1102)
> >     java.net.ConnectException: Connection refused
> >             at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> >             at 
> > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> >             at 
> > org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
> >             at 
> > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
> >     2017-11-30 12:16:29,029 INFO  - [main:] ~ Starting service 
> > org.apache.atlas.web.service.ActiveInstanceElectorService (Services:53)
> >     2017-11-30 12:16:29,030 INFO  - [main:] ~ HA is not enabled, no need to 
> > start leader election service (ActiveInstanceElectorService:96)
> >     2017-11-30 12:16:29,030 INFO  - [main:] ~ Starting service 
> > org.apache.atlas.kafka.KafkaNotification (Services:53)
> >     2017-11
> >     
> >     
> >     and 
> >     017-11-30 12:16:31,194 INFO  - [main:] ~ Adding cross-site request 
> > forgery (CSRF) protection (AtlasCSRFPreventionFilter:98)
> >     2017-11-30 12:16:31,646 INFO  - [main:] ~ AuditFilter initialization 
> > started (AuditFilter:57)
> >     2017-11-30 12:30:47,004 WARN  - 
> > [NIOServerCxn.Factory:localhost/127.0.0.1:9026:] ~ caught end of stream 
> > exception (NIOServerCnxn:357)
> >     EndOfStreamException: Unable to read additional data from client 
> > sessionid 0x1600cdb55840000, likely client has closed socket
> >             at 
> > org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> >             at 
> > org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
> >             at java.lang.Thread.run(Thread.java:748)
> >     2017-11-30 12:30:47,109 WARN  - [zkCallback-3-thread-2:] ~ Watcher 
> > org.apache.solr.common.cloud.ConnectionManager@5232c71 name: 
> > ZooKeeperConnection Watcher:localhost:2181 got event WatchedEvent 
> > state:Disconnected type:None path:null path: null type: None 
> > (ConnectionManager:108)
> >     2017-11-30 12:30:47,111 WARN  - [zkCallback-3-thread-2:] ~ zkClient has 
> > disconnected (ConnectionManager:184)
> >     2017-11-30 12:30:48,060 WARN  - 
> > [NIOServerCxn.Factory:localhost/127.0.0.1:9026:] ~ caught end of stream 
> > exception (NIOServerCnxn:357)
> >     EndOfStreamException: Unable to read additional data from client 
> > sessionid 0x1600cdb55840001, likely client has closed socket
> >             at 
> > org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> >             at 
> > org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
> >             at java.lang.Thread.run(Thread.java:748)
> >     2017-11-30 12:30:48,255 WARN  - [SyncThread:0:] ~ fsync-ing the write 
> > ahead log in SyncThread:0 took 1246ms which will adversely effect operation 
> > latency. See the ZooKeeper troubleshooting guide (FileTxnLog:334)
> >     2017-11-30 12:30:48,508 ERROR - 
> > [ZkClient-EventThread-1116-localhost:9026:] ~ Controller 1 epoch 2 
> > initiated state change for partition [__consumer_offsets,19] from 
> > OfflinePartition to OnlinePartition failed (Logging$class:103)
> >     kafka.common.NoReplicaOnlineException: No replica for partition 
> > [__consumer_offsets,19] is alive. Live brokers are: [Set()], Assigned 
> > replicas are: [List(1)]
> >             at 
> > kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75)
> >             at 
> > kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:345)
> >             at 
> > kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:205)
> >             at 
> > kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120)
> >             at 
> > kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:117)
> >             at 
> > scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
> >             at 
> > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
> >             at 
> > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
> >             at 
> > scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
> >             at 
> > scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
> >             at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
> >             at 
> > scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
> >             at 
> > kafka.controller.PartitionStateMachine.triggerOnlinePartitionStateChange(PartitionStateMachine.scala:117)
> >             at 
> > kafka.controller.PartitionStateMachine.startup(PartitionStateMachine.scala:70)
> >             at 
> > kafka.controller.KafkaController.onControllerFailover(KafkaController.scala:335)
> >             at 
> > kafka.controller.KafkaController$$anonfun$1.apply$mcV$sp(KafkaController.scala:166)
> >             at 
> > kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:84)
> >             at 
> > kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSession$1.apply$mcZ$sp(KafkaController.scala:1175)
> >             at 
> > kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSession$1.apply(KafkaController.scala:1173)
> >             at 
> > kafka.controller.KafkaController$SessionExpirationListener$$anonfun$handleNewSession$1.apply(KafkaController.scala:1173)
> >             at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:231)
> >             at 
> > kafka.controller.KafkaController$SessionExpirationListener.handleNewSession(KafkaController.scala:1173)
> >             at org.I0Itec.zkclient.ZkClient$6.run(ZkClient.java:735)
> >             at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
> >     2017-11-30 12:30:48,512 ERROR - 
> > [ZkClient-EventThread-1116-localhost:9026:] ~ Controller 1 epoch 2 
> > initiated state change for partition [__consumer_offsets,30] from 
> > OfflinePartition to OnlinePartition failed (Logging$class:103)
> >     kafka.common.NoReplicaOnlineException: No replica for partition 
> > [__consumer_offsets,30] is alive. Live brokers are: [Set()], Assigned 
> > replicas are: [List(1)]
> >             at 
> > kafka.controller.OfflinePartitionLeaderSelector.selectLeader(PartitionLeaderSelector.scala:75)
> >             at 
> > kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:345)
> >             at 
> > kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:205)
> >             at 
> > kafka.controller.PartitionStateMachine$$anonfun$triggerOnlinePartitionStateChange$3.apply(PartitionStateMachine.scala:120)
> >             at kafka.controll
> 
> Madhan Neethiraj wrote:
>     2017-11-30 12:30:48,060 WARN  - 
> [NIOServerCxn.Factory:localhost/127.0.0.1:9026:] ~ caught end of stream 
> exception (NIOServerCnxn:357)
>     EndOfStreamException: Unable to read additional data from client 
> sessionid 0x1600cdb55840001, likely client has closed socket
>             at 
> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
>             at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
>             at java.lang.Thread.run(Thread.java:748)
>             
>     This error was caused by incorrect port number configuration in 
> conf/atlas-application.properties:
>       atlas.kafka.zookeeper.connect=localhost:9026
>     
>     The fix is to replace 9026 with 2181 (i.e. use the zookeeper in 
> embedded-hbase).
>     
>     However, after this change startup of 
> org.apache.atlas.kafka.KafkaNotification service seems to hang. I will look 
> into this further.
> 
> Madhan Neethiraj wrote:
>     Use of port 9026 is indeed correct. No need to update the configuration.
>     
>     WARN messagess about "connection refused" are specific to stand-alone 
> dev-env deployment where Atlas uses an embedded Kafka & Zookeeper. Kafka & 
> Zookeeper are started towards the end of initilization but before this is 
> done, an attempt is made to connect to the Zookeeper - resulting in this 
> WARN. This WARNing will be gone later after embedded Kafka and Zookeeper are 
> started. Need to investigate from where the connection attempt is being made. 
> But this issue shouldn't block Atlas being functional.
>     
>     Another error, "Unable to read additional data from client sessionid 
> 0x1600cdb55840001, likely client has closed socket" - I think this might be 
> due to low connect/session timeout values in 
> conf/atlas-application.properties:
>       atlas.kafka.zookeeper.session.timeout.ms=400
>       atlas.kafka.zookeeper.connection.timeout.ms=200
>     
>     Can you increase both to 60000 and try again?


I validated my patch again on a bare CentOS machine and it works, the steps I 
followed:

# Environment
CentOS 6.8
Apache Maven 3.5.2
Java version: 1.8.0_151

1. Clear your m2 cache:  rm -rf ~/.m2/repository
2. Make sure JAVA_HOME is set in your environment, echo $JAVA_HOME to confirm
3. Export the following (or add it to your bashrc and source it)
   export MANAGE_LOCAL_SOLR=true
   export MANAGE_LOCAL_HBASE=true

# Clone and Build Atlas (for Janus profile)

git clone https://github.com/apache/atlas.git -b master

mvn clean install -DskipTests -Pdist,embedded-hbase-solr

# Copy the distribution to atlas binary directory and extract

mkdir /tmp/atlas_binary
cp -r distro/target/apache-atlas-1.0.0-SNAPSHOT-bin.tar.gz /tmp/atlas_binary/

cd /tmp/atlas_binary/
tar -zxvf apache-atlas-1.0.0-SNAPSHOT-bin.tar.gz

cd /tmp/atlas_binary/apache-atlas-1.0.0-SNAPSHOT
echo 
atlas.graphdb.backend=org.apache.atlas.repository.graphdb.janus.AtlasJanusGraphDatabase
 >> conf/atlas-application.properties

# Start Atlas

./bin/atlas_start.py

It's going to take a while for Atlas to startup but it eventually gets there. 
The reason for the slowness is in the indexing stage (which is a different 
issue altogether).
But the basic functionalities work

# Atlas UI
http://localhost:21000/


- Sarath


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64141/#review192267
-----------------------------------------------------------


On Nov. 28, 2017, 6:32 p.m., Sarath Subramanian wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64141/
> -----------------------------------------------------------
> 
> (Updated Nov. 28, 2017, 6:32 p.m.)
> 
> 
> Review request for atlas, Apoorv Naik, Ashutosh Mestry, and Madhan Neethiraj.
> 
> 
> Bugs: ATLAS-2287
>     https://issues.apache.org/jira/browse/ATLAS-2287
> 
> 
> Repository: atlas
> 
> 
> Description
> -------
> 
> When Atlas is build using -Pdist profile, lucene jars are excluded during 
> packaging of the war file. Since we are not shading graphdb module for janus 
> profile, these jars are needed as run time dependency.
> Titan's shaded jar includes the lucene libraries and hence were excluded 
> during packaging of war to avoid duplicate dependencies.
> 
> 
> Diffs
> -----
> 
>   distro/pom.xml eea256d8 
>   pom.xml 3720c1f5 
>   webapp/pom.xml b4a96d36 
> 
> 
> Diff: https://reviews.apache.org/r/64141/diff/1/
> 
> 
> Testing
> -------
> 
> validated building atlas distribution using both janus and titan0 profile. 
> Atlas starts fine and basic functionalities working.
> 
> mvn clean install -DskipTests -Pdist,embedded-hbase-solr
> mvn clean install -DskipTests -Pdist,embedded-hbase-solr 
> -DGRAPH-PROVIDER=titan0
> 
> 
> Thanks,
> 
> Sarath Subramanian
> 
>

Re: Review Request 64141: [ATLAS-2287]: Include lucene libraries when building atlas distribution with Janus profile

Reply via email to