RE: Trafodion release2.0 Daily Test Result - 23 - Still Failing

Steve Varnau Thu, 26 May 2016 12:13:12 -0700

Tested out this theory. I tested PR502 (Selva's memory fix for TEST018)
against hive test, and it still fails:
https://jenkins.esgyn.com/job/Requested-Test/57/


Then I changed jenkins config to use the previous VM image and ran it again,
and it passed:
https://jenkins.esgyn.com/job/Requested-Test/59/

The only intentional change between those VM images was limiting the range
of ephemeral ports.
Perhaps some unintentional change also got in, otherwise I'm stumped how
that would cause this problem.

--Steve


> -----Original Message-----
> From: Steve Varnau [mailto:[email protected]]
> Sent: Thursday, May 26, 2016 9:11 AM
> To: '[email protected]'
> <[email protected]>
> Subject: RE: Trafodion release2.0 Daily Test Result - 23 - Still Failing
>
> I think the error usually looks like that or more often it hangs and the
> test times
> out.
>
> The odd thing is that it started failing on both branches on the same day.
> There
> were changes on master branch, but none on the release2.0 branch.  That is
> what makes me think the trigger was environmental rather than a code
> change.
>
> I guess I could switch jenkins back to using the previous VM image to see
> if it
> goes away.
>
> --Steve
>
>
> > -----Original Message-----
> > From: Sandhya Sundaresan [mailto:[email protected]]
> > Sent: Thursday, May 26, 2016 9:04 AM
> > To: [email protected]
> > Subject: RE: Trafodion release2.0 Daily Test Result - 23 - Still Failing
> >
> >  RE: Trafodion release2.0 Daily Test Result - 23 - Still Failing
> >
> > Hi Steve,
> >
> >    The error today is this :
> >
> >  *** ERROR[8448] Unable to access Hbase interface. Call to
> > ExpHbaseInterface::scanOpen returned error HBASE_OPEN_ERROR(-704).
> > Cause:
> >
> > > java.lang.Exception: Cannot create Table Snapshot Scanner
> >
> > > org.TRAFODION.sql.HTableClient.startScan(HTableClient.java:1003)
> >
> > We have seen this when there is  java memory pressure in the past.
> >
> > A few days back this same snapshot scan creation failed with this : I
> > wonder if anyone can see  pattern here or knows the causes of either of
> > these.
> >
> > >>--snapshot
> >
> > >>execute snp;
> >
> > *** ERROR[8448] Unable to access Hbase interface. Call to
> > ExpHbaseInterface::scanOpen returned error HBASE_OPEN_ERROR(-704).
> > Cause:
> >
> > java.io.IOException: java.util.concurrent.ExecutionException:
> > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
> >
> /bulkload/20160520102824/TRAFODION.HBASE.CUSTOMER_ADDRESS_SNAP11
> > 1/6695c6f9-4bb5-4ad5-893b-
> >
> adf07fc8a4b9/data/default/TRAFODION.HBASE.CUSTOMER_ADDRESS/7143c21
> > b40a7bef21768685f7dc18e1c/.regioninfo
> > could only be replicated to 0 nodes instead of minReplication (=1).
> > There
> > are 1 datanode(s) running and no node(s) are excluded in this operation.
> >
> >         at
> >
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget
> > 4NewBlock(BlockManager.java:1541)
> >
> >         at
> >
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(F
> > SNamesystem.java:3289)
> >
> >         at
> >
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(Nam
> > eNodeRpcServer.java:668)
> >
> >         at
> >
> org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientPro
> > tocol.addBlock(AuthorizationProviderProxyClientProtocol.java:212)
> >
> >         at
> >
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTransla
> > torPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:483)
> >
> >         at
> >
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientN
> >
> amenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> >
> >         at
> >
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Pro
> > tobufRpcEngine.java:619)
> >
> >         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
> >
> >         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
> >
> >         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)
> >
> >         at java.security.AccessController.doPrivileged(Native Method)
> >
> >         at javax.security.auth.Subject.doAs(Subject.java:415)
> >
> >         at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation
> > .java:1671)
> >
> >         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038)
> >
> >
> org.apache.hadoop.hbase.util.ModifyRegionUtils.createRegions(ModifyRegionU
> > tils.java:162)
> >
> >
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.cloneHdfsRegions(R
> > estoreSnapshotHelper.java:561)
> >
> >
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.restoreHdfsRegions
> > (RestoreSnapshotHelper.java:237)
> >
> >
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.restoreHdfsRegions
> > (RestoreSnapshotHelper.java:159)
> >
> >
> org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.copySnapshotForSca
> > nner(RestoreSnapshotHelper.java:812)
> >
> >
> org.apache.hadoop.hbase.client.TableSnapshotScanner.init(TableSnapshotScann
> > er.java:156)
> >
> >
> org.apache.hadoop.hbase.client.TableSnapshotScanner.<init>(TableSnapshotSca
> > nner.java:124)
> >
> >
> org.apache.hadoop.hbase.client.TableSnapshotScanner.<init>(TableSnapshotSca
> > nner.java:101)
> >
> >
> org.trafodion.sql.HTableClient$SnapshotScanHelper.createTableSnapshotScann
> > er(HTableClient.java:222)
> >
> > org.trafodion.sql.HTableClient.startScan(HTableClient.java:1009)
> >
> > .
> >
> > --- 0 row(s) selected.
> >
> > >>log;
> >
> > Sandhya
> >
> > -----Original Message-----
> > From: Steve Varnau [mailto:[email protected]
> > <[email protected]>]
> > Sent: Thursday, May 26, 2016 8:49 AM
> > To: [email protected]
> > Subject: RE: Trafodion release2.0 Daily Test Result - 23 - Still Failing
> >
> > This hive regression behavior is still puzzling, however, I just
> > realized
> > one thing that did change just before it started failing and is a test
> > environment change common to both branches.  The VM image for cloudera
> > was
> > updated to set a smaller ephemeral port range to reduce chance of port
> > conflict that was occasionally impacting HBase.
> >
> > The range was set to 51000 - 59999, to avoid default port numbers that
> > Cloudera distro uses.
> >
> > So how could this possibly be causing disaster in hive/TEST018?   I have
> > no
> >
> > idea.
> >
> > --Steve
> >
> > > -----Original Message-----
> >
> > > From: [email protected] [mailto:[email protected]
> > <[email protected]>]
> >
> > > Sent: Thursday, May 26, 2016 1:36 AM
> >
> > > To: [email protected]
> >
> > > Subject: Trafodion release2.0 Daily Test Result - 23 - Still Failing
> >
> > >
> >
> > > Daily Automated Testing release2.0
> >
> > >
> >
> > > Jenkins Job:
> > > https://jenkins.esgyn.com/job/Check-Daily-release2.0/23/
> >
> > > Archived Logs: http://traf-testlogs.esgyn.com/Daily-release2.0/23
> >
> > > Bld Downloads: http://traf-builds.esgyn.com
> >
> > >
> >
> > > Changes since previous daily build:
> >
> > > No changes
> >
> > >
> >
> > >
> >
> > > Test Job Results:
> >
> > >
> >
> > > FAILURE core-regress-hive-cdh (55 min) SUCCESS build-release2.0-debug
> >
> > > (24 min) SUCCESS build-release2.0-release (28 min) SUCCESS
> >
> > > core-regress-charsets-cdh (28 min) SUCCESS core-regress-charsets-hdp
> >
> > > (41 min) SUCCESS core-regress-compGeneral-cdh (36 min) SUCCESS
> >
> > > core-regress-compGeneral-hdp (45 min) SUCCESS core-regress-core-cdh
> >
> > > (39 min) SUCCESS core-regress-core-hdp (1 hr 10 min) SUCCESS
> >
> > > core-regress-executor-cdh (56 min) SUCCESS core-regress-executor-hdp
> >
> > > (1 hr 25 min) SUCCESS core-regress-fullstack2-cdh (13 min) SUCCESS
> >
> > > core-regress-fullstack2-hdp (14 min) SUCCESS core-regress-hive-hdp (53
> >
> > > min) SUCCESS core-regress-privs1-cdh (39 min) SUCCESS
> >
> > > core-regress-privs1-hdp (59 min) SUCCESS core-regress-privs2-cdh (41
> >
> > > min) SUCCESS core-regress-privs2-hdp (54 min) SUCCESS
> >
> > > core-regress-qat-cdh (16 min) SUCCESS core-regress-qat-hdp (21 min)
> >
> > > SUCCESS core-regress-seabase-cdh (57 min) SUCCESS
> >
> > > core-regress-seabase-hdp (1 hr 16 min) SUCCESS core-regress-udr-cdh
> >
> > > (28 min) SUCCESS core-regress-udr-hdp (31 min) SUCCESS jdbc_test-cdh
> >
> > > (22 min) SUCCESS jdbc_test-hdp (40 min) SUCCESS phoenix_part1_T2-cdh
> >
> > > (56 min) SUCCESS phoenix_part1_T2-hdp (1 hr 17 min) SUCCESS
> >
> > > phoenix_part1_T4-cdh (46 min) SUCCESS phoenix_part1_T4-hdp (57 min)
> >
> > > SUCCESS phoenix_part2_T2-cdh (53 min) SUCCESS phoenix_part2_T2-hdp (1
> >
> > > hr 25 min) SUCCESS phoenix_part2_T4-cdh (44 min) SUCCESS
> >
> > > phoenix_part2_T4-hdp (1 hr 0 min) SUCCESS pyodbc_test-cdh (11 min)
> >
> > > SUCCESS pyodbc_test-hdp (23 min)

RE: Trafodion release2.0 Daily Test Result - 23 - Still Failing

Reply via email to