but the way, we need to do something with those OOMs and "unable to
create new native thread" in ITs. It's quite strange to see in 10
lines test such kind of failures. Especially when queries for table
with less than 10 rows generate over 2500 threads. Does anybody know
whether it's zk related issue?

On Fri, Apr 29, 2016 at 7:51 AM, James Taylor <[email protected]> wrote:
> A patch would be much appreciated, Sergey.
>
> On Fri, Apr 29, 2016 at 3:26 AM, Sergey Soldatov <[email protected]>
> wrote:
>
>> As for flume module - flume-ng is coming with commons-io 2.1 while
>> hadoop & hbase require org.apache.commons.io.Charsets which was
>> introduced in 2.3. Easy way is to move dependency on flume-ng after
>> the dependencies on hbase/hadoop.
>>
>> The last thing about ConcurrentHashMap - it definitely means that the
>> code was compiled with 1.8 since 1.7 returns a simple Set while 1.8
>> returns KeySetView
>>
>>
>>
>> On Thu, Apr 28, 2016 at 4:08 PM, Josh Elser <[email protected]> wrote:
>> > *tl;dr*
>> >
>> > * I'm removing ubuntu-us1 from all pools
>> > * Phoenix-Flume ITs look busted
>> > * UpsertValuesIT looks busted
>> > * Something is weirdly wrong with Phoenix-4.x-HBase-1.1 in its entirety.
>> >
>> > Details below...
>> >
>> > It looks like we have a bunch of different reasons for the failures.
>> > Starting with Phoenix-master:
>> >
>> >>>>
>> > org.apache.phoenix.schema.NewerTableAlreadyExistsException: ERROR 1013
>> > (42M04): Table already exists. tableName=T
>> >         at
>> >
>> org.apache.phoenix.end2end.UpsertValuesIT.testBatchedUpsert(UpsertValuesIT.java:476)
>> > <<<
>> >
>> > I've seen this coming out of a few different tests (I think I've also run
>> > into it on my own, but that's another thing)
>> >
>> > Some of them look like the Jenkins build host is just over-taxed:
>> >
>> >>>>
>> > Java HotSpot(TM) 64-Bit Server VM warning: INFO:
>> > os::commit_memory(0x00000007e7600000, 331350016, 0) failed; error='Cannot
>> > allocate memory' (errno=12)
>> > #
>> > # There is insufficient memory for the Java Runtime Environment to
>> continue.
>> > # Native memory allocation (malloc) failed to allocate 331350016 bytes
>> for
>> > committing reserved memory.
>> > # An error report file with more information is saved as:
>> > #
>> >
>> /home/jenkins/jenkins-slave/workspace/Phoenix-master/phoenix-core/hs_err_pid26454.log
>> > Java HotSpot(TM) 64-Bit Server VM warning: INFO:
>> > os::commit_memory(0x00000007ea600000, 273678336, 0) failed; error='Cannot
>> > allocate memory' (errno=12)
>> > #
>> > <<<
>> >
>> > and
>> >
>> >>>>
>> > -------------------------------------------------------
>> >  T E S T S
>> > -------------------------------------------------------
>> > Build step 'Invoke top-level Maven targets' marked build as failure
>> > <<<
>> >
>> > Both of these issues are limited to the host "ubuntu-us1". Let me just
>> > remove him from the pool (on Phoenix-master) and see if that helps at
>> all.
>> >
>> > I also see some sporadic failures of some Flume tests
>> >
>> >>>>
>> > Running org.apache.phoenix.flume.PhoenixSinkIT
>> > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.004 sec
>> > <<< FAILURE! - in org.apache.phoenix.flume.PhoenixSinkIT
>> > org.apache.phoenix.flume.PhoenixSinkIT  Time elapsed: 0.004 sec  <<<
>> ERROR!
>> > java.lang.RuntimeException: java.io.IOException: Failed to save in any
>> > storage directories while saving namespace.
>> > Caused by: java.io.IOException: Failed to save in any storage directories
>> > while saving namespace.
>> >
>> > Running org.apache.phoenix.flume.RegexEventSerializerIT
>> > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.005 sec
>> > <<< FAILURE! - in org.apache.phoenix.flume.RegexEventSerializerIT
>> > org.apache.phoenix.flume.RegexEventSerializerIT  Time elapsed: 0.004 sec
>> > <<< ERROR!
>> > java.lang.RuntimeException: java.io.IOException: Failed to save in any
>> > storage directories while saving namespace.
>> > Caused by: java.io.IOException: Failed to save in any storage directories
>> > while saving namespace.
>> > <<<
>> >
>> > I'm not sure what the error message means at a glance.
>> >
>> > For Phoenix-HBase-1.1:
>> >
>> >>>>
>> > org.apache.hadoop.hbase.DoNotRetryIOException:
>> java.lang.NoSuchMethodError:
>> >
>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
>> >         at
>> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2156)
>> >         at
>> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104)
>> >         at
>> >
>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
>> >         at
>> > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
>> >         at java.lang.Thread.run(Thread.java:745)
>> > Caused by: java.lang.NoSuchMethodError:
>> >
>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
>> >         at
>> >
>> org.apache.hadoop.hbase.master.ServerManager.findServerWithSameHostnamePortWithLock(ServerManager.java:432)
>> >         at
>> >
>> org.apache.hadoop.hbase.master.ServerManager.checkAndRecordNewServer(ServerManager.java:346)
>> >         at
>> >
>> org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:264)
>> >         at
>> >
>> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:318)
>> >         at
>> >
>> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)
>> >         at
>> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2117)
>> >         ... 4 more
>> > 2016-04-28 22:54:35,497 WARN  [RS:0;hemera:41302]
>> > org.apache.hadoop.hbase.regionserver.HRegionServer(2279): error telling
>> > master we are up
>> > com.google.protobuf.ServiceException:
>> >
>> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.DoNotRetryIOException):
>> > org.apache.hadoop.hbase.DoNotRetryIOException:
>> java.lang.NoSuchMethodError:
>> >
>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
>> >         at
>> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2156)
>> >         at
>> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104)
>> >         at
>> >
>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
>> >         at
>> > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
>> >         at java.lang.Thread.run(Thread.java:745)
>> > Caused by: java.lang.NoSuchMethodError:
>> >
>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
>> >         at
>> >
>> org.apache.hadoop.hbase.master.ServerManager.findServerWithSameHostnamePortWithLock(ServerManager.java:432)
>> >         at
>> >
>> org.apache.hadoop.hbase.master.ServerManager.checkAndRecordNewServer(ServerManager.java:346)
>> >         at
>> >
>> org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:264)
>> >         at
>> >
>> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:318)
>> >         at
>> >
>> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)
>> >         at
>> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2117)
>> >         ... 4 more
>> >
>> >         at
>> >
>> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227)
>> >         at
>> >
>> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:318)
>> >         at
>> >
>> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerStartup(RegionServerStatusProtos.java:8982)
>> >         at
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:2269)
>> >         at
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:893)
>> >         at
>> >
>> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:156)
>> >         at
>> >
>> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:108)
>> >         at
>> >
>> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:140)
>> >         at java.security.AccessController.doPrivileged(Native Method)
>> >         at javax.security.auth.Subject.doAs(Subject.java:356)
>> >         at
>> >
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637)
>> >         at
>> >
>> org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:307)
>> >         at
>> >
>> org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:138)
>> >         at java.lang.Thread.run(Thread.java:745)
>> > Caused by:
>> >
>> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.DoNotRetryIOException):
>> > org.apache.hadoop.hbase.DoNotRetryIOException:
>> java.lang.NoSuchMethodError:
>> >
>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
>> >         at
>> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2156)
>> >         at
>> org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:104)
>> >         at
>> >
>> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
>> >         at
>> > org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
>> >         at java.lang.Thread.run(Thread.java:745)
>> > Caused by: java.lang.NoSuchMethodError:
>> >
>> java.util.concurrent.ConcurrentHashMap.keySet()Ljava/util/concurrent/ConcurrentHashMap$KeySetView;
>> >         at
>> >
>> org.apache.hadoop.hbase.master.ServerManager.findServerWithSameHostnamePortWithLock(ServerManager.java:432)
>> >         at
>> >
>> org.apache.hadoop.hbase.master.ServerManager.checkAndRecordNewServer(ServerManager.java:346)
>> >         at
>> >
>> org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:264)
>> >         at
>> >
>> org.apache.hadoop.hbase.master.MasterRpcServices.regionServerStartup(MasterRpcServices.java:318)
>> >         at
>> >
>> org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:8615)
>> >         at
>> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2117)
>> >         ... 4 more
>> >
>> >         at
>> > org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1235)
>> >         at
>> >
>> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:217)
>> >         ... 13 more
>> > <<<
>> >
>> > We have hit-or-miss on this error message which keeps hbase:namespace
>> from
>> > being assigned (as the RS's can never report into the hmaster). This is
>> > happening across a couple of the nodes (ubuntu-[3,4,6]). I had tried to
>> look
>> > into this one over the weekend (and was lead to a JDK8 built jar,
>> running on
>> > JDK7), but if I look at META-INF/MANIFEST.mf in the
>> hbase-server-1.1.3.jar
>> > from central, I see it was built with 1.7.0_80 (which I think means the
>> JDK8
>> > thought is a red-herring). I'm really confused by this one, actually.
>> > Something must be amiss here.
>> >
>> > For Phoenix-HBase-1.0:
>> >
>> > We see the same Phoenix-Flume failures, UpsertValuesIT failure, and
>> timeouts
>> > on ubuntu-us1. There is one crash on H10, but that might just be bad
>> luck.
>> >
>> > For Phoenix-HBase-0.98:
>> >
>> > Same UpsertValuesIT failure and failures on ubuntu-us1.
>> >
>> >
>> > James Taylor wrote:
>> >>
>> >> Anyone know why our Jenkins builds keep failing? Is it environmental and
>> >> is
>> >> there anything we can do about it?
>> >>
>> >> Thanks,
>> >> James
>> >>
>> >
>>

Reply via email to