Re: HBase errors prevent run-all-tests.sh

Bharath Vissapragada Sun, 24 Jul 2016 20:04:17 -0700

Based on

16/07/24 18:36:08 WARN hdfs.BlockReaderFactory: I/O error constructing
remote block reader.
java.net.SocketException: Too many open files


16/07/24 18:36:08 WARN hdfs.DFSClient: Failed to connect to
/127.0.0.1:31000 for block, add to deadNodes and continue.
java.net.SocketException: Too many open files

I'm guessing your hdfs instance might be overloaded (check the NN/DN logs).
HMaster is unable to connect to NN while opening regions and hence throwing
the error.

On Mon, Jul 25, 2016 at 8:05 AM, Jim Apple <jbap...@cloudera.com> wrote:

> Several thousand lines of things like
>
> WARN shortcircuit.ShortCircuitCache: ShortCircuitCache(0x419c7df4):
> failed to load 1073764575_BP-1490185442-127.0.0.1-1456935654337
>
> java.lang.NullPointerException at
>
> org.apache.hadoop.hdfs.shortcircuit.ShortCircuitReplica.<init>(ShortCircuitReplica.java:126)
> ...
>
> 16/07/24 18:36:08 WARN hdfs.BlockReaderFactory:
>
> BlockReaderFactory(fileName=/hbase/MasterProcWALs/state-00000000000000003172.log,
> block=BP-1490185442-127.0.0.1-1456935654337:blk_1073764629_23805):
> error creating ShortCircuitReplica.
>
> java.io.EOFException: unexpected EOF while reading metadata file header
>
> 16/07/24 18:36:08 WARN hdfs.BlockReaderFactory: I/O error constructing
> remote block reader.
> java.net.SocketException: Too many open files
>
> 16/07/24 18:36:08 WARN hdfs.DFSClient: Failed to connect to
> /127.0.0.1:31000 for block, add to deadNodes and continue.
> java.net.SocketException: Too many open files
>
> 16/07/24 18:36:08 INFO hdfs.DFSClient: Could not obtain
> BP-1490185442-127.0.0.1-1456935654337:blk_1073764629_23805 from any
> node: java.io.IOException: No live nodes contain block
> BP-1490185442-127.0.0.1-1456935654337:blk_1073764629_23805 a
> fter checking nodes =
> [DatanodeInfoWithStorage[127.0.0.1:31000
> ,DS-0232508a-5512-4827-bcaf-c922f1e65eb1,DISK]],
> ignoredNodes = null No live nodes contain current block Block
> locations: DatanodeInfoWithStorage[127.0.0.1:31000,DS-0232508a-551
> 2-4827-bcaf-c922f1e65eb1,DISK] Dead nodes:
> DatanodeInfoWithStorage[127.0.0.1:31000
> ,DS-0232508a-5512-4827-bcaf-c922f1e65eb1,DISK].
> Will get new block locations from namenode and retry...
> 16/07/24 18:36:08 WARN hdfs.DFSClient: DFS chooseDataNode: got # 1
> IOException, will wait for 2772.7114628272548 msec.
> 16/07/24 18:36:11 WARN hdfs.BlockReaderFactory:
>
> BlockReaderFactory(fileName=/hbase/MasterProcWALs/state-00000000000000003172.log,
> block=BP-1490185442-127.0.0.1-1456935654337:blk_1073764629_23805):
> error creating ShortCircuitReplica.
> java.io.IOException: Illegal seek
>         at sun.nio.ch.FileDispatcherImpl.pread0(Native Method)
>
> On Sun, Jul 24, 2016 at 7:24 PM, Bharath Vissapragada
> <bhara...@cloudera.com> wrote:
> > Do you see something in the HMaster log? From the error it looks like the
> > Hbase master hasn't started properly for some reason.
> >
> > On Mon, Jul 25, 2016 at 6:08 AM, Jim Apple <jbap...@cloudera.com> wrote:
> >
> >> I tried reloading the data with
> >>
> >> ./bin/load-data.py --workloads functional-query
> >>
> >> but that gave errors like
> >>
> >> Executing HBase Command: hbase shell
> >> load-functional-query-core-hbase-generated.create
> >> 16/07/24 17:19:39 INFO Configuration.deprecation: hadoop.native.lib is
> >> deprecated. Instead, use io.native.lib.available
> >> SLF4J: Class path contains multiple SLF4J bindings.
> >> SLF4J: Found binding in
> >>
> >>
> [jar:file:/opt/Impala-Toolchain/cdh_components/hbase-1.2.0-cdh5.9.0-SNAPSHOT/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> >> SLF4J: Found binding in
> >>
> >>
> [jar:file:/opt/Impala-Toolchain/cdh_components/hadoop-2.6.0-cdh5.9.0-SNAPSHOT/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> >> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> >> explanation.
> >> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> >>
> >> ERROR: Can't get the locations
> >>
> >> Here is some help for this command:
> >> Start disable of named table:
> >>   hbase> disable 't1'
> >>   hbase> disable 'ns1:t1'
> >>
> >> ERROR: Can't get master address from ZooKeeper; znode data == null
> >>
> >> On Sun, Jul 24, 2016 at 5:12 PM, Jim Apple <jbap...@cloudera.com>
> wrote:
> >> > I'm having trouble with my HBase environment, and it's preventing me
> >> > from running bin/run-all-tests.sh. I am on Ubuntu 14.04. I have tried
> >> > this with a clean build, and I have tried unset LD_LIBRARY_PATH &&
> >> > bin/impala-config.sh, and I have tried ./testdata/bin/run-all.sh
> >> >
> >> > Here is the error I get from compute stats:
> >> > (./testdata/bin/compute-table-stats.sh)
> >> >
> >> > Executing: compute stats functional_hbase.alltypessmall
> >> >   -> Error: ImpalaBeeswaxException:
> >> >  Query aborted:RuntimeException: couldn't retrieve HBase table
> >> > (functional_hbase.alltypessmall) info:
> >> > Unable to find region for  in functional_hbase.alltypessmall after 35
> >> tries.
> >> > CAUSED BY: NoServerForRegionException: Unable to find region for  in
> >> > functional_hbase.alltypessmall after 35 tries.
> >> >
> >> > Here is a snippet of the error in ./testdata/bin/split-hbase.sh
> >> >
> >> > Sun Jul 24 15:24:52 PDT 2016,
> >> > RpcRetryingCaller{globalStartTime=1469399003900, pause=100,
> >> > retries=31}, org.apache.hadoop.hbase.MasterNotRunningException:
> >> > com.google.protobuf.ServiceException:
> >> >
> >>
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.ipc.ServerNotRunningYetException):
> >> > org.apache.hadoop.hbase.ipc.ServerNotRunningYetException: Server is
> >> > not running yet
> >> >
> >> > I tried ./bin/create_testdata.sh, but that exited almost immediately
> >> > with no error.
> >> >
> >> > Has anyone else seen and solved this before?
> >>
> >
> >
> >
> > --
> > Thanks,
> > Bharath
>



-- 
Thanks,
Bharath

Re: HBase errors prevent run-all-tests.sh

Reply via email to