Re: Exceptions were shown when 5 billions of records were being populated into HBase on a single node

Xueling Shu Fri, 18 Dec 2009 12:41:17 -0800

Hi St.Ack:

I decided to start trying to populate a small data set (33,000 rows) into a
single node. Here are the issues:


1. The "connection refused" exception in my original post happened again
right after 33,000 rows were uploaded. There is still plenty of disk space
(around 2.5T) left.
2. I could not see the data file under the default location
"/tmp/hbase-{user}/hbase/". The directory representing the table exists with
an encoded region name as the sub direcotry. But nothing was under the sub
directory except for ".reigioninfo" and "..regioninfo.crc".  However I could
do the scan over the table in hbase shell. Wondering where the data file is
actually stored?

Do I need to change any configuration? Here is my hbase-site.xml:

<configuration>
   <property>
    <name>mapred.job.reuse.jvm.num.tasks</name>
    <value>-1</value>
    <description>How many tasks to run per jvm. If -1 then no limit at
all.</description>
  </property>
  <property>
    <name>dfs.datanode.max.xcievers</name>
    <value>2048</value>
  </property>
  <property>
    <name>hbase.hregion.max.filesize</name>
    <value>1073741824</value>
    <description>
    Maximum HStoreFile size. If any one of a column families' HStoreFiles
has
    grown to exceed this value, the hosting HRegion is split in two.
    Default: 256M.
    </description>
  </property>
  <property>
    <name>hbase.hregion.memstore.block.multiplier</name>
    <value>3</value>
    <description>
    Block updates if memstore has hbase.hregion.block.memstore
    time hbase.hregion.flush.size bytes.  Useful preventing
    runaway memstore during spikes in update traffic.  Without an
    upper-bound, memstore fills such that when it flushes the
    resultant flush files take a long time to compact or split, or
    worse, we OOME.
    </description>
  </property>

</configuration>


Thanks,
Xueling


On Thu, Dec 17, 2009 at 5:41 PM, Xueling Shu <[email protected]>wrote:

> Hey St.Ack:
>
>
> Thank you for your reply.
>
> I chose to start with HBase after getting the answer for the original post
> on the hadoop list :)
>
> As of now I use two fields to form a composite key. Other fields are
> organized into one column family.
>
> I will discuss with my manager and see how she thinks about getting more
> nodes to continue the testing.
>
> Thanks!
> Xueling
>
>
> On Thu, Dec 17, 2009 at 4:40 PM, stack <[email protected]> wrote:
>
>> Hey Xueling:
>>
>> Now I notice that you are the fellow who recently wrote up on the hadoop
>> list.
>>
>> Todds described scheme I take it won't work for you then? There'd be less
>> moving parts for sure.
>>
>> Up on hadoop list you gave a description of your records as so:
>>
>> "1-1-174-418 TGTGTCCCTTTGTAATGAATCACTATC U2 0 0 1 4 *103570835* F .. 23G
>> 24
>>
>> "The highlighted field is called "position of match" and the query we are
>> interested in is the # of sequences in a certain range of this "position
>> of
>> match". For instance the range can be "position of match" > 200 and
>> "position of match" + 36 < 200,000."
>>
>> What are you thinking regards row key?  Will each of the fields above be
>> concatenated as row key or will they each be individual columns all in the
>> one column family or in many?
>>
>> I'd suggest you get some subset of your dataset, say a million records or
>> so.  This should load into a single hbase node fine.  Use this small
>> dataset
>> to figure the schema that best serves the way you'll be querying the data.
>>
>> If you can get away with a single family, work on writing an import that
>> write hfiles directly:
>>
>> http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#bulk
>> .
>>  It'll run an order of magnitude or more faster than going via the API.
>>
>> Now, as to the size of the cluster, see the presentations section where
>> Ryan
>> describes the hardware used loading up a 9B row table.  His hardware might
>> be more than you need.  I'd suggest you start with 4 or 5 nodes and see
>> how
>> loading goes.  Check query latency.  If the numbers are not to your
>> liking,
>> add more nodes.  HBase generally scales linearly.
>>
>> Hope this helps,
>> St.Ack
>>
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Dec 17, 2009 at 4:00 PM, Xueling Shu <[email protected]
>> >wrote:
>>
>> > Hi St.Ack:
>> >
>> > Wondering how many nodes in a cluster do you recommend to hold 5B data?
>> > Eventually we need to handle X times 5B data. I want to get an idea of
>> how
>> > many resources we need.
>> >
>> > Thanks,
>> > Xueling
>> >
>> >
>> > On Thu, Dec 17, 2009 at 3:45 PM, stack <[email protected]> wrote:
>> >
>> > > Hey Xueling, 5B into a single node ain't going to work.  Get yourself
>> a
>> > bit
>> > > of a cluster somewhere.  Single node is for messing around.  Not for
>> > doing
>> > > 'real' stuff.
>> > >
>> > > St.Ack
>> > >
>> > >
>> > > On Thu, Dec 17, 2009 at 3:29 PM, stack <[email protected]> wrote:
>> > >
>> > > > On Thu, Dec 17, 2009 at 2:38 PM, Xueling Shu <
>> [email protected]
>> > > >wrote:
>> > > >
>> > > >>
>> > > >> Things started fine until 5 mins after the data population started.
>> > > >>
>> > > >> Here is the exception:
>> > > >> Exception in thread "main"
>> > > >> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying to
>> > > >> contact
>> > > >> region server 10.0.176.64:39045 for region Genome,,1261087437258,
>> row
>> > > >>
>> > > >>
>> > >
>> >
>> '\x00\x00\x00\x00\x0E\xB00\xAC\x00\x00\x00\x05\x00\x00\x00\x00\x00\x00s\xAD',
>> > > >> but failed after 10 attempts.
>> > > >> Exceptions:
>> > > >> java.io.IOException: java.io.IOException: Server not running,
>> aborting
>> > > >>
>> > > >
>> > > > See why it quit by looking in the regionserver log.
>> > > >
>> > > > Make sure you have latest hbase and read the 'Getting Started'
>> section.
>> > > >
>> > > > St.Ack
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >>        at
>> > > >>
>> > > >>
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2347)
>> > > >>        at
>> > > >>
>> > > >>
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1826)
>> > > >>        at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown
>> Source)
>> > > >>        at
>> > > >>
>> > > >>
>> > >
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> > > >>        at java.lang.reflect.Method.invoke(Method.java:597)
>> > > >>        at
>> > > >> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
>> > > >>        at
>> > > >>
>> > >
>> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
>> > > >>
>> > > >> java.net.ConnectException: Connection refused
>> > > >> java.net.ConnectException: Connection refused
>> > > >> java.net.ConnectException: Connection refused
>> > > >> java.net.ConnectException: Connection refused
>> > > >> java.net.ConnectException: Connection refused
>> > > >> java.net.ConnectException: Connection refused
>> > > >> java.net.ConnectException: Connection refused
>> > > >> java.net.ConnectException: Connection refused
>> > > >> java.net.ConnectException: Connection refused
>> > > >>
>> > > >>        at
>> > > >>
>> > > >>
>> > >
>> >
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1002)
>> > > >>        at
>> > > >>
>> > > >>
>> > >
>> >
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$2.doCall(HConnectionManager.java:1193)
>> > > >>        at
>> > > >>
>> > > >>
>> > >
>> >
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.process(HConnectionManager.java:1115)
>> > > >>        at
>> > > >>
>> > > >>
>> > >
>> >
>> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1201)
>> > > >>        at
>> > > >> org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:605)
>> > > >>        at
>> org.apache.hadoop.hbase.client.HTable.put(HTable.java:470)
>> > > >>        at HadoopTrigger.populateData(HadoopTrigger.java:126)
>> > > >>        at HadoopTrigger.main(HadoopTrigger.java:52)
>> > > >>
>> > > >> Can anybody let me know how to fix it?
>> > > >> Thanks,
>> > > >> Xueling
>> > > >>
>> > > >
>> > > >
>> > >
>> >
>>
>
>

Re: Exceptions were shown when 5 billions of records were being populated into HBase on a single node

Reply via email to