On Fri, Dec 18, 2009 at 12:40 PM, Xueling Shu <[email protected]>wrote:

> Hi St.Ack:
>
> I decided to start trying to populate a small data set (33,000 rows) into a
> single node. Here are the issues:
>
> 1. The "connection refused" exception in my original post happened again
> right after 33,000 rows were uploaded. There is still plenty of disk space
> (around 2.5T) left.
>

Look in logs to see why the server commit.  It'll usually say.



> 2. I could not see the data file under the default location
> "/tmp/hbase-{user}/hbase/". The directory representing the table exists
> with
> an encoded region name as the sub direcotry. But nothing was under the sub
> directory except for ".reigioninfo" and "..regioninfo.crc".  However I
> could
> do the scan over the table in hbase shell. Wondering where the data file is
> actually stored?
>


Probably haven't done a flush yet.  The data is still up in memory.  Sounds
like you have not added enough data to trigger a flush yet.



>
> Do I need to change any configuration? Here is my hbase-site.xml:
>
> <configuration>
>   <property>
>    <name>mapred.job.reuse.jvm.num.tasks</name>
>    <value>-1</value>
>    <description>How many tasks to run per jvm. If -1 then no limit at
> all.</description>
>  </property>
>  <property>
>    <name>dfs.datanode.max.xcievers</name>
>    <value>2048</value>
>  </property>
>  <property>
>    <name>hbase.hregion.max.filesize</name>
>    <value>1073741824</value>
>    <description>
>    Maximum HStoreFile size. If any one of a column families' HStoreFiles
> has
>    grown to exceed this value, the hosting HRegion is split in two.
>    Default: 256M.
>    </description>
>  </property>
>  <property>
>    <name>hbase.hregion.memstore.block.multiplier</name>
>    <value>3</value>
>    <description>
>    Block updates if memstore has hbase.hregion.block.memstore
>    time hbase.hregion.flush.size bytes.  Useful preventing
>    runaway memstore during spikes in update traffic.  Without an
>    upper-bound, memstore fills such that when it flushes the
>    resultant flush files take a long time to compact or split, or
>    worse, we OOME.
>    </description>
>  </property>
>
> </configuration>
>
>
>
Go with the defaults.  Remove hbase.hregion.max.filesize.  You are not
running hdfs right?  The above hdfs configs. belong over there rather than
up in hbase-site.xml.  Otherwise, config. is fine.

St.Ack



> Thanks,
> Xueling
>
>
> On Thu, Dec 17, 2009 at 5:41 PM, Xueling Shu <[email protected]
> >wrote:
>
> > Hey St.Ack:
> >
> >
> > Thank you for your reply.
> >
> > I chose to start with HBase after getting the answer for the original
> post
> > on the hadoop list :)
> >
> > As of now I use two fields to form a composite key. Other fields are
> > organized into one column family.
> >
> > I will discuss with my manager and see how she thinks about getting more
> > nodes to continue the testing.
> >
> > Thanks!
> > Xueling
> >
> >
> > On Thu, Dec 17, 2009 at 4:40 PM, stack <[email protected]> wrote:
> >
> >> Hey Xueling:
> >>
> >> Now I notice that you are the fellow who recently wrote up on the hadoop
> >> list.
> >>
> >> Todds described scheme I take it won't work for you then? There'd be
> less
> >> moving parts for sure.
> >>
> >> Up on hadoop list you gave a description of your records as so:
> >>
> >> "1-1-174-418 TGTGTCCCTTTGTAATGAATCACTATC U2 0 0 1 4 *103570835* F .. 23G
> >> 24
> >>
> >> "The highlighted field is called "position of match" and the query we
> are
> >> interested in is the # of sequences in a certain range of this "position
> >> of
> >> match". For instance the range can be "position of match" > 200 and
> >> "position of match" + 36 < 200,000."
> >>
> >> What are you thinking regards row key?  Will each of the fields above be
> >> concatenated as row key or will they each be individual columns all in
> the
> >> one column family or in many?
> >>
> >> I'd suggest you get some subset of your dataset, say a million records
> or
> >> so.  This should load into a single hbase node fine.  Use this small
> >> dataset
> >> to figure the schema that best serves the way you'll be querying the
> data.
> >>
> >> If you can get away with a single family, work on writing an import that
> >> write hfiles directly:
> >>
> >>
> http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#bulk
> >> .
> >>  It'll run an order of magnitude or more faster than going via the API.
> >>
> >> Now, as to the size of the cluster, see the presentations section where
> >> Ryan
> >> describes the hardware used loading up a 9B row table.  His hardware
> might
> >> be more than you need.  I'd suggest you start with 4 or 5 nodes and see
> >> how
> >> loading goes.  Check query latency.  If the numbers are not to your
> >> liking,
> >> add more nodes.  HBase generally scales linearly.
> >>
> >> Hope this helps,
> >> St.Ack
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On Thu, Dec 17, 2009 at 4:00 PM, Xueling Shu <[email protected]
> >> >wrote:
> >>
> >> > Hi St.Ack:
> >> >
> >> > Wondering how many nodes in a cluster do you recommend to hold 5B
> data?
> >> > Eventually we need to handle X times 5B data. I want to get an idea of
> >> how
> >> > many resources we need.
> >> >
> >> > Thanks,
> >> > Xueling
> >> >
> >> >
> >> > On Thu, Dec 17, 2009 at 3:45 PM, stack <[email protected]> wrote:
> >> >
> >> > > Hey Xueling, 5B into a single node ain't going to work.  Get
> yourself
> >> a
> >> > bit
> >> > > of a cluster somewhere.  Single node is for messing around.  Not for
> >> > doing
> >> > > 'real' stuff.
> >> > >
> >> > > St.Ack
> >> > >
> >> > >
> >> > > On Thu, Dec 17, 2009 at 3:29 PM, stack <[email protected]> wrote:
> >> > >
> >> > > > On Thu, Dec 17, 2009 at 2:38 PM, Xueling Shu <
> >> [email protected]
> >> > > >wrote:
> >> > > >
> >> > > >>
> >> > > >> Things started fine until 5 mins after the data population
> started.
> >> > > >>
> >> > > >> Here is the exception:
> >> > > >> Exception in thread "main"
> >> > > >> org.apache.hadoop.hbase.client.RetriesExhaustedException: Trying
> to
> >> > > >> contact
> >> > > >> region server 10.0.176.64:39045 for region
> Genome,,1261087437258,
> >> row
> >> > > >>
> >> > > >>
> >> > >
> >> >
> >>
> '\x00\x00\x00\x00\x0E\xB00\xAC\x00\x00\x00\x05\x00\x00\x00\x00\x00\x00s\xAD',
> >> > > >> but failed after 10 attempts.
> >> > > >> Exceptions:
> >> > > >> java.io.IOException: java.io.IOException: Server not running,
> >> aborting
> >> > > >>
> >> > > >
> >> > > > See why it quit by looking in the regionserver log.
> >> > > >
> >> > > > Make sure you have latest hbase and read the 'Getting Started'
> >> section.
> >> > > >
> >> > > > St.Ack
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >>        at
> >> > > >>
> >> > > >>
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.HRegionServer.checkOpen(HRegionServer.java:2347)
> >> > > >>        at
> >> > > >>
> >> > > >>
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1826)
> >> > > >>        at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown
> >> Source)
> >> > > >>        at
> >> > > >>
> >> > > >>
> >> > >
> >> >
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >> > > >>        at java.lang.reflect.Method.invoke(Method.java:597)
> >> > > >>        at
> >> > > >>
> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
> >> > > >>        at
> >> > > >>
> >> > >
> >>
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
> >> > > >>
> >> > > >> java.net.ConnectException: Connection refused
> >> > > >> java.net.ConnectException: Connection refused
> >> > > >> java.net.ConnectException: Connection refused
> >> > > >> java.net.ConnectException: Connection refused
> >> > > >> java.net.ConnectException: Connection refused
> >> > > >> java.net.ConnectException: Connection refused
> >> > > >> java.net.ConnectException: Connection refused
> >> > > >> java.net.ConnectException: Connection refused
> >> > > >> java.net.ConnectException: Connection refused
> >> > > >>
> >> > > >>        at
> >> > > >>
> >> > > >>
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1002)
> >> > > >>        at
> >> > > >>
> >> > > >>
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$2.doCall(HConnectionManager.java:1193)
> >> > > >>        at
> >> > > >>
> >> > > >>
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.process(HConnectionManager.java:1115)
> >> > > >>        at
> >> > > >>
> >> > > >>
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1201)
> >> > > >>        at
> >> > > >>
> org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:605)
> >> > > >>        at
> >> org.apache.hadoop.hbase.client.HTable.put(HTable.java:470)
> >> > > >>        at HadoopTrigger.populateData(HadoopTrigger.java:126)
> >> > > >>        at HadoopTrigger.main(HadoopTrigger.java:52)
> >> > > >>
> >> > > >> Can anybody let me know how to fix it?
> >> > > >> Thanks,
> >> > > >> Xueling
> >> > > >>
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Reply via email to