Re: Storing XML file in Hbase

2016-11-28 Thread Richard Startin
In my experience it's better to keep the number of column families low. When 
flushes occur, they effect all column families in a table, so when the memstore 
fills you'll create an HFile per family. I haven't seen any performance impact 
in having two column families though.


As for the number of columns, there are two extremes - 1) "narrow" - store the 
xml as a blob in a single cell; 2) "wide" break it out into columns, of which 
you can have thousands.


  1.  In the case where you store XML as a blob you always need to retrieve the 
entire document, and must deserialise it to perform operations. You save space 
in not repeating the row key, you save space on column and column family 
qualifiers
  2.  When you break the XML out into columns you can retrieve data at a per 
attribute level, which might save IO by filtering unnecessary content, and you 
don't need to break open the XML to perform operations. You incur a cost in 
repeating the row key per tuple (this can add up and will effect read 
performance by limiting the number of rows that can fit into the block cache), 
as well as the extra cost of column families. There is a practical limit to the 
number of columns because a row cannot be split across regions.

You may find optimal performance for you use case somewhere between the two 
extremes and it's best to prototype and measure early.

Cheers,
Richard


https://richardstartin.com/



From: Mich Talebzadeh 
Sent: 28 November 2016 21:57
To: user@hbase.apache.org
Subject: Re: Storing XML file in Hbase

Thanks Richard.

How would one decide on the number of column family and columns?

Is there a ballpark approach

Cheers

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 28 November 2016 at 16:04, Richard Startin 
wrote:

> Hi Mich,
>
> If you want to store the file whole, you'll need to enforce a 10MB limit
> to the file size, otherwise you will flush too often (each time the me
> store fills up) which will slow down writes.
>
> Maybe you could deconstruct the xml by extracting columns from the xml
> using xpath?
>
> If the files are small there might be a tangible performance benefit by
> limiting the number of columns.
>
> Cheers,
> Richard
>
> Sent from my iPhone
>
> > On 28 Nov 2016, at 15:53, Dima Spivak  wrote:
> >
> > Hi Mich,
> >
> > How many files are you looking to store? How often do you need to read
> > them? What's the total size of all the files you need to serve?
> >
> > Cheers,
> > Dima
> >
> > On Mon, Nov 28, 2016 at 7:04 AM Mich Talebzadeh <
> mich.talebza...@gmail.com>
> > wrote:
> >
> >> Hi,
> >>
> >> Storing XML file in Big Data. Are there any strategies to create
> multiple
> >> column families or just one column family and in that case how many
> columns
> >> would be optional?
> >>
> >> thanks
> >>
> >> Dr Mich Talebzadeh
> >>
> >>
> >>
> >> LinkedIn *
> >> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCd
> OABUrV8Pw
> >> <
> >> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCd
> OABUrV8Pw
> >>> *
> >>
> >>
> >>
> >> http://talebzadehmich.wordpress.com
> >>
> >>
> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for
> any
> >> loss, damage or destruction of data or any other property which may
> arise
> >> from relying on this email's technical content is explicitly disclaimed.
> >> The author will in no case be liable for any monetary damages arising
> from
> >> such loss, damage or destruction.
> >>
>


Re: Using Hbase as a transactional table

2016-11-28 Thread John Leach
Mich,

Splice Machine (Open Source) can do this on top of Hbase and we have an example 
running a TPC-C benchmark.  Might be worth a look.

Regards,
John

> On Nov 28, 2016, at 4:36 PM, Ted Yu  wrote:
> 
> Not sure if Transactions (beta) | Apache Phoenix is up to date.
> Why not ask on Phoenix mailing list where you would get better answer(s) ?
> Cheers
> 
> |  
> |   |  
> Transactions (beta) | Apache Phoenix
>   |  |
> 
>  |
> 
> 
> 
> 
>On Monday, November 28, 2016 2:02 PM, Mich Talebzadeh 
>  wrote:
> 
> 
> Thanks Ted.
> 
> How does Phoenix provide transaction support?
> 
> I have read some docs but sounds like problematic. I need to be sure there
> is full commit and rollback if things go wrong!
> 
> Also it appears that Phoenix transactional support is in beta phase.
> 
> Cheers
> 
> 
> 
> Dr Mich Talebzadeh
> 
> 
> 
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
> 
> 
> 
> http://talebzadehmich.wordpress.com
> 
> 
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
> 
> 
> 
> On 23 November 2016 at 18:15, Ted Yu  wrote:
> 
>> Mich:
>> Even though related rows are on the same region server, there is no
>> intrinsic transaction support.
>> 
>> For #1 under design considerations, multi column family is one
>> possibility. You should consider how the queries from RDBMS access the
>> related data.
>> 
>> You can also evaluate Phoenix / Trafodion which provides transaction
>> support.
>> 
>> Cheers
>> 
>>> On Nov 23, 2016, at 9:19 AM, Mich Talebzadeh 
>> wrote:
>>> 
>>> Thanks all.
>>> 
>>> As I understand Hbase does not support ACIC compliant transactions over
>>> multiple rows or across tables?
>>> 
>>> So this is not supported
>>> 
>>> 
>>>   1. Hbase can support multi-rows transactions if the rows are on the
>> same
>>>   table and in the same RegionServer?
>>>   2. Hbase does not support multi-rows transactions if the rows are in
>>>   different tables but happen to be in the same RegionServer?
>>>   3. If I migrated RDBMS transactional tables to the same Hbase table
>> (big
>>>   if) with different column familities will that work?
>>> 
>>> 
>>> Design considerations
>>> 
>>> 
>>>   1. If I have 4 big tables in RDBMS, some having in excess of 200
>> columns
>>>   (I know this is a joke), can they all go one-to-one to Hbase tables.
>> Can
>>>   some of these RDBMS tables put into one Hbase schema  with different
>> column
>>>   families.
>>>   2. then another question. If I use hive tables on these hbase tables
>>>   with large number of family columns, will it work ok?
>>> 
>>> thanks
>>> 
>>>   1.
>>> 
>>> 
>>> Dr Mich Talebzadeh
>>> 
>>> 
>>> 
>>> LinkedIn * https://www.linkedin.com/profile/view?id=
>> AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> > OABUrV8Pw>*
>>> 
>>> 
>>> 
>>> http://talebzadehmich.wordpress.com
>>> 
>>> 
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
>>> loss, damage or destruction of data or any other property which may arise
>>> from relying on this email's technical content is explicitly disclaimed.
>>> The author will in no case be liable for any monetary damages arising
>> from
>>> such loss, damage or destruction.
>>> 
>>> 
>>> 
 On 23 November 2016 at 16:43, Denise Rogers  wrote:
 
 I would recommend MariaDB. HBase is not ACID compliant. MariaDB is.
 
 Regards,
 Denise
 
 
 Sent from mi iPad
 
>> On Nov 23, 2016, at 11:27 AM, Mich Talebzadeh <
>> mich.talebza...@gmail.com>
> wrote:
> 
> Hi,
> 
> I need to explore if anyone has used Hbase as a transactional table to
>> do
> the processing that historically one has done with RDBMSs.
> 
> A simple question dealing with a transaction as a unit of work (all or
> nothing). In that case if any part of statement in batch transaction
 fails,
> that transaction will be rolled back in its entirety.
> 
> Now how does Hbase can handle this? Specifically at the theoretical
>> level
> if a standard transactional processing was migrated from RDBMS to Hbase
> tables, will that work.
> 
> Has anyone built  successful transaction processing in Hbase?
> 
> Thanks
> 
> 
> Dr Mich Talebzadeh
> 
> 
> 
> LinkedIn * https://www.linkedin.com/profile/view?id=
 AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > 

Re: Using Hbase as a transactional table

2016-11-28 Thread Mich Talebzadeh
Thanks Ted.

How does Phoenix provide transaction support?

I have read some docs but sounds like problematic. I need to be sure there
is full commit and rollback if things go wrong!

Also it appears that Phoenix transactional support is in beta phase.

Cheers



Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 23 November 2016 at 18:15, Ted Yu  wrote:

> Mich:
> Even though related rows are on the same region server, there is no
> intrinsic transaction support.
>
> For #1 under design considerations, multi column family is one
> possibility. You should consider how the queries from RDBMS access the
> related data.
>
> You can also evaluate Phoenix / Trafodion which provides transaction
> support.
>
> Cheers
>
> > On Nov 23, 2016, at 9:19 AM, Mich Talebzadeh 
> wrote:
> >
> > Thanks all.
> >
> > As I understand Hbase does not support ACIC compliant transactions over
> > multiple rows or across tables?
> >
> > So this is not supported
> >
> >
> >   1. Hbase can support multi-rows transactions if the rows are on the
> same
> >   table and in the same RegionServer?
> >   2. Hbase does not support multi-rows transactions if the rows are in
> >   different tables but happen to be in the same RegionServer?
> >   3. If I migrated RDBMS transactional tables to the same Hbase table
> (big
> >   if) with different column familities will that work?
> >
> >
> > Design considerations
> >
> >
> >   1. If I have 4 big tables in RDBMS, some having in excess of 200
> columns
> >   (I know this is a joke), can they all go one-to-one to Hbase tables.
> Can
> >   some of these RDBMS tables put into one Hbase schema  with different
> column
> >   families.
> >   2. then another question. If I use hive tables on these hbase tables
> >   with large number of family columns, will it work ok?
> >
> > thanks
> >
> >   1.
> >
> >
> > Dr Mich Talebzadeh
> >
> >
> >
> > LinkedIn * https://www.linkedin.com/profile/view?id=
> AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >  OABUrV8Pw>*
> >
> >
> >
> > http://talebzadehmich.wordpress.com
> >
> >
> > *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> > loss, damage or destruction of data or any other property which may arise
> > from relying on this email's technical content is explicitly disclaimed.
> > The author will in no case be liable for any monetary damages arising
> from
> > such loss, damage or destruction.
> >
> >
> >
> >> On 23 November 2016 at 16:43, Denise Rogers  wrote:
> >>
> >> I would recommend MariaDB. HBase is not ACID compliant. MariaDB is.
> >>
> >> Regards,
> >> Denise
> >>
> >>
> >> Sent from mi iPad
> >>
>  On Nov 23, 2016, at 11:27 AM, Mich Talebzadeh <
> mich.talebza...@gmail.com>
> >>> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I need to explore if anyone has used Hbase as a transactional table to
> do
> >>> the processing that historically one has done with RDBMSs.
> >>>
> >>> A simple question dealing with a transaction as a unit of work (all or
> >>> nothing). In that case if any part of statement in batch transaction
> >> fails,
> >>> that transaction will be rolled back in its entirety.
> >>>
> >>> Now how does Hbase can handle this? Specifically at the theoretical
> level
> >>> if a standard transactional processing was migrated from RDBMS to Hbase
> >>> tables, will that work.
> >>>
> >>> Has anyone built  successful transaction processing in Hbase?
> >>>
> >>> Thanks
> >>>
> >>>
> >>> Dr Mich Talebzadeh
> >>>
> >>>
> >>>
> >>> LinkedIn * https://www.linkedin.com/profile/view?id=
> >> AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >>>  AAEWh2gBxianrbJd6zP6AcPCCd
> >> OABUrV8Pw>*
> >>>
> >>>
> >>>
> >>> http://talebzadehmich.wordpress.com
> >>>
> >>>
> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
> any
> >>> loss, damage or destruction of data or any other property which may
> arise
> >>> from relying on this email's technical content is explicitly
> disclaimed.
> >>> The author will in no case be liable for any monetary damages arising
> >> from
> >>> such loss, damage or destruction.
> >>
> >>
>


Re: Storing XML file in Hbase

2016-11-28 Thread Mich Talebzadeh
Thanks Richard.

How would one decide on the number of column family and columns?

Is there a ballpark approach

Cheers

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 28 November 2016 at 16:04, Richard Startin 
wrote:

> Hi Mich,
>
> If you want to store the file whole, you'll need to enforce a 10MB limit
> to the file size, otherwise you will flush too often (each time the me
> store fills up) which will slow down writes.
>
> Maybe you could deconstruct the xml by extracting columns from the xml
> using xpath?
>
> If the files are small there might be a tangible performance benefit by
> limiting the number of columns.
>
> Cheers,
> Richard
>
> Sent from my iPhone
>
> > On 28 Nov 2016, at 15:53, Dima Spivak  wrote:
> >
> > Hi Mich,
> >
> > How many files are you looking to store? How often do you need to read
> > them? What's the total size of all the files you need to serve?
> >
> > Cheers,
> > Dima
> >
> > On Mon, Nov 28, 2016 at 7:04 AM Mich Talebzadeh <
> mich.talebza...@gmail.com>
> > wrote:
> >
> >> Hi,
> >>
> >> Storing XML file in Big Data. Are there any strategies to create
> multiple
> >> column families or just one column family and in that case how many
> columns
> >> would be optional?
> >>
> >> thanks
> >>
> >> Dr Mich Talebzadeh
> >>
> >>
> >>
> >> LinkedIn *
> >> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCd
> OABUrV8Pw
> >> <
> >> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCd
> OABUrV8Pw
> >>> *
> >>
> >>
> >>
> >> http://talebzadehmich.wordpress.com
> >>
> >>
> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for
> any
> >> loss, damage or destruction of data or any other property which may
> arise
> >> from relying on this email's technical content is explicitly disclaimed.
> >> The author will in no case be liable for any monetary damages arising
> from
> >> such loss, damage or destruction.
> >>
>


Re: High CPU Utilization by meta region

2016-11-28 Thread Timothy Brown
Responses inlined.

On Mon, Nov 28, 2016 at 12:45 PM, Stack  wrote:

> On Sun, Nov 27, 2016 at 6:53 PM, Timothy Brown 
> wrote:
>
> > Hi Everyone,
> >
> > I apologize for starting an additional thread about this but I wasn't
> > subscribed to the users mailing list when I sent the original and can't
> > figure out how to respond to the original :(
> >
> > Original Message:
> >
> > We are seeing about 80% CPU utilization on the Region Server that solely
> > serves the meta table while other region servers typically have under 50%
> > CPU utilization. Is this expected?
> >
> > What is the difference when you compare servers? More requests? More i/o?
> Thread dump the metadata server and let us see a link in here? (What you
> attached below is cut-off... just as it is getting to the good part).
>
>
> There are more requests to the server containing meta. The network in
bytes are greater for the meta regionserver than the others but the network
out bytes are less.

Here's a dropbox link to the output https://dl.dropboxusercontent.com/u/
54494127/thread_dump.txt. I apologize for the cliffhanger.


>
> > Here's some more info about our cluster:
> > HBase version 1.2
> >
>
> Which 1.2?
>
> 1.2.0 which is bundled with CDH 5.8.0

>
>
> > Number of regions: 72
> > Number of tables: 97
> >
>
> On whole cluster? (Can't have more tables than regions...)
>
>
> An error on my part, I meant to put 72 region servers.


>
> > Approx. requests per second to meta region server: 3k
> >
>
> Can you see who is hitting he meta region most? (Enable rpc-level TRACE
> logging on the server hosting meta for a minute or so and see where the
> requests are coming in from).
>
> What is your cache hit rate? Can you get it higher?
>
> Cache hit rate is above 99%. We see very little disk reads.


> Is there much writing going on against meta? Or is cluster stable regards
> region movement/creation?
>
> Writing is very infrequent. The cluster is stable with regards to region
movement and creation.

>
>
> > Approx. requests per second to entire HBase cluster: 90k
> >
> > Additional info:
> >
> >
> > From Storefile Metrics:
> > Stores Num: 1
> > Storefiles: 1
> > Storefile Size: 30m
> > Uncompressed Storefile Size: 30m
> > Index Size: 459k
> >
> >
> This from meta table? That is very small.
>
> Yes this is from the meta table.


>
> >
> > I/O for the region server with only meta on it:
> > 48M bytes in
> >
>
>
> Whats all the writing about?
>
> I'm not sure. According to the AWS dashboard there are no disk writes at
that time.

>
>
> > 5.9B bytes out
> >
> >
> This is disk or network? If network, is that 5.9 bytes?
>
> This is network and thats 5.9 billion byes. (I'm using the AWS dashboard
for this)


> Thanks Tim,
> S
>
>
>
> > I used the debug dump on the region server's UI but it was too large
> > for paste bin so here's a portion of it: http://pastebin.com/nkYhEceE
> >
> >
> > Thanks for the help,
> >
> > Tim
> >
>


Re: High CPU Utilization by meta region

2016-11-28 Thread Stack
On Sun, Nov 27, 2016 at 6:53 PM, Timothy Brown  wrote:

> Hi Everyone,
>
> I apologize for starting an additional thread about this but I wasn't
> subscribed to the users mailing list when I sent the original and can't
> figure out how to respond to the original :(
>
> Original Message:
>
> We are seeing about 80% CPU utilization on the Region Server that solely
> serves the meta table while other region servers typically have under 50%
> CPU utilization. Is this expected?
>
> What is the difference when you compare servers? More requests? More i/o?
Thread dump the metadata server and let us see a link in here? (What you
attached below is cut-off... just as it is getting to the good part).



> Here's some more info about our cluster:
> HBase version 1.2
>

Which 1.2?



> Number of regions: 72
> Number of tables: 97
>

On whole cluster? (Can't have more tables than regions...)



> Approx. requests per second to meta region server: 3k
>

Can you see who is hitting he meta region most? (Enable rpc-level TRACE
logging on the server hosting meta for a minute or so and see where the
requests are coming in from).

What is your cache hit rate? Can you get it higher?

Is there much writing going on against meta? Or is cluster stable regards
region movement/creation?



> Approx. requests per second to entire HBase cluster: 90k
>
> Additional info:
>
>
> From Storefile Metrics:
> Stores Num: 1
> Storefiles: 1
> Storefile Size: 30m
> Uncompressed Storefile Size: 30m
> Index Size: 459k
>
>
This from meta table? That is very small.


>
> I/O for the region server with only meta on it:
> 48M bytes in
>


Whats all the writing about?



> 5.9B bytes out
>
>
This is disk or network? If network, is that 5.9 bytes?

Thanks Tim,
S



> I used the debug dump on the region server's UI but it was too large
> for paste bin so here's a portion of it: http://pastebin.com/nkYhEceE
>
>
> Thanks for the help,
>
> Tim
>


Re: Storing XML file in Hbase

2016-11-28 Thread Richard Startin
Hi Mich,

If you want to store the file whole, you'll need to enforce a 10MB limit to the 
file size, otherwise you will flush too often (each time the me store fills up) 
which will slow down writes. 

Maybe you could deconstruct the xml by extracting columns from the xml using 
xpath?

If the files are small there might be a tangible performance benefit by 
limiting the number of columns.

Cheers,
Richard

Sent from my iPhone

> On 28 Nov 2016, at 15:53, Dima Spivak  wrote:
> 
> Hi Mich,
> 
> How many files are you looking to store? How often do you need to read
> them? What's the total size of all the files you need to serve?
> 
> Cheers,
> Dima
> 
> On Mon, Nov 28, 2016 at 7:04 AM Mich Talebzadeh 
> wrote:
> 
>> Hi,
>> 
>> Storing XML file in Big Data. Are there any strategies to create multiple
>> column families or just one column family and in that case how many columns
>> would be optional?
>> 
>> thanks
>> 
>> Dr Mich Talebzadeh
>> 
>> 
>> 
>> LinkedIn *
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <
>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> *
>> 
>> 
>> 
>> http://talebzadehmich.wordpress.com
>> 
>> 
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
>> loss, damage or destruction of data or any other property which may arise
>> from relying on this email's technical content is explicitly disclaimed.
>> The author will in no case be liable for any monetary damages arising from
>> such loss, damage or destruction.
>> 


Re: Storing XML file in Hbase

2016-11-28 Thread Dima Spivak
Hi Mich,

How many files are you looking to store? How often do you need to read
them? What's the total size of all the files you need to serve?

Cheers,
Dima

On Mon, Nov 28, 2016 at 7:04 AM Mich Talebzadeh 
wrote:

> Hi,
>
> Storing XML file in Big Data. Are there any strategies to create multiple
> column families or just one column family and in that case how many columns
> would be optional?
>
> thanks
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn *
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> >*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>


Re: Creating HBase table with presplits

2016-11-28 Thread Dave Latham
If you truly have no way to predict anything about the distribution of your
data across the row key space, then you are correct that there is no way to
presplit your regions in an effective way.  Either you need to make some
starting guess, such as a small number of uniform splits, or wait until you
have some information about what the data will look like.

Dave

On Mon, Nov 28, 2016 at 12:42 AM, Sachin Jain 
wrote:

> Hi,
>
> I was going though pre-splitting a table article [0] and it is mentioned
> that it is generally best practice to presplit your table. But don't we
> need to know the data in advance in order to presplit it.
>
> Question: What should be the best practice when we don't know what data is
> going to be inserted into HBase. Essentially I don't know the key range so
> if I specify wrong splits, then either first or last split can be a hot
> region in my system.
>
> [0]: https://hbase.apache.org/book.html#rowkey.regionsplits
>
> Thanks
> -Sachin
>


Storing XML file in Hbase

2016-11-28 Thread Mich Talebzadeh
Hi,

Storing XML file in Big Data. Are there any strategies to create multiple
column families or just one column family and in that case how many columns
would be optional?

thanks

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.


Re: High CPU Utilization by meta region

2016-11-28 Thread Timothy Brown
Hi Ted,

The region server hosting hbase:meta only has the meta region on it so it
has 1 region while other region servers can have more than 100 regions on
them.
I didn't notice anything interesting in the logs in my opinion. Is there
anything in particular I should watch out for?
The hbase:meta table was major compacted yesterday and we're still
experiencing the issue.

Thanks for the quick response,
Tim


On Mon, Nov 28, 2016 at 5:45 AM, Ted Yu  wrote:

> Does the region server hosting hbase:meta have roughly the same number of
> regions as the other servers ?
> Did you find anything interesting in the server log (where hbase:meta is
> hosted) ?
> Have you tried major compacting the hbase:meta table ?
> In 1.2, DEFAULT_HBASE_META_VERSIONS is still 10. See HBASE-16832
>
>
> On Sunday, November 27, 2016 6:53 PM, Timothy Brown <
> t...@siftscience.com> wrote:
>
>
>  Hi Everyone,
>
> I apologize for starting an additional thread about this but I wasn't
> subscribed to the users mailing list when I sent the original and can't
> figure out how to respond to the original :(
>
> Original Message:
>
> We are seeing about 80% CPU utilization on the Region Server that solely
> serves the meta table while other region servers typically have under 50%
> CPU utilization. Is this expected?
>
> Here's some more info about our cluster:
> HBase version 1.2
> Number of regions: 72
> Number of tables: 97
> Approx. requests per second to meta region server: 3k
> Approx. requests per second to entire HBase cluster: 90k
>
> Additional info:
>
>
> From Storefile Metrics:
> Stores Num: 1
> Storefiles: 1
> Storefile Size: 30m
> Uncompressed Storefile Size: 30m
> Index Size: 459k
>
>
> I/O for the region server with only meta on it:
> 48M bytes in
> 5.9B bytes out
>
> I used the debug dump on the region server's UI but it was too large
> for paste bin so here's a portion of it: http://pastebin.com/nkYhEceE
>
>
> Thanks for the help,
>
> Tim
>
>
>
>


Re: High CPU Utilization by meta region

2016-11-28 Thread Ted Yu
Does the region server hosting hbase:meta have roughly the same number of 
regions as the other servers ?
Did you find anything interesting in the server log (where hbase:meta is 
hosted) ?
Have you tried major compacting the hbase:meta table ?
In 1.2, DEFAULT_HBASE_META_VERSIONS is still 10. See HBASE-16832
 

On Sunday, November 27, 2016 6:53 PM, Timothy Brown  
wrote:
 

 Hi Everyone,

I apologize for starting an additional thread about this but I wasn't
subscribed to the users mailing list when I sent the original and can't
figure out how to respond to the original :(

Original Message:

We are seeing about 80% CPU utilization on the Region Server that solely
serves the meta table while other region servers typically have under 50%
CPU utilization. Is this expected?

Here's some more info about our cluster:
HBase version 1.2
Number of regions: 72
Number of tables: 97
Approx. requests per second to meta region server: 3k
Approx. requests per second to entire HBase cluster: 90k

Additional info:


>From Storefile Metrics:
Stores Num: 1
Storefiles: 1
Storefile Size: 30m
Uncompressed Storefile Size: 30m
Index Size: 459k


I/O for the region server with only meta on it:
48M bytes in
5.9B bytes out

I used the debug dump on the region server's UI but it was too large
for paste bin so here's a portion of it: http://pastebin.com/nkYhEceE


Thanks for the help,

Tim


   

About HBASE-12949 - Scanner can be stuck in infinite loop if the HFile is corrupted

2016-11-28 Thread Chang Chen
Hi jerry

we met the similar issue with  HBASE-12949, I guess compaction thread is in
the endless loop, because the CPU load is quite higher than usual.

However, I can not understand why KeyValueHeap.generalizedSeek() has a
endless loop,  from the codes:

  boolean seekResult;
  if (isLazy && heap.size() > 0) {
// If there is only one scanner left, we don't do lazy seek.
seekResult = scanner.requestSeek(seekKey, forward, useBloom);
  } else {
seekResult = NonLazyKeyValueScanner.doRealSeek(
scanner, seekKey, forward);
  }

  if (!seekResult) {
scanner.close();
  } else {
heap.add(scanner);
  }

If there is an infinite loop, that means seekResult will always return
true, why is it possible?  what happen if the seek reaches end?

Tnanks
Chang


Creating HBase table with presplits

2016-11-28 Thread Sachin Jain
Hi,

I was going though pre-splitting a table article [0] and it is mentioned
that it is generally best practice to presplit your table. But don't we
need to know the data in advance in order to presplit it.

Question: What should be the best practice when we don't know what data is
going to be inserted into HBase. Essentially I don't know the key range so
if I specify wrong splits, then either first or last split can be a hot
region in my system.

[0]: https://hbase.apache.org/book.html#rowkey.regionsplits

Thanks
-Sachin