Re: Hot Region Server With No Hot Region

2016-12-02 Thread John Leach
Here is what I see...


* Short Compaction Running on Heap
"regionserver/ip-10-99-181-146.aolp-prd.us-east-1.ec2.aolcloud.net/10.99.181.146:60020-shortCompactions-1480229281547"
 - Thread t@242
   java.lang.Thread.State: RUNNABLE
at 
org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder.compressSingleKeyValue(FastDiffDeltaEncoder.java:270)
at 
org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder.internalEncode(FastDiffDeltaEncoder.java:245)
at 
org.apache.hadoop.hbase.io.encoding.BufferedDataBlockEncoder.encode(BufferedDataBlockEncoder.java:987)
at 
org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder.encode(FastDiffDeltaEncoder.java:58)
at 
org.apache.hadoop.hbase.io.hfile.HFileDataBlockEncoderImpl.encode(HFileDataBlockEncoderImpl.java:97)
at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.write(HFileBlock.java:866)
at 
org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(HFileWriterV2.java:270)
at 
org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(HFileWriterV3.java:87)
at 
org.apache.hadoop.hbase.regionserver.StoreFile$Writer.append(StoreFile.java:949)
at 
org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:282)
at 
org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:105)
at 
org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:124)
at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1233)
at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1770)
at 
org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:520)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)


* WAL Syncs waiting…   ALL 5
"sync.0" - Thread t@202
   java.lang.Thread.State: TIMED_WAITING
at java.lang.Object.wait(Native Method)
- waiting on <67ba892d> (a java.util.LinkedList)
at 
org.apache.hadoop.hdfs.DFSOutputStream.waitForAckedSeqno(DFSOutputStream.java:2337)
at 
org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:2224)
at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:2116)
at 
org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:130)
at 
org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:173)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.run(FSHLog.java:1379)
at java.lang.Thread.run(Thread.java:745)

* Mutations backing up very badly...

"B.defaultRpcServer.handler=103,queue=7,port=60020" - Thread t@155
   java.lang.Thread.State: TIMED_WAITING
at java.lang.Object.wait(Native Method)
- waiting on <6ab54ea3> (a 
org.apache.hadoop.hbase.regionserver.wal.SyncFuture)
at 
org.apache.hadoop.hbase.regionserver.wal.SyncFuture.get(SyncFuture.java:167)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog.blockOnSync(FSHLog.java:1504)
at 
org.apache.hadoop.hbase.regionserver.wal.FSHLog.publishSyncThenBlockOnCompletion(FSHLog.java:1498)
at org.apache.hadoop.hbase.regionserver.wal.FSHLog.sync(FSHLog.java:1632)
at 
org.apache.hadoop.hbase.regionserver.HRegion.syncOrDefer(HRegion.java:7737)
at 
org.apache.hadoop.hbase.regionserver.HRegion.processRowsWithLocks(HRegion.java:6504)
at 
org.apache.hadoop.hbase.regionserver.HRegion.mutateRowsWithLocks(HRegion.java:6352)
at 
org.apache.hadoop.hbase.regionserver.HRegion.mutateRowsWithLocks(HRegion.java:6334)
at org.apache.hadoop.hbase.regionserver.HRegion.mutateRow(HRegion.java:6325)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.mutateRows(RSRpcServices.java:418)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:1916)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32213)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2034)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
at java.lang.Thread.run(Thread.java:745)


Too many writers being blocked attempting to write to WAL.

What does your disk infrastructure look like?  Can you get away with Multi-wal? 
 Ugh...

Regards,
John Leach


> On Dec 2, 2016, at 1:20 PM, Saad Mufti <saad.mu...@gmail.com> wrote:
> 
> Hi Ted,
> 
> Finally we have another hotspot going on, same symptoms as before, here is
> the pastebin for the stack trace from the region server that I obtained via
> VisualVM:
> 
> http://pastebin.com/qbXPP

Re: Hot Region Server With No Hot Region

2016-12-01 Thread John Leach
Saad,

Region move or split causes client connections to simultaneously refresh their 
meta.

Key word is supposed.  We have seen meta hot spotting from time to time and on 
different versions at Splice Machine.  

How confident are you in your hashing algorithm?

Regards,
John Leach



> On Dec 1, 2016, at 2:25 PM, Saad Mufti <saad.mu...@gmail.com> wrote:
> 
> No never thought about that. I just figured out how to locate the server
> for that table after you mentioned it. We'll have to keep an eye on it next
> time we have a hotspot to see if it coincides with the hotspot server.
> 
> What would be the theory for how it could become a hotspot? Isn't the
> client supposed to cache it and only go back for a refresh if it hits a
> region that is not in its expected location?
> 
> 
> Saad
> 
> 
> On Thu, Dec 1, 2016 at 2:56 PM, John Leach <jle...@splicemachine.com> wrote:
> 
>> Saad,
>> 
>> Did you validate that Meta is not on the “Hot” region server?
>> 
>> Regards,
>> John Leach
>> 
>> 
>> 
>>> On Dec 1, 2016, at 1:50 PM, Saad Mufti <saad.mu...@gmail.com> wrote:
>>> 
>>> Hi,
>>> 
>>> We are using HBase 1.0 on CDH 5.5.2 . We have taken great care to avoid
>>> hotspotting due to inadvertent data patterns by prepending an MD5 based 4
>>> digit hash prefix to all our data keys. This works fine most of the
>> times,
>>> but more and more (as much as once or twice a day) recently we have
>>> occasions where one region server suddenly becomes "hot" (CPU above or
>>> around 95% in various monitoring tools). When it happens it lasts for
>>> hours, occasionally the hotspot might jump to another region server as
>> the
>>> master decide the region is unresponsive and gives its region to another
>>> server.
>>> 
>>> For the longest time, we thought this must be some single rogue key in
>> our
>>> input data that is being hammered. All attempts to track this down have
>>> failed though, and the following behavior argues against this being
>>> application based:
>>> 
>>> 1. plotted Get and Put rate by region on the "hot" region server in
>>> Cloudera Manager Charts, shows no single region is an outlier.
>>> 
>>> 2. cleanly restarting just the region server process causes its regions
>> to
>>> randomly migrate to other region servers, then it gets new ones from the
>>> HBase master, basically a sort of shuffling, then the hotspot goes away.
>> If
>>> it were application based, you'd expect the hotspot to just jump to
>> another
>>> region server.
>>> 
>>> 3. have pored through region server logs and can't see anything out of
>> the
>>> ordinary happening
>>> 
>>> The only other pertinent thing to mention might be that we have a special
>>> process of our own running outside the cluster that does cluster wide
>> major
>>> compaction in a rolling fashion, where each batch consists of one region
>>> from each region server, and it waits before one batch is completely done
>>> before starting another. We have seen no real impact on the hotspot from
>>> shutting this down and in normal times it doesn't impact our read or
>> write
>>> performance much.
>>> 
>>> We are at our wit's end, anyone have experience with a scenario like
>> this?
>>> Any help/guidance would be most appreciated.
>>> 
>>> -
>>> Saad
>> 
>> 



Re: Hot Region Server With No Hot Region

2016-12-01 Thread John Leach
Saad,

Did you validate that Meta is not on the “Hot” region server?  

Regards,
John Leach



> On Dec 1, 2016, at 1:50 PM, Saad Mufti <saad.mu...@gmail.com> wrote:
> 
> Hi,
> 
> We are using HBase 1.0 on CDH 5.5.2 . We have taken great care to avoid
> hotspotting due to inadvertent data patterns by prepending an MD5 based 4
> digit hash prefix to all our data keys. This works fine most of the times,
> but more and more (as much as once or twice a day) recently we have
> occasions where one region server suddenly becomes "hot" (CPU above or
> around 95% in various monitoring tools). When it happens it lasts for
> hours, occasionally the hotspot might jump to another region server as the
> master decide the region is unresponsive and gives its region to another
> server.
> 
> For the longest time, we thought this must be some single rogue key in our
> input data that is being hammered. All attempts to track this down have
> failed though, and the following behavior argues against this being
> application based:
> 
> 1. plotted Get and Put rate by region on the "hot" region server in
> Cloudera Manager Charts, shows no single region is an outlier.
> 
> 2. cleanly restarting just the region server process causes its regions to
> randomly migrate to other region servers, then it gets new ones from the
> HBase master, basically a sort of shuffling, then the hotspot goes away. If
> it were application based, you'd expect the hotspot to just jump to another
> region server.
> 
> 3. have pored through region server logs and can't see anything out of the
> ordinary happening
> 
> The only other pertinent thing to mention might be that we have a special
> process of our own running outside the cluster that does cluster wide major
> compaction in a rolling fashion, where each batch consists of one region
> from each region server, and it waits before one batch is completely done
> before starting another. We have seen no real impact on the hotspot from
> shutting this down and in normal times it doesn't impact our read or write
> performance much.
> 
> We are at our wit's end, anyone have experience with a scenario like this?
> Any help/guidance would be most appreciated.
> 
> -
> Saad



Re: Using Hbase as a transactional table

2016-11-28 Thread John Leach
Mich,

Splice Machine (Open Source) can do this on top of Hbase and we have an example 
running a TPC-C benchmark.  Might be worth a look.

Regards,
John

> On Nov 28, 2016, at 4:36 PM, Ted Yu  wrote:
> 
> Not sure if Transactions (beta) | Apache Phoenix is up to date.
> Why not ask on Phoenix mailing list where you would get better answer(s) ?
> Cheers
> 
> |  
> |   |  
> Transactions (beta) | Apache Phoenix
>   |  |
> 
>  |
> 
> 
> 
> 
>On Monday, November 28, 2016 2:02 PM, Mich Talebzadeh 
>  wrote:
> 
> 
> Thanks Ted.
> 
> How does Phoenix provide transaction support?
> 
> I have read some docs but sounds like problematic. I need to be sure there
> is full commit and rollback if things go wrong!
> 
> Also it appears that Phoenix transactional support is in beta phase.
> 
> Cheers
> 
> 
> 
> Dr Mich Talebzadeh
> 
> 
> 
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> *
> 
> 
> 
> http://talebzadehmich.wordpress.com
> 
> 
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
> 
> 
> 
> On 23 November 2016 at 18:15, Ted Yu  wrote:
> 
>> Mich:
>> Even though related rows are on the same region server, there is no
>> intrinsic transaction support.
>> 
>> For #1 under design considerations, multi column family is one
>> possibility. You should consider how the queries from RDBMS access the
>> related data.
>> 
>> You can also evaluate Phoenix / Trafodion which provides transaction
>> support.
>> 
>> Cheers
>> 
>>> On Nov 23, 2016, at 9:19 AM, Mich Talebzadeh 
>> wrote:
>>> 
>>> Thanks all.
>>> 
>>> As I understand Hbase does not support ACIC compliant transactions over
>>> multiple rows or across tables?
>>> 
>>> So this is not supported
>>> 
>>> 
>>>   1. Hbase can support multi-rows transactions if the rows are on the
>> same
>>>   table and in the same RegionServer?
>>>   2. Hbase does not support multi-rows transactions if the rows are in
>>>   different tables but happen to be in the same RegionServer?
>>>   3. If I migrated RDBMS transactional tables to the same Hbase table
>> (big
>>>   if) with different column familities will that work?
>>> 
>>> 
>>> Design considerations
>>> 
>>> 
>>>   1. If I have 4 big tables in RDBMS, some having in excess of 200
>> columns
>>>   (I know this is a joke), can they all go one-to-one to Hbase tables.
>> Can
>>>   some of these RDBMS tables put into one Hbase schema  with different
>> column
>>>   families.
>>>   2. then another question. If I use hive tables on these hbase tables
>>>   with large number of family columns, will it work ok?
>>> 
>>> thanks
>>> 
>>>   1.
>>> 
>>> 
>>> Dr Mich Talebzadeh
>>> 
>>> 
>>> 
>>> LinkedIn * https://www.linkedin.com/profile/view?id=
>> AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> > OABUrV8Pw>*
>>> 
>>> 
>>> 
>>> http://talebzadehmich.wordpress.com
>>> 
>>> 
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
>>> loss, damage or destruction of data or any other property which may arise
>>> from relying on this email's technical content is explicitly disclaimed.
>>> The author will in no case be liable for any monetary damages arising
>> from
>>> such loss, damage or destruction.
>>> 
>>> 
>>> 
 On 23 November 2016 at 16:43, Denise Rogers  wrote:
 
 I would recommend MariaDB. HBase is not ACID compliant. MariaDB is.
 
 Regards,
 Denise
 
 
 Sent from mi iPad
 
>> On Nov 23, 2016, at 11:27 AM, Mich Talebzadeh <
>> mich.talebza...@gmail.com>
> wrote:
> 
> Hi,
> 
> I need to explore if anyone has used Hbase as a transactional table to
>> do
> the processing that historically one has done with RDBMSs.
> 
> A simple question dealing with a transaction as a unit of work (all or
> nothing). In that case if any part of statement in batch transaction
 fails,
> that transaction will be rolled back in its entirety.
> 
> Now how does Hbase can handle this? Specifically at the theoretical
>> level
> if a standard transactional processing was migrated from RDBMS to Hbase
> tables, will that work.
> 
> Has anyone built  successful transaction processing in Hbase?
> 
> Thanks
> 
> 
> Dr Mich Talebzadeh
> 
> 
> 
> LinkedIn * https://www.linkedin.com/profile/view?id=
 AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > 

Re: Hive on Hbase

2016-11-17 Thread John Leach
Mich,

Please see slide 9 for architectural differences between Splice Machine, 
Trafodion, and Phoenix.

https://docs.google.com/presentation/d/111t2QSVaI-CPwE_ejPHZMFhKJVe5yghCMPLfR3zh9hQ/edit?ts=582def5b#slide=id.g5fcdef5a7_09
 
<https://docs.google.com/presentation/d/111t2QSVaI-CPwE_ejPHZMFhKJVe5yghCMPLfR3zh9hQ/edit?ts=582def5b#slide=id.g5fcdef5a7_09>

The performance differences are in the later slides.

Hope this helps.  

Regards,
John Leach

> On Nov 17, 2016, at 10:41 AM, Gunnar Tapper <tapper.gun...@gmail.com> wrote:
> 
> Hi,
> 
> Trafodion's native storage engine is HBase.
> 
> You can find its documentation at: trafodion.apache.org/documentation.html
> 
> Since this is an HBase user mailing list, I suggest that we discuss your
> other questions on u...@trafodion.incubator.apache.org.
> 
> Thanks,
> 
> Gunnar
> 
> 
> 
> On Thu, Nov 17, 2016 at 8:19 AM, Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
> 
>> thanks Gunnar.
>> 
>> have you tried the performance of this product on Hbase. There are a number
>> of options available. However, what makes this product better than hive on
>> hbase?
>> 
>> regards
>> 
>> Dr Mich Talebzadeh
>> 
>> 
>> 
>> LinkedIn * https://www.linkedin.com/profile/view?id=
>> AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCd
>> OABUrV8Pw>*
>> 
>> 
>> 
>> http://talebzadehmich.wordpress.com
>> 
>> 
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
>> loss, damage or destruction of data or any other property which may arise
>> from relying on this email's technical content is explicitly disclaimed.
>> The author will in no case be liable for any monetary damages arising from
>> such loss, damage or destruction.
>> 
>> 
>> 
>> On 17 November 2016 at 15:04, Gunnar Tapper <tapper.gun...@gmail.com>
>> wrote:
>> 
>>> Apache Trafodion provides SQL on top of HBase.
>>> 
>>> On Thu, Nov 17, 2016 at 7:40 AM, Mich Talebzadeh <
>>> mich.talebza...@gmail.com>
>>> wrote:
>>> 
>>>> thanks John.
>>>> 
>>>> How about using Phoenix or using Spark RDDs on top of Hbase?
>>>> 
>>>> Many people think Phoenix is not a good choice?
>>>> 
>>>> 
>>>> 
>>>> Dr Mich Talebzadeh
>>>> 
>>>> 
>>>> 
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=
>>>> AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=
>> AAEWh2gBxianrbJd6zP6AcPCCd
>>>> OABUrV8Pw>*
>>>> 
>>>> 
>>>> 
>>>> http://talebzadehmich.wordpress.com
>>>> 
>>>> 
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any
>>>> loss, damage or destruction of data or any other property which may
>> arise
>>>> from relying on this email's technical content is explicitly
>> disclaimed.
>>>> The author will in no case be liable for any monetary damages arising
>>> from
>>>> such loss, damage or destruction.
>>>> 
>>>> 
>>>> 
>>>> On 17 November 2016 at 14:24, John Leach <jle...@splicemachine.com>
>>> wrote:
>>>> 
>>>>> Mich,
>>>>> 
>>>>> I have not found too many happy users of Hive on top of HBase in my
>>>>> experience.  For every query in Hive, you will have to read the data
>>> from
>>>>> the filesystem into hbase and then serialize the data via an HBase
>>>> scanner
>>>>> into Hive.  The throughput through this mechanism is pretty poor and
>>> now
>>>>> when you read 1 million records you actually read 1 Million records
>> in
>>>>> HBase and 1 Million Records in Hive.  There are significant resource
>>>>> management issues with this approach as well.
>>>>> 
>>>>> At Splice Machine (open source), we have written an implementation to
>>>> read
>>>>> the store files directly from the file system (via embedded Spark)
>> and
>>>> then
>>>>> we do incremental deltas with HBase to maintain consistency.  When we
>>>> read
>>>>> 1 million records, Spark reads most of them directly from the
>>> filesystem.
>>>>

Re: Hive on Hbase

2016-11-17 Thread John Leach
Mich,

I have not found too many happy users of Hive on top of HBase in my experience. 
 For every query in Hive, you will have to read the data from the filesystem 
into hbase and then serialize the data via an HBase scanner into Hive.  The 
throughput through this mechanism is pretty poor and now when you read 1 
million records you actually read 1 Million records in HBase and 1 Million 
Records in Hive.  There are significant resource management issues with this 
approach as well.

At Splice Machine (open source), we have written an implementation to read the 
store files directly from the file system (via embedded Spark) and then we do 
incremental deltas with HBase to maintain consistency.  When we read 1 million 
records, Spark reads most of them directly from the filesystem.  Spark provides 
resource management and fair scheduling of those queries as well.  

We released some of our performance results at HBaseCon East in NYC.  Here is 
the video.  https://www.youtube.com/watch?v=cgIz-cjehJ0 
<https://www.youtube.com/watch?v=cgIz-cjehJ0> .

Regards,
John Leach

> On Nov 17, 2016, at 6:09 AM, Mich Talebzadeh <mich.talebza...@gmail.com> 
> wrote:
> 
> H,
> 
> My approach to have a SQL engine on top of Hbase has been (excluding Spark
> & Phoenix for now) is to create Hbase table as is, then create an EXTERNAL
> Hive table on Hbase using Hadoop.hive.HbaseStorageHandler to interface with
> Hbase table.
> 
> My reasoning with creating Hive external table is to avoid accidentally
> dropping Hbase table etc. Is this a reasonable approach?
> 
> Then that Hive table can be used by a variety of tools like Spark, Tableau,
> Zeppelin.
> 
> Is this a viable solution as Hive seems to be preferred on top of Hbase
> compared to Phoenix etc.
> 
> Thaks
> 
> Dr Mich Talebzadeh
> 
> 
> 
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
> 
> 
> 
> http://talebzadehmich.wordpress.com
> 
> 
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.



Re: Scan a region in parallel

2016-10-22 Thread John Leach
Anil,

You could also try Splice Machine (Open Source).  

Regards,
John Leach

> On Oct 21, 2016, at 4:05 AM, Anil <anilk...@gmail.com> wrote:
> 
> Thank you Ram. Now its clear. i will take a look at it.
> 
> Thanks again.
> 
> On 21 October 2016 at 14:25, ramkrishna vasudevan <
> ramkrishna.s.vasude...@gmail.com> wrote:
> 
>> Phoenix does support intelligent ways when you query using columns since it
>> is a SQL engine.
>> 
>> There the parallelism happens by using guideposts - those are fixed spaced
>> row keys stored in a seperate stats table. So when you do a query the
>> Phoenix internally spawns parallels scan queries using those guide posts
>> and thus making querying faster.
>> 
>> Regards
>> Ram
>> 
>> On Fri, Oct 21, 2016 at 1:26 PM, Anil <anilk...@gmail.com> wrote:
>> 
>>> Thank you Ram.
>>> 
>>> "So now  you are spawning those many scan threads equal to the number of
>>> regions " - YES
>>> 
>>> There are two ways of scanning region in parallel
>>> 
>>> 1. scan a region with start row and stop row in parallel with single scan
>>> operation on server side and hbase take care of parallelism internally.
>>> 2. transform a start row and stop row of a region into number of start
>> and
>>> stop rows (by some criteria) and span scan query for each start and stop
>>> row.
>>> 
>>> #1 is not supported (as you also said).
>>> 
>>> i am looking for #2. i checked the phoenix documentation and code. it
>> seems
>>> to me that phoenix is doing #2. i looked into phoenix code and could not
>>> understand it completely.
>>> 
>>> The usecase is very simple. Hbase not good (at least in terms of
>>> performance for OLTP) query by all columns (other than row key) and
>> sorting
>>> of all columns of a row. even phoenix too.
>>> 
>>> So i am planning load the hbase/phoenix table into in-memory data base
>> for
>>> faster access.
>>> 
>>> scanning of big region sequentially will lead to larger load time. so
>>> finding ways to minimize the load time.
>>> 
>>> Hope this helps.
>>> 
>>> Thanks.
>>> 
>>> 
>>> On 21 October 2016 at 09:30, ramkrishna vasudevan <
>>> ramkrishna.s.vasude...@gmail.com> wrote:
>>> 
>>>> Hi Anil
>>>> 
>>>> So now  you are spawning those many scan threads equal to the number of
>>>> regions.
>>>> bq.Is there any way to scan a region in parallel ?
>>>> You mean with in a region you want to scan parallely? Which means that
>> a
>>>> single query you want to split up into N number of small scans and read
>>> and
>>>> aggregate on the client side/server side?
>>>> 
>>>> Currently you cannot do that. Once you set a start and stoprow the scan
>>>> will determine which region it belongs to and retrieves the data
>>>> sequentially in that region (it applies the filtering that you do
>> during
>>>> the course of the scan).
>>>> 
>>>> Have you tried Apache Phoenix?  Its a SQL wrapper over HBase and there
>>> you
>>>> could do parallel scans for a given SQL query if there are some guide
>>> posts
>>>> collected. Such things cannot be an integral part of HBase. But I fear
>>> as I
>>>> am not aware of your usecase we cannot suggest on this.
>>>> 
>>>> REgards
>>>> Ram
>>>> 
>>>> 
>>>> On Fri, Oct 21, 2016 at 8:40 AM, Anil <anilk...@gmail.com> wrote:
>>>> 
>>>>> Any pointers ?
>>>>> 
>>>>> On 20 October 2016 at 18:15, Anil <anilk...@gmail.com> wrote:
>>>>> 
>>>>>> HI,
>>>>>> 
>>>>>> I am loading hbase table into an in-memory db to support filter,
>>>> ordering
>>>>>> and pagination.
>>>>>> 
>>>>>> I am scanning region and inserting data into in-memory db. each
>>> region
>>>>>> scan is done in single thread so each region is scanned in
>> parallel.
>>>>>> 
>>>>>> Is there any way to scan a region in parallel ? any pointers would
>> be
>>>>>> helpful.
>>>>>> 
>>>>>> Thanks
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 



Re: where clause on Phoenix view built on Hbase table throws error

2016-10-05 Thread John Leach

Remove the double quotes and try single quote.  Double quotes refers to an 
identifier…

Cheers,
John Leach

> On Oct 5, 2016, at 9:21 AM, Mich Talebzadeh <mich.talebza...@gmail.com> wrote:
> 
> Hi,
> 
> I have this Hbase table already populated
> 
> create 'tsco','stock_daily'
> 
> and populated using
> $HBASE_HOME/bin/hbase org.apache.hadoop.hbase.mapreduce.ImportTsv
> -Dimporttsv.separator=',' -Dimporttsv.columns="HBASE_ROW_KEY,
> stock_info:stock,stock_info:ticker,stock_daily:Date,stock_daily:open,stock_daily:high,stock_daily:low,stock_daily:close,stock_daily:volume"
> tsco hdfs://rhes564:9000/data/stocks/tsco.csv
> This works OK. In Hbase I have
> 
> hbase(main):176:0> scan 'tsco', LIMIT => 1
> ROWCOLUMN+CELL
> TSCO-1-Apr-08
> column=stock_daily:Date, timestamp=1475525222488, value=1-Apr-08
> TSCO-1-Apr-08
> column=stock_daily:close, timestamp=1475525222488, value=405.25
> TSCO-1-Apr-08
> column=stock_daily:high, timestamp=1475525222488, value=406.75
> TSCO-1-Apr-08
> column=stock_daily:low, timestamp=1475525222488, value=379.25
> TSCO-1-Apr-08
> column=stock_daily:open, timestamp=1475525222488, value=380.00
> TSCO-1-Apr-08
> column=stock_daily:stock, timestamp=1475525222488, value=TESCO PLC
> TSCO-1-Apr-08
> column=stock_daily:ticker, timestamp=1475525222488, value=TSCO
> TSCO-1-Apr-08
> column=stock_daily:volume, timestamp=1475525222488, value=49664486
> 
> In Phoenix I have a view "tsco" created on Hbase table as follows:
> 
> 0: jdbc:phoenix:rhes564:2181> create view "tsco" (PK VARCHAR PRIMARY KEY,
> "stock_daily"."Date" VARCHAR, "stock_daily"."close" VARCHAR,
> "stock_daily"."high" VARCHAR, "stock_daily"."low" VARCHAR,
> "stock_daily"."open" VARCHAR, "stock_daily"."ticker" VARCHAR,
> "stock_daily"."stock" VARCHAR, "stock_daily"."volume" VARCHAR)
> 
> So all good.
> 
> This works
> 
> 0: jdbc:phoenix:rhes564:2181> select "Date","volume" from "tsco" limit 2;
> +---+---+
> |   Date|  volume   |
> +---+---+
> | 1-Apr-08  | 49664486  |
> | 1-Apr-09  | 24877341  |
> +---+---+
> 2 rows selected (0.011 seconds)
> 
> However, I don't seem to be able to use where clause!
> 
> 0: jdbc:phoenix:rhes564:2181> select "Date","volume" from "tsco" where
> "Date" = "1-Apr-08";
> Error: ERROR 504 (42703): Undefined column. columnName=1-Apr-08
> (state=42703,code=504)
> org.apache.phoenix.schema.ColumnNotFoundException: ERROR 504 (42703):
> Undefined column. columnName=1-Apr-08
> 
> Why does it think a predicate "1-Apr-08" is a column.
> 
> Any ideas?
> 
> Thanks
> 
> 
> 
> Dr Mich Talebzadeh
> 
> 
> 
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
> 
> 
> 
> http://talebzadehmich.wordpress.com
> 
> 
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.



Re: Options to Import Data from MySql DB to HBase

2016-08-30 Thread John Leach
I would suggest using an open source tool on top of HBase (Splice Machine, 
Trafodion, or Phoenix) if you are wanting to map from an RDBMS.

Regards,
John Leach

> On Aug 30, 2016, at 1:09 AM, Soumya Nayan Kar <soumyanayan@gmail.com> 
> wrote:
> 
> I have a single table in MySql which contains around 2400 records. I
> need a way to import this data into a table in HBase with multiple column
> families. I initially chose Sqoop as the tool to import the data but later
> found that I cannot use Sqoop to directly import the data as Sqoop does not
> support multiple column family import as yet. I have populated the data in
> HDFS using Sqoop from the MySql database. What are my choices to import
> this data from HDFSFS to HBase table with 3 column families? It seems for
> bulk import, I have two choices:
> 
>   - ImportTSV tool: this probably requires the source data to be in TSV
>   format. But the data that I have imported in HDFS from MySql using Sqoop
>   seems to be in the CSV format. What is the standard solution for this
>   approach?
>   - Write a custom Map Reduce program to translate the data in HDFS to
>   HFile and load it into HBase.
> 
> I just wanted to ensure that are these the only two choices available to
> load the data. This seems to be a bit restrictive given the fact that such
> a requirement is a very basic one in any system. If custom Map Reduce is
> the way to go, an example or working sample would be really helpful.



Re: Hbase regionserver.MultiVersionConcurrencyControl Warning

2016-08-11 Thread John Leach
We saw this as well at Splice Machine.  This led us to run compactions in 
Spark.  Once we did this, we saw the compaction effects go away almost entirely.

Here is a link to our code.

https://github.com/splicemachine/spliceengine/blob/73640a81972ef5831c1ea834ac9ac22f5b3428db/hbase_sql/src/main/java/com/splicemachine/olap/CompactionJob.java
 
<https://github.com/splicemachine/spliceengine/blob/73640a81972ef5831c1ea834ac9ac22f5b3428db/hbase_sql/src/main/java/com/splicemachine/olap/CompactionJob.java>

We have a todo to get this back in the community.  

Regards,
John Leach

> On Aug 11, 2016, at 8:03 AM, Sterfield <sterfi...@gmail.com> wrote:
> 
> And it's gone [1]. No more spikes in the writes / read, no more OpenTSDB
> error. So I think it's safe to assume that OpenTSDB compaction is
> generating some additional load that is not very well handled by the HBase,
> and therefore, generating the issues I'm mentioning.
> 
> It seems also that the MVCC error are gone (to be checked).
> 
> I don't know how to manage Hbase in order to make it possible to handle
> compaction without any issues, but at least, I know where it comes from
> 
> [1] :
> https://www.dropbox.com/s/d6l2lngr6mpizh9/Without%20OpenTSDB%20compaction.png?dl=0
> 
> 2016-08-11 13:18 GMT+02:00 Sterfield <sterfi...@gmail.com>:
> 
>> Hello,
>> 
>> 
>>>> Hi,
>>>> 
>>>> Thanks for your answer.
>>>> 
>>>> I'm currently testing OpenTSDB + HBase, so I'm generating thousands of
>>> HTTP
>>>> POST on OpenTSDB in order to write data points (currently up to 300k/s).
>>>> OpenTSDB is only doing increment / append (AFAIK)
>>>> 
>>>> How many nodes or is that 300k/s on a single machine?
>> 
>> 
>> 1 master node, 4 slaves, colo HDFS + RS.
>> 
>> Master : m4.2xlarge (8CPU, 32GB RAM)
>> Slave : d2.2xlarge (8CPU, 61GB RAM, 6x2T disk)
>> 
>> 
>>>> If I have understood your answer correctly, some write ops are queued,
>>> and
>>>> some younger ops in the queue are "done" while some older are not.
>>>> 
>>>> 
>>> What Anoop said plus, we'll see the STUCK notice when it is taking a long
>>> time for the MVCC read point to come up to the write point of the
>>> currently
>>> ongoing transaction. We will hold the updating thread until the readpoint
>>> is equal or greater than the current transactions write point. We do this
>>> to ensure a client can read its own writes. The MVCC is region wide. If
>>> many ongoing updates, a slightly slower one may drag down other
>>> outstanding
>>> transactions completing. The STUCK message goes away after some time? It
>>> happens frequently? A thread dump while this is going on would be
>>> interesting if possible or what else is going on on the server around this
>>> time (see in logs?)
>> 
>> 
>> Yes, the STUCK message happens for quite some time (a dozen of minutes,
>> each hour.). It happens every hour.
>> 
>>> Few additional questions :
>>>> 
>>>>   - Is it a problem regarding the data or is it "safe" ? In other
>>> words,
>>>>   the old data not been written yet will be dropped or they will be
>>>> written
>>>>   correctly, just later ?
>>>> 
>>> No data is going to be dropped. The STUCK message is just flagging you
>>> that
>>> a write is taking a while to complete while we wait on MVCC. You backed up
>>> on disk or another resource or a bunch of writers have all happened to
>>> arrive at one particular region (MVCC is by region)?
>> 
>> 
>> I've pre-splitted my "tsdb" region, to be managed by all 4 servers, so I
>> think I'm ok on this side. All information are stored locally on the EC2
>> disks.
>> 
>> 
>>>>   - How can I debug this and if possible, fix it ?
>>>> 
>>>> 
>>> See above. Your writes are well distributed. Disks are healthy?
>> 
>> 
>> 
>> So here's the result of my investigation + my assumptions :
>> 
>> 
>>   - Every hour, my RS have a peak of Load / CPU. I was looking at a RS
>>   when it happened (that's easy, it's at the beginning of each hour), and the
>>   RS java process was taking all the CPU available on the machine, hence the
>>   load. You can see the load of all my servers on those images, see [1] and
>>   [2].
>>   - Disk are fine IMO. Write IO is OK on average, peak to 300 / 400
>>   IOPS, in range of a correct mechani

Re: help, try to use HBase's checkAndPut() to implement distribution lock

2016-08-10 Thread John Leach
Ming,

One challenge with Locking Mechanisms is that you need to account for node 
failure after lock is acquired.  If a region would to be moved, split, etc, the 
lock needs to survive those operations.  Most databases put the locks inline 
with the data they store and user their transaction resolution mechanism to 
enforce or release them.

Good luck.

Regards,
John Leach
 
> On Aug 10, 2016, at 9:54 AM, Liu, Ming (Ming) <ming@esgyn.cn> wrote:
> 
> Thanks Ted!
> 
> -Original Message-
> From: Ted Yu [mailto:yuzhih...@gmail.com] 
> Sent: Wednesday, August 10, 2016 10:17 PM
> To: user@hbase.apache.org
> Subject: Re: help, try to use HBase's checkAndPut() to implement distribution 
> lock
> 
> Please take a look at EnableTableHandler where you can find example
> - prepare() and process().
> 
> On Tue, Aug 9, 2016 at 5:04 PM, Liu, Ming (Ming) <ming@esgyn.cn> wrote:
> 
>> Thanks Ted for pointing out this. Can this TableLockManager be used 
>> from a client? I am fine to migrate if this API change for each release.
>> I am writing a client application, and need to lock a hbase table, if 
>> this can be used directly, that will be super great!
>> 
>> Thanks,
>> Ming
>> 
>> -Original Message-
>> From: Ted Yu [mailto:yuzhih...@gmail.com]
>> Sent: Wednesday, August 10, 2016 1:04 AM
>> To: user@hbase.apache.org
>> Subject: Re: help, try to use HBase's checkAndPut() to implement 
>> distribution lock
>> 
>> Please take a look at TableLockManager and its subclasses.
>> 
>> It is marked @InterfaceAudience.Private Meaning the API may change 
>> across releases.
>> 
>> Cheers
>> 
>> On Tue, Aug 9, 2016 at 9:58 AM, Liu, Ming (Ming) <ming@esgyn.cn>
>> wrote:
>> 
>>> Thanks Ted for the questions.
>>> 
>>> 1. what if the process of owner of the lock dies?
>>> I didn't think of this... This is really an issue. I don't have a 
>>> good answer. One possibility is to have a lease of each lock, owner 
>>> must renew it periodically until release it. The getLock can check 
>>> the last timestamp to see if it is expired. But I have no idea how 
>>> the lock owner can periodic renew the lock lease, spawn a new thread 
>>> to do that? So, this makes me trying to abandon this idea.
>>> 
>>> 2. How can other processes obtain the lock?
>>> The getLock() has input param of table, so any process can give the 
>>> tablename and invoke getLock(), it check the row 0 value in an 
>>> atomic check and put operation. So if the 'table lock' is free, 
>>> anyone should be able to get it I think.
>>> 
>>> Maybe I have to study the Zookeeper's distributed lock recipes?
>>> 
>>> Thanks,
>>> Ming
>>> 
>>> -Original Message-
>>> From: Ted Yu [mailto:yuzhih...@gmail.com]
>>> Sent: Wednesday, August 10, 2016 12:26 AM
>>> To: user@hbase.apache.org
>>> Subject: Re: help, try to use HBase's checkAndPut() to implement 
>>> distribution lock
>>> 
>>> What if the process of owner of the lock dies ?
>>> How can other processes obtain the lock ?
>>> 
>>> Cheers
>>> 
>>> On Tue, Aug 9, 2016 at 8:19 AM, Liu, Ming (Ming) <ming@esgyn.cn>
>>> wrote:
>>> 
>>>> Hi, all,
>>>> 
>>>> I want to implement a simple 'table lock' in HBase. My current 
>>>> idea is for each table, I choose a special rowkey which NEVER be 
>>>> used in real data, and then use this row as a 'table level lock'.
>>>> 
>>>> The getLock() will be:
>>>> 
>>>>  getLock(table, cf, cl)
>>>>  {
>>>> rowkey = 0 ; //which never used in real data
>>>> 
>>>>//generate a UUID, thread ID + an atomic increamental sequence 
>>>> number for example. Call it myid
>>>> genMyid(myid);
>>>> 
>>>> while(TRUE)
>>>> {
>>>> ret = table.checkAndPut ( rowkey, cf , cl , '0' , myid);
>>>> if (ret == true ) get the lock
>>>> return myid;
>>>> else
>>>> sleep and continue; //retry, maybe can retry for 
>>>> timeout here, or wait forever here
>>>>  }
>>>>   }
>>>> 
>>>> The releaseLock() may be:
>>>> 
>>>>   releaseLock(table, cf, cl, myid)
>>>>   {
>>>>   Rowkey = 0;
>>>>ret = checkAndPut ( rowkey , cf , cl , myid , '0' );  //If 
>>>> I am holding the lock
>>>>if (ret == TRUE) return TRUE;
>>>>else  return FALSE;
>>>> }
>>>> 
>>>> So one caller get lock, and others must wait until the caller 
>>>> release it, and only the lock holder can release the lock. So if 
>>>> one getLock(), it can then modify the table, others during this 
>>>> period must
>>> wait.
>>>> 
>>>> I am very new in lock implementation, so I fear there are basic 
>>>> problems in this 'design'.
>>>> So please help me to review if there are any big issues about this
>> idea?
>>>> Any help will be very appreciated.
>>>> 
>>>> Thanks a lot,
>>>> Ming
>>>> 
>>> 
>>