Re: New Apache HBase blog post: Introduction "hbtop", a real-time monitoring tool for HBase modeled after Unix's 'top' command

2019-09-10 Thread Jean-Marc Spaggiari
Impressive good interesting work! Thanks for doing that!

JMS

Le mar. 10 sept. 2019 à 19:53, Toshihiro Suzuki  a
écrit :

> Hi folks,
>
> Thanks to Stack, a new blog post for hbtop is published on the apache hbase
> site.
>
> Check it out at
> https://blogs.apache.org/hbase/entry/introduction-hbtop-a-real-time
>
> Regards,
> Toshihiro Suzuki
>


Re: Region in RIT (CLOSING) , How to fix it ?

2019-09-02 Thread Jean-Marc Spaggiari
Hi Syni,

Have you tried using HBCK2?

JMS

Le lun. 2 sept. 2019 07 h 57, Syni Guo  a écrit :

>
>
> Hbase version : 2.1.3
>
>
> There are 2 region in RIT (CLOSING) , How to fix it ? , I try to unassign
> it ,but timeout failed .
>
>
> hbase(main):032:0> unassign '05785869685e6ec948c03c2076b8',true
>
> ERROR: Call id=13259, waitTime=10009, rpcTimeout=1
>
> For usage try 'help "unassign”'
>
>
>
> Logs;
>
> 2019-09-02 19:50:54,453 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: STUCK Region-In-Transition rit=CLOSING,
> location=tx-220-70-27.h.chinabank.com.cn,60020,1567410218074,
> table=alpha_daas:device_data_details,
> region=05785869685e6ec948c03c2076b8
> 2019-09-02 19:50:54,453 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: STUCK Region-In-Transition rit=CLOSING,
> location=tx-220-70-27.h.chinabank.com.cn,60020,1567410218074,
> table=alpha_daas:poi_unicom_stat, region=42de1052551760e45cb7ba2684d586f8
>
>
>
> 2019-09-02 19:51:29,365 INFO  [PEWorker-1]
> procedure.MasterProcedureScheduler: Waiting on xlock for pid=21097026,
> state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure
> table=alpha_daas:device_data_details,
> region=05785869685e6ec948c03c2076b8, server=
> tx-220-70-27.h.chinabank.com.cn,60020,1567410218074 held by pid=21075720
> 2019-09-02 19:51:39,581 INFO  [PEWorker-11]
> procedure.MasterProcedureScheduler: Waiting on xlock for pid=21097027,
> state=RUNNABLE:REGION_TRANSITION_DISPATCH; UnassignProcedure
> table=alpha_daas:device_data_details,
> region=05785869685e6ec948c03c2076b8, server=
> tx-220-70-27.h.chinabank.com.cn,60020,1567410218074 held by pid=21075720
>
>
>
>


Re: does failure of write to memstore has any impact on response code from hbase

2019-08-08 Thread Jean-Marc Spaggiari
Don't we first write to WAL and then memstore? Getting confused here, sorry.

JMS

Le mer. 7 août 2019 13 h 22, Udai Bhan Kashyap (BLOOMBERG/ PRINCETON) <
ukashy...@bloomberg.net> a écrit :

> memstore is written first and if write(s) to WAL fails, they are rolled
> back.
>
> From: dev@hbase.apache.org At: 08/07/19 13:15:27To:  dev@hbase.apache.org
> Subject: Re: does failure of write to memstore has any impact on response
> code from hbase
>
> On Mon, Aug 5, 2019 at 9:36 PM Maneesh Bhunwal 
> wrote:
>
> > Hi Team,
> >
> > First fo all thanks for the awesome product.
> >
> > Can you please help me with how will application behave when write to
> > memstore fails but write to WAL has succeeded already,will we return
> > success to the user or failure?
> >
> > Failure. Client will get an exception that varies dependent on failure
> type.
>
>
> > Also what if db crashes after writing to WAL, when db will come up after
> > crash, it will assume that whatever is there in WAL has been replicated,
> > but that may not be the case.
> >
> >
> DB does not make progress until the write to the DB has been sync'd (which
> means the edit has been replicated). So on crash, the WAL will be replayed.
> The edit that was written to the WAL but not to the memstore on which the
> client received an exception, will perpetuate.
>
> S
>
>
> > Can you please help me with the above?
> >
> > Regards
> > Maneesh Bhunwal
> >
>
>
>


Re: Adding a new balancer to HBase

2019-06-20 Thread Jean-Marc Spaggiari
Bonjour Pierre,

Some time ago I build (for my own purposes) something similar that I called
"LoadBasedLoadBalancer" that moves the regions based on my servers load and
capacity. The load balancer is querying the region servers to get the
number of cores, the allocated heap, the 5 minutes average load, etc. and
balanced the regions based on that.

I felt that need already years ago. What you are proposing is a simplified
version that will most probably be more stable and easier to implement. I
will be happy to assist you in the process or getting that into HBase.

Have you already opened the JIRA to support that?

Thanks,

JMS

Le jeu. 20 juin 2019 à 01:11, ramkrishna vasudevan <
ramkrishna.s.vasude...@gmail.com> a écrit :

> Seems a very good idea for cloud servers. Pls feel free to raise a JIRA and
> contribute your patch.
>
> Regards
> Ram
>
> On Tue, Jun 18, 2019 at 8:09 AM 刘新星  wrote:
>
> >
> >
> > I'm interested on this. It sounds like a weighted load balancer and
> > valuable for those users deploy their hbase cluster on cloud servers.
> > You can create a jira and make a patch for better discussion.
> >
> >
> >
> >
> >
> >
> >
> > At 2019-06-18 05:00:54, "Pierre Zemb" 
> wrote:
> > >Hi!
> > >
> > >My name is Pierre, I'm working at OVH, an European cloud-provider. Our
> > >team, Observability, is heavily relying on HBase to store telemetry. We
> > >would like to open the discussion about adding into 1.4X and 2.X a new
> > >Balancer.
> > ><
> >
> https://gist.github.com/PierreZ/15560e12c147e661e5c1b5f0edeb9282#our-situation
> > >Our
> > >situation
> > >
> > >The Observability team in OVH is responsible to handle logs and metrics
> > >from all servers/applications/equipments within OVH. HBase is used as
> the
> > >datastore for metrics. We are using an open-source software called
> Warp10
> > > to handle all the metrics coming from OVH's
> > >infrastructure. We are operating three HBase 1.4 clusters, including one
> > >with 218 RegionServers which is growing every month.
> > >
> > >We found out that *in our usecase*(single table, dedicated HBase and
> > Hadoop
> > >tuned for our usecase, good key distribution)*, the number of regions
> per
> > >RS was the real limit for us*.
> > >
> > >Over the years, due to historical reasons and also the need to benchmark
> > >new machines, we ended-up with differents groups of hardware: some
> servers
> > >can handle only 180 regions, whereas the biggest can handle more than
> 900.
> > >Because of such a difference, we had to disable the LoadBalancing to
> avoid
> > >the roundRobinAssigmnent. We developed some internal tooling which are
> > >responsible for load balancing regions across RegionServers. That was
> 1.5
> > >year ago.
> > >
> > >Today, we are thinking about fully integrate it within HBase, using the
> > >LoadBalancer interface. We started working on a new Balancer called
> > >HeterogeneousBalancer, that will be able to fullfill our need.
> > ><
> >
> https://gist.github.com/PierreZ/15560e12c147e661e5c1b5f0edeb9282#how-does-it-works
> > >How
> > >does it works?
> > >
> > >A rule file is loaded before balancing. It contains lines of rules. A
> rule
> > >is composed of a regexp for hostname, and a limit. For example, we could
> > >have:
> > >
> > >rs[0-9] 200
> > >rs1[0-9] 50
> > >
> > >RegionServers with hostname matching the first rules will have a limit
> of
> > >200, and the others 50. If there's no match, a default is set.
> > >
> > >Thanks to the rule, we have two informations: the max number of regions
> > for
> > >this cluster, and the rules for each servers. HeterogeneousBalancer will
> > >try to balance regions according to their capacity.
> > >
> > >Let's take an example. Let's say that we have 20 RS:
> > >
> > >   - 10 RS, named through rs0 to rs9 loaded with 60 regions each, and
> each
> > >   can handle 200 regions.
> > >   - 10 RS, named through rs10 to rs19 loaded with 60 regions each, and
> > >   each can support 50 regions.
> > >
> > >Based on the following rules:
> > >
> > >rs[0-9] 200
> > >rs1[0-9] 50
> > >
> > >The second group is overloaded, whereas the first group has plenty of
> > space.
> > >
> > >We know that we can handle at maximum *2500 regions* (200*10 + 50*10)
> and
> > >we have currently *1200 regions* (60*20). HeterogeneousBalancer will
> > >understand that the cluster is *full at 48.0%* (1200/2500). Based on
> this
> > >information, we will then *try to put all the RegionServers to ~48% of
> > load
> > >according to the rules.* In this case, it will move regions from the
> > second
> > >group to the first.
> > >
> > >The balancer will:
> > >
> > >   - compute how many regions needs to be moved. In our example, by
> moving
> > >   36 regions on rs10, we could go from 120.0% to 46.0%
> > >   - select regions with lowest data-locality
> > >   - try to find an appropriate RS for the region. We will take the
> lowest
> > >   available RS.
> > >
> > ><
> >
> 

Re: [VOTE] Merge branch HBASE-21512 back to master

2019-06-13 Thread Jean-Marc Spaggiari
Hi,

Is this going to change the way the client should be called? Or it will be
mostly transparent replacement?

Thanks,

JMS

Le jeu. 13 juin 2019 à 02:13, 张铎(Duo Zhang)  a
écrit :

> Josh Elser  于2019年6月12日周三 下午10:00写道:
>
> > Nice perf results!
> >
> > https://issues.apache.org/jira/browse/HBASE-22237 looks like it's also
> > good to be resolved, given
> >
> >
> https://builds.apache.org/job/HBASE%20Nightly/job/HBASE-21512/279/testReport/
> > (TestLogLevel will be fixed on your rebase/merge).
> >
> > Poking through the PR, it looks like the big change is that we're also
> > defaulting over to use the [sync]ConnectionOverAsyncConnection. Good to
> > do it now to help iron things out more. Calling it out to make sure
> > others see this. Is it still possible to use the old Connection impl? (I
> > think the answer is "no").
> >
> No, all the code have been purged...
>
> >
> > Only other question: are there updates for the book that should happen
> > before you move past this? What about "knobs" for configuring retries,
> > internal thread pool(s)? Anything like that you think would be important
> > for people to tweak?
> >
>  Will fill a 'fat' release note soon. I think there will be less parameters
> to tune, as we do not need any thread pools unless you are using
> coprocessor related methods(which are deprecated and we recommend users to
> use the ones in async client interface). The retry config is still the same
> with the old sync client.
>
> >
> > +1
> >
> > On 6/11/19 5:48 AM, 张铎(Duo Zhang) wrote:
> > > Filed  https://issues.apache.org/jira/browse/HBASE-22564
> > >
> > > 张铎(Duo Zhang)  于2019年6月11日周二 下午3:53写道:
> > >
> > >> Let me do a YCSB test about the performance.
> > >>
> > >> Stack  于2019年6月11日周二 下午1:15写道:
> > >>
> > >>> +1 on merge from me.
> > >>>
> > >>> It removes the complicated multi-threaded edifice we'd built
> > client-side
> > >>> to
> > >>> fake an async behavior replacing it with an actual async
> > implementation.
> > >>> Users will immediately notice a radical plummet in working thread
> > count on
> > >>> the client side.
> > >>>
> > >>> For the cleanup of old idioms alone, in test code in particular, the
> > patch
> > >>> is worth merging.
> > >>>
> > >>> Any perf numbers to share comparing old sync and async?
> > >>>
> > >>> What about difference in operation? Is there any commentary or doc or
> > >>> release note to point at?
> > >>>
> > >>> Thanks,
> > >>> S
> > >>>
> > >>>
> > >>>
> > >>> On Mon, Jun 10, 2019 at 6:59 PM 张铎(Duo Zhang)  >
> > >>> wrote:
> > >>>
> >  https://issues.apache.org/jira/browse/HBASE-21512
> > 
> >  "Reimplement sync client based on async client"
> > 
> >  The jira title tells everything. This is what I promised when I
> first
> >  introduced the async client in HBase, about three years ago, that
> the
> > >>> sync
> >  client can be implemented on top of the async client, so we can
> remove
> > >>> the
> >  old sync client implementation, which can reduce our client code
> base
> > a
> >  lot.
> > 
> >  I've already opened a PR here, and received several feedback(thanks
> > >>> stack!)
> > 
> >  https://github.com/apache/hbase/pull/287
> > 
> >  It shows that we add 8,663 lines and remove 31,386 lines.
> > 
> >  This is the flaky dashboard for this branch
> > 
> > 
> > 
> > >>>
> >
> https://builds.apache.org/job/HBASE-Find-Flaky-Tests/job/HBASE-21512/lastSuccessfulBuild/artifact/dashboard.html
> > 
> >  With the recent efforts I think it is getting better.
> > 
> >  Will fill the release note soon, it will be a fat one.
> > 
> >  Please vote
> > 
> >  [] +1
> >  [] +0/-0
> >  [] -1 Do not merge the branch back because ...
> > 
> >  Thanks. Any suggestions are welcomed.
> > 
> > >>>
> > >>
> > >
> >
>


Re: DISCUSS: Is there any idea about ARM CI for Hbase?

2019-06-10 Thread Jean-Marc Spaggiari
Hi,

As long as you have a Java VM, it should work, no? Did you give it a try?
If you did, what kind of issues did you face? Were looking to get the
client running? Or the entire HBase cluster? I saw on the web someone
running an HDFS cluster on Raspberry Pi. So I guess HBase should work fine
too?

Thanks,

JMS

Le lun. 10 juin 2019 04 h 17, bo zhaobo  a
écrit :

> Hi guys,
>
> My name is ZhaoBo, a member from OpenLab CI Team, I had post a issue [1]
> several days ago.
> The reason to do this is for the ARM eco-system and make Hbase can be run
> on more devices.
> So I wish our HBASE core team members can leave some comments or
> suggestions on it.
>
> Thank you very much.
>
> Best regards
>
> [1] https://issues.apache.org/jira/browse/HBASE-22468
>


Re: Flame Graphs

2019-06-03 Thread Jean-Marc Spaggiari
That's absolutely awesome and super useful! Thanks a lot for sharing!

JMS

Le lun. 3 juin 2019 à 09:34, OpenInx  a écrit :

> There's also a doc here: http://hbase.apache.org/book.html#profiler
> Thanks.
>
> On Mon, Jun 3, 2019 at 9:25 PM OpenInx  wrote:
>
> > Hi :
> >
> > In HBASE-21926,  Andrew introduced the great profile tool (async-profiler
> > ) into our HBase.
> > It's very easy to use this:
> > 1.  you need to download the binary of async-profier from here:
> > https://github.com/jvm-profiling-tools/async-profiler
> > 2.  then you need to set a java option which point to the home of
> > async-profiler lib, here I set:
> >  -Dasync.profiler.home=/home/work/soft
> > because my async-profiler is under here:
> >
> > [work@soft]$ pwd
> > /home/work/soft
> > [work@soft]$ tree
> > .
> > ├── async-profiler-1.5-linux-x64.tar.gz
> > ├── build
> > │   ├── async-profiler.jar
> > │   ├── jattach
> > │   └── libasyncProfiler.so
> > ├── CHANGELOG.md
> > ├── LICENSE
> > ├── profiler.sh
> > └── README.md
> >
> > 1 directory, 8 files
> >
> > 3. finally, you can need restart the RS and click the [profier] tab in RS
> > web UI,  then you will see the flame graph.
> >
> > You can also see the java doc for more details:
> >
> >
> https://github.com/apache/hbase/blob/858d30dd30159de70ac62d44c0ee0278708f2dec/hbase-http/src/main/java/org/apache/hadoop/hbase/http/ProfileServlet.java
> >
> > Thanks.
> >
> > On Mon, Jun 3, 2019 at 9:15 PM Jean-Marc Spaggiari <
> > jean-m...@spaggiari.org> wrote:
> >
> >> Hi all,
> >>
> >> Not sure it's the right plateform, but can someone explain me how flame
> >> graphs (Like the one here)
> >> https://issues.apache.org/jira/browse/HBASE-22532 are
> >> generated?
> >>
> >> Thanks,
> >>
> >> JMS
> >>
> >
>


Flame Graphs

2019-06-03 Thread Jean-Marc Spaggiari
Hi all,

Not sure it's the right plateform, but can someone explain me how flame
graphs (Like the one here)
https://issues.apache.org/jira/browse/HBASE-22532 are
generated?

Thanks,

JMS


Re: [DISCUSS] Direction of HBCK2

2019-05-29 Thread Jean-Marc Spaggiari
Personnaly, when I tried to upgrade from 1.4.x to 2.2.x I end up in a
situation where my meta was empty and had to get it repaired, but lacked
OfflineMetaRepair for 2.2.x so I just had to delete all my tables, get a
brand new installation, recreate the tables and bulkload back the data into
them. Would have been happy to have a OfflineMetaRepair.

But it's more like an experimental cluster than a production one...

JMS

Le mer. 29 mai 2019 à 06:36, Wellington Chevreuil <
wellington.chevre...@gmail.com> a écrit :

> Interesting, I haven't seen any cases where OfflineMetaRepair was really
> required, among our customer base (running cdh6.1.x/hbase2.1.1,
> cdh6.2/hbase2.1.2). Majority of RITs issue I had came with on hbase 2.x
> were related to APs/SCPs failures, most of which could be sorted with hbck2
> commands available by then (in some cases, required some CLI scripting to
> build up a "bulk" assign command).
>
> Em qua, 29 de mai de 2019 às 00:55, Toshihiro Suzuki 
> escreveu:
>
> > Hi Josh,
> >
> > Thank you for the explanation. I agree with the direction for HBCK2.
> >
> > The problem I wanted to tell you in the Jira is that until we implement
> the
> > features
> > you mentioned, we don't have any direct way how to fix holes and
> overlaps.
> > The holes and overlaps can be created by bugs or operation errors, so I
> > think we
> > should be able to fix these issues.
> >
> > I thought OfflineMetaRepair could be a workaround for the issues until we
> > implement
> > the features of HBCK2.
> >
> > Regards,
> > Toshi
> >
> >
> > On Tue, May 28, 2019 at 9:12 AM Josh Elser  wrote:
> >
> > > Context: https://issues.apache.org/jira/browse/HBASE-21665
> > >
> > > I left a comment on the above issue about what I thought good things to
> > > build into HBCK2 would be -- a focus on specific "primitive" operations
> > > that an admin/operator could use to help repair an otherwise broken
> > > HBase installation. Some examples I had in my head were:
> > >
> > > * Create an empty region (to plug a hole)
> > > * Report holes in a region chain
> > >
> > > In my head, the difference for HBCK2 was that we want to give folks the
> > > tools to fix their cluster, but we did not want to own the "just fix
> > > everything" kind of tool that HBCK1 had become. That problem with HBCK1
> > > was that it was often difficult/problematic for us to know how to
> > > correctly fix a problem (the same problem could be corrected in
> > > different ways).
> > >
> > > Andrew had some confusion about this, so I'm not sure if I'm off-base
> or
> > > if we're all in agreement on direction and we just need to do a better
> > > job documenting things. Thanks for keeping me honest either way :)
> > >
> > > And just in case it doesn't go without saying, HBCK2 would be something
> > > that helps fix a system, while we want to always understand the root
> > > cause of how/why we got into a situation where we needed HBCK2 and also
> > > address that.
> > >
> > > - Josh
> > >
> >
>


Re: [VOTE] First release candidate for HBASE 2.2.0 is available

2019-04-13 Thread Jean-Marc Spaggiari
So.

About one hour after I started to run the tests, one of my servers crashed
(Not surprising, it was over heating) and table go corrupted. I restored it
again, and re-run everything for the last 20 hours and there isn't any
issue. I'm now re-activating the normalizer. Seems that where there is
regions re-assigned to a different server, some writes are still going in
the previous server? Don'T know yet :(

JM

Le ven. 12 avr. 2019 à 10:23, Jean-Marc Spaggiari 
a écrit :

> Performed those steps and got a branch new 2.2.1-SNAPSHOT running
> instance. one master, 7 workers. Disabled the splits and the normalizer.
> Will report later in the day when  (if) it fails.
>
> git clone https://github.com/apache/hbase.git hbase-2.2.1
> export JAVA_HOME=/usr/local/jdk1.8.0_151/
> mvn -DskipTests -Dhadoop-two.version=2.7.5 clean install && mvn
> -Dhadoop-two.version=2.7.5 -DskipTests package assembly:single
> for i in {1..8}; do echo node$i; scp
> hbase-assembly/target/hbase-2.2.1-SNAPSHOT-bin.tar.gz node$i:/tmp; done
> for i in {1..8}; do ssh node$i cp /home/hbase/hbase-2.2.0/conf/*
> /home/hbase/hbase-2.2.1-SNAPSHOT/conf/; done
> for i in {1..8}; do ssh node$i tar -xzf
> /tmp/hbase-2.2.1-SNAPSHOT-bin.tar.gz --directory /home/hbase; done
> copy native libs
> start HBase
>
> To restore my table:
> form shell: disable 'stones5'
> from bash hdfs dfs -mv /hbase/data/default/stones5/*/A/* /temp/A/
> from shell:  enable 'stones5'; truncate 'stones5'; disable 'stones5'
> from bash: hdfs dfs -ls /hbase/data/default/stones5/ (Capture region name)
> from bash: hdfs dfs -mv /temp/A/*
> /hbase/data/default/stones5/12546c6eac93c2c9caccddf57e1c020b/A
> from shell: enable 'stones5'
> from shell: major_compact 'stones5'
> Changed the split policy to avoid splitting, disabled normalizer.
>
> Validate that there is just a single store file in the region, validate
> HBCK:
> 0 inconsistencies detected.
> Status: OK
>
> Table ok in UI.
> Splitting the tables few times (split, balancer,  compact, janitor,
> repeat). Got 16 regions, HBCK still ok.
>
> Restarting client application.
>
> Le ven. 12 avr. 2019 à 09:43, Jean-Marc Spaggiari 
> a écrit :
>
>> Overlaps:
>> ERROR: (regions stones5,,1555031586547.96b7f40ded8459db044cdb447d9a12c8.
>> and
>> stones5,\x00\x00\x00\x00\x01\x04\x00\x01\x02\x03\x00\x00\x00\x03\x02\x03\x00\x02\x00\x02,1555035157984.1b1493533a4140732b5a3cfc3da52176.)
>> There is an overlap in the region chain.
>> ERROR: (regions stones5,,1555031586547.96b7f40ded8459db044cdb447d9a12c8.
>> and
>> stones5,\x00\x00\x00\x00\x03\x02\x01\x02\x05\x02\x05\x02\x03\x01\x01\x01\x01\x01\x00\x02,1555046849488.ba3c45f7bb5c9134ba78f35e4454e79f.)
>> There is an overlap in the region chain.
>> ERROR: (regions stones5,,1555031586547.96b7f40ded8459db044cdb447d9a12c8.
>> and
>> stones5,\x00\x00\x01\x01\x01\x01\x00\x00\x00\x03\x01\x03\x00\x02\x01\x01\x02\x00\x00\x02,1555030949303.446399be95af2396c0ed5dfe705a90dc.)
>> There is an overlap in the region chain.
>> ERROR: (regions stones5,,1555031586547.96b7f40ded8459db044cdb447d9a12c8.
>> and
>> stones5,\x00\x00\x01\x03\x02\x05\x00\x02\x01\x00\x01\x00\x01\x05\x02\x01\x01\x00\x00\x01\x01,1555031867671.2ecb6a00a4c3df38b9e6e02546e8f09a.)
>> There is an overlap in the region chain.
>> ERROR: (regions stones5,,1555031586547.96b7f40ded8459db044cdb447d9a12c8.
>> and
>> stones5,\x00\x00\x02\x01\x03\x00\x01\x03\x01\x03\x05\x00\x04\x00\x00\x00\x01,1555031584820.76cbd1d32c6874f522061a8c20cbb792.)
>> There is an overlap in the region chain.
>> ERROR: (regions stones5,,1555031586547.96b7f40ded8459db044cdb447d9a12c8.
>> and
>> stones5,\x00\x01\x00\x00\x03\x01\x00\x04\x02\x03\x03\x00\x04\x00\x03\x04\x00\x01,1555027357980.3792e707669986039cefda8781c31cdf.)
>> There is an overlap in the region chain.
>> ERROR: (regions stones5,,1555031586547.96b7f40ded8459db044cdb447d9a12c8.
>> and
>> stones5,\x00\x01\x01\x03\x03\x00\x00\x00\x00\x01\x01\x02\x02\x01\x00\x04\x00\x00\x00\x02,1555027357980.26ab4bfd50ca39a82227f03254a13ea0.)
>> There is an overlap in the region chain.
>> ERROR: (regions stones5,,1555031586547.96b7f40ded8459db044cdb447d9a12c8.
>> and
>> stones5,\x00\x01\x02\x00\x01\x01\x00\x00\x00\x02\x03\x01\x01\x01\x01\x01\x01\x02,1555054349272.c220877e57ecadf55925a3c927b62e12.)
>> There is an overlap in the region chain.
>> ERROR: (regions stones5,,1555031586547.96b7f40ded8459db044cdb447d9a12c8.
>> and
>> stones5,\x00\x01\x03\x01\x01\x02\x02\x01\x04\x04\x01\x00\x02\x01\x02\x04\x00\x02\x02\x00\x01,1555031249434.e919ba9eafac6f1ac9a42fcd3cbd9bc5.)
>> There is an overlap in the region chain.
>> ERROR: (regions stones5,,1555031586547.96b7f40ded8459db04

Re: [VOTE] First release candidate for HBASE 2.2.0 is available

2019-04-12 Thread Jean-Marc Spaggiari
Performed those steps and got a branch new 2.2.1-SNAPSHOT running instance.
one master, 7 workers. Disabled the splits and the normalizer. Will report
later in the day when  (if) it fails.

git clone https://github.com/apache/hbase.git hbase-2.2.1
export JAVA_HOME=/usr/local/jdk1.8.0_151/
mvn -DskipTests -Dhadoop-two.version=2.7.5 clean install && mvn
-Dhadoop-two.version=2.7.5 -DskipTests package assembly:single
for i in {1..8}; do echo node$i; scp
hbase-assembly/target/hbase-2.2.1-SNAPSHOT-bin.tar.gz node$i:/tmp; done
for i in {1..8}; do ssh node$i cp /home/hbase/hbase-2.2.0/conf/*
/home/hbase/hbase-2.2.1-SNAPSHOT/conf/; done
for i in {1..8}; do ssh node$i tar -xzf
/tmp/hbase-2.2.1-SNAPSHOT-bin.tar.gz --directory /home/hbase; done
copy native libs
start HBase

To restore my table:
form shell: disable 'stones5'
from bash hdfs dfs -mv /hbase/data/default/stones5/*/A/* /temp/A/
from shell:  enable 'stones5'; truncate 'stones5'; disable 'stones5'
from bash: hdfs dfs -ls /hbase/data/default/stones5/ (Capture region name)
from bash: hdfs dfs -mv /temp/A/*
/hbase/data/default/stones5/12546c6eac93c2c9caccddf57e1c020b/A
from shell: enable 'stones5'
from shell: major_compact 'stones5'
Changed the split policy to avoid splitting, disabled normalizer.

Validate that there is just a single store file in the region, validate
HBCK:
0 inconsistencies detected.
Status: OK

Table ok in UI.
Splitting the tables few times (split, balancer,  compact, janitor,
repeat). Got 16 regions, HBCK still ok.

Restarting client application.

Le ven. 12 avr. 2019 à 09:43, Jean-Marc Spaggiari 
a écrit :

> Overlaps:
> ERROR: (regions stones5,,1555031586547.96b7f40ded8459db044cdb447d9a12c8.
> and
> stones5,\x00\x00\x00\x00\x01\x04\x00\x01\x02\x03\x00\x00\x00\x03\x02\x03\x00\x02\x00\x02,1555035157984.1b1493533a4140732b5a3cfc3da52176.)
> There is an overlap in the region chain.
> ERROR: (regions stones5,,1555031586547.96b7f40ded8459db044cdb447d9a12c8.
> and
> stones5,\x00\x00\x00\x00\x03\x02\x01\x02\x05\x02\x05\x02\x03\x01\x01\x01\x01\x01\x00\x02,1555046849488.ba3c45f7bb5c9134ba78f35e4454e79f.)
> There is an overlap in the region chain.
> ERROR: (regions stones5,,1555031586547.96b7f40ded8459db044cdb447d9a12c8.
> and
> stones5,\x00\x00\x01\x01\x01\x01\x00\x00\x00\x03\x01\x03\x00\x02\x01\x01\x02\x00\x00\x02,1555030949303.446399be95af2396c0ed5dfe705a90dc.)
> There is an overlap in the region chain.
> ERROR: (regions stones5,,1555031586547.96b7f40ded8459db044cdb447d9a12c8.
> and
> stones5,\x00\x00\x01\x03\x02\x05\x00\x02\x01\x00\x01\x00\x01\x05\x02\x01\x01\x00\x00\x01\x01,1555031867671.2ecb6a00a4c3df38b9e6e02546e8f09a.)
> There is an overlap in the region chain.
> ERROR: (regions stones5,,1555031586547.96b7f40ded8459db044cdb447d9a12c8.
> and
> stones5,\x00\x00\x02\x01\x03\x00\x01\x03\x01\x03\x05\x00\x04\x00\x00\x00\x01,1555031584820.76cbd1d32c6874f522061a8c20cbb792.)
> There is an overlap in the region chain.
> ERROR: (regions stones5,,1555031586547.96b7f40ded8459db044cdb447d9a12c8.
> and
> stones5,\x00\x01\x00\x00\x03\x01\x00\x04\x02\x03\x03\x00\x04\x00\x03\x04\x00\x01,1555027357980.3792e707669986039cefda8781c31cdf.)
> There is an overlap in the region chain.
> ERROR: (regions stones5,,1555031586547.96b7f40ded8459db044cdb447d9a12c8.
> and
> stones5,\x00\x01\x01\x03\x03\x00\x00\x00\x00\x01\x01\x02\x02\x01\x00\x04\x00\x00\x00\x02,1555027357980.26ab4bfd50ca39a82227f03254a13ea0.)
> There is an overlap in the region chain.
> ERROR: (regions stones5,,1555031586547.96b7f40ded8459db044cdb447d9a12c8.
> and
> stones5,\x00\x01\x02\x00\x01\x01\x00\x00\x00\x02\x03\x01\x01\x01\x01\x01\x01\x02,1555054349272.c220877e57ecadf55925a3c927b62e12.)
> There is an overlap in the region chain.
> ERROR: (regions stones5,,1555031586547.96b7f40ded8459db044cdb447d9a12c8.
> and
> stones5,\x00\x01\x03\x01\x01\x02\x02\x01\x04\x04\x01\x00\x02\x01\x02\x04\x00\x02\x02\x00\x01,1555031249434.e919ba9eafac6f1ac9a42fcd3cbd9bc5.)
> There is an overlap in the region chain.
> ERROR: (regions stones5,,1555031586547.96b7f40ded8459db044cdb447d9a12c8.
> and
> stones5,\x00\x02\x00\x02\x02\x01\x03\x00\x00\x03\x02\x03\x03\x00\x01\x03\x02\x02,1555041449331.d8d72f9649d8c347cd0daefd42fb478f.)
> There is an overlap in the region chain.
> ERROR: (regions stones5,,1555031586547.96b7f40ded8459db044cdb447d9a12c8.
> and
> stones5,\x00\x02\x03\x02\x00\x00\x01\x05\x00\x03\x02\x03\x01\x00\x02\x01\x02\x02,1555037257863.2eb4c29b2b87ce9bca3c9e850b378894.)
> There is an overlap in the region chain.
> ERROR: (regions stones5,,1555031586547.96b7f40ded8459db044cdb447d9a12c8.
> and
> stones5,\x01\x00\x01\x01\x00\x04\x00\x01\x06\x02\x05\x02\x05\x03\x02\x01\x00\x01,1555027668004.9372857902cba5ed444c82c295588ba6.)
> There is an overlap in the region chain.
> ERROR: (regions stones5,,1555031586547.96b7f40ded8459db044cdb447d9a12c8.
> and
> stones5,\x0

Re: [VOTE] First release candidate for HBASE 2.2.0 is available

2019-04-12 Thread Jean-Marc Spaggiari
\x01,1555027366401.8b5b5fea4c95b7bb9e95b99360425d81.)
There is an overlap in the region chain.
ERROR: (regions stones5,,1555031586547.96b7f40ded8459db044cdb447d9a12c8.
and
stones5,\x03\x01\x03\x02\x02\x00\x00\x02\x05\x02\x00\x01\x00\x02\x02\x02,1555034249345.373ef8d16692e7e39ebb81787f138b0f.)
There is an overlap in the region chain.
ERROR: (regions stones5,,1555031586547.96b7f40ded8459db044cdb447d9a12c8.
and
stones5,\x03\x02\x00\x01\x00\x02\x03\x01\x02\x00\x01\x04\x03\x02\x00\x00\x00\x02,1555034249345.cf6fbfdbbb541ca22dfb1de337b3eb42.)
There is an overlap in the region chain.
ERROR: (regions stones5,,1555031586547.96b7f40ded8459db044cdb447d9a12c8.
and
stones5,\x03\x02\x01\x01\x01\x04\x01\x03\x03\x02\x00\x03\x00\x04\x03\x01\x01,1555030049511.d260897d0c8c5d156edc7c3ef220fd9c.)
There is an overlap in the region chain.
ERROR: (regions stones5,,1555031586547.96b7f40ded8459db044cdb447d9a12c8.
and
stones5,\x03\x02\x02\x00\x04\x05\x07\x04\x04\x02\x03\x01\x02\x01\x04\x02\x02,1555023149297.1ecd9a495183397ea8cce6e34805e0b8.)
There is an overlap in the region chain.
ERROR: (regions stones5,,1555031586547.96b7f40ded8459db044cdb447d9a12c8.
and
stones5,\x03\x02\x02\x01\x01\x03\x05\x01\x00\x01\x03\x02\x00\x02\x01\x02,1555030349356.b04b4fd705b3c348c58a91b1670d4fee.)
There is an overlap in the region chain.


I'm re-building the environment. I will start with a brand new clean
directory and will check the logs. The normalizer is running and split some
regions.
2019-04-12 05:07:29,256 INFO  [master/node2:6.Chore.2]
normalizer.SimpleRegionNormalizer: Table stones5, large region
stones5,\x02\x00\x02\x00\x02\x02\x00\x00\x04\x01\x03\x00\x02\x03\x01,1555054054003.aa5fbcf8a100aee7f784c4e32bcdbc0f.
has size 9, more than twice avg size, splitting

2019-04-12 05:12:30,332 INFO  [master/node2:6.Chore.1]
normalizer.MergeNormalizationPlan: Executing merging normalization plan:
MergeNormalizationPlan{firstRegion={ENCODED =>
3d498e6d2748d9ef7cd1a477f1334a1c, NAME =>
'stones5,\x02\x00\x02\x00\x02\x02\x00\x00\x04\x01\x03\x00\x02\x03\x01,1555060049267.3d498e6d2748d9ef7cd1a477f1334a1c.',
STARTKEY => '\x02\x00\x02\x00\x02\x02\x00\x00\x04\x01\x03\x00\x02\x03\x01',
ENDKEY =>
'\x02\x00\x02\x01\x01\x02\x06\x01\x06\x04\x01\x04\x04\x03\x01\x02\x00\x01'},
secondRegion={ENCODED => 8c3e3d8542b405546a8f076fa4311172, NAME =>
'stones5,\x02\x00\x02\x01\x01\x02\x06\x01\x06\x04\x01\x04\x04\x03\x01\x02\x00\x01,1555060049267.8c3e3d8542b405546a8f076fa4311172.',
STARTKEY =>
'\x02\x00\x02\x01\x01\x02\x06\x01\x06\x04\x01\x04\x04\x03\x01\x02\x00\x01',
ENDKEY =>
'\x02\x00\x03\x00\x02\x00\x02\x01\x04\x01\x01\x00\x00\x01\x03\x00\x01\x00\x00\x01'}}


I will try running with and without. But I suspect that splits/merges are
having issues... It seems that all error are related to the first region.

JMS

Le ven. 12 avr. 2019 à 09:29, 张铎(Duo Zhang)  a
écrit :

> What is the 'inconsistency'? Overlapped regions? Or holes between regions?
> Have checked the SPLIT/MERGE related logs on master?
>
> Jean-Marc Spaggiari  于2019年4月12日周五 下午9:23写道:
>
> > Same error.
> >
> > I will re-pull, re-build, re-deployed. I clean clean the table again,
> > sanitize everything, document all the steps and report here...What is
> > interesting is that I have 2 tables but only one get corrupted.
> >
> > JMS
> >
> > Le jeu. 11 avr. 2019 à 15:55, Jean-Marc Spaggiari <
> jean-m...@spaggiari.org
> > >
> > a écrit :
> >
> > > Ok. Let me pull the last 2.2 branch, build that, deploy and retry... I
> > > might be able to let you know tomorrow morning.
> > >
> > > Le jeu. 11 avr. 2019 à 14:19, Sean Busbey  a écrit
> :
> > >
> > >> Maybe worth trying out the HEAD of branch-2.2 prior to the next RC?
> That
> > >> way if you hit a blocker like your current ones we can fix them before
> > RC
> > >> generation happens.
> > >>
> > >> On Thu, Apr 11, 2019, 13:08 Jean-Marc Spaggiari <
> > jean-m...@spaggiari.org>
> > >> wrote:
> > >>
> > >> > Thanks Guanghao.
> > >> >
> > >> > I'm having consistant issues with RC0. I had some issues with a
> table
> > >> this
> > >> > morning. Dropped it, recreated it, made it clean, restart my
> > >> application,
> > >> > and I have again some inconsistencies on the regions. There is
> > something
> > >> > wrong somewhere. I'm not able to identify where, but I can easily
> get
> > my
> > >> > tables corrupted by running my client application. So I'm eager to
> try
> > >> the
> > >> > next RC to see if it helps.
> > >> >
> > >> > JMS
> > >> >
> > >> > JMS
> > >> >
> > >>

Re: [VOTE] First release candidate for HBASE 2.2.0 is available

2019-04-12 Thread Jean-Marc Spaggiari
Same error.

I will re-pull, re-build, re-deployed. I clean clean the table again,
sanitize everything, document all the steps and report here...What is
interesting is that I have 2 tables but only one get corrupted.

JMS

Le jeu. 11 avr. 2019 à 15:55, Jean-Marc Spaggiari 
a écrit :

> Ok. Let me pull the last 2.2 branch, build that, deploy and retry... I
> might be able to let you know tomorrow morning.
>
> Le jeu. 11 avr. 2019 à 14:19, Sean Busbey  a écrit :
>
>> Maybe worth trying out the HEAD of branch-2.2 prior to the next RC? That
>> way if you hit a blocker like your current ones we can fix them before RC
>> generation happens.
>>
>> On Thu, Apr 11, 2019, 13:08 Jean-Marc Spaggiari 
>> wrote:
>>
>> > Thanks Guanghao.
>> >
>> > I'm having consistant issues with RC0. I had some issues with a table
>> this
>> > morning. Dropped it, recreated it, made it clean, restart my
>> application,
>> > and I have again some inconsistencies on the regions. There is something
>> > wrong somewhere. I'm not able to identify where, but I can easily get my
>> > tables corrupted by running my client application. So I'm eager to try
>> the
>> > next RC to see if it helps.
>> >
>> > JMS
>> >
>> > JMS
>> >
>> > Le jeu. 11 avr. 2019 à 13:33, Guanghao Zhang  a
>> écrit
>> > :
>> >
>> > > Sorry for late... I am  testing the ITBLL for 2.2.0 and it passed
>> > > yesterday. See  https://issues.apache.org/jira/browse/HBASE-21886. I
>> > will
>> > > generate RC1 later. Thanks.
>> > >
>> > > Jean-Marc Spaggiari  于2019年4月11日周四
>> 下午10:52写道:
>> > >
>> > > > Hi all,
>> > > >
>> > > > Any chance to get an updated RC soon?
>> > > >
>> > > > JM
>> > > >
>> > > > Le mar. 12 mars 2019 à 12:23, Sean Busbey  a
>> écrit
>> > :
>> > > >
>> > > > > quick follow-up here. the full version info is in fact missing the
>> > > > > revision:
>> > > > >
>> > > > > hbase-2.2.0 busbey$ ./bin/hbase version
>> > > > > HBase 2.2.0
>> > > > > Source code repository
>> > > > > git://hao-OptiPlex-7050/home/hao/open_source/hbase revision=
>> > > > > Compiled by hao on 2019年 03月 07日 星期四 14:05:34 CST
>> > > > > From source with checksum 783fee467bb1b28666f0d904437862c4
>> > > > >
>> > > > > I think this is the issue stack ran into on
>> HBASE-21935/HBASE-21999
>> > > > > where HBASE-20764 introduced a git cli option that issn't
>> supported
>> > on
>> > > > > older versions of git.
>> > > > >
>> > > > > Guanghao for the next RC would it be possible to update your local
>> > git
>> > > > > version?
>> > > > >
>> > > > > On Tue, Mar 12, 2019 at 9:37 AM Sean Busbey 
>> > wrote:
>> > > > > >
>> > > > > > locale of the build is up to the RM (this is why, for example,
>> the
>> > > 2.1
>> > > > > > release line javadocs have chinese for the boilerplate text[1])
>> > > > > >
>> > > > > > however, it does look like that shell output might be missing
>> the
>> > > > > > build revision information from git or we might not be properly
>> > > > > > parsing the output from git when a non-english locale is used.
>> > > > > >
>> > > > > > [1]: http://hbase.apache.org/2.1/apidocs/index.html
>> > > > > >
>> > > > > >
>> > > > > > On Tue, Mar 12, 2019 at 8:54 AM Jean-Marc Spaggiari
>> > > > > >  wrote:
>> > > > > > >
>> > > > > > > Also, in the shell,it displays Asian texte:
>> > > > > > >
>> > > > > > > "Version 2.2.0, r, 2019年 03月 07日 星期四 14:05:34 CST"
>> > > > > > >
>> > > > > > > Not sure if that's we want.
>> > > > > > >
>> > > > > > > JMS
>> > > > > > >
>> > > > > > > Le lun. 11 mars 2019 à 21:53, Guanghao Zhang <
>> zghao...@gmail.com
>> > >
>> > > a
>> > > > > écrit :
>> > >

[jira] [Resolved] (HBASE-7297) Allow load balancer to accommodate different region server configurations

2019-04-11 Thread Jean-Marc Spaggiari (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-7297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari resolved HBASE-7297.

Resolution: Duplicate

Duplicate of HBASE-11780.

> Allow load balancer to accommodate different region server configurations
> -
>
> Key: HBASE-7297
> URL: https://issues.apache.org/jira/browse/HBASE-7297
> Project: HBase
>  Issue Type: New Feature
>  Components: Balancer
>Reporter: Ted Yu
>Priority: Major
>
> Robert Dyer raised the following scenario under the thread of 'Multiple 
> regionservers on a single node':
> {quote}
> I have a very small cluster where all nodes are identical.  However, I was
> just given a very powerful node to add into this cluster which effectively
> doubles the total CPUs, RAM, and HDDs in the cluster.
> As such, when I run a MR job half the jobs go to this single, new node yet
> most of the data is not local due to HBase balancing the regions.
> {quote}
> Load balancer should take region server config (total heap in the above case) 
> into account when allocating regions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] First release candidate for HBASE 2.2.0 is available

2019-04-11 Thread Jean-Marc Spaggiari
Ok. Let me pull the last 2.2 branch, build that, deploy and retry... I
might be able to let you know tomorrow morning.

Le jeu. 11 avr. 2019 à 14:19, Sean Busbey  a écrit :

> Maybe worth trying out the HEAD of branch-2.2 prior to the next RC? That
> way if you hit a blocker like your current ones we can fix them before RC
> generation happens.
>
> On Thu, Apr 11, 2019, 13:08 Jean-Marc Spaggiari 
> wrote:
>
> > Thanks Guanghao.
> >
> > I'm having consistant issues with RC0. I had some issues with a table
> this
> > morning. Dropped it, recreated it, made it clean, restart my application,
> > and I have again some inconsistencies on the regions. There is something
> > wrong somewhere. I'm not able to identify where, but I can easily get my
> > tables corrupted by running my client application. So I'm eager to try
> the
> > next RC to see if it helps.
> >
> > JMS
> >
> > JMS
> >
> > Le jeu. 11 avr. 2019 à 13:33, Guanghao Zhang  a
> écrit
> > :
> >
> > > Sorry for late... I am  testing the ITBLL for 2.2.0 and it passed
> > > yesterday. See  https://issues.apache.org/jira/browse/HBASE-21886. I
> > will
> > > generate RC1 later. Thanks.
> > >
> > > Jean-Marc Spaggiari  于2019年4月11日周四 下午10:52写道:
> > >
> > > > Hi all,
> > > >
> > > > Any chance to get an updated RC soon?
> > > >
> > > > JM
> > > >
> > > > Le mar. 12 mars 2019 à 12:23, Sean Busbey  a
> écrit
> > :
> > > >
> > > > > quick follow-up here. the full version info is in fact missing the
> > > > > revision:
> > > > >
> > > > > hbase-2.2.0 busbey$ ./bin/hbase version
> > > > > HBase 2.2.0
> > > > > Source code repository
> > > > > git://hao-OptiPlex-7050/home/hao/open_source/hbase revision=
> > > > > Compiled by hao on 2019年 03月 07日 星期四 14:05:34 CST
> > > > > From source with checksum 783fee467bb1b28666f0d904437862c4
> > > > >
> > > > > I think this is the issue stack ran into on HBASE-21935/HBASE-21999
> > > > > where HBASE-20764 introduced a git cli option that issn't supported
> > on
> > > > > older versions of git.
> > > > >
> > > > > Guanghao for the next RC would it be possible to update your local
> > git
> > > > > version?
> > > > >
> > > > > On Tue, Mar 12, 2019 at 9:37 AM Sean Busbey 
> > wrote:
> > > > > >
> > > > > > locale of the build is up to the RM (this is why, for example,
> the
> > > 2.1
> > > > > > release line javadocs have chinese for the boilerplate text[1])
> > > > > >
> > > > > > however, it does look like that shell output might be missing the
> > > > > > build revision information from git or we might not be properly
> > > > > > parsing the output from git when a non-english locale is used.
> > > > > >
> > > > > > [1]: http://hbase.apache.org/2.1/apidocs/index.html
> > > > > >
> > > > > >
> > > > > > On Tue, Mar 12, 2019 at 8:54 AM Jean-Marc Spaggiari
> > > > > >  wrote:
> > > > > > >
> > > > > > > Also, in the shell,it displays Asian texte:
> > > > > > >
> > > > > > > "Version 2.2.0, r, 2019年 03月 07日 星期四 14:05:34 CST"
> > > > > > >
> > > > > > > Not sure if that's we want.
> > > > > > >
> > > > > > > JMS
> > > > > > >
> > > > > > > Le lun. 11 mars 2019 à 21:53, Guanghao Zhang <
> zghao...@gmail.com
> > >
> > > a
> > > > > écrit :
> > > > > > >
> > > > > > > > Let me start a new RC1. HBASE-21970 should be included and
> > need a
> > > > > release
> > > > > > > > note.
> > > > > > > >
> > > > > > > > Sean Busbey  于2019年3月12日周二 上午8:35写道:
> > > > > > > >
> > > > > > > > > I'm -1 on RC0 as it is.
> > > > > > > > >
> > > > > > > > > The current release notes don't include any call out about
> > the
> > > > > upgrade
> > > > > > > > > steps needed. Since we don't usually have minor-v

[jira] [Resolved] (HBASE-22213) Create a Java based BulkLoadPartitioner

2019-04-11 Thread Jean-Marc Spaggiari (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari resolved HBASE-22213.
-
Resolution: Later

> Create a Java based BulkLoadPartitioner
> ---
>
> Key: HBASE-22213
> URL: https://issues.apache.org/jira/browse/HBASE-22213
> Project: HBase
>  Issue Type: New Feature
>Affects Versions: 2.1.4
>    Reporter: Jean-Marc Spaggiari
>    Assignee: Jean-Marc Spaggiari
>Priority: Minor
>
> We have a scala based partitionner, but not all projects are build in Scala. 
> We should provide a Java based version of it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-22213) Create a Java based BulkLoadPartitioner

2019-04-11 Thread Jean-Marc Spaggiari (JIRA)
Jean-Marc Spaggiari created HBASE-22213:
---

 Summary: Create a Java based BulkLoadPartitioner
 Key: HBASE-22213
 URL: https://issues.apache.org/jira/browse/HBASE-22213
 Project: HBase
  Issue Type: New Feature
Affects Versions: 2.1.4
Reporter: Jean-Marc Spaggiari
Assignee: Jean-Marc Spaggiari


We have a scala based partitionner, but not all projects are build in Scala. We 
should provide a Java based version of it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] First release candidate for HBASE 2.2.0 is available

2019-04-11 Thread Jean-Marc Spaggiari
Thanks Guanghao.

I'm having consistant issues with RC0. I had some issues with a table this
morning. Dropped it, recreated it, made it clean, restart my application,
and I have again some inconsistencies on the regions. There is something
wrong somewhere. I'm not able to identify where, but I can easily get my
tables corrupted by running my client application. So I'm eager to try the
next RC to see if it helps.

JMS

JMS

Le jeu. 11 avr. 2019 à 13:33, Guanghao Zhang  a écrit :

> Sorry for late... I am  testing the ITBLL for 2.2.0 and it passed
> yesterday. See  https://issues.apache.org/jira/browse/HBASE-21886. I will
> generate RC1 later. Thanks.
>
> Jean-Marc Spaggiari  于2019年4月11日周四 下午10:52写道:
>
> > Hi all,
> >
> > Any chance to get an updated RC soon?
> >
> > JM
> >
> > Le mar. 12 mars 2019 à 12:23, Sean Busbey  a écrit :
> >
> > > quick follow-up here. the full version info is in fact missing the
> > > revision:
> > >
> > > hbase-2.2.0 busbey$ ./bin/hbase version
> > > HBase 2.2.0
> > > Source code repository
> > > git://hao-OptiPlex-7050/home/hao/open_source/hbase revision=
> > > Compiled by hao on 2019年 03月 07日 星期四 14:05:34 CST
> > > From source with checksum 783fee467bb1b28666f0d904437862c4
> > >
> > > I think this is the issue stack ran into on HBASE-21935/HBASE-21999
> > > where HBASE-20764 introduced a git cli option that issn't supported on
> > > older versions of git.
> > >
> > > Guanghao for the next RC would it be possible to update your local git
> > > version?
> > >
> > > On Tue, Mar 12, 2019 at 9:37 AM Sean Busbey  wrote:
> > > >
> > > > locale of the build is up to the RM (this is why, for example, the
> 2.1
> > > > release line javadocs have chinese for the boilerplate text[1])
> > > >
> > > > however, it does look like that shell output might be missing the
> > > > build revision information from git or we might not be properly
> > > > parsing the output from git when a non-english locale is used.
> > > >
> > > > [1]: http://hbase.apache.org/2.1/apidocs/index.html
> > > >
> > > >
> > > > On Tue, Mar 12, 2019 at 8:54 AM Jean-Marc Spaggiari
> > > >  wrote:
> > > > >
> > > > > Also, in the shell,it displays Asian texte:
> > > > >
> > > > > "Version 2.2.0, r, 2019年 03月 07日 星期四 14:05:34 CST"
> > > > >
> > > > > Not sure if that's we want.
> > > > >
> > > > > JMS
> > > > >
> > > > > Le lun. 11 mars 2019 à 21:53, Guanghao Zhang 
> a
> > > écrit :
> > > > >
> > > > > > Let me start a new RC1. HBASE-21970 should be included and need a
> > > release
> > > > > > note.
> > > > > >
> > > > > > Sean Busbey  于2019年3月12日周二 上午8:35写道:
> > > > > >
> > > > > > > I'm -1 on RC0 as it is.
> > > > > > >
> > > > > > > The current release notes don't include any call out about the
> > > upgrade
> > > > > > > steps needed. Since we don't usually have minor-version
> specific
> > > > > > > upgrade steps and especially since there are things folks need
> to
> > > do
> > > > > > > before installing 2.2.0, it's important that they be front and
> > > center.
> > > > > > > Possibly that should mean a link to the ref guide section from
> > the
> > > RC
> > > > > > > instructions and eventual announcement.
> > > > > > >
> > > > > > > I think either HBASE-21075 needs to have 2.2.0 included in its
> > fix
> > > > > > > version or the release note from that issue needs to be copied
> > > over to
> > > > > > > HBASE-21970 and it needs to have 2.2.0 included in its fix
> > > version(s).
> > > > > > > In either case the release notes should link to the ref guide
> > > section.
> > > > > > >
> > > > > > > On Thu, Mar 7, 2019 at 3:44 AM Guanghao Zhang <
> > zghao...@gmail.com>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > Please vote on this release candidate (RC) for Apache HBase
> > > 2.2.0.
> > > > > > > > This is the first release of the branch-2

Any bulkload issue with 2.2.0?

2019-04-11 Thread Jean-Marc Spaggiari
Trying to bulkload a single HFile in a single region empty table on a
sleeping 2.2.0 cluster, I get this:

2019-04-11 11:22:40,594 INFO  [LoadIncrementalHFiles-0] compress.CodecPool:
Got brand-new decompressor [.snappy]
2019-04-11 11:22:40,632 INFO  [LoadIncrementalHFiles-0]
tool.LoadIncrementalHFiles: Trying to load hfile=hdfs://
node2.distparser.com:8020/source/A/fd8dd05fbee84a8688733de648eb2a23
first=Optional[\x02\x02\x01\x02\x03\x01\x02\x03\x03\x00\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F]
last=Optional[\x02\x02\x02\x01\x08\x01\x05\x01\x00\x00\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F]
2019-04-11 11:22:40,737 WARN  [LoadIncrementalHFiles-1]
tool.LoadIncrementalHFiles: Attempt to bulk load region containing  into
table stones5 with files [family:A path:hdfs://
node2.distparser.com:8020/source/A/fd8dd05fbee84a8688733de648eb2a23]
failed.  This is recoverable and they will be retried.
2019-04-11 11:22:40,746 INFO  [main] tool.LoadIncrementalHFiles: Split
occurred while grouping HFiles, retry attempt 1 with 1 files remaining to
group or split
2019-04-11 11:22:40,758 INFO  [LoadIncrementalHFiles-2]
tool.LoadIncrementalHFiles: Trying to load hfile=hdfs://
node2.distparser.com:8020/source/A/fd8dd05fbee84a8688733de648eb2a23
first=Optional[\x02\x02\x01\x02\x03\x01\x02\x03\x03\x00\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F]
last=Optional[\x02\x02\x02\x01\x08\x01\x05\x01\x00\x00\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F\x7F]
2019-04-11 11:22:40,816 WARN  [LoadIncrementalHFiles-3]
tool.LoadIncrementalHFiles: Attempt to bulk load region containing  into
table stones5 with files [family:A path:hdfs://
node2.distparser.com:8020/source/A/fd8dd05fbee84a8688733de648eb2a23]
failed.  This is recoverable and they will be retried.
2019-04-11 11:22:40,824 INFO  [main] tool.LoadIncrementalHFiles: Split
occurred while grouping HFiles, retry attempt 2 with 1 files remaining to
group or split

Bulkload seems to think that the destination table is splitting while
loading.

Destination table:
 jmspaggi@node8:~/Othello$ hdfs dfs -ls /hbase/data/default/stones5/*
Found 1 items
-rw-r--r--   3 hbase supergroup507 2019-04-11 11:22
/hbase/data/default/stones5/.tabledesc/.tableinfo.01
Found 2 items
-rw-r--r--   3 hbase supergroup 42 2019-04-11 11:22
/hbase/data/default/stones5/052f1fdf28ae75754d28f2ed7fafd6c6/.regioninfo
drwxr-xr-x   - hbase supergroup  0 2019-04-11 11:22
/hbase/data/default/stones5/052f1fdf28ae75754d28f2ed7fafd6c6/A


And meta seems to be clean:
 stones5 column=table:state, timestamp=1554996138033, value=\x08\x00
 stones5,,1554996135582.052f1fdf28ae75754d28f2ed7fafd6c6.
column=info:regioninfo, timestamp=1554996137061, value={ENCODED =>
052f1fdf28ae75754d28f2ed7fafd6c6, NAME =>
'stones5,,1554996135582.052f1fdf28ae75754d28f2ed7fafd6c6.', STARTKEY => '',
ENDKEY => ''}
 stones5,,1554996135582.052f1fdf28ae75754d28f2ed7fafd6c6.
column=info:seqnumDuringOpen, timestamp=1554996137061,
value=\x00\x00\x00\x00\x00\x00\x00\x02
 stones5,,1554996135582.052f1fdf28ae75754d28f2ed7fafd6c6.
column=info:server, timestamp=1554996137061, value=
node6.distparser.com:16020
 stones5,,1554996135582.052f1fdf28ae75754d28f2ed7fafd6c6.
column=info:serverstartcode, timestamp=1554996137061, value=1554893230076
 stones5,,1554996135582.052f1fdf28ae75754d28f2ed7fafd6c6. column=info:sn,
timestamp=1554996136579, value=node6.distparser.com,16020,1554893230076
 stones5,,1554996135582.052f1fdf28ae75754d28f2ed7fafd6c6.
column=info:state, timestamp=1554996137061, value=OPEN

Looking at the code, it uses some deprecated functions and types.

I went the dirty way and just manually moved all my files into the table
region, but I think there is something to be looked at here :-/

JMS


Re: HBase 2.2.0 overlapping regions

2019-04-11 Thread Jean-Marc Spaggiari
Scanning the tables from the shell returns this:
ERROR Java::JavaIo::IOException: Unable to find region for
\x03\x02\x03\x01\x01\x02\x01\x01\x01\x01\x04\x00\x04\x02\x00\x00\x02\x01\x00\x00\x01
in stones5

Scanning the meta table returns 22regions (which seems to be normal).
Output is there: https://pastebin.com/axbG5C87

I will remove all HFiles and bulkload them back then I will try to
reproduce the issue...

Le jeu. 11 avr. 2019 à 10:44, Jean-Marc Spaggiari 
a écrit :

> Hi all,
>
> I don't know how i ended up in this stage, but I cleared /hbase in both ZK
> and HDFS 3 days ago. created 3 brand new tables, started to put some data
> on them. Twice a day I run a split command, to get a better spread.
>
> After 3 days I have a lot of regions overlaps. HBCK output is there:
> https://pastebin.com/y4awxmN5
>
> Now, how can this be repaired? If I understand correctly, -fixHdfsOverlaps
> can't be used anymore. I can move the HFiles, drop the table, re-create,
> reload and split again, but I have a job running, that's supposed to last 6
> days. I would have liked to keep it running.
>
> Any idea why there is such overlaps?
>
> JMS
>


Re: [VOTE] First release candidate for HBASE 2.2.0 is available

2019-04-11 Thread Jean-Marc Spaggiari
Hi all,

Any chance to get an updated RC soon?

JM

Le mar. 12 mars 2019 à 12:23, Sean Busbey  a écrit :

> quick follow-up here. the full version info is in fact missing the
> revision:
>
> hbase-2.2.0 busbey$ ./bin/hbase version
> HBase 2.2.0
> Source code repository
> git://hao-OptiPlex-7050/home/hao/open_source/hbase revision=
> Compiled by hao on 2019年 03月 07日 星期四 14:05:34 CST
> From source with checksum 783fee467bb1b28666f0d904437862c4
>
> I think this is the issue stack ran into on HBASE-21935/HBASE-21999
> where HBASE-20764 introduced a git cli option that issn't supported on
> older versions of git.
>
> Guanghao for the next RC would it be possible to update your local git
> version?
>
> On Tue, Mar 12, 2019 at 9:37 AM Sean Busbey  wrote:
> >
> > locale of the build is up to the RM (this is why, for example, the 2.1
> > release line javadocs have chinese for the boilerplate text[1])
> >
> > however, it does look like that shell output might be missing the
> > build revision information from git or we might not be properly
> > parsing the output from git when a non-english locale is used.
> >
> > [1]: http://hbase.apache.org/2.1/apidocs/index.html
> >
> >
> > On Tue, Mar 12, 2019 at 8:54 AM Jean-Marc Spaggiari
> >  wrote:
> > >
> > > Also, in the shell,it displays Asian texte:
> > >
> > > "Version 2.2.0, r, 2019年 03月 07日 星期四 14:05:34 CST"
> > >
> > > Not sure if that's we want.
> > >
> > > JMS
> > >
> > > Le lun. 11 mars 2019 à 21:53, Guanghao Zhang  a
> écrit :
> > >
> > > > Let me start a new RC1. HBASE-21970 should be included and need a
> release
> > > > note.
> > > >
> > > > Sean Busbey  于2019年3月12日周二 上午8:35写道:
> > > >
> > > > > I'm -1 on RC0 as it is.
> > > > >
> > > > > The current release notes don't include any call out about the
> upgrade
> > > > > steps needed. Since we don't usually have minor-version specific
> > > > > upgrade steps and especially since there are things folks need to
> do
> > > > > before installing 2.2.0, it's important that they be front and
> center.
> > > > > Possibly that should mean a link to the ref guide section from the
> RC
> > > > > instructions and eventual announcement.
> > > > >
> > > > > I think either HBASE-21075 needs to have 2.2.0 included in its fix
> > > > > version or the release note from that issue needs to be copied
> over to
> > > > > HBASE-21970 and it needs to have 2.2.0 included in its fix
> version(s).
> > > > > In either case the release notes should link to the ref guide
> section.
> > > > >
> > > > > On Thu, Mar 7, 2019 at 3:44 AM Guanghao Zhang 
> > > > wrote:
> > > > > >
> > > > > > Please vote on this release candidate (RC) for Apache HBase
> 2.2.0.
> > > > > > This is the first release of the branch-2.2 line.
> > > > > >
> > > > > > The VOTE will remain open for at least 72 hours.
> > > > > >
> > > > > > [ ] +1 Release this package as Apache HBase 2.2.0
> > > > > > [ ] -1 Do not release this package because ...
> > > > > >
> > > > > > The tag to be voted on is 2.2.0-RC0 (commit
> > > > > > 4ab2dc20f15e9b59477de4bd971c367f3ce342cb):
> > > > > >
> > > > > >  https://github.com/apache/hbase/tree/2.2.0-RC0
> > > > > >
> > > > > > The release files, including signatures, digests, etc. can be
> found at:
> > > > > >
> > > > > > https://dist.apache.org/repos/dist/dev/hbase/2.2.0RC0/
> > > > > >
> > > > > > Maven artifacts are available in a staging repository at:
> > > > > >
> > > > > >
> https://repository.apache.org/content/repositories/orgapachehbase-1286
> > > > > >
> > > > > > Signatures used for HBase RCs can be found in this file:
> > > > > >
> > > > > > https://dist.apache.org/repos/dist/release/hbase/KEYS
> > > > > >
> > > > > > The list of bug fixes going into 2.2.0 can be found in included
> > > > > > CHANGES.md and RELEASENOTES.md available here:
> > > > > >
> > > > > > https://dist.apache.org/repos/dist/dev/hbase/2.2.0RC0/CHANGES.md
> > > > > >
> https://dist.apache.org/repos/dist/dev/hbase/2.2.0RC0/RELEASENOTES.md
> > > > > >
> > > > > > To learn more about Apache HBase, please see
> http://hbase.apache.org/
> > > > > >
> > > > > > Thanks,
> > > > > > Guanghao Zhang
> > > > >
> > > >
>


HBase 2.2.0 overlapping regions

2019-04-11 Thread Jean-Marc Spaggiari
Hi all,

I don't know how i ended up in this stage, but I cleared /hbase in both ZK
and HDFS 3 days ago. created 3 brand new tables, started to put some data
on them. Twice a day I run a split command, to get a better spread.

After 3 days I have a lot of regions overlaps. HBCK output is there:
https://pastebin.com/y4awxmN5

Now, how can this be repaired? If I understand correctly, -fixHdfsOverlaps
can't be used anymore. I can move the HFiles, drop the table, re-create,
reload and split again, but I have a job running, that's supposed to last 6
days. I would have liked to keep it running.

Any idea why there is such overlaps?

JMS


[jira] [Resolved] (HBASE-22209) sdf

2019-04-11 Thread Jean-Marc Spaggiari (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari resolved HBASE-22209.
-
Resolution: Invalid

> sdf
> ---
>
> Key: HBASE-22209
> URL: https://issues.apache.org/jira/browse/HBASE-22209
> Project: HBase
>  Issue Type: Bug
>  Components: Admin
>Affects Versions: 2.1.4
>Reporter: leonjoe
>Priority: Major
> Fix For: hbase-6055
>
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: 2.0.4 to 2.2.0 testing

2019-04-04 Thread Jean-Marc Spaggiari
Hi Sean,

Thanks for spending some time on that.

This option was one on my plans ;) But I might end up just wiping
everything and try to do an export snapshot from my other HBase 1.2.0
cluster and see what it does...

JMS

Le ven. 29 mars 2019 à 19:32, Sean Busbey  a écrit :

> Putting aside for now speculation on how you go to this state, I think
> with current tooling your best option for recovery is to  sideline the
> /hbase directory, start with a fresh install, create your namespaces &
> tables, bulkload the sidelined hfiles
>
> JIRAs that aim to improve this situation, I'm sure feedback or help
> welcome:
>
> * HBASE-21665 "OfflineMetaRepair tool fails with NPE"
> * HBASE-18840 "Add functionality to refresh meta table at master
> startup" (as an alternative to making OfflineMetaRepairTool work
> again; busted according to HBASE-21665)
> * HBASE-21966 "Fix region holes, overlaps, and other region related errors"
>
> On Fri, Mar 29, 2019 at 1:08 PM Jean-Marc Spaggiari
>  wrote:
> >
> > Hi Sean,
> >
> > Here is the hdfs content: https://pastebin.com/EqK1zhEe
> >
> > I unfortunately don't have HDFS audit logs :( And I cleaned HBase logs
> > before the last upgrade test, so RCA will be difficult :-/
> >
> > JMS
> >
> > Le ven. 29 mars 2019 à 16:01, Sean Busbey  a écrit :
> >
> > > So all we have in hbase:meta is an entry for each table that claims
> > > they're all in enabled state.
> > >
> > > And the info column family is totally empty?  I believe this is a
> > > failure state we don't have tooling for yet. Can you upload and link
> > > the results of running hdfs dfs -ls -R on the /hbase directory?
> > >
> > > Do you happen to have HDFS auditing turned on and logs that go back a
> > > few weeks? I'd be curious about how we got into this state. The only
> > > way I've seen it happen thus far is when folks disabled the safety
> > > that keeps hbck1 from running.
> > >
> > > On Fri, Mar 29, 2019 at 9:40 AM Jean-Marc Spaggiari
> > >  wrote:
> > > >
> > > > Hi Sean,
> > > >
> > > > Thanks again for keeping an eye on that.
> > > >
> > > > I think the META content has been lost somewhere in the process.
> > > >
> > > > drwxr-xr-x   - hbase supergroup  0 2019-03-12 15:42
> > > > /hbase/data/hbase/meta/.tabledesc
> > > > -rw-r--r--   3 hbase supergroup   1447 2019-03-12 15:42
> > > > /hbase/data/hbase/meta/.tabledesc/.tableinfo.01
> > > > drwxr-xr-x   - hbase supergroup  0 2019-03-12 15:42
> > > > /hbase/data/hbase/meta/.tmp
> > > > drwxr-xr-x   - hbase supergroup  0 2019-03-12 15:49
> > > > /hbase/data/hbase/meta/1588230740
> > > > -rw-r--r--   3 hbase supergroup 32 2019-03-12 15:40
> > > > /hbase/data/hbase/meta/1588230740/.regioninfo
> > > > drwxr-xr-x   - hbase supergroup  0 2019-03-12 15:40
> > > > /hbase/data/hbase/meta/1588230740/info
> > > > drwxr-xr-x   - hbase supergroup  0 2019-03-12 15:40
> > > > /hbase/data/hbase/meta/1588230740/recovered.edits
> > > > -rw-r--r--   3 hbase supergroup  0 2019-03-12 15:40
> > > > /hbase/data/hbase/meta/1588230740/recovered.edits/2.seqid
> > > > drwxr-xr-x   - hbase supergroup  0 2019-03-12 15:42
> > > > /hbase/data/hbase/meta/1588230740/rep_barrier
> > > > drwxr-xr-x   - hbase supergroup  0 2019-03-12 15:47
> > > > /hbase/data/hbase/meta/1588230740/table
> > > > -rw-r--r--   3 hbase supergroup   5454 2019-03-12 15:47
> > > >
> /hbase/data/hbase/meta/1588230740/table/b65e8774ff284e77bf22641de36110cc
> > > >
> > > > And this is the content of the file:
> > > > hbase@node2:~$ hbase hfile -p -f
> > > >
> /hbase/data/hbase/meta/1588230740/table/b65e8774ff284e77bf22641de36110cc
> > > > 2019-03-29 12:38:36,028 INFO  [main] metrics.MetricRegistries: Loaded
> > > > MetricRegistries class
> > > > org.apache.hadoop.hbase.metrics.impl.MetricRegistriesImpl
> > > > K: customers/table:state/1552419727646/Put/vlen=2/seqid=903258414 V:
> > > > \x08\x00
> > > > K: dns/table:state/1552419727462/Put/vlen=2/seqid=903258404 V:
> \x08\x00
> > > > K: email/table:state/1552419727691/Put/vlen=2/seqid=903258416 V:
> \x08\x00
> > > > K:
> email_proposed/table:state/1552419727602/Put/vlen=2/seqid=903258410 V:
> > > > \x08\x

Re: 2.0.4 to 2.2.0 testing

2019-03-29 Thread Jean-Marc Spaggiari
Hi Sean,

Here is the hdfs content: https://pastebin.com/EqK1zhEe

I unfortunately don't have HDFS audit logs :( And I cleaned HBase logs
before the last upgrade test, so RCA will be difficult :-/

JMS

Le ven. 29 mars 2019 à 16:01, Sean Busbey  a écrit :

> So all we have in hbase:meta is an entry for each table that claims
> they're all in enabled state.
>
> And the info column family is totally empty?  I believe this is a
> failure state we don't have tooling for yet. Can you upload and link
> the results of running hdfs dfs -ls -R on the /hbase directory?
>
> Do you happen to have HDFS auditing turned on and logs that go back a
> few weeks? I'd be curious about how we got into this state. The only
> way I've seen it happen thus far is when folks disabled the safety
> that keeps hbck1 from running.
>
> On Fri, Mar 29, 2019 at 9:40 AM Jean-Marc Spaggiari
>  wrote:
> >
> > Hi Sean,
> >
> > Thanks again for keeping an eye on that.
> >
> > I think the META content has been lost somewhere in the process.
> >
> > drwxr-xr-x   - hbase supergroup  0 2019-03-12 15:42
> > /hbase/data/hbase/meta/.tabledesc
> > -rw-r--r--   3 hbase supergroup   1447 2019-03-12 15:42
> > /hbase/data/hbase/meta/.tabledesc/.tableinfo.01
> > drwxr-xr-x   - hbase supergroup  0 2019-03-12 15:42
> > /hbase/data/hbase/meta/.tmp
> > drwxr-xr-x   - hbase supergroup  0 2019-03-12 15:49
> > /hbase/data/hbase/meta/1588230740
> > -rw-r--r--   3 hbase supergroup 32 2019-03-12 15:40
> > /hbase/data/hbase/meta/1588230740/.regioninfo
> > drwxr-xr-x   - hbase supergroup  0 2019-03-12 15:40
> > /hbase/data/hbase/meta/1588230740/info
> > drwxr-xr-x   - hbase supergroup  0 2019-03-12 15:40
> > /hbase/data/hbase/meta/1588230740/recovered.edits
> > -rw-r--r--   3 hbase supergroup  0 2019-03-12 15:40
> > /hbase/data/hbase/meta/1588230740/recovered.edits/2.seqid
> > drwxr-xr-x   - hbase supergroup  0 2019-03-12 15:42
> > /hbase/data/hbase/meta/1588230740/rep_barrier
> > drwxr-xr-x   - hbase supergroup  0 2019-03-12 15:47
> > /hbase/data/hbase/meta/1588230740/table
> > -rw-r--r--   3 hbase supergroup   5454 2019-03-12 15:47
> > /hbase/data/hbase/meta/1588230740/table/b65e8774ff284e77bf22641de36110cc
> >
> > And this is the content of the file:
> > hbase@node2:~$ hbase hfile -p -f
> > /hbase/data/hbase/meta/1588230740/table/b65e8774ff284e77bf22641de36110cc
> > 2019-03-29 12:38:36,028 INFO  [main] metrics.MetricRegistries: Loaded
> > MetricRegistries class
> > org.apache.hadoop.hbase.metrics.impl.MetricRegistriesImpl
> > K: customers/table:state/1552419727646/Put/vlen=2/seqid=903258414 V:
> > \x08\x00
> > K: dns/table:state/1552419727462/Put/vlen=2/seqid=903258404 V: \x08\x00
> > K: email/table:state/1552419727691/Put/vlen=2/seqid=903258416 V: \x08\x00
> > K: email_proposed/table:state/1552419727602/Put/vlen=2/seqid=903258410 V:
> > \x08\x00
> > K: ew_table/table:state/1552419727527/Put/vlen=2/seqid=903258406 V:
> \x08\x00
> > K: hbase:acl/table:state/1552419727547/Put/vlen=2/seqid=903258407 V:
> > \x08\x00
> > K: hbase:namespace/table:state/1552419727382/Put/vlen=2/seqid=903258402
> V:
> > \x08\x00
> > K: page/table:state/1552419727669/Put/vlen=2/seqid=903258415 V: \x08\x00
> > K: pageAvro/table:state/1552419727572/Put/vlen=2/seqid=903258408 V:
> \x08\x00
> > K: pageMini/table:state/1552419727591/Put/vlen=2/seqid=903258409 V:
> \x08\x00
> > K: pageSpark/table:state/1552419727867/Put/vlen=2/seqid=903258417 V:
> > \x08\x00
> > K: page_crc/table:state/1552419727635/Put/vlen=2/seqid=903258413 V:
> \x08\x00
> > K: page_duplicate/table:state/1552419727613/Put/vlen=2/seqid=903258411 V:
> > \x08\x00
> > K: page_proposed/table:state/1552419727175/Put/vlen=2/seqid=903258401 V:
> > \x08\x00
> > K: tree/table:state/1552419727502/Put/vlen=2/seqid=903258405 V: \x08\x00
> > K: work_proposed/table:state/1552419727402/Put/vlen=2/seqid=903258403 V:
> > \x08\x00
> > K: work_sent/table:state/1552419727624/Put/vlen=2/seqid=903258412 V:
> > \x08\x00
> > Scanned kv count -> 17
> >
> > Seems that it's still aware of the tables. But I don't see any reference
> to
> > any server...
> >
> > JMS
> >
> >
> > Le ven. 29 mars 2019 à 12:25, Sean Busbey  a écrit :
> >
> > > Okay I read the logs again and we're in a weird failure state.
> > >
> > > 1) Master comes up
> > > 2) Master schedules SCP for all RS
> > > 3) Master recovers meta
> > > 4) SCP for every se

Re: 2.0.4 to 2.2.0 testing

2019-03-29 Thread Jean-Marc Spaggiari
Hi Sean,

Thanks again for keeping an eye on that.

I think the META content has been lost somewhere in the process.

drwxr-xr-x   - hbase supergroup  0 2019-03-12 15:42
/hbase/data/hbase/meta/.tabledesc
-rw-r--r--   3 hbase supergroup   1447 2019-03-12 15:42
/hbase/data/hbase/meta/.tabledesc/.tableinfo.01
drwxr-xr-x   - hbase supergroup  0 2019-03-12 15:42
/hbase/data/hbase/meta/.tmp
drwxr-xr-x   - hbase supergroup  0 2019-03-12 15:49
/hbase/data/hbase/meta/1588230740
-rw-r--r--   3 hbase supergroup 32 2019-03-12 15:40
/hbase/data/hbase/meta/1588230740/.regioninfo
drwxr-xr-x   - hbase supergroup  0 2019-03-12 15:40
/hbase/data/hbase/meta/1588230740/info
drwxr-xr-x   - hbase supergroup  0 2019-03-12 15:40
/hbase/data/hbase/meta/1588230740/recovered.edits
-rw-r--r--   3 hbase supergroup  0 2019-03-12 15:40
/hbase/data/hbase/meta/1588230740/recovered.edits/2.seqid
drwxr-xr-x   - hbase supergroup  0 2019-03-12 15:42
/hbase/data/hbase/meta/1588230740/rep_barrier
drwxr-xr-x   - hbase supergroup  0 2019-03-12 15:47
/hbase/data/hbase/meta/1588230740/table
-rw-r--r--   3 hbase supergroup   5454 2019-03-12 15:47
/hbase/data/hbase/meta/1588230740/table/b65e8774ff284e77bf22641de36110cc

And this is the content of the file:
hbase@node2:~$ hbase hfile -p -f
/hbase/data/hbase/meta/1588230740/table/b65e8774ff284e77bf22641de36110cc
2019-03-29 12:38:36,028 INFO  [main] metrics.MetricRegistries: Loaded
MetricRegistries class
org.apache.hadoop.hbase.metrics.impl.MetricRegistriesImpl
K: customers/table:state/1552419727646/Put/vlen=2/seqid=903258414 V:
\x08\x00
K: dns/table:state/1552419727462/Put/vlen=2/seqid=903258404 V: \x08\x00
K: email/table:state/1552419727691/Put/vlen=2/seqid=903258416 V: \x08\x00
K: email_proposed/table:state/1552419727602/Put/vlen=2/seqid=903258410 V:
\x08\x00
K: ew_table/table:state/1552419727527/Put/vlen=2/seqid=903258406 V: \x08\x00
K: hbase:acl/table:state/1552419727547/Put/vlen=2/seqid=903258407 V:
\x08\x00
K: hbase:namespace/table:state/1552419727382/Put/vlen=2/seqid=903258402 V:
\x08\x00
K: page/table:state/1552419727669/Put/vlen=2/seqid=903258415 V: \x08\x00
K: pageAvro/table:state/1552419727572/Put/vlen=2/seqid=903258408 V: \x08\x00
K: pageMini/table:state/1552419727591/Put/vlen=2/seqid=903258409 V: \x08\x00
K: pageSpark/table:state/1552419727867/Put/vlen=2/seqid=903258417 V:
\x08\x00
K: page_crc/table:state/1552419727635/Put/vlen=2/seqid=903258413 V: \x08\x00
K: page_duplicate/table:state/1552419727613/Put/vlen=2/seqid=903258411 V:
\x08\x00
K: page_proposed/table:state/1552419727175/Put/vlen=2/seqid=903258401 V:
\x08\x00
K: tree/table:state/1552419727502/Put/vlen=2/seqid=903258405 V: \x08\x00
K: work_proposed/table:state/1552419727402/Put/vlen=2/seqid=903258403 V:
\x08\x00
K: work_sent/table:state/1552419727624/Put/vlen=2/seqid=903258412 V:
\x08\x00
Scanned kv count -> 17

Seems that it's still aware of the tables. But I don't see any reference to
any server...

JMS


Le ven. 29 mars 2019 à 12:25, Sean Busbey  a écrit :

> Okay I read the logs again and we're in a weird failure state.
>
> 1) Master comes up
> 2) Master schedules SCP for all RS
> 3) Master recovers meta
> 4) SCP for every server claims AM currently thinks 0 regions were
> assigned to each server.
> 5) Master successfully finishes WAL splitting from dead RS and works
> through prior split attempts that died?
> 6) WAL recovery from every RS says there are no edits for any region
> 7) No Assignments are scheduled out of the SCP because each believes
> there were no regions hosted on the server that's being processed
> 6) Master reports all SCP have completed successfully
> 7) Master times out at initializing
>
> Could you link to a scan of meta? it'll include server names, table
> names, and region information, so I'm not sure if any of those are too
> sensitive?
>
> On Thu, Mar 14, 2019 at 11:36 AM Jean-Marc Spaggiari
>  wrote:
> >
> > Updated logs are there: https://pastebin.com/1UrTA8JS
> >
> > They really look like exactly the same as the previous version :-/
> >
> > There is no warning, no error, nothing :(
> >
> > JMS
> >
> > Le jeu. 14 mars 2019 à 13:38, Sean Busbey  a écrit :
> >
> > > We still need to find out why hbase:namespace is not online. Did the
> > > logs complaining about being unable to assign regions not include any
> > > thing about the region(s) for the namespace table?
> > >
> > > Can you upload updated logs?
> > >
> > > If there's no mention of it then that sounds like we need an hbck2
> > > command to output the current assignment state of a region.
> > >
> > > On Thu, Mar 14, 2019 at 11:57 AM Jean-Marc Spaggiari
> > >  wrote:
> > > >
> > > > I stopped all t

Re: GetAndPut

2019-03-25 Thread Jean-Marc Spaggiari
Well, I just don't want it to fail. I want to put a value that will replace
the previous one and just return it (the previous one).

It can be something like:
repeat until success {
  Get previous value
  CheckAndPut(new value, previous value)
}
Then I know what I replaced by what.

The usecase is where someone wants to keep track of what has been modified.
A bit like a client side WAL. But they want that ONLY for updates. They
don't care about new inserts. And since there is 99% inserts and only 1%
updates, they don't want to just keep all puts.

JMS

Le lun. 25 mars 2019 à 15:16, Vladimir Rodionov  a
écrit :

> Interesting. If CheckAndPut succeeds, then you know the value and no need
> for Get, right?
> Only if it fail, you want to know current value if CheckAndPut fails?
> Can you elaborate on your use case, Jean-Marc?
>
> -Vlad
>
> On Mon, Mar 25, 2019 at 11:54 AM Jean-Marc Spaggiari <
> jean-m...@spaggiari.org> wrote:
>
> > Hi all,
> >
> > We have all CheckAndxxx operations, where we verify something and if the
> > condition is true we perform the operatoin (Put, Delete, Mutation, etc.).
> >
> > I'm looking for a GetAndPut operation. Where in a single atomic call, I
> can
> > get the actual value of a cell (if any), and perform the put. Working on
> a
> > usecase where this might help.
> >
> > Do we have anything like that? I can simulate by doing a Get then a
> > CheckAndPut, but that's 2 calls. Trying to save one call ;)
> >
> > Do we have anything like that?
> >
> > Thanks
> >
> > JMS
> >
>


GetAndPut

2019-03-25 Thread Jean-Marc Spaggiari
Hi all,

We have all CheckAndxxx operations, where we verify something and if the
condition is true we perform the operatoin (Put, Delete, Mutation, etc.).

I'm looking for a GetAndPut operation. Where in a single atomic call, I can
get the actual value of a cell (if any), and perform the put. Working on a
usecase where this might help.

Do we have anything like that? I can simulate by doing a Get then a
CheckAndPut, but that's 2 calls. Trying to save one call ;)

Do we have anything like that?

Thanks

JMS


Re: 2.0.4 to 2.2.0 testing

2019-03-14 Thread Jean-Marc Spaggiari
Updated logs are there: https://pastebin.com/1UrTA8JS

They really look like exactly the same as the previous version :-/

There is no warning, no error, nothing :(

JMS

Le jeu. 14 mars 2019 à 13:38, Sean Busbey  a écrit :

> We still need to find out why hbase:namespace is not online. Did the
> logs complaining about being unable to assign regions not include any
> thing about the region(s) for the namespace table?
>
> Can you upload updated logs?
>
> If there's no mention of it then that sounds like we need an hbck2
> command to output the current assignment state of a region.
>
> On Thu, Mar 14, 2019 at 11:57 AM Jean-Marc Spaggiari
>  wrote:
> >
> > I stopped all the region servers, started the master. It was complaining
> > about not being able to assign regions. Then started region servers, but
> > after 5 minutes got the same error :-/
> >
> > 2019-03-14 12:46:38,586 ERROR [master/node2:6:becomeActiveMaster]
> > master.HMaster: Failed to become active master
> > java.lang.IllegalStateException: Expected the service
> > ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has
> FAILED
> > at
> >
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:345)
> > at
> >
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:291)
> > at
> >
> org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1341)
> > at
> >
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1119)
> > at
> >
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2347)
> > at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:595)
> > at java.lang.Thread.run(Thread.java:748)
> > Caused by: java.io.IOException: Timedout 30ms waiting for namespace
> > table to be assigned and enabled: tableName=hbase:namespace,
> state=ENABLED
> > at
> >
> org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:108)
> > at
> >
> org.apache.hadoop.hbase.master.ClusterSchemaServiceImpl.doStart(ClusterSchemaServiceImpl.java:63)
> > at
> >
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:226)
> > at
> >
> org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1339)
> > ... 4 more
> >
> > Then stopped all, configured the maintenance mode, started all, get the
> > same error. I tried to bounce the RS within those 5 minutes without any
> > difference. I still get the same exception after 5 minutes:
> > 2019-03-14 12:55:35,167 ERROR [master/node2:6:becomeActiveMaster]
> > master.HMaster: * ABORTING master node2.distparser.com
> ,6,1552582220013:
> > Unhandled exception. Starting shutdown. *
> > java.lang.IllegalStateException: Expected the service
> > ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has
> FAILED
> > at
> >
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:345)
> > at
> >
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:291)
> > at
> >
> org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1341)
> > at
> >
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1119)
> > at
> >
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2347)
> > at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:595)
> > at java.lang.Thread.run(Thread.java:748)
> > Caused by: java.io.IOException: Timedout 30ms waiting for namespace
> > table to be assigned and enabled: tableName=hbase:namespace,
> state=ENABLED
> > at
> >
> org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:108)
> > at
> >
> org.apache.hadoop.hbase.master.ClusterSchemaServiceImpl.doStart(ClusterSchemaServiceImpl.java:63)
> > at
> >
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:226)
> > at
> >
> org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1339)
> > ... 4 more
> >
> > I validated the maintenance mode:
> > 2019-03-14 12:50:25,421 INFO  [main] master.HMaster: Detected
> > hbase.master.maintenance_m

Re: 2.0.4 to 2.2.0 testing

2019-03-14 Thread Jean-Marc Spaggiari
eah, as I suspected in my previous comment, for this type of timeouts, the
> maintenance mode wouldn't give any help. It's weird that AM starts but
> apparently does nothing until the namespace 5 mins timeout is reached:
> ...
> 2019-03-12 20:53:45,942 INFO  [master/node2:6:becomeActiveMaster]
> assignment.AssignmentManager: Joined the cluster in 308msec
> 2019-03-12 20:54:45,725 INFO
> [ReadOnlyZKClient-latitude.distparser.com:2181@0x7ea9b2c0]
> zookeeper.ZooKeeper: Session: 0x16911bd542a02a2 closed
> 2019-03-12 20:54:45,725 INFO
> [ReadOnlyZKClient-latitude.distparser.com:2181@0x7ea9b2c0-EventThread]
> zookeeper.ClientCnxn: EventThread shut down for session: 0x16911bd542a02a2
> 2019-03-12 20:58:46,603 ERROR [master/node2:6:becomeActiveMaster]
> master.HMaster: Failed to become active master
> java.lang.IllegalStateException: Expected the service
> ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has FAILED
> ...
>
> I would expect namespace region to be processed by this call
> <
> https://github.com/apache/hbase/blob/2.2.0-RC0/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java#L1088
> >,
> if
> namespace region is offline, as the timeout suggests. Also odd is that we
> don;'t see any logs suggesting offlined regions are getting assigned. Maybe
> all regions are already online on RSes? But then master should had figured
> that out. Have you already tried restart all RSes? That could kick some
> reassignments.
>
> Em qui, 14 de mar de 2019 às 02:21, Jean-Marc Spaggiari <
> jean-m...@spaggiari.org> escreveu:
>
> > Hi Wellington,
> >
> > Indeed, the META is now deployed.  I found the namespace region encoded
> > name using hdfs dfs -ls -R /hbase/data/hbase/namespace and it gives
> > me 7f4a480f47f98300185d1ae2ff663295. But here again, HBCK doesn't want to
> > do anything because the master is initializing :( I tried with ad without
> > the maintenant flag and I get the same result.
> >
> > On HBCK2 side: PleaseHoldException: Master is initializing
> > On the master side, it just stoped after 5 minutes trying to assign
> > namespace :(
> >
> > JMS
> >
> >
> > Le mer. 13 mars 2019 à 12:04, Wellington Chevreuil <
> > wellington.chevre...@gmail.com> a écrit :
> >
> > > "1588230740" would be the meta region name, not namespace. It seems
> meta
> > is
> > > already online, per below log:
> > > ...
> > > 2019-03-12 20:53:41,037 INFO  [master/node2:6:becomeActiveMaster]
> > > master.HMaster: hbase:meta {1588230740 state=OPEN, ts=1552438420570,
> > > server=
> > > node7.distparser.com,16020,1552421510124}
> > > ...
> > >
> > > The maintenance mode I suggested before was to have master doing
> minimum
> > > required stuff while attempting to getting meta/namespace online, but I
> > > guess it wouldn't be able to avoid such timeouts. Below message also
> > means
> > > AM could read meta table, giving another indication meta is fine:
> > > ...
> > > 2019-03-12 20:53:45,942 INFO  [master/node2:6:becomeActiveMaster]
> > > assignment.AssignmentManager: Joined the cluster in 308msec
> > > ...
> > >
> > > Now issue is namespace table. For some reason, AM is not able to kick
> APs
> > > before the 5 minutes timeout exceeds, and that's probably why namespace
> > > table never comes available:
> > > ...
> > > 2019-03-12 20:53:45,942 INFO  [master/node2:6:becomeActiveMaster]
> > > assignment.AssignmentManager: Joined the cluster in 308msec
> > > 2019-03-12 20:54:45,725 INFO
> > > [ReadOnlyZKClient-latitude.distparser.com:2181@0x7ea9b2c0]
> > > zookeeper.ZooKeeper: Session: 0x16911bd542a02a2 closed
> > > 2019-03-12 20:54:45,725 INFO
> > > [ReadOnlyZKClient-latitude.distparser.com:2181@0x7ea9b2c0-EventThread]
> > > zookeeper.ClientCnxn: EventThread shut down for session:
> > 0x16911bd542a02a2
> > > 2019-03-12 20:58:46,603 ERROR [master/node2:6:becomeActiveMaster]
> > > master.HMaster: Failed to become active master
> > > java.lang.IllegalStateException: Expected the service
> > > ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has
> > FAILED
> > > ...
> > >
> > > You may be able to force namespace region coming online with hbck2
> > assigns
> > > command. You would need to find out the namespace region name first,
> you
> > > can either scan meta table or check the region dir name in hdfs with
> > "hdfs
> > > dfs -ls -R /hb

Re: 2.0.4 to 2.2.0 testing

2019-03-13 Thread Jean-Marc Spaggiari
Hi Wellington,

Indeed, the META is now deployed.  I found the namespace region encoded
name using hdfs dfs -ls -R /hbase/data/hbase/namespace and it gives
me 7f4a480f47f98300185d1ae2ff663295. But here again, HBCK doesn't want to
do anything because the master is initializing :( I tried with ad without
the maintenant flag and I get the same result.

On HBCK2 side: PleaseHoldException: Master is initializing
On the master side, it just stoped after 5 minutes trying to assign
namespace :(

JMS


Le mer. 13 mars 2019 à 12:04, Wellington Chevreuil <
wellington.chevre...@gmail.com> a écrit :

> "1588230740" would be the meta region name, not namespace. It seems meta is
> already online, per below log:
> ...
> 2019-03-12 20:53:41,037 INFO  [master/node2:6:becomeActiveMaster]
> master.HMaster: hbase:meta {1588230740 state=OPEN, ts=1552438420570,
> server=
> node7.distparser.com,16020,1552421510124}
> ...
>
> The maintenance mode I suggested before was to have master doing minimum
> required stuff while attempting to getting meta/namespace online, but I
> guess it wouldn't be able to avoid such timeouts. Below message also means
> AM could read meta table, giving another indication meta is fine:
> ...
> 2019-03-12 20:53:45,942 INFO  [master/node2:6:becomeActiveMaster]
> assignment.AssignmentManager: Joined the cluster in 308msec
> ...
>
> Now issue is namespace table. For some reason, AM is not able to kick APs
> before the 5 minutes timeout exceeds, and that's probably why namespace
> table never comes available:
> ...
> 2019-03-12 20:53:45,942 INFO  [master/node2:6:becomeActiveMaster]
> assignment.AssignmentManager: Joined the cluster in 308msec
> 2019-03-12 20:54:45,725 INFO
> [ReadOnlyZKClient-latitude.distparser.com:2181@0x7ea9b2c0]
> zookeeper.ZooKeeper: Session: 0x16911bd542a02a2 closed
> 2019-03-12 20:54:45,725 INFO
> [ReadOnlyZKClient-latitude.distparser.com:2181@0x7ea9b2c0-EventThread]
> zookeeper.ClientCnxn: EventThread shut down for session: 0x16911bd542a02a2
> 2019-03-12 20:58:46,603 ERROR [master/node2:6:becomeActiveMaster]
> master.HMaster: Failed to become active master
> java.lang.IllegalStateException: Expected the service
> ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has FAILED
> ...
>
> You may be able to force namespace region coming online with hbck2 assigns
> command. You would need to find out the namespace region name first, you
> can either scan meta table or check the region dir name in hdfs with "hdfs
> dfs -ls -R /hbase | grep namespace", in order to pass it as a param for
>
> Em qua, 13 de mar de 2019 às 13:00, Jean-Marc Spaggiari <
> jean-m...@spaggiari.org> escreveu:
>
> > Hi Sean,
> >
> > I tried. I looked-up the region name for base:namespace like this:
> >
> > hdfs dfs -ls /hbase/data/hbase/meta/
> >
> > And found the region to be 1588230740.
> >
> > The master dies after 5 minutes, so I start the master, wait 2 minutes to
> > be sure it's up, and run the following command:
> >
> > bin/hbase hbck -j
> >
> test/hbase-operator-tools/hbase-hbck2/target/hbase-hbck2-1.0.0-SNAPSHOT.jar
> > assigns 1588230740
> >
> > But HBCK2 doesn't like it:
> > 08:57:35.273 [main] INFO
> > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl - Call exception,
> > tries=9, retries=16, started=29322 ms ago, cancelled=false,
> > msg=org.apache.hadoop.hbase.PleaseHoldException: Master is initializing
> > at
> >
> org.apache.hadoop.hbase.master.HMaster.checkInitialized(HMaster.java:3057)
> > at
> >
> >
> org.apache.hadoop.hbase.master.MasterRpcServices.getClusterStatus(MasterRpcServices.java:942)
> > at
> >
> >
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
> > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
> > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
> > at
> > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
> > at
> > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> >
> >
> > It keeps retrying and after 16 times it stopped saying the master is not
> > initialized.
> >
> > On the WebUI I can see that there is a single region assigned, the META
> > region.
> >
> > Also, here is the HDFS structure of my META table. Sounds like some parts
> > got lost in the process (The info content).
> >
> > hbase@node2:~/hbase-2.2.0$ hdfs dfs -ls -R /hbase/data/hbase/meta/
> > drwxr-xr-x   - hbase supergroup  0 2019-03-12 15:42
> > /hbase/data/hbase/me

Re: 2.0.4 to 2.2.0 testing

2019-03-13 Thread Jean-Marc Spaggiari
Hi Sean,

I tried. I looked-up the region name for base:namespace like this:

hdfs dfs -ls /hbase/data/hbase/meta/

And found the region to be 1588230740.

The master dies after 5 minutes, so I start the master, wait 2 minutes to
be sure it's up, and run the following command:

bin/hbase hbck -j
test/hbase-operator-tools/hbase-hbck2/target/hbase-hbck2-1.0.0-SNAPSHOT.jar
assigns 1588230740

But HBCK2 doesn't like it:
08:57:35.273 [main] INFO
org.apache.hadoop.hbase.client.RpcRetryingCallerImpl - Call exception,
tries=9, retries=16, started=29322 ms ago, cancelled=false,
msg=org.apache.hadoop.hbase.PleaseHoldException: Master is initializing
at
org.apache.hadoop.hbase.master.HMaster.checkInitialized(HMaster.java:3057)
at
org.apache.hadoop.hbase.master.MasterRpcServices.getClusterStatus(MasterRpcServices.java:942)
at
org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)


It keeps retrying and after 16 times it stopped saying the master is not
initialized.

On the WebUI I can see that there is a single region assigned, the META
region.

Also, here is the HDFS structure of my META table. Sounds like some parts
got lost in the process (The info content).

hbase@node2:~/hbase-2.2.0$ hdfs dfs -ls -R /hbase/data/hbase/meta/
drwxr-xr-x   - hbase supergroup  0 2019-03-12 15:42
/hbase/data/hbase/meta/.tabledesc
-rw-r--r--   3 hbase supergroup   1447 2019-03-12 15:42
/hbase/data/hbase/meta/.tabledesc/.tableinfo.01
drwxr-xr-x   - hbase supergroup  0 2019-03-12 15:42
/hbase/data/hbase/meta/.tmp
drwxr-xr-x   - hbase supergroup  0 2019-03-12 15:49
/hbase/data/hbase/meta/1588230740
-rw-r--r--   3 hbase supergroup 32 2019-03-12 15:40
/hbase/data/hbase/meta/1588230740/.regioninfo
drwxr-xr-x   - hbase supergroup  0 2019-03-12 15:40
/hbase/data/hbase/meta/1588230740/info
drwxr-xr-x   - hbase supergroup  0 2019-03-12 15:40
/hbase/data/hbase/meta/1588230740/recovered.edits
-rw-r--r--   3 hbase supergroup  0 2019-03-12 15:40
/hbase/data/hbase/meta/1588230740/recovered.edits/2.seqid
drwxr-xr-x   - hbase supergroup  0 2019-03-12 15:42
/hbase/data/hbase/meta/1588230740/rep_barrier
drwxr-xr-x   - hbase supergroup  0 2019-03-12 15:47
/hbase/data/hbase/meta/1588230740/table
-rw-r--r--   3 hbase supergroup   5454 2019-03-12 15:47
/hbase/data/hbase/meta/1588230740/table/b65e8774ff284e77bf22641de36110cc

What will be the next best step?

Thanks,

JMS



Le mer. 13 mars 2019 à 08:45, Sean Busbey  a écrit :

> Okay so master thinks hbase:namespace is already enabled, but no RS
> believes it should be hosting the regions.
>
> Can you find the region name for the hbase:namespace region and issue
> an hbck2 assigns command for it?
>
> On Tue, Mar 12, 2019 at 8:26 PM Jean-Marc Spaggiari
>  wrote:
> >
> > It doesn't say that much :(
> >
> > hbase@node2:~/hbase-2.2.0$ cat logs/hbase-hbase-master-node2.log  |
> grep -i
> > namespace
> > Caused by: java.io.IOException: Timedout 30ms waiting for namespace
> > table to be assigned and enabled: tableName=hbase:namespace,
> state=ENABLED
> > at
> >
> org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:108)
> > Caused by: java.io.IOException: Timedout 30ms waiting for namespace
> > table to be assigned and enabled: tableName=hbase:namespace,
> state=ENABLED
> > at
> >
> org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:108)
> >
> > I cleared the logs before restarting the instance. That all what it says
> > about namespace.
> >
> > Full logs are available there: https://pastebin.com/9j2Rzdcg
> >
> > Le mar. 12 mars 2019 à 20:47, Sean Busbey  a
> écrit :
> >
> > > okay so the master spent ~5 minutes waiting to see if it could get the
> > > namespace table working. when it couldn't it aborted.
> > >
> > > can you look back over that 5 minutes and see what the master had to
> > > say about the namespace table? did the master think some particular
> > > server should have it open already? was it waiting for someone to
> > > finish opening or closing it?
> > >
> > > On Tue, Mar 12, 2019 at 6:39 PM Jean-Marc Spaggiari
> > >  wrote:
> > > >
> > > > Le mar. 12 mars 2019 à 19:25, Sean Busbey  a
> écrit :
> > > >
> > > > > your command above points at the wrong jar from the hbck2 repo.
> it's
&

Re: 2.0.4 to 2.2.0 testing

2019-03-12 Thread Jean-Marc Spaggiari
It doesn't say that much :(

hbase@node2:~/hbase-2.2.0$ cat logs/hbase-hbase-master-node2.log  | grep -i
namespace
Caused by: java.io.IOException: Timedout 30ms waiting for namespace
table to be assigned and enabled: tableName=hbase:namespace, state=ENABLED
at
org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:108)
Caused by: java.io.IOException: Timedout 30ms waiting for namespace
table to be assigned and enabled: tableName=hbase:namespace, state=ENABLED
at
org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:108)

I cleared the logs before restarting the instance. That all what it says
about namespace.

Full logs are available there: https://pastebin.com/9j2Rzdcg

Le mar. 12 mars 2019 à 20:47, Sean Busbey  a écrit :

> okay so the master spent ~5 minutes waiting to see if it could get the
> namespace table working. when it couldn't it aborted.
>
> can you look back over that 5 minutes and see what the master had to
> say about the namespace table? did the master think some particular
> server should have it open already? was it waiting for someone to
> finish opening or closing it?
>
> On Tue, Mar 12, 2019 at 6:39 PM Jean-Marc Spaggiari
>  wrote:
> >
> > Le mar. 12 mars 2019 à 19:25, Sean Busbey  a écrit :
> >
> > > your command above points at the wrong jar from the hbck2 repo. it's
> > >
> > pointing at the one where you need to manually assemble all the
> > > dependencies it has.
> > >
> > > You want the one that does not say "original" in the name.
> > >
> > >
> >
> > Ha!!! That's why! Way easier ;)
> >
> > indeed, this works even without removing all environment variables:
> >  bin/hbase hbck -j
> >
> test/hbase-operator-tools/hbase-hbck2/target/hbase-hbck2-1.0.0-SNAPSHOT.jar
> >
> >
> >
> >
> >
> > >
> > > >
> > > > > Can you confirm it's the one in
> > > > > the bin tarball? what does the version command output? What does
> the
> > > > > mapredcp command output? What does the cli help for the hbase
> command
> > > > > show?
> > > > >
> > > >
> > > >  hbase@node2:~/hbase-2.2.0$ hbase mapredcp
> > > >
> > >
> /home/hbase/hbase-2.2.0/bin/../lib/shaded-clients/hbase-shaded-mapreduce-2.2.0.jar:/home/hbase/hbase-2.2.0/bin/../lib/client-facing-thirdparty/audience-annotations-0.5.0.jar:/home/hbase/hbase-2.2.0/bin/../lib/client-facing-thirdparty/commons-logging-1.2.jar:/home/hbase/hbase-2.2.0/bin/../lib/client-facing-thirdparty/findbugs-annotations-1.3.9-1.jar:/home/hbase/hbase-2.2.0/bin/../lib/client-facing-thirdparty/htrace-core4-4.2.0-incubating.jar:/home/hbase/hbase-2.2.0/bin/../lib/client-facing-thirdparty/log4j-1.2.17.jar:/home/hbase/hbase-2.2.0/bin/../lib/client-facing-thirdparty/slf4j-api-1.7.25.jar
> > > >
> > >
> > > that looks great now.
> > >
> > >
> > > Once you correct the hbck2 jar above I think you'll be good for
> invoking
> > > HBCK2.
> > >
> > > Next, what does the initializing master say it's doing? It should be
> > > on the master UI near the bottom. If it hasn't made progress since
> > > your last update it'll be waiting for the hbase:namespace table. If it
> > > is, find the region and see what the last few messages in the master
> > > log are about that region.
> > >
> >
> > The master died some times ago. It dies after 5 minutes.
> >
> > 2019-03-12 19:35:58,568 ERROR [master/node2:6:becomeActiveMaster]
> > master.HMaster: Failed to become active master
> > java.lang.IllegalStateException: Expected the service
> > ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has
> FAILED
> > at
> >
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:345)
> > at
> >
> org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:291)
> > at
> >
> org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1341)
> > at
> >
> org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1119)
> > at
> >
> org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2347)
> > at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:595)
> > at java.lang.Thread.run(Thread.java:748)
> > Caused by: java.io.IOException: Timedout 30ms waiting for namespace
> > table to be assigned and enabled: tableName=

Re: 2.0.4 to 2.2.0 testing

2019-03-12 Thread Jean-Marc Spaggiari
Le mar. 12 mars 2019 à 19:25, Sean Busbey  a écrit :

> your command above points at the wrong jar from the hbck2 repo. it's
>
pointing at the one where you need to manually assemble all the
> dependencies it has.
>
> You want the one that does not say "original" in the name.
>
>

Ha!!! That's why! Way easier ;)

indeed, this works even without removing all environment variables:
 bin/hbase hbck -j
test/hbase-operator-tools/hbase-hbck2/target/hbase-hbck2-1.0.0-SNAPSHOT.jar





>
> >
> > > Can you confirm it's the one in
> > > the bin tarball? what does the version command output? What does the
> > > mapredcp command output? What does the cli help for the hbase command
> > > show?
> > >
> >
> >  hbase@node2:~/hbase-2.2.0$ hbase mapredcp
> >
> /home/hbase/hbase-2.2.0/bin/../lib/shaded-clients/hbase-shaded-mapreduce-2.2.0.jar:/home/hbase/hbase-2.2.0/bin/../lib/client-facing-thirdparty/audience-annotations-0.5.0.jar:/home/hbase/hbase-2.2.0/bin/../lib/client-facing-thirdparty/commons-logging-1.2.jar:/home/hbase/hbase-2.2.0/bin/../lib/client-facing-thirdparty/findbugs-annotations-1.3.9-1.jar:/home/hbase/hbase-2.2.0/bin/../lib/client-facing-thirdparty/htrace-core4-4.2.0-incubating.jar:/home/hbase/hbase-2.2.0/bin/../lib/client-facing-thirdparty/log4j-1.2.17.jar:/home/hbase/hbase-2.2.0/bin/../lib/client-facing-thirdparty/slf4j-api-1.7.25.jar
> >
>
> that looks great now.
>
>
> Once you correct the hbck2 jar above I think you'll be good for invoking
> HBCK2.
>
> Next, what does the initializing master say it's doing? It should be
> on the master UI near the bottom. If it hasn't made progress since
> your last update it'll be waiting for the hbase:namespace table. If it
> is, find the region and see what the last few messages in the master
> log are about that region.
>

The master died some times ago. It dies after 5 minutes.

2019-03-12 19:35:58,568 ERROR [master/node2:6:becomeActiveMaster]
master.HMaster: Failed to become active master
java.lang.IllegalStateException: Expected the service
ClusterSchemaServiceImpl [FAILED] to be RUNNING, but the service has FAILED
at
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:345)
at
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:291)
at
org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1341)
at
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1119)
at
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2347)
at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:595)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Timedout 30ms waiting for namespace
table to be assigned and enabled: tableName=hbase:namespace, state=ENABLED
at
org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:108)
at
org.apache.hadoop.hbase.master.ClusterSchemaServiceImpl.doStart(ClusterSchemaServiceImpl.java:63)
at
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:226)
at
org.apache.hadoop.hbase.master.HMaster.initClusterSchemaService(HMaster.java:1339)
... 4 more


I just restarted it. I can see the meta table being assigned. I can access
the WebUI and I don't see any initializing information. On the table
section, I don't see anything, in any tab. However, when doing "list" on
the shell, I can see my tables. But I can not scan them. Scanning any table
gives :
hbase(main):001:0> scan 'hbase:namespace'
ROW   COLUMN+CELL


ERROR: Unknown table hbase:namespace!

For usage try 'help "scan"'

Took 1.0395 seconds



JMS


Re: 2.0.4 to 2.2.0 testing

2019-03-12 Thread Jean-Marc Spaggiari
I got HBCK2 working by setting
export
HBASE_CLASSPATH=test/hbase-operator-tools/hbase-hbck2/target/hbase-hbck2-1.0.0-SNAPSHOT.jar


 bin/hbase hbck -j
test/hbase-operator-tools/hbase-hbck2/target/original-hbase-hbck2-1.0.0-SNAPSHOT.jar
usage: HBCK2 [OPTIONS] COMMAND 

Options:
 -d,--debug run with debug output
 -h,--help  output this help message
 -p,--hbase.zookeeper.property.clientPort   port of target hbase ensemble
 -q,--hbase.zookeeper.quorum   ensemble of target hbase
 -s,--skip  skip hbase version check
 -v,--version   this hbck2 version
 -z,--zookeeper.znode.parentparent znode of target hbase


I will wait for your instruction on which command I should run now.

Thanks,

JMS

Le mar. 12 mars 2019 à 18:38, Jean-Marc Spaggiari 
a écrit :

> Thanks for looking at it. Results are below.
>
> JMS
>
> Le mar. 12 mars 2019 à 18:16, Sean Busbey  a écrit :
>
>>
>> > The hbase hbck -j ~/hbase-hbck2-for-2.2.0.jar unassigns 1588230740
>> command
>> > doesn.t seems to be correct. "-j" is not a valid parameter, should be
>> -jar.
>> > And fixing it displayes the previous HBCK version help.
>> >
>>
>> Okay, something is still wrong then. The cli option is definitely "-j"
>> and not "-jar".
>>
>> Which "hbase" command is being run?
>
>
> Ha, good point! I had hbase 2.0.0-beta in my path. Some sometimes I was
> typing bin/hbase (which was making sure hbase 2.2.0 was running) and
> sometimes hbase. I changed the path and it seems to like the -j parameter
> better...
>
> hbase@node2:~/hbase-2.2.0$ hbase hbck -j
> test/hbase-operator-tools/hbase-hbck2/target/original-hbase-hbck2-1.0.0-SNAPSHOT.jar
> Error: A JNI error has occurred, please check your installation and try
> again
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/commons/cli/ParseException
>
> Trying to figure why...
>
>
>> Can you confirm it's the one in
>> the bin tarball? what does the version command output? What does the
>> mapredcp command output? What does the cli help for the hbase command
>> show?
>>
>
>  hbase@node2:~/hbase-2.2.0$ hbase mapredcp
>
> /home/hbase/hbase-2.2.0/bin/../lib/shaded-clients/hbase-shaded-mapreduce-2.2.0.jar:/home/hbase/hbase-2.2.0/bin/../lib/client-facing-thirdparty/audience-annotations-0.5.0.jar:/home/hbase/hbase-2.2.0/bin/../lib/client-facing-thirdparty/commons-logging-1.2.jar:/home/hbase/hbase-2.2.0/bin/../lib/client-facing-thirdparty/findbugs-annotations-1.3.9-1.jar:/home/hbase/hbase-2.2.0/bin/../lib/client-facing-thirdparty/htrace-core4-4.2.0-incubating.jar:/home/hbase/hbase-2.2.0/bin/../lib/client-facing-thirdparty/log4j-1.2.17.jar:/home/hbase/hbase-2.2.0/bin/../lib/client-facing-thirdparty/slf4j-api-1.7.25.jar
>
>
> hbase@node2:~/hbase-2.2.0$ hbase version
> HBase 2.2.0
> Source code repository git://hao-OptiPlex-7050/home/hao/open_source/hbase
> revision=
> Compiled by hao on 2019年 03月 07日 星期四 14:05:34 CST
> From source with checksum 783fee467bb1b28666f0d904437862c4
>
> hbase@node2:~/hbase-2.2.0$ hbase
> Usage: hbase []  []
> Options:
>   --config DIR Configuration direction to use. Default: ./conf
>   --hosts HOSTSOverride the list in 'regionservers' file
>   --auth-as-server Authenticate to ZooKeeper using servers
> configuration
>   --internal-classpath Skip attempting to use client facing jars (WARNING:
> unstable results between versions)
>
> Commands:
> Some commands take arguments. Pass no args or -h for usage.
>   shell   Run the HBase shell
>   hbckRun the HBase 'fsck' tool. Defaults read-only hbck1.
>   Pass '-j /path/to/HBCK2.jar' to run hbase-2.x HBCK2.
>   snapshotTool for managing snapshots
>   wal Write-ahead-log analyzer
>   hfile   Store file analyzer
>   zkcli   Run the ZooKeeper shell
>   master  Run an HBase HMaster node
>   regionserverRun an HBase HRegionServer node
>   zookeeper   Run a ZooKeeper server
>   restRun an HBase REST server
>   thrift  Run the HBase Thrift server
>   thrift2 Run the HBase Thrift2 server
>   clean   Run the HBase clean up script
>   classpath   Dump hbase CLASSPATH
>   mapredcpDump CLASSPATH entries required by mapreduce
>   pe  Run PerformanceEvaluation
>   ltt Run LoadTestTool
>   canary  Run the Canary tool
>   version Print the version
>   regionsplitter  Run RegionSplitter tool
>   rowcounter  Run RowCounter tool
>   

Re: 2.0.4 to 2.2.0 testing

2019-03-12 Thread Jean-Marc Spaggiari
Thanks for looking at it. Results are below.

JMS

Le mar. 12 mars 2019 à 18:16, Sean Busbey  a écrit :

>
> > The hbase hbck -j ~/hbase-hbck2-for-2.2.0.jar unassigns 1588230740
> command
> > doesn.t seems to be correct. "-j" is not a valid parameter, should be
> -jar.
> > And fixing it displayes the previous HBCK version help.
> >
>
> Okay, something is still wrong then. The cli option is definitely "-j"
> and not "-jar".
>
> Which "hbase" command is being run?


Ha, good point! I had hbase 2.0.0-beta in my path. Some sometimes I was
typing bin/hbase (which was making sure hbase 2.2.0 was running) and
sometimes hbase. I changed the path and it seems to like the -j parameter
better...

hbase@node2:~/hbase-2.2.0$ hbase hbck -j
test/hbase-operator-tools/hbase-hbck2/target/original-hbase-hbck2-1.0.0-SNAPSHOT.jar
Error: A JNI error has occurred, please check your installation and try
again
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/commons/cli/ParseException

Trying to figure why...


> Can you confirm it's the one in
> the bin tarball? what does the version command output? What does the
> mapredcp command output? What does the cli help for the hbase command
> show?
>

 hbase@node2:~/hbase-2.2.0$ hbase mapredcp
/home/hbase/hbase-2.2.0/bin/../lib/shaded-clients/hbase-shaded-mapreduce-2.2.0.jar:/home/hbase/hbase-2.2.0/bin/../lib/client-facing-thirdparty/audience-annotations-0.5.0.jar:/home/hbase/hbase-2.2.0/bin/../lib/client-facing-thirdparty/commons-logging-1.2.jar:/home/hbase/hbase-2.2.0/bin/../lib/client-facing-thirdparty/findbugs-annotations-1.3.9-1.jar:/home/hbase/hbase-2.2.0/bin/../lib/client-facing-thirdparty/htrace-core4-4.2.0-incubating.jar:/home/hbase/hbase-2.2.0/bin/../lib/client-facing-thirdparty/log4j-1.2.17.jar:/home/hbase/hbase-2.2.0/bin/../lib/client-facing-thirdparty/slf4j-api-1.7.25.jar


hbase@node2:~/hbase-2.2.0$ hbase version
HBase 2.2.0
Source code repository git://hao-OptiPlex-7050/home/hao/open_source/hbase
revision=
Compiled by hao on 2019年 03月 07日 星期四 14:05:34 CST
>From source with checksum 783fee467bb1b28666f0d904437862c4

hbase@node2:~/hbase-2.2.0$ hbase
Usage: hbase []  []
Options:
  --config DIR Configuration direction to use. Default: ./conf
  --hosts HOSTSOverride the list in 'regionservers' file
  --auth-as-server Authenticate to ZooKeeper using servers configuration
  --internal-classpath Skip attempting to use client facing jars (WARNING:
unstable results between versions)

Commands:
Some commands take arguments. Pass no args or -h for usage.
  shell   Run the HBase shell
  hbckRun the HBase 'fsck' tool. Defaults read-only hbck1.
  Pass '-j /path/to/HBCK2.jar' to run hbase-2.x HBCK2.
  snapshotTool for managing snapshots
  wal Write-ahead-log analyzer
  hfile   Store file analyzer
  zkcli   Run the ZooKeeper shell
  master  Run an HBase HMaster node
  regionserverRun an HBase HRegionServer node
  zookeeper   Run a ZooKeeper server
  restRun an HBase REST server
  thrift  Run the HBase Thrift server
  thrift2 Run the HBase Thrift2 server
  clean   Run the HBase clean up script
  classpath   Dump hbase CLASSPATH
  mapredcpDump CLASSPATH entries required by mapreduce
  pe  Run PerformanceEvaluation
  ltt Run LoadTestTool
  canary  Run the Canary tool
  version Print the version
  regionsplitter  Run RegionSplitter tool
  rowcounter  Run RowCounter tool
  cellcounter Run CellCounter tool
  pre-upgrade Run Pre-Upgrade validator tool
  CLASSNAME   Run the class named CLASSNAME




>
>
>
> > Because the META is now assigned, I changed 1588230740 to 1431292037213
> to
> > get the namespace table assigned. No idea how the meta got assigned... I
> > removed /hbase from ZK, maybe it helped.
> >
>
> It's not good that we're bouncing around states without knowing why
> those states are changing. Let's focus on getting hbck2 invocations to
> work as expected before continuing with cluster health.
>

Good for me.


>
> > Cluster is still running in maintenance mode:
> >   
> > hbase.master.maintenance_mode
> > true
> >   
> >
>
> I don't recall suggesting maintenance mode and would suggest not being
> in it for now.
>

Wellington suggested it earlier. Removing it for now.


Re: 2.0.4 to 2.2.0 testing

2019-03-12 Thread Jean-Marc Spaggiari
> The important bit will be that you get a currently working HBCK2 instance

I did. I downloaded all 2.2.0 artifacts locally, and build with the
parameters you gave. All went well. However, HBCK2 can't talk to my HBase
server...

I tried this:
java org.apache.hbase.HBCK2 assigns 1431292037213

But I got this:
PleaseHoldException: Master is initializing

The hbase hbck -j ~/hbase-hbck2-for-2.2.0.jar unassigns 1588230740 command
doesn.t seems to be correct. "-j" is not a valid parameter, should be -jar.
And fixing it displayes the previous HBCK version help.

Because the META is now assigned, I changed 1588230740 to 1431292037213 to
get the namespace table assigned. No idea how the meta got assigned... I
removed /hbase from ZK, maybe it helped.

Cluster is still running in maintenance mode:
  
hbase.master.maintenance_mode
true
  

That's why I'm very surprised it tries to still assign some regions to
other servers (master is node2)
2019-03-12 10:59:12,286 WARN  [master/node2:6:becomeActiveMaster]
master.HMaster:
hbase:namespace,,1431292037213.7f4a480f47f98300185d1ae2ff663295. is NOT
online; state={7f4a480f47f98300185d1ae2ff663295 state=OPENING,
ts=1552402749342, server=node3,16020,1551984565705};
ServerCrashProcedures=true. Master startup cannot progress, in
holding-pattern until region onlined.


I looked at node3 and there is absolutely nothing wrong there. Logs are
just showing LruBlockCacheStatsExecutor. There is nothing there but INFO
logs.

Thanks again for all your suggestions!

JMS

Le mar. 12 mars 2019 à 13:23, Wellington Chevreuil <
wellington.chevre...@gmail.com> a écrit :

> >
> > server=node3,6,-1
>
>
> That's an unusual server start code (-1).  Maybe that's what's causing meta
> to never come online, as there will be no SCP for this RS instance. The
> hbck2 unassigns command Busbey suggested will probably clean assignment for
> meta, than you could try the "assigns" command to bring it online again.
>
> Em ter, 12 de mar de 2019 às 15:30, Sean Busbey 
> escreveu:
>
> > On Tue, Mar 12, 2019 at 9:38 AM Jean-Marc Spaggiari
> >  wrote:
> > >
> > > Ok. First, thanks to all of you for taking a look at that!
> > >
> > > Sean, I didn't follow the steps. 2.0.4 was failure and not able to
> > start. I
> > > got the recommendation to remove the master wall, which I did. So I
> did't
> > > set the procedure upgrade tag, because there was no more any data on
> the
> > > procedure side. Should I still put it even if I wiped everything?
> > >
> >
> > Probably not? Let's see if we can recover. :)
> >
> > I've already pushed to get the "here is how you upgrade" doc made more
> > visible, so hopefully others don't end up here.
> >
> >
> >
> > >
> > > "Can you tell me where in this high level things fell down? Or where I
> > > should drill in more?"
> > >
> > > It's hard to say. I tried way too many things I think, to be able to
> > point
> > > to something specific. I think HBCK2 is definitely somewhere where I
> > > struggled. So the instructions you sent are very helpful.
> > >
> >
> >
> > The important bit will be that you get a currently working HBCK2
> instance.
> >
> >
> > >
> > > 2019-03-12 10:37:08,385 WARN  [master/node2:6:becomeActiveMaster]
> > > master.HMaster: hbase:meta,,1.1588230740 is NOT online;
> state={1588230740
> > > state=OPEN, ts=1552400910746, server=node3,6,-1};
> > > ServerCrashProcedures=true. Master startup cannot progress, in
> > > holding-pattern until region onlined.
> > >
> >
> > excellent! this is great info. This says that the master has no RS
> > report that shows meta as happy, but also that internally the master
> > thinks meta is open on RS "node3" so it isn't trying to recover it.
> >
> > Is node3 up? If it is, does the web UI for node3 show meta as a table
> > it's serving? does the RS log for node3 say anything about meta?
> >
> > If node3 isn't currently struggling with some issue, let's assume you
> > are just in a bad state from when you were running 2.0. In that case
> > we can use hbck2 and tell master to unassign meta:
> >
> > $ hbase hbck -j ~/hbase-hbck2-for-2.2.0.jar unassigns 1588230740
> >
>


Re: [VOTE] First release candidate for HBASE 2.2.0 is available

2019-03-12 Thread Jean-Marc Spaggiari
Interesting. It makes things a bit difficult to read and understand. Should
we not have the same locale for all the releases?

JM

Le mar. 12 mars 2019 à 10:35, Sean Busbey  a écrit :

> locale of the build is up to the RM (this is why, for example, the 2.1
> release line javadocs have chinese for the boilerplate text[1])
>
> however, it does look like that shell output might be missing the
> build revision information from git or we might not be properly
> parsing the output from git when a non-english locale is used.
>
> [1]: http://hbase.apache.org/2.1/apidocs/index.html
>
>
> On Tue, Mar 12, 2019 at 8:54 AM Jean-Marc Spaggiari
>  wrote:
> >
> > Also, in the shell,it displays Asian texte:
> >
> > "Version 2.2.0, r, 2019年 03月 07日 星期四 14:05:34 CST"
> >
> > Not sure if that's we want.
> >
> > JMS
> >
> > Le lun. 11 mars 2019 à 21:53, Guanghao Zhang  a
> écrit :
> >
> > > Let me start a new RC1. HBASE-21970 should be included and need a
> release
> > > note.
> > >
> > > Sean Busbey  于2019年3月12日周二 上午8:35写道:
> > >
> > > > I'm -1 on RC0 as it is.
> > > >
> > > > The current release notes don't include any call out about the
> upgrade
> > > > steps needed. Since we don't usually have minor-version specific
> > > > upgrade steps and especially since there are things folks need to do
> > > > before installing 2.2.0, it's important that they be front and
> center.
> > > > Possibly that should mean a link to the ref guide section from the RC
> > > > instructions and eventual announcement.
> > > >
> > > > I think either HBASE-21075 needs to have 2.2.0 included in its fix
> > > > version or the release note from that issue needs to be copied over
> to
> > > > HBASE-21970 and it needs to have 2.2.0 included in its fix
> version(s).
> > > > In either case the release notes should link to the ref guide
> section.
> > > >
> > > > On Thu, Mar 7, 2019 at 3:44 AM Guanghao Zhang 
> > > wrote:
> > > > >
> > > > > Please vote on this release candidate (RC) for Apache HBase 2.2.0.
> > > > > This is the first release of the branch-2.2 line.
> > > > >
> > > > > The VOTE will remain open for at least 72 hours.
> > > > >
> > > > > [ ] +1 Release this package as Apache HBase 2.2.0
> > > > > [ ] -1 Do not release this package because ...
> > > > >
> > > > > The tag to be voted on is 2.2.0-RC0 (commit
> > > > > 4ab2dc20f15e9b59477de4bd971c367f3ce342cb):
> > > > >
> > > > >  https://github.com/apache/hbase/tree/2.2.0-RC0
> > > > >
> > > > > The release files, including signatures, digests, etc. can be
> found at:
> > > > >
> > > > > https://dist.apache.org/repos/dist/dev/hbase/2.2.0RC0/
> > > > >
> > > > > Maven artifacts are available in a staging repository at:
> > > > >
> > > > >
> https://repository.apache.org/content/repositories/orgapachehbase-1286
> > > > >
> > > > > Signatures used for HBase RCs can be found in this file:
> > > > >
> > > > > https://dist.apache.org/repos/dist/release/hbase/KEYS
> > > > >
> > > > > The list of bug fixes going into 2.2.0 can be found in included
> > > > > CHANGES.md and RELEASENOTES.md available here:
> > > > >
> > > > > https://dist.apache.org/repos/dist/dev/hbase/2.2.0RC0/CHANGES.md
> > > > >
> https://dist.apache.org/repos/dist/dev/hbase/2.2.0RC0/RELEASENOTES.md
> > > > >
> > > > > To learn more about Apache HBase, please see
> http://hbase.apache.org/
> > > > >
> > > > > Thanks,
> > > > > Guanghao Zhang
> > > >
> > >
>


Re: 2.0.4 to 2.2.0 testing

2019-03-12 Thread Jean-Marc Spaggiari
Ok. First, thanks to all of you for taking a look at that!

Sean, I didn't follow the steps. 2.0.4 was failure and not able to start. I
got the recommendation to remove the master wall, which I did. So I did't
set the procedure upgrade tag, because there was no more any data on the
procedure side. Should I still put it even if I wiped everything?


"Can you tell me where in this high level things fell down? Or where I
should drill in more?"

It's hard to say. I tried way too many things I think, to be able to point
to something specific. I think HBCK2 is definitely somewhere where I
struggled. So the instructions you sent are very helpful.

Right now, I still have a 2.2.0 instance that doesn't want to start. I can
wipe /hbase in both ZK and HDFS and I'm sure it will run, but I'm
interested  to figure how to get it back stable, instead of taking the easy
path.



@Allan: I removed both the znode for meta and namespace and they still
can't be assigned. Thanks for the suggestion.

@Wellington: I followed your proposed steps, but I still get
PleaseHoldException: Master is initializing. I don't see the difference
with and without this parameter.

2019-03-12 10:37:08,385 WARN  [master/node2:6:becomeActiveMaster]
master.HMaster: hbase:meta,,1.1588230740 is NOT online; state={1588230740
state=OPEN, ts=1552400910746, server=node3,6,-1};
ServerCrashProcedures=true. Master startup cannot progress, in
holding-pattern until region onlined.


I will keep trying to get this cluster starting. Helps to understand the
new constriants...

Thanks again all,

JM


Le ven. 8 mars 2019 à 06:36, Wellington Chevreuil <
wellington.chevre...@gmail.com> a écrit :

> JMS, if u are still getting stuck to assign meta table, assuming u managed
> to get an hbck2 jar built, u can try set master to maintenance mode by
> setting "hbase.master.maintenance_mode" to "true" on master's
> hbase-site.xml, restart master, then manually bring meta online with hbck2
> below command:
>
> $ hbase hbck -j /path/to/hbase-hbck2-1.0.0-SNAPSHOT.jar assigns 1588230740
>
>
> Em sex, 8 de mar de 2019 às 08:07, Allan Yang 
> escreveu:
>
> > Try to delete meta Znode from Zookeeper, and restart master.
> > Best Regards
> > Allan Yang
> >
> >
> > Sean Busbey  于2019年3月8日周五 上午4:37写道:
> >
> > > In HBase 2 you should never delete master proc wals. unlike in earlier
> > > releases, it will almost certainly damage the cluster. Probably now
> > > you are in a known-bad state independent of whatever your earlier
> > > issue was. I think though we can fix it.
> > >
> > > Some baseline info:
> > >
> > > 1) Did you follow the upgrade process to go from 2.0.z to 2.2.0?
> > >
> > > I can't link directly to the section due to HBASE-22010, but it's the
> > > first one here:
> > >
> > > http://hbase.apache.org/book.html#_upgrade_paths
> > >
> > > 2) I think your meta issue is somethign we'll need HBCK2 to fix. so
> > > I'd like to work out what's not working for you there.
> > >
> > > We have not done a release of HBCK2 yet, so unfortunately you'll have
> > > to build it yourself. I think you've already realized that's
> > > non-trivial. We have, however, successfully gone through using it with
> > > prior releases.
> > >
> > > Can you tell me where in this high level things fell down? Or where I
> > > should drill in more?
> > >
> > > 2a) Get the code from the git repo:
> > > https://github.com/apache/hbase-operator-tools
> > > 2b) Build for use with the RC. It is important that you specify your
> > > hbase version
> > >
> > > mvn -Dhbase.version=2.2.0 package
> > >
> > > Note that since 2.2.0 hasn't been released yet, you'll need to tell
> > > maven to point at the staged repository posted in the VOTE. e.g. save
> > > this gist
> > >
> > > https://gist.github.com/busbey/ce2293e78440f060fa60aa2dcf1333f1
> > >
> > >  as ~/hbase-2.2.0rc0.settings.xml and then do
> > >
> > > mvn --settings ~/hbase-2.2.0rc0.settings.xml -Dhbase.version=2.2.0
> > package
> > >
> > > 2c) grab the jar from
> > > hbase-hbck2/target/hbase-hbck2-1.0.0-SNAPSHOT.jar and put it where you
> > > can access it on the cluster. let's call it in
> > > ~/hbase-hbck2-for-2.2.0.jar
> > >
> > > 2d) run hbck2 on the cluster to verify that you get the correct help
> > >
> > > hbase hbck -j ~/hbase-hbck2-for-2.2.0.jar
> > >
> > > 3) are there outstanding procedures? when master isn't finishing
> > > initialization,

Re: [VOTE] First release candidate for HBASE 2.2.0 is available

2019-03-12 Thread Jean-Marc Spaggiari
Also, in the shell,it displays Asian texte:

"Version 2.2.0, r, 2019年 03月 07日 星期四 14:05:34 CST"

Not sure if that's we want.

JMS

Le lun. 11 mars 2019 à 21:53, Guanghao Zhang  a écrit :

> Let me start a new RC1. HBASE-21970 should be included and need a release
> note.
>
> Sean Busbey  于2019年3月12日周二 上午8:35写道:
>
> > I'm -1 on RC0 as it is.
> >
> > The current release notes don't include any call out about the upgrade
> > steps needed. Since we don't usually have minor-version specific
> > upgrade steps and especially since there are things folks need to do
> > before installing 2.2.0, it's important that they be front and center.
> > Possibly that should mean a link to the ref guide section from the RC
> > instructions and eventual announcement.
> >
> > I think either HBASE-21075 needs to have 2.2.0 included in its fix
> > version or the release note from that issue needs to be copied over to
> > HBASE-21970 and it needs to have 2.2.0 included in its fix version(s).
> > In either case the release notes should link to the ref guide section.
> >
> > On Thu, Mar 7, 2019 at 3:44 AM Guanghao Zhang 
> wrote:
> > >
> > > Please vote on this release candidate (RC) for Apache HBase 2.2.0.
> > > This is the first release of the branch-2.2 line.
> > >
> > > The VOTE will remain open for at least 72 hours.
> > >
> > > [ ] +1 Release this package as Apache HBase 2.2.0
> > > [ ] -1 Do not release this package because ...
> > >
> > > The tag to be voted on is 2.2.0-RC0 (commit
> > > 4ab2dc20f15e9b59477de4bd971c367f3ce342cb):
> > >
> > >  https://github.com/apache/hbase/tree/2.2.0-RC0
> > >
> > > The release files, including signatures, digests, etc. can be found at:
> > >
> > > https://dist.apache.org/repos/dist/dev/hbase/2.2.0RC0/
> > >
> > > Maven artifacts are available in a staging repository at:
> > >
> > > https://repository.apache.org/content/repositories/orgapachehbase-1286
> > >
> > > Signatures used for HBase RCs can be found in this file:
> > >
> > > https://dist.apache.org/repos/dist/release/hbase/KEYS
> > >
> > > The list of bug fixes going into 2.2.0 can be found in included
> > > CHANGES.md and RELEASENOTES.md available here:
> > >
> > > https://dist.apache.org/repos/dist/dev/hbase/2.2.0RC0/CHANGES.md
> > > https://dist.apache.org/repos/dist/dev/hbase/2.2.0RC0/RELEASENOTES.md
> > >
> > > To learn more about Apache HBase, please see http://hbase.apache.org/
> > >
> > > Thanks,
> > > Guanghao Zhang
> >
>


2.0.4 to 2.2.0 testing

2019-03-07 Thread Jean-Marc Spaggiari
Sure! here it is!

I cleaned all WALs (old, master, etc.) and it seems to be a bit more clean
now but it's stlil stuck trying to assign the META table.

2019-03-07 14:50:35,286 WARN  [master/node2:6:becomeActiveMaster]
master.HMaster: hbase:meta,,1.1588230740 is NOT online; state={1588230740
state=OPEN, ts=1551988229980, server=node5.distparser.com,16020,1551986838747};
ServerCrashProcedures=false. Master startup cannot progress, in
holding-pattern until region onlined.
2019-03-07 14:50:36,287 WARN  [master/node2:6:becomeActiveMaster]
master.HMaster: hbase:meta,,1.1588230740 is NOT online; state={1588230740
state=OPEN, ts=1551988229980, server=node5.distparser.com,16020,1551986838747};
ServerCrashProcedures=false. Master startup cannot progress, in
holding-pattern until region onlined.
2019-03-07 14:50:38,287 WARN  [master/node2:6:becomeActiveMaster]
master.HMaster: hbase:meta,,1.1588230740 is NOT online; state={1588230740
state=OPEN, ts=1551988229980, server=node5.distparser.com,16020,1551986838747};
ServerCrashProcedures=false. Master startup cannot progress, in
holding-pattern until region onlined.
2019-03-07 14:50:42,288 WARN  [master/node2:6:becomeActiveMaster]
master.HMaster: hbase:meta,,1.1588230740 is NOT online; state={1588230740
state=OPEN, ts=1551988229980, server=node5.distparser.com,16020,1551986838747};
ServerCrashProcedures=false. Master startup cannot progress, in
holding-pattern until region onlined.
2019-03-07 14:50:50,289 WARN  [master/node2:6:becomeActiveMaster]
master.HMaster: hbase:meta,,1.1588230740 is NOT online; state={1588230740
state=OPEN, ts=1551988229980, server=node5.distparser.com,16020,1551986838747};
ServerCrashProcedures=false. Master startup cannot progress, in
holding-pattern until region onlined.
2019-03-07 14:51:06,290 WARN  [master/node2:6:becomeActiveMaster]
master.HMaster: hbase:meta,,1.1588230740 is NOT online; state={1588230740
state=OPEN, ts=1551988229980, server=node5.distparser.com,16020,1551986838747};
ServerCrashProcedures=false. Master startup cannot progress, in
holding-pattern until region onlined.
2019-03-07 14:51:29,765 INFO
[ReadOnlyZKClient-latitude.distparser.com:2181@0x71707c27]
zookeeper.ZooKeeper: Session: 0x16911bd542a00fa closed
2019-03-07 14:51:29,766 INFO
[ReadOnlyZKClient-latitude.distparser.com:2181@0x71707c27-EventThread]
zookeeper.ClientCnxn: EventThread shut down for session: 0x16911bd542a00fa
2019-03-07 14:51:38,292 WARN  [master/node2:6:becomeActiveMaster]
master.HMaster: hbase:meta,,1.1588230740 is NOT online; state={1588230740
state=OPEN, ts=1551988229980, server=node5.distparser.com,16020,1551986838747};
ServerCrashProcedures=false. Master startup cannot progress, in
holding-pattern until region onlined.
2019-03-07 14:52:42,292 WARN  [master/node2:6:becomeActiveMaster]
master.HMaster: hbase:meta,,1.1588230740 is NOT online; state={1588230740
state=OPEN, ts=1551988229980, server=node5.distparser.com,16020,1551986838747};
ServerCrashProcedures=false. Master startup cannot progress, in
holding-pattern until region onlined.
2019-03-07 14:54:50,293 WARN  [master/node2:6:becomeActiveMaster]
master.HMaster: hbase:meta,,1.1588230740 is NOT online; state={1588230740
state=OPEN, ts=1551988229980, server=node5.distparser.com,16020,1551986838747};
ServerCrashProcedures=false. Master startup cannot progress, in
holding-pattern until region
onlined.

I will kepe it running for some time and see if it ends up doing
something...

JMS


Le jeu. 7 mars 2019 à 14:54, Sean Busbey  a écrit :

> JMS, could you start a new thread with your upgrade issue so we can go
> through some things without pinging the VOTE thread?
>
>
> On Thu, Mar 7, 2019 at 1:48 PM Jean-Marc Spaggiari
>  wrote:
> >
> > Downloaded the version and checked the MD5SUM.
> > Checked documentation and README
> > Checked license => FAILED. *Too many files with unapproved license*
> > Ran in standalone, checked logs and UI, ran some load, went well.
> >
> > I tried to deploy on top of 2.0.4 and it doesn't start.
> > 2019-03-07 14:38:14,848 WARN  [master/node2:6.Chore.1]
> > master.CatalogJanitor: CatalogJanitor is disabled! Enabled=true,
> > maintenanceMode=false,
> > am=org.apache.hadoop.hbase.master.assignment.AssignmentManager@7deaf821,
> > metaLoaded=true, hasRIT=true clusterShutDown=false
> > 2019-03-07 14:39:13,869 WARN  [ProcExecTimeout]
> > assignment.AssignmentManager: STUCK Region-In-Transition rit=OPENING,
> > location=node1.distparser.com,16020,1551986838653, table=dns,
> > region=bb65f685cdefc4f2491d246f376fc1f0
> >
> > Tried to disable the tables but I'm not able. Tried to move the regions
> but
> > HBCK2 doesn't want.
> > Caused by:
> >
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.PleaseHoldException):
> > org.apache.hadoop.hbase.PleaseHoldExcep

Re: Comments on the blog

2019-03-07 Thread Jean-Marc Spaggiari
Got it. So I will forget about it ;)

Thanks,

JMS

Le jeu. 7 mars 2019 à 14:04, Kevin Risden  a écrit :

> Based on [1], Sean Busbey is right that it is linked to ASF LDAP.
>
> [1] http://www.apache.org/dev/project-blogs#grantrights
>
> Kevin Risden
>
>
> On Thu, Mar 7, 2019 at 2:00 PM Sean Busbey  wrote:
>
> > I think blogs.a.o accounts are limited to committers. IIRC this has
> > been a contributing factor to other communities moving away from it to
> > a blog they host on their website.
> >
> > On Thu, Mar 7, 2019 at 10:54 AM Jean-Marc Spaggiari
> >  wrote:
> > >
> > > Is there any place where I should create an account? I have not been
> able
> > > to find any link for that :(
> > >
> > > Le sam. 2 mars 2019 18 h 18, Stack  a écrit :
> > >
> > > > Thanks JMS. I cleaned up the comments and changed the setting for
> > comments
> > > > to be moderated. Per Josh, I'd have given you Admin but trying to add
> > you
> > > > email as-is failed for me sir.
> > > > S
> > > >
> > > > On Sat, Mar 2, 2019 at 1:49 PM Jean-Marc Spaggiari <
> > > > jean-m...@spaggiari.org>
> > > > wrote:
> > > >
> > > > > Hum. I didn't find any option to create an account. I found the
> login
> > > > form,
> > > > > but as you sais, I don't think I have any account, so tried few
> > without
> > > > any
> > > > > luck :( Any idea where to go to create one?
> > > > >
> > > > > JM
> > > > >
> > > > > Le sam. 2 mars 2019 à 16:36, Josh Elser  a
> écrit
> > :
> > > > >
> > > > > > Hey JMS,
> > > > > >
> > > > > > Thanks for the keen eye. If you create an account on blogs.a.o,
> we
> > can
> > > > > > give you the appropriate karma to delete the spam and prevent new
> > > > > comments.
> > > > > >
> > > > > > I don't see an account for you on the system, else I'd have added
> > you
> > > > > > already :)
> > > > > >
> > > > > > On 3/2/19 10:18 AM, Jean-Marc Spaggiari wrote:
> > > > > > > All,
> > > > > > >
> > > > > > > Comments here
> > > > > > https://blogs.apache.org/hbase/entry/coprocessor_introduction
> > > > > > > look more like spam than anything else. What does it take to
> > clean
> > > > > that?
> > > > > > > I'm can help with it.
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > JMS
> > > > > > >
> > > > > >
> > > > >
> > > >
> >
>


Re: [VOTE] First release candidate for HBASE 2.2.0 is available

2019-03-07 Thread Jean-Marc Spaggiari
Downloaded the version and checked the MD5SUM.
Checked documentation and README
Checked license => FAILED. *Too many files with unapproved license*
Ran in standalone, checked logs and UI, ran some load, went well.

I tried to deploy on top of 2.0.4 and it doesn't start.
2019-03-07 14:38:14,848 WARN  [master/node2:6.Chore.1]
master.CatalogJanitor: CatalogJanitor is disabled! Enabled=true,
maintenanceMode=false,
am=org.apache.hadoop.hbase.master.assignment.AssignmentManager@7deaf821,
metaLoaded=true, hasRIT=true clusterShutDown=false
2019-03-07 14:39:13,869 WARN  [ProcExecTimeout]
assignment.AssignmentManager: STUCK Region-In-Transition rit=OPENING,
location=node1.distparser.com,16020,1551986838653, table=dns,
region=bb65f685cdefc4f2491d246f376fc1f0

Tried to disable the tables but I'm not able. Tried to move the regions but
HBCK2 doesn't want.
Caused by:
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.PleaseHoldException):
org.apache.hadoop.hbase.PleaseHoldException: Master is initializing


Few things I found:

Command line says:

  hbckRun the HBase 'fsck' tool. Defaults read-only hbck1.
  Pass '-j /path/to/HBCK2.jar' to run hbase-2.x HBCK2.

However there is no HBCK2.jar. Google gave me
https://github.com/apache/hbase-operator-tools but that's not trivial.
Downloaded and built it, but the -j option doesn't seems to exist. Found
that it should be -jar . Was still not working. After fighting with it I
finally got it working by calling directly java org.apache.hbase.HBCK2


I got this in the logs at some point:
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.hbase.master.assignment.TransitRegionStateProcedure
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at
org.apache.hadoop.hbase.procedure2.ProcedureUtil.newProcedure(ProcedureUtil.java:50)
... 17 more

I will keep fighting with it to try to get something working. I can of
course rm -rm /hbase and get it run, but I would like to see if it can
recover...

Thanks,

JMS

Le jeu. 7 mars 2019 à 04:44, Guanghao Zhang  a écrit :

> Please vote on this release candidate (RC) for Apache HBase 2.2.0.
> This is the first release of the branch-2.2 line.
>
> The VOTE will remain open for at least 72 hours.
>
> [ ] +1 Release this package as Apache HBase 2.2.0
> [ ] -1 Do not release this package because ...
>
> The tag to be voted on is 2.2.0-RC0 (commit
> 4ab2dc20f15e9b59477de4bd971c367f3ce342cb):
>
>  https://github.com/apache/hbase/tree/2.2.0-RC0
>
> The release files, including signatures, digests, etc. can be found at:
>
> https://dist.apache.org/repos/dist/dev/hbase/2.2.0RC0/
>
> Maven artifacts are available in a staging repository at:
>
> https://repository.apache.org/content/repositories/orgapachehbase-1286
>
> Signatures used for HBase RCs can be found in this file:
>
> https://dist.apache.org/repos/dist/release/hbase/KEYS
>
> The list of bug fixes going into 2.2.0 can be found in included
> CHANGES.md and RELEASENOTES.md available here:
>
> https://dist.apache.org/repos/dist/dev/hbase/2.2.0RC0/CHANGES.md
> https://dist.apache.org/repos/dist/dev/hbase/2.2.0RC0/RELEASENOTES.md
>
> To learn more about Apache HBase, please see http://hbase.apache.org/
>
> Thanks,
> Guanghao Zhang
>


Re: Comments on the blog

2019-03-07 Thread Jean-Marc Spaggiari
Is there any place where I should create an account? I have not been able
to find any link for that :(

Le sam. 2 mars 2019 18 h 18, Stack  a écrit :

> Thanks JMS. I cleaned up the comments and changed the setting for comments
> to be moderated. Per Josh, I'd have given you Admin but trying to add you
> email as-is failed for me sir.
> S
>
> On Sat, Mar 2, 2019 at 1:49 PM Jean-Marc Spaggiari <
> jean-m...@spaggiari.org>
> wrote:
>
> > Hum. I didn't find any option to create an account. I found the login
> form,
> > but as you sais, I don't think I have any account, so tried few without
> any
> > luck :( Any idea where to go to create one?
> >
> > JM
> >
> > Le sam. 2 mars 2019 à 16:36, Josh Elser  a écrit :
> >
> > > Hey JMS,
> > >
> > > Thanks for the keen eye. If you create an account on blogs.a.o, we can
> > > give you the appropriate karma to delete the spam and prevent new
> > comments.
> > >
> > > I don't see an account for you on the system, else I'd have added you
> > > already :)
> > >
> > > On 3/2/19 10:18 AM, Jean-Marc Spaggiari wrote:
> > > > All,
> > > >
> > > > Comments here
> > > https://blogs.apache.org/hbase/entry/coprocessor_introduction
> > > > look more like spam than anything else. What does it take to clean
> > that?
> > > > I'm can help with it.
> > > >
> > > > Thanks,
> > > >
> > > > JMS
> > > >
> > >
> >
>


Re: Release 2.2.0

2019-03-07 Thread Jean-Marc Spaggiari
Wow, that was quick! Thanks a lot. My 2.0.4 doesn't start anymore, so I
will most probably give 2.2.0 a try today! Will comment on the other thread.

Thanks,

JMS

Le jeu. 7 mars 2019 06 h 23, Guanghao Zhang  a écrit :

> The first release candidates of 2.2.0 is ready and you can take a try for
> it. Thanks.
>
> Jean-Marc Spaggiari  于2019年3月7日周四 上午3:24写道:
>
> > thanks for the update. I just built 2.2.0-SNAPSHOT locally. Built well. I
> > will see if I can deploy it and test it. Else I will wait for Guanghao's
> > communication.
> >
> > Le mar. 5 mars 2019 à 20:59, 张铎(Duo Zhang)  a
> > écrit :
> >
> > > Yes, Guanghao is still working on it. It usually spends more time for a
> > > minor release than a patch release, as we need to align the git commits
> > and
> > > the jira issues. And usually there will be a lot of differences...
> > >
> > > Sean Busbey  于2019年3月6日周三 上午9:40写道:
> > >
> > > > I believe work on the RC is being tracked in
> > > >
> > > > https://issues.apache.org/jira/browse/HBASE-21747
> > > >
> > > > Looks like an RC is imminent.
> > > >
> > > > On Tue, Mar 5, 2019, 14:39 Jean-Marc Spaggiari <
> > jean-m...@spaggiari.org>
> > > > wrote:
> > > >
> > > > > Bump ;)
> > > > >
> > > > > Le mar. 5 févr. 2019 à 18:43, Jean-Marc Spaggiari <
> > > > jean-m...@spaggiari.org
> > > > > >
> > > > > a écrit :
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > When we will have a 2.2.0 RC I will give a try of the upgrade
> path
> > > from
> > > > > > 2.0.x... do we have any idea when this will be out?
> > > > > >
> > > > > > JMS
> > > > > >
> > > > > > Le mar. 29 janv. 2019 02 h 09, Guanghao Zhang <
> zghao...@gmail.com>
> > a
> > > > > > écrit :
> > > > > >
> > > > > >> Cut a new branch-2.2 at this commit.
> > > > > >>
> > > > > >> commit e736d78362253936492fb3bd16e614d14859281d
> > > > > >> Author: Duo Zhang 
> > > > > >> Date: Mon Jan 28 18:21:51 2019 +0800
> > > > > >>
> > > > > >> HBASE-21792 Mark HTableMultiplexer as deprecated and remove it
> in
> > > > 3.0.0
> > > > > >>
> > > > > >> Signed-off-by: Michael Stack 
> > > > > >>
> > > > > >> Sean Busbey  于2019年1月24日周四 上午9:45写道:
> > > > > >>
> > > > > >> > Okay it sounds like we definitely need a better doc about this
> > as
> > > a
> > > > > >> > starting point. I have some additional questions about failure
> > > > > handling;
> > > > > >> > should we go through them here or in a jira about improving
> how
> > > > > upgrade
> > > > > >> > from 2.0/2.1 gets handled?
> > > > > >> >
> > > > > >> > On Wed, Jan 23, 2019, 07:33 张铎(Duo Zhang) <
> > palomino...@gmail.com
> > > > > wrote:
> > > > > >> >
> > > > > >> > > Oh, you misunderstood me. It is just for master, you need
> make
> > > > sure
> > > > > >> that
> > > > > >> > > before you killing the old version master, there is no RITs.
> > And
> > > > > once
> > > > > >> the
> > > > > >> > > new master is up, everything is fine., as the new master
> will
> > > > detect
> > > > > >> if
> > > > > >> > > there are old style AssignProcedure/UnassignProcedures, if
> so
> > it
> > > > > will
> > > > > >> > quit
> > > > > >> > > immediately.
> > > > > >> > >
> > > > > >> > > Sean Busbey  于2019年1月23日周三 下午9:21写道:
> > > > > >> > >
> > > > > >> > > > Yes, please. I don't think it's reasonable to expect no
> > region
> > > > > >> > > transitions
> > > > > >> > > > during a rolling upgrade window; that would imply no
> servers
> > > can
> > > > > >> crash
> > > > &

Re: Release 2.2.0

2019-03-06 Thread Jean-Marc Spaggiari
thanks for the update. I just built 2.2.0-SNAPSHOT locally. Built well. I
will see if I can deploy it and test it. Else I will wait for Guanghao's
communication.

Le mar. 5 mars 2019 à 20:59, 张铎(Duo Zhang)  a écrit :

> Yes, Guanghao is still working on it. It usually spends more time for a
> minor release than a patch release, as we need to align the git commits and
> the jira issues. And usually there will be a lot of differences...
>
> Sean Busbey  于2019年3月6日周三 上午9:40写道:
>
> > I believe work on the RC is being tracked in
> >
> > https://issues.apache.org/jira/browse/HBASE-21747
> >
> > Looks like an RC is imminent.
> >
> > On Tue, Mar 5, 2019, 14:39 Jean-Marc Spaggiari 
> > wrote:
> >
> > > Bump ;)
> > >
> > > Le mar. 5 févr. 2019 à 18:43, Jean-Marc Spaggiari <
> > jean-m...@spaggiari.org
> > > >
> > > a écrit :
> > >
> > > > Hi all,
> > > >
> > > > When we will have a 2.2.0 RC I will give a try of the upgrade path
> from
> > > > 2.0.x... do we have any idea when this will be out?
> > > >
> > > > JMS
> > > >
> > > > Le mar. 29 janv. 2019 02 h 09, Guanghao Zhang  a
> > > > écrit :
> > > >
> > > >> Cut a new branch-2.2 at this commit.
> > > >>
> > > >> commit e736d78362253936492fb3bd16e614d14859281d
> > > >> Author: Duo Zhang 
> > > >> Date: Mon Jan 28 18:21:51 2019 +0800
> > > >>
> > > >> HBASE-21792 Mark HTableMultiplexer as deprecated and remove it in
> > 3.0.0
> > > >>
> > > >> Signed-off-by: Michael Stack 
> > > >>
> > > >> Sean Busbey  于2019年1月24日周四 上午9:45写道:
> > > >>
> > > >> > Okay it sounds like we definitely need a better doc about this as
> a
> > > >> > starting point. I have some additional questions about failure
> > > handling;
> > > >> > should we go through them here or in a jira about improving how
> > > upgrade
> > > >> > from 2.0/2.1 gets handled?
> > > >> >
> > > >> > On Wed, Jan 23, 2019, 07:33 张铎(Duo Zhang)  > > wrote:
> > > >> >
> > > >> > > Oh, you misunderstood me. It is just for master, you need make
> > sure
> > > >> that
> > > >> > > before you killing the old version master, there is no RITs. And
> > > once
> > > >> the
> > > >> > > new master is up, everything is fine., as the new master will
> > detect
> > > >> if
> > > >> > > there are old style AssignProcedure/UnassignProcedures, if so it
> > > will
> > > >> > quit
> > > >> > > immediately.
> > > >> > >
> > > >> > > Sean Busbey  于2019年1月23日周三 下午9:21写道:
> > > >> > >
> > > >> > > > Yes, please. I don't think it's reasonable to expect no region
> > > >> > > transitions
> > > >> > > > during a rolling upgrade window; that would imply no servers
> can
> > > >> crash
> > > >> > > > during the upgrade.
> > > >> > > >
> > > >> > > > On Wed, Jan 23, 2019, 07:04 张铎(Duo Zhang) <
> > palomino...@gmail.com
> > > >> > wrote:
> > > >> > > >
> > > >> > > > > From 1.x it is OK, but from 2.0 or 2.1, we need make sure
> that
> > > >> there
> > > >> > > are
> > > >> > > > no
> > > >> > > > > RITs ongoing. We could see if we can make the upgrading more
> > > >> > smoothly.
> > > >> > > > >
> > > >> > > > > Guanghao Zhang  于2019年1月23日周三 下午8:56写道:
> > > >> > > > >
> > > >> > > > > > Our new internal branch is based on branch-2. And we
> already
> > > >> > rolling
> > > >> > > > > > upgrade our staging cluster from our internal 0.98 branch
> to
> > > >> it...
> > > >> > I
> > > >> > > > will
> > > >> > > > > > take a try for 2.0.* to 2.2.0.
> > > >> > > > > >
> > > >> > > > > > Are you saying we need to make sure it can do this? Or are
> > you
&g

Re: Release 2.2.0

2019-03-05 Thread Jean-Marc Spaggiari
Bump ;)

Le mar. 5 févr. 2019 à 18:43, Jean-Marc Spaggiari 
a écrit :

> Hi all,
>
> When we will have a 2.2.0 RC I will give a try of the upgrade path from
> 2.0.x... do we have any idea when this will be out?
>
> JMS
>
> Le mar. 29 janv. 2019 02 h 09, Guanghao Zhang  a
> écrit :
>
>> Cut a new branch-2.2 at this commit.
>>
>> commit e736d78362253936492fb3bd16e614d14859281d
>> Author: Duo Zhang 
>> Date: Mon Jan 28 18:21:51 2019 +0800
>>
>> HBASE-21792 Mark HTableMultiplexer as deprecated and remove it in 3.0.0
>>
>> Signed-off-by: Michael Stack 
>>
>> Sean Busbey  于2019年1月24日周四 上午9:45写道:
>>
>> > Okay it sounds like we definitely need a better doc about this as a
>> > starting point. I have some additional questions about failure handling;
>> > should we go through them here or in a jira about improving how upgrade
>> > from 2.0/2.1 gets handled?
>> >
>> > On Wed, Jan 23, 2019, 07:33 张铎(Duo Zhang) > >
>> > > Oh, you misunderstood me. It is just for master, you need make sure
>> that
>> > > before you killing the old version master, there is no RITs. And once
>> the
>> > > new master is up, everything is fine., as the new master will detect
>> if
>> > > there are old style AssignProcedure/UnassignProcedures, if so it will
>> > quit
>> > > immediately.
>> > >
>> > > Sean Busbey  于2019年1月23日周三 下午9:21写道:
>> > >
>> > > > Yes, please. I don't think it's reasonable to expect no region
>> > > transitions
>> > > > during a rolling upgrade window; that would imply no servers can
>> crash
>> > > > during the upgrade.
>> > > >
>> > > > On Wed, Jan 23, 2019, 07:04 张铎(Duo Zhang) > > wrote:
>> > > >
>> > > > > From 1.x it is OK, but from 2.0 or 2.1, we need make sure that
>> there
>> > > are
>> > > > no
>> > > > > RITs ongoing. We could see if we can make the upgrading more
>> > smoothly.
>> > > > >
>> > > > > Guanghao Zhang  于2019年1月23日周三 下午8:56写道:
>> > > > >
>> > > > > > Our new internal branch is based on branch-2. And we already
>> > rolling
>> > > > > > upgrade our staging cluster from our internal 0.98 branch to
>> it...
>> > I
>> > > > will
>> > > > > > take a try for 2.0.* to 2.2.0.
>> > > > > >
>> > > > > > Are you saying we need to make sure it can do this? Or are you
>> > > > > > > asserting that it already does?
>> > > > > > >
>> > > > > > It already does.
>> > > > > >
>> > > > > >
>> > > > > > Sean Busbey  于2019年1月22日周二 下午9:48写道:
>> > > > > >
>> > > > > > > excellent! thanks for volunteering to get the 2.2 release line
>> > > going
>> > > > > > > Guanghao!
>> > > > > > >
>> > > > > > > > Need to add more document about how to rolling upgrade from
>> > > > > > > 2.0.* or 2.1.* to 2.2.*. Meanwhile, need document about how
>> > rolling
>> > > > > > upgrade
>> > > > > > > from 1.* to 2.2.*.
>> > > > > > >
>> > > > > > > Has anyone confirmed that rolling upgrade to 2.2.0 works? I
>> > thought
>> > > > it
>> > > > > > > couldn't because of the change to assignment handling classes?
>> > > > > > >
>> > > > > > > > Now HBCK2 tool support branch-2's region assignments,
>> > > > > > > too.
>> > > > > > >
>> > > > > > > Are you saying we need to make sure it can do this? Or are you
>> > > > > > > asserting that it already does?
>> > > > > > >
>> > > > > > > On Sun, Jan 20, 2019 at 10:25 PM Guanghao Zhang <
>> > > zghao...@gmail.com>
>> > > > > > > wrote:
>> > > > > > > >
>> > > > > > > > Hi, all, there has been six months since we released 2.1.0.
>> And
>> > > > there
>> > > > > > are
>> > > > > > > > 429 issues which fixed version is 2.2

Re: Branch-2.0: EOL and a 2.0.5RC

2019-03-05 Thread Jean-Marc Spaggiari
Did as you described. Removed Master WALs, restarted, still getting 80% of
my regions stuck. I will keep playing a bit but I think I will "just"
upgrade to the last 2.2 and retry... Right now 2.0.4 seems a bit unstable
for me :(

Le sam. 16 févr. 2019 à 01:38, Stack  a écrit :

> Remove the Master WALs at least. The beta is damaged.
>
> If testing, trying to help out, I'd suggest move to branch-2.2. It is the
> next minor release. 2.1.x has had some love and is in a good state. It is
> superior to 2.0.x which we are trying to EOL (too many branches) so your
> time would be better spent elsewhere.
>
> Thanks for the help JMS,
> S
>
>
> On Wed, Feb 13, 2019 at 5:03 AM Jean-Marc Spaggiari <
> jean-m...@spaggiari.org>
> wrote:
>
> > Hi Stack,
> >
> > Thanks for looking at it. So what's next? Do you want to me try to put
> the
> > latest 2.1 RC on top of it and see if it behaves well? Or just remove the
> > Master WALs and stay on 2.0.4 to try the 2.2 upgrade path? I don' thave
> an
> > urgent need of this cluster, so I can play a bit with it if it helps.
> >
> > Any JIRA that I should open or update based on that?
> >
> > JMS
> >
> > Le lun. 11 févr. 2019 à 21:16, Stack  a écrit :
> >
> > > Took a quick look.
> > >
> > > On startup, it has hundreds of AMv2 WALs to recover. Its backed up
> unable
> > > to let go of the early files because some old procedures have not
> > > 'completed'.. This sort of backup problem has been addressed in later
> > > hbase-2.0s.
> > >
> > > (Looks like oldest WAL is hdfs://
> > >
> >
> node2.distparser.com:8020/hbase/MasterProcWALs/pv2-0016.log
> > > )
> > >
> > > It then finds corrupt procedures which is probably why we are retaining
> > > logs in first place. Corruption was fixed in later hbase-2.0s (may be a
> > new
> > > instance of corruption raising its head on master branch... but that
> > > doesn't apply here).
> > >
> > > There is then an issue assigning namespace -- it has a mangled hostname
> > --
> > > which is probably why the fail... Would be interesting if we saw
> similar
> > in
> > > a later hbase... but probably a result of the mess above.
> > >
> > > Thanks for letting me take a look JMS.
> > >
> > > S
> > >
> > >
> > >
> > > On Mon, Feb 11, 2019 at 4:24 PM Jean-Marc Spaggiari <
> > > jean-m...@spaggiari.org>
> > > wrote:
> > >
> > > > Doesn't seems to work very well.
> > > >
> > > > Can you try this one?
> > > >
> > > >
> > > >
> > >
> >
> https://drive.google.com/file/d/1PMqJz4LjkEx0U2jYVVSLXaSEqcdHq7w4/view?usp=sharing
> > > >
> > > > JMS
> > > >
> > > > Le lun. 11 févr. 2019 à 16:27, Stack  a écrit :
> > > >
> > > > > On Fri, Feb 8, 2019 at 11:47 AM Jean-Marc Spaggiari <
> > > > > jean-m...@spaggiari.org>
> > > > > wrote:
> > > > >
> > > > > > Probably not the best way to host a file, but logs are there:
> > > > > > https://pastebin.com/dl/JaETSZfh
> > > > > > I had a running 2.0.0-beta but 2.0.4 can't start.
> > > > > > Let me know if you can't see the file (.tar.bz2) and I will try
> > > > something
> > > > > > different.
> > > > > > JMS
> > > > > >
> > > > > >
> > > > > Says "Your request is blocked due to invalid referrer. If you are
> > > trying
> > > > to
> > > > > hotlink this page, please use: https://pastebin.com/raw/JaETSZfh;.
> > > > >
> > > > > I'm doing something wrong?
> > > > >
> > > > > Thanks JMS,
> > > > > S
> > > > >
> > > > >
> > > > > > Le ven. 8 févr. 2019 à 14:15, Stack  a écrit :
> > > > > >
> > > > > > > On Wed, Feb 6, 2019 at 1:42 PM Jean-Marc Spaggiari <
> > > > > > > jean-m...@spaggiari.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Quick question here. Just upgraded from 2.0.0 BETA 2 to 2.0.4
> > and
> > > > I'm
> > > > > > > > having some issues to start HBase. I will probably be able to
> > fix
> > > >

Re: Comments on the blog

2019-03-02 Thread Jean-Marc Spaggiari
Hum. I didn't find any option to create an account. I found the login form,
but as you sais, I don't think I have any account, so tried few without any
luck :( Any idea where to go to create one?

JM

Le sam. 2 mars 2019 à 16:36, Josh Elser  a écrit :

> Hey JMS,
>
> Thanks for the keen eye. If you create an account on blogs.a.o, we can
> give you the appropriate karma to delete the spam and prevent new comments.
>
> I don't see an account for you on the system, else I'd have added you
> already :)
>
> On 3/2/19 10:18 AM, Jean-Marc Spaggiari wrote:
> > All,
> >
> > Comments here
> https://blogs.apache.org/hbase/entry/coprocessor_introduction
> > look more like spam than anything else. What does it take to clean that?
> > I'm can help with it.
> >
> > Thanks,
> >
> > JMS
> >
>


Comments on the blog

2019-03-02 Thread Jean-Marc Spaggiari
All,

Comments here https://blogs.apache.org/hbase/entry/coprocessor_introduction
look more like spam than anything else. What does it take to clean that?
I'm can help with it.

Thanks,

JMS


Re: Branch-2.0: EOL and a 2.0.5RC

2019-02-13 Thread Jean-Marc Spaggiari
Hi Stack,

Thanks for looking at it. So what's next? Do you want to me try to put the
latest 2.1 RC on top of it and see if it behaves well? Or just remove the
Master WALs and stay on 2.0.4 to try the 2.2 upgrade path? I don' thave an
urgent need of this cluster, so I can play a bit with it if it helps.

Any JIRA that I should open or update based on that?

JMS

Le lun. 11 févr. 2019 à 21:16, Stack  a écrit :

> Took a quick look.
>
> On startup, it has hundreds of AMv2 WALs to recover. Its backed up unable
> to let go of the early files because some old procedures have not
> 'completed'.. This sort of backup problem has been addressed in later
> hbase-2.0s.
>
> (Looks like oldest WAL is hdfs://
> node2.distparser.com:8020/hbase/MasterProcWALs/pv2-0016.log
> )
>
> It then finds corrupt procedures which is probably why we are retaining
> logs in first place. Corruption was fixed in later hbase-2.0s (may be a new
> instance of corruption raising its head on master branch... but that
> doesn't apply here).
>
> There is then an issue assigning namespace -- it has a mangled hostname  --
> which is probably why the fail... Would be interesting if we saw similar in
> a later hbase... but probably a result of the mess above.
>
> Thanks for letting me take a look JMS.
>
> S
>
>
>
> On Mon, Feb 11, 2019 at 4:24 PM Jean-Marc Spaggiari <
> jean-m...@spaggiari.org>
> wrote:
>
> > Doesn't seems to work very well.
> >
> > Can you try this one?
> >
> >
> >
> https://drive.google.com/file/d/1PMqJz4LjkEx0U2jYVVSLXaSEqcdHq7w4/view?usp=sharing
> >
> > JMS
> >
> > Le lun. 11 févr. 2019 à 16:27, Stack  a écrit :
> >
> > > On Fri, Feb 8, 2019 at 11:47 AM Jean-Marc Spaggiari <
> > > jean-m...@spaggiari.org>
> > > wrote:
> > >
> > > > Probably not the best way to host a file, but logs are there:
> > > > https://pastebin.com/dl/JaETSZfh
> > > > I had a running 2.0.0-beta but 2.0.4 can't start.
> > > > Let me know if you can't see the file (.tar.bz2) and I will try
> > something
> > > > different.
> > > > JMS
> > > >
> > > >
> > > Says "Your request is blocked due to invalid referrer. If you are
> trying
> > to
> > > hotlink this page, please use: https://pastebin.com/raw/JaETSZfh;.
> > >
> > > I'm doing something wrong?
> > >
> > > Thanks JMS,
> > > S
> > >
> > >
> > > > Le ven. 8 févr. 2019 à 14:15, Stack  a écrit :
> > > >
> > > > > On Wed, Feb 6, 2019 at 1:42 PM Jean-Marc Spaggiari <
> > > > > jean-m...@spaggiari.org>
> > > > > wrote:
> > > > >
> > > > > > Quick question here. Just upgraded from 2.0.0 BETA 2 to 2.0.4 and
> > I'm
> > > > > > having some issues to start HBase. I will probably be able to fix
> > > that,
> > > > > but
> > > > > > I'm wondering if you want me to document those issues or if 2.0.x
> > is
> > > > too
> > > > > > old so it's not needed?
> > > > > >
> > > > > >
> > > > > Hey JMS.
> > > > >
> > > > > Would be interested in the issues you are seeing. Perhaps they
> > pertain
> > > to
> > > > > branch-2.1+?
> > > > >
> > > > > Thanks,
> > > > > S
> > > > >
> > > > >
> > > > >
> > > > > > Le lun. 4 févr. 2019 à 21:11, 张铎(Duo Zhang) <
> palomino...@gmail.com
> > >
> > > a
> > > > > > écrit :
> > > > > >
> > > > > > > +1
> > > > > > >
> > > > > > > Andrew Purtell  于2019年2月5日周二 上午3:24写道:
> > > > > > >
> > > > > > > > +1
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, Feb 4, 2019 at 10:29 AM Stack 
> > wrote:
> > > > > > > >
> > > > > > > > > I was going to put up a 2.0.5 RC to address
> > > > > > > > > https://issues.apache.org/jira/browse/HBASE-21791.Its been
> > > > about a
> > > > > > > month
> > > > > > > > > since 2.0.4. There are 50 odd fixes in 2.0.5 currently [1]
> > > mostly
> > > > > > > > > perfunctory backports.
> > > > > > > > >
> > > > > > > > > Stepping back though, I'd like to entertain our letting the
> > > > > > > branch-2.0.x
> > > > > > > > > releases go. After 2.0.5, let's not backport going forward
> > and
> > > > > > > encourage
> > > > > > > > > users to move to branch-2.1. I can make a release should a
> > > > > > > super-critical
> > > > > > > > > arise but otherwise, will let the branch go to seed. We
> have
> > > > enough
> > > > > > > > > branches as it is (smile).
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > S
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > 1.
> > > > https://issues.apache.org/jira/projects/HBASE/versions/12344604
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Best regards,
> > > > > > > > Andrew
> > > > > > > >
> > > > > > > > Words like orphans lost among the crosstalk, meaning torn
> from
> > > > > truth's
> > > > > > > > decrepit hands
> > > > > > > >- A23, Crosstalk
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: Branch-2.0: EOL and a 2.0.5RC

2019-02-11 Thread Jean-Marc Spaggiari
Doesn't seems to work very well.

Can you try this one?

https://drive.google.com/file/d/1PMqJz4LjkEx0U2jYVVSLXaSEqcdHq7w4/view?usp=sharing

JMS

Le lun. 11 févr. 2019 à 16:27, Stack  a écrit :

> On Fri, Feb 8, 2019 at 11:47 AM Jean-Marc Spaggiari <
> jean-m...@spaggiari.org>
> wrote:
>
> > Probably not the best way to host a file, but logs are there:
> > https://pastebin.com/dl/JaETSZfh
> > I had a running 2.0.0-beta but 2.0.4 can't start.
> > Let me know if you can't see the file (.tar.bz2) and I will try something
> > different.
> > JMS
> >
> >
> Says "Your request is blocked due to invalid referrer. If you are trying to
> hotlink this page, please use: https://pastebin.com/raw/JaETSZfh;.
>
> I'm doing something wrong?
>
> Thanks JMS,
> S
>
>
> > Le ven. 8 févr. 2019 à 14:15, Stack  a écrit :
> >
> > > On Wed, Feb 6, 2019 at 1:42 PM Jean-Marc Spaggiari <
> > > jean-m...@spaggiari.org>
> > > wrote:
> > >
> > > > Quick question here. Just upgraded from 2.0.0 BETA 2 to 2.0.4 and I'm
> > > > having some issues to start HBase. I will probably be able to fix
> that,
> > > but
> > > > I'm wondering if you want me to document those issues or if 2.0.x is
> > too
> > > > old so it's not needed?
> > > >
> > > >
> > > Hey JMS.
> > >
> > > Would be interested in the issues you are seeing. Perhaps they pertain
> to
> > > branch-2.1+?
> > >
> > > Thanks,
> > > S
> > >
> > >
> > >
> > > > Le lun. 4 févr. 2019 à 21:11, 张铎(Duo Zhang) 
> a
> > > > écrit :
> > > >
> > > > > +1
> > > > >
> > > > > Andrew Purtell  于2019年2月5日周二 上午3:24写道:
> > > > >
> > > > > > +1
> > > > > >
> > > > > >
> > > > > > On Mon, Feb 4, 2019 at 10:29 AM Stack  wrote:
> > > > > >
> > > > > > > I was going to put up a 2.0.5 RC to address
> > > > > > > https://issues.apache.org/jira/browse/HBASE-21791.Its been
> > about a
> > > > > month
> > > > > > > since 2.0.4. There are 50 odd fixes in 2.0.5 currently [1]
> mostly
> > > > > > > perfunctory backports.
> > > > > > >
> > > > > > > Stepping back though, I'd like to entertain our letting the
> > > > > branch-2.0.x
> > > > > > > releases go. After 2.0.5, let's not backport going forward and
> > > > > encourage
> > > > > > > users to move to branch-2.1. I can make a release should a
> > > > > super-critical
> > > > > > > arise but otherwise, will let the branch go to seed. We have
> > enough
> > > > > > > branches as it is (smile).
> > > > > > >
> > > > > > > Thanks,
> > > > > > > S
> > > > > > >
> > > > > > >
> > > > > > > 1.
> > https://issues.apache.org/jira/projects/HBASE/versions/12344604
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best regards,
> > > > > > Andrew
> > > > > >
> > > > > > Words like orphans lost among the crosstalk, meaning torn from
> > > truth's
> > > > > > decrepit hands
> > > > > >- A23, Crosstalk
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: Branch-2.0: EOL and a 2.0.5RC

2019-02-08 Thread Jean-Marc Spaggiari
Probably not the best way to host a file, but logs are there:
https://pastebin.com/dl/JaETSZfh
I had a running 2.0.0-beta but 2.0.4 can't start.
Let me know if you can't see the file (.tar.bz2) and I will try something
different.
JMS

Le ven. 8 févr. 2019 à 14:15, Stack  a écrit :

> On Wed, Feb 6, 2019 at 1:42 PM Jean-Marc Spaggiari <
> jean-m...@spaggiari.org>
> wrote:
>
> > Quick question here. Just upgraded from 2.0.0 BETA 2 to 2.0.4 and I'm
> > having some issues to start HBase. I will probably be able to fix that,
> but
> > I'm wondering if you want me to document those issues or if 2.0.x is too
> > old so it's not needed?
> >
> >
> Hey JMS.
>
> Would be interested in the issues you are seeing. Perhaps they pertain to
> branch-2.1+?
>
> Thanks,
> S
>
>
>
> > Le lun. 4 févr. 2019 à 21:11, 张铎(Duo Zhang)  a
> > écrit :
> >
> > > +1
> > >
> > > Andrew Purtell  于2019年2月5日周二 上午3:24写道:
> > >
> > > > +1
> > > >
> > > >
> > > > On Mon, Feb 4, 2019 at 10:29 AM Stack  wrote:
> > > >
> > > > > I was going to put up a 2.0.5 RC to address
> > > > > https://issues.apache.org/jira/browse/HBASE-21791.Its been about a
> > > month
> > > > > since 2.0.4. There are 50 odd fixes in 2.0.5 currently [1] mostly
> > > > > perfunctory backports.
> > > > >
> > > > > Stepping back though, I'd like to entertain our letting the
> > > branch-2.0.x
> > > > > releases go. After 2.0.5, let's not backport going forward and
> > > encourage
> > > > > users to move to branch-2.1. I can make a release should a
> > > super-critical
> > > > > arise but otherwise, will let the branch go to seed. We have enough
> > > > > branches as it is (smile).
> > > > >
> > > > > Thanks,
> > > > > S
> > > > >
> > > > >
> > > > > 1. https://issues.apache.org/jira/projects/HBASE/versions/12344604
> > > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Andrew
> > > >
> > > > Words like orphans lost among the crosstalk, meaning torn from
> truth's
> > > > decrepit hands
> > > >- A23, Crosstalk
> > > >
> > >
> >
>


Re: Branch-2.0: EOL and a 2.0.5RC

2019-02-06 Thread Jean-Marc Spaggiari
Quick question here. Just upgraded from 2.0.0 BETA 2 to 2.0.4 and I'm
having some issues to start HBase. I will probably be able to fix that, but
I'm wondering if you want me to document those issues or if 2.0.x is too
old so it's not needed?

Le lun. 4 févr. 2019 à 21:11, 张铎(Duo Zhang)  a
écrit :

> +1
>
> Andrew Purtell  于2019年2月5日周二 上午3:24写道:
>
> > +1
> >
> >
> > On Mon, Feb 4, 2019 at 10:29 AM Stack  wrote:
> >
> > > I was going to put up a 2.0.5 RC to address
> > > https://issues.apache.org/jira/browse/HBASE-21791.Its been about a
> month
> > > since 2.0.4. There are 50 odd fixes in 2.0.5 currently [1] mostly
> > > perfunctory backports.
> > >
> > > Stepping back though, I'd like to entertain our letting the
> branch-2.0.x
> > > releases go. After 2.0.5, let's not backport going forward and
> encourage
> > > users to move to branch-2.1. I can make a release should a
> super-critical
> > > arise but otherwise, will let the branch go to seed. We have enough
> > > branches as it is (smile).
> > >
> > > Thanks,
> > > S
> > >
> > >
> > > 1. https://issues.apache.org/jira/projects/HBASE/versions/12344604
> > >
> >
> >
> > --
> > Best regards,
> > Andrew
> >
> > Words like orphans lost among the crosstalk, meaning torn from truth's
> > decrepit hands
> >- A23, Crosstalk
> >
>


Re: Release 2.2.0

2019-02-05 Thread Jean-Marc Spaggiari
Hi all,

When we will have a 2.2.0 RC I will give a try of the upgrade path from
2.0.x... do we have any idea when this will be out?

JMS

Le mar. 29 janv. 2019 02 h 09, Guanghao Zhang  a écrit :

> Cut a new branch-2.2 at this commit.
>
> commit e736d78362253936492fb3bd16e614d14859281d
> Author: Duo Zhang 
> Date: Mon Jan 28 18:21:51 2019 +0800
>
> HBASE-21792 Mark HTableMultiplexer as deprecated and remove it in 3.0.0
>
> Signed-off-by: Michael Stack 
>
> Sean Busbey  于2019年1月24日周四 上午9:45写道:
>
> > Okay it sounds like we definitely need a better doc about this as a
> > starting point. I have some additional questions about failure handling;
> > should we go through them here or in a jira about improving how upgrade
> > from 2.0/2.1 gets handled?
> >
> > On Wed, Jan 23, 2019, 07:33 张铎(Duo Zhang)  >
> > > Oh, you misunderstood me. It is just for master, you need make sure
> that
> > > before you killing the old version master, there is no RITs. And once
> the
> > > new master is up, everything is fine., as the new master will detect if
> > > there are old style AssignProcedure/UnassignProcedures, if so it will
> > quit
> > > immediately.
> > >
> > > Sean Busbey  于2019年1月23日周三 下午9:21写道:
> > >
> > > > Yes, please. I don't think it's reasonable to expect no region
> > > transitions
> > > > during a rolling upgrade window; that would imply no servers can
> crash
> > > > during the upgrade.
> > > >
> > > > On Wed, Jan 23, 2019, 07:04 张铎(Duo Zhang)  > wrote:
> > > >
> > > > > From 1.x it is OK, but from 2.0 or 2.1, we need make sure that
> there
> > > are
> > > > no
> > > > > RITs ongoing. We could see if we can make the upgrading more
> > smoothly.
> > > > >
> > > > > Guanghao Zhang  于2019年1月23日周三 下午8:56写道:
> > > > >
> > > > > > Our new internal branch is based on branch-2. And we already
> > rolling
> > > > > > upgrade our staging cluster from our internal 0.98 branch to
> it...
> > I
> > > > will
> > > > > > take a try for 2.0.* to 2.2.0.
> > > > > >
> > > > > > Are you saying we need to make sure it can do this? Or are you
> > > > > > > asserting that it already does?
> > > > > > >
> > > > > > It already does.
> > > > > >
> > > > > >
> > > > > > Sean Busbey  于2019年1月22日周二 下午9:48写道:
> > > > > >
> > > > > > > excellent! thanks for volunteering to get the 2.2 release line
> > > going
> > > > > > > Guanghao!
> > > > > > >
> > > > > > > > Need to add more document about how to rolling upgrade from
> > > > > > > 2.0.* or 2.1.* to 2.2.*. Meanwhile, need document about how
> > rolling
> > > > > > upgrade
> > > > > > > from 1.* to 2.2.*.
> > > > > > >
> > > > > > > Has anyone confirmed that rolling upgrade to 2.2.0 works? I
> > thought
> > > > it
> > > > > > > couldn't because of the change to assignment handling classes?
> > > > > > >
> > > > > > > > Now HBCK2 tool support branch-2's region assignments,
> > > > > > > too.
> > > > > > >
> > > > > > > Are you saying we need to make sure it can do this? Or are you
> > > > > > > asserting that it already does?
> > > > > > >
> > > > > > > On Sun, Jan 20, 2019 at 10:25 PM Guanghao Zhang <
> > > zghao...@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > Hi, all, there has been six months since we released 2.1.0.
> And
> > > > there
> > > > > > are
> > > > > > > > 429 issues which fixed version is 2.2.0[1]. Our internal
> branch
> > > > which
> > > > > > > based
> > > > > > > > branch-2 run ITBLL successfully recently. branch-2 is stable
> > now
> > > > and
> > > > > it
> > > > > > > is
> > > > > > > > time to release 2.2.0. I volunteered to be the release
> manager
> > > for
> > > > > the
> > > > > > > 2.2
> > > > > > > > release line. And plan to cut branch-2.2 from branch-2.
> > > > > > > >
> > > > > > > > For 2.2.0, the biggest change is about AMV2[2]: HBASE-20881
> is
> > an
> > > > > > > > incompatible change and different implemenation with
> branch-2.0
> > > and
> > > > > > > > branch-2.1. Need to add more document about how to rolling
> > > upgrade
> > > > > from
> > > > > > > > 2.0.* or 2.1.* to 2.2.*. Meanwhile, need document about how
> > > rolling
> > > > > > > upgrade
> > > > > > > > from 1.* to 2.2.*. Now HBCK2 tool support branch-2's region
> > > > > > assignments,
> > > > > > > > too.
> > > > > > > >
> > > > > > > > Another features will be included:
> > > > > > > > 1. HBASE-20610 Procedure V2 - Distributed Log Splitting[3]
> > > > > > > > 2. HBASE-21649 Complete Thrift2[4]
> > > > > > > > 3. HBASE-16707 Improve throttling feature for production
> > usage[5]
> > > > > > > > 4. HBASE-20886 [Auth] Support keytab login in hbase client[6]
> > > > > > > > 5. HBASE-20636 Introduce two bloom filter type :
> > > > > ROWPREFIX_FIXED_LENGTH
> > > > > > > and
> > > > > > > > ROWPREFIX_DELIMITED[7]
> > > > > > > >
> > > > > > > > Open a issue HBASE-21747 to release 2.2.0[8]. Suggestions are
> > > > > welcomed.
> > > > > > > > Thanks.
> > > > > > > >
> > > > > > > > Best Regards,
> > > > > > > > Guanghao
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >

Re: [VOTE] First release candidate for HBase 2.0.0 (RC0) is available

2018-04-13 Thread Jean-Marc Spaggiari
Exactly what I was looking for! Thanks Sean!

2018-04-13 9:37 GMT-04:00 Sean Busbey <bus...@apache.org>:

> Hi JMS!
>
> Current flaky results for branch-2.0, generated today AFAICT:
>
> https://builds.apache.org/view/H-L/view/HBase/job/HBase-
> Find-Flaky-Tests-branch2.0/lastSuccessfulBuild/artifact/
>
> you can look at the "dashboard.html" artifact for a human usable idea
> of what we think is broken.
>
> you can fetch the excludes file to get a pattern you can pass to Maven.
>
>
> On Fri, Apr 13, 2018 at 7:40 AM, Jean-Marc Spaggiari
> <jean-m...@spaggiari.org> wrote:
> > Do we have a list of tests that we know will not pass this release?
> >
> > I got those failures so far, but since I want to run multiple runs, I
> want
> > to make sure to exclude the un-stable tests.
> >
> > TestAssignmentManagerMetrics.testRITAssignmentManagerMetrics:152 Metrics
> > Should be equal expected:<1> but was:<0>
> > TestReplicationSmallTests.testDisableEnable:198 Replication wasn't
> disabled
> > TestReplicationSmallTests.testSimplePutDelete:168->TestReplicationBase.
> runSimplePutDeleteTest:266
> > Waited too much time for put replication
> > TestClientOperationTimeout.setUp:99 » SocketTimeout callTimeout=500,
> > callDurat...
> >
> > Thanks,
> >
> > JMS
> >
> >
> > 2018-04-13 0:19 GMT-04:00 Yu Li <car...@gmail.com>:
> >
> >> bq. I'd imagine that if the difference were large, then yes, it should
> >> be a blocker
> >> -- or we as a community can decide to work on it in a follow-on release
> >> making perf a priority (say, 2.1.0).
> >> I see, thanks for the clarification boss, makes sense.
> >>
> >> Best Regards,
> >> Yu
> >>
> >> On 13 April 2018 at 06:02, Stack <st...@duboce.net> wrote:
> >>
> >> > On Thu, Apr 12, 2018 at 1:47 PM, Stack <st...@duboce.net> wrote:
> >> >
> >> > > On Tue, Apr 10, 2018 at 2:50 PM, Sean Busbey <bus...@apache.org>
> >> wrote:
> >> > >
> >> > >> no compat report in the RC directory. does that mean we won't have
> one
> >> > >> in the dist area?
> >> > >>
> >> > >>
> >> > >> not a blocker; we've been inconsistent on it in prior releases, but
> >> > >> the trend seemed to be towards including it.
> >> > >>
> >> > >>
> >> > >
> >> > > HBASE-18622 has current state of compatibility compare. Let me do a
> new
> >> > > run and add it to the RC dir.
> >> > >
> >> >
> >> > I just added
> >> > https://dist.apache.org/repos/dist/dev/hbase/hbase-2.0.0RC0/
> >> > compat-check-v1.2.6-v2.0.0.report.html
> >> >
> >> > Thanks,
> >> > S
> >> >
> >> >
> >> >
> >> > > St.Ack
> >> > >
> >> > >
> >> > >
> >> > >>
> >> > >> On Tue, Apr 10, 2018 at 3:47 PM, Stack <st...@duboce.net> wrote:
> >> > >> > The first release candidate for Apache HBase 2.0.0 is available
> for
> >> > >> > downloading and testing.
> >> > >> >
> >> > >> > Artifacts are available here:
> >> > >> >
> >> > >> >  https://dist.apache.org/repos/dist/dev/hbase/hbase-2.0.0RC0/
> >> > >> >
> >> > >> > Maven artifacts are available in the staging repository at:
> >> > >> >
> >> > >> >  https://repository.apache.org/content/repositories/
> >> > orgapachehbase-1209
> >> > >> >
> >> > >> > All artifacts are signed with my signing key 8ACC93D2, which is
> also
> >> > >> > in the project KEYS file at
> >> > >> >
> >> > >> >  http://www.apache.org/dist/hbase/KEYS
> >> > >> >
> >> > >> > These artifacts were tagged 2.0.0RC0 at
> >> > >> > hash 011dd2dae33456b3a2bcc2513e9fdd29de23be46
> >> > >> >
> >> > >> > Please review 'Upgrading from 1.x to 2.x' in the bundled HBase
> 2.0.0
> >> > >> > Reference Guide before installing or upgrading for a list of
> >> > >> > incompatibilities, major changes, and notable new features. Be
> aware
> >> &g

Re: [VOTE] First release candidate for HBase 2.0.0 (RC0) is available

2018-04-13 Thread Jean-Marc Spaggiari
Do we have a list of tests that we know will not pass this release?

I got those failures so far, but since I want to run multiple runs, I want
to make sure to exclude the un-stable tests.

TestAssignmentManagerMetrics.testRITAssignmentManagerMetrics:152 Metrics
Should be equal expected:<1> but was:<0>
TestReplicationSmallTests.testDisableEnable:198 Replication wasn't disabled
TestReplicationSmallTests.testSimplePutDelete:168->TestReplicationBase.runSimplePutDeleteTest:266
Waited too much time for put replication
TestClientOperationTimeout.setUp:99 » SocketTimeout callTimeout=500,
callDurat...

Thanks,

JMS


2018-04-13 0:19 GMT-04:00 Yu Li :

> bq. I'd imagine that if the difference were large, then yes, it should
> be a blocker
> -- or we as a community can decide to work on it in a follow-on release
> making perf a priority (say, 2.1.0).
> I see, thanks for the clarification boss, makes sense.
>
> Best Regards,
> Yu
>
> On 13 April 2018 at 06:02, Stack  wrote:
>
> > On Thu, Apr 12, 2018 at 1:47 PM, Stack  wrote:
> >
> > > On Tue, Apr 10, 2018 at 2:50 PM, Sean Busbey 
> wrote:
> > >
> > >> no compat report in the RC directory. does that mean we won't have one
> > >> in the dist area?
> > >>
> > >>
> > >> not a blocker; we've been inconsistent on it in prior releases, but
> > >> the trend seemed to be towards including it.
> > >>
> > >>
> > >
> > > HBASE-18622 has current state of compatibility compare. Let me do a new
> > > run and add it to the RC dir.
> > >
> >
> > I just added
> > https://dist.apache.org/repos/dist/dev/hbase/hbase-2.0.0RC0/
> > compat-check-v1.2.6-v2.0.0.report.html
> >
> > Thanks,
> > S
> >
> >
> >
> > > St.Ack
> > >
> > >
> > >
> > >>
> > >> On Tue, Apr 10, 2018 at 3:47 PM, Stack  wrote:
> > >> > The first release candidate for Apache HBase 2.0.0 is available for
> > >> > downloading and testing.
> > >> >
> > >> > Artifacts are available here:
> > >> >
> > >> >  https://dist.apache.org/repos/dist/dev/hbase/hbase-2.0.0RC0/
> > >> >
> > >> > Maven artifacts are available in the staging repository at:
> > >> >
> > >> >  https://repository.apache.org/content/repositories/
> > orgapachehbase-1209
> > >> >
> > >> > All artifacts are signed with my signing key 8ACC93D2, which is also
> > >> > in the project KEYS file at
> > >> >
> > >> >  http://www.apache.org/dist/hbase/KEYS
> > >> >
> > >> > These artifacts were tagged 2.0.0RC0 at
> > >> > hash 011dd2dae33456b3a2bcc2513e9fdd29de23be46
> > >> >
> > >> > Please review 'Upgrading from 1.x to 2.x' in the bundled HBase 2.0.0
> > >> > Reference Guide before installing or upgrading for a list of
> > >> > incompatibilities, major changes, and notable new features. Be aware
> > >> that
> > >> > according to our adopted Semantic Versioning guidelines[1], we've
> > allow
> > >> > ourselves to make breaking changes in this major version release.
> For
> > >> > example, Coprocessors will need to be recast to fit more constrained
> > CP
> > >> > APIs and a rolling upgrade of an hbase-1.x install to hbase-2.x
> > without
> > >> > downtime is (currently) not possible. That said, a bunch of effort
> has
> > >> been
> > >> > expended mitigating differences; a hbase-1.x client can perform DML
> > >> against
> > >> > an hbase-2 cluster.
> > >> >
> > >> > For the full list of ~6k issues addressed, see [2]. There are also
> > >> > CHANGES.md and RELEASENOTES.md in the root directory of the source
> > >> tarball.
> > >> >
> > >> > Please take a few minutes to verify the release and vote on
> releasing
> > >> it:
> > >> >
> > >> > [ ] +1 Release this package as Apache HBase 2.0.0
> > >> > [ ] +0 no opinion
> > >> > [ ] -1 Do not release this package because...
> > >> >
> > >> > This VOTE will run for one week and close Tuesday, April 17, 2018 @
> > >> 13:00
> > >> > PST.
> > >> >
> > >> > Thanks to the myriad who have helped out with this release,
> > >> > Your 2.0.0 Release Manager
> > >> >
> > >> > 1. http://hbase.apache.org/2.0/book.html#hbase.versioning.post10
> > >> > 2.  https://s.apache.org/zwS9
> > >>
> > >
> > >
> >
>


Re: Building Trunk

2018-03-08 Thread Jean-Marc Spaggiari
Hum. mvn clean and rebuild passed. Strange. forget about that. Thanks.


2018-03-08 9:51 GMT-05:00 Jean-Marc Spaggiari <jean-m...@spaggiari.org>:

> Hi all,
>
> Trying to build Trunk I'm getting the following error:
> [INFO] -
> [ERROR] COMPILATION ERROR :
> [INFO] -
> [ERROR] /stock/hbase-releases/hbase-master/hbase-client/src/main/
> java/org/apache/hadoop/hbase/client/RawAsyncHBaseAdmin.java:[2626,51]
> cannot access ForeignExceptionMessage
>   class file for ForeignExceptionMessage not found
> [ERROR] /stock/hbase-releases/hbase-master/hbase-client/src/main/
> java/org/apache/hadoop/hbase/client/HBaseAdmin.java:[3508,71]
> incompatible types: org.apache.hadoop.hbase.shaded.protobuf.generated.
> ErrorHandlingProtos.ForeignExceptionMessage cannot be converted to
> ForeignExceptionMessage
> [INFO] 2 errors
> [INFO] -
>
> Am I missing something? Mailing doesn't show any recent message about
> that...
>
> Thanks,
>
> JMS
>
>


Building Trunk

2018-03-08 Thread Jean-Marc Spaggiari
Hi all,

Trying to build Trunk I'm getting the following error:
[INFO] -
[ERROR] COMPILATION ERROR :
[INFO] -
[ERROR]
/stock/hbase-releases/hbase-master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/RawAsyncHBaseAdmin.java:[2626,51]
cannot access ForeignExceptionMessage
  class file for ForeignExceptionMessage not found
[ERROR]
/stock/hbase-releases/hbase-master/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java:[3508,71]
incompatible types:
org.apache.hadoop.hbase.shaded.protobuf.generated.ErrorHandlingProtos.ForeignExceptionMessage
cannot be converted to ForeignExceptionMessage
[INFO] 2 errors
[INFO] -

Am I missing something? Mailing doesn't show any recent message about
that...

Thanks,

JMS


Re: [VOTE] The first hbase-2.0.0-beta-2 Release Candidate is available for download

2018-03-06 Thread Jean-Marc Spaggiari
I deployed it on 8 nodes, running many different things including MR,
RowCounts, compactions, etc. Nothing new. So far so good...

2018-03-06 17:29 GMT-05:00 Stack :

> On Tue, Mar 6, 2018 at 12:52 PM, Peter Somogyi 
> wrote:
>
> > +1 (non-binding)
> >
> > - Signature, checksum OK
> > - Test suite using 1.8.0_161 OK
> > - Build and run from source OK
> > - Run from bin tarball OK
> > - PE 1M rows OK
> > - LTT 1M rows OK
> > - Basic operations from shell and Java client OK
> >
> > One thing I noticed: CHANGES.txt isn't updated, latest information is
> about
> > 0.93.0 - Unreleased. The 1.4.2 release contains up-to-date CHANGES.txt.
> > Will the CHANGES.txt be updated only for the final 2.0.0 release?
> >
> > Thanks Peter for trying the RC. Yeah, CHANGES.txt needs to be up-to-date
> by RC (I tried to note that this had not been done in the head of this
> thread 'Note CHANGES has not yet been updated').
>
> S
>
>
>
> >
> > On Tue, Mar 6, 2018 at 10:58 AM, Artem Ervits 
> > wrote:
> >
> > > +1 (non-binding)
> > >
> > > Hadoop Pseudo-distrbibuted: 2.7.5
> > > $M2_HOME from scratch
> > > Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d;
> > > 2017-10-18T07:58:13Z)
> > > Maven home: /opt/maven/apache-maven-3.5.2
> > > Java version: 1.8.0_161, vendor: Oracle Corporation
> > > Java home: /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.161-0.b14.el7_4.
> > > x86_64/jre
> > > Default locale: en_US, platform encoding: UTF-8
> > > OS name: "linux", version: "3.10.0-693.11.6.el7.x86_64", arch: "amd64",
> > > family: "unix"
> > >
> > > Binary Release:
> > > Java 1M rows: OK
> > > LTT 1M rows: OK
> > > PE 1M rows: OK
> > > MD5: OK
> > >
> > > hbase shell: OK
> > > create, list, scan, count, truncate, disable, drop
> > > snapshot, restore_snapshot
> > > UI:
> > >   split: OK
> > >   merge: OK
> > >
> > > Source Release:
> > > Build with: mvn clean -DskipTests -Dhadoop-two.version=2.7.5
> > > install && mvn clean -DskipTests -Dhadoop-two.version=2.7.5 package
> > > assembly:single   OK
> > > MD5: OK
> > > installed and ran from src: OK
> > >
> > > mvn test -P runSmallTests: NOK (this can be my own environment and I've
> > > surfaced this in votes for 1.4.1 and 1.4.2 but failing class is the
> same
> > > but failure is different.
> > >
> > >   [ESC[1;34mINFOESC[m] Results:
> > > [ESC[1;34mINFOESC[m]
> > > [ESC[1;31mERRORESC[m] ESC[1;31mFailures: ESC[m
> > > [ESC[1;31mERRORESC[m] ESC[1;31m
> > > TestSpnegoHttpServer.testAllowedClient:243->Assert.
> > > assertEquals:631->Assert.assertEquals:645->Assert.
> > > failNotEquals:834->Assert.fail:88
> > > expected:<200> but was:<401>ESC[m
> > > [ESC[1;34mINFOESC[m]
> > >
> > > On Tue, Mar 6, 2018 at 10:57 AM, Josh Elser  wrote:
> > >
> > > > +1 (binding)
> > > >
> > > > * src release OK
> > > > * xsums/sigs OK
> > > > * Can build and run from src OK
> > > > * Loaded some data locally
> > > >
> > > >
> > > > On 3/2/18 6:40 PM, Stack wrote:
> > > >
> > > >> The first release candidate for HBase 2.0.0-beta-2 is up at
> > > >>
> > > >>   https://dist.apache.org/repos/dist/dev/hbase/hbase-2.0.0-
> > beta-2.RC0/
> > > >>
> > > >> Maven artifacts are available from a staging directory here:
> > > >>
> > > >>https://repository.apache.org/content/repositories/
> > > orgapachehbase-1199
> > > >>
> > > >> All was signed with my key at 8ACC93D2 [1]
> > > >>
> > > >> I tagged the RC as 2.0.0-beta-2RC0.2 at
> > > >> 9e9b347d667e1fc6165c9f8ae5ae7052147e8895
> > > >>
> > > >> hbase-2.0.0-beta-2 is a not-for-production preview of hbase-2.0.0.
> It
> > is
> > > >> meant for devs and downstreamers to test drive and flag us if we
> > messed
> > > up
> > > >> on anything ahead of our rolling
> > > >> actual 2.0.0 release candidates ("GAs").
> > > >>
> > > >> hbase-2.0.0-beta-2 is our second beta release. More than 200 fixes
> > have
> > > >> gone in since
> > > >> beta-1. Unit tests generallly pass when run against hadoop2 and
> > > >> hadoop3[5].
> > > >> It includes
> > > >> all that was in previous alphas and beta (new assignment manager,
> > > offheap
> > > >> read/write
> > > >> path, in-memory compactions, etc).The list of features addressed in
> > > 2.0.0
> > > >> so far can be
> > > >> found here [3]. There are thousands. The list of ~3k+ fixes in 2.0.0
> > > >> exclusively can be
> > > >> found here [4]. Our overview doc. on the state of 2.0.0 is at [6].
> > > >>
> > > >> This beta was supposed to have as its focus rolling upgrade from
> > > hbase-1.x
> > > >> versions but
> > > >> this is work not complete (At this late stage, it is looking like it
> > > will
> > > >> be a post-2.0.0 project).
> > > >>
> > > >> This is our last hbase-2.0.0 beta release. Next up, we'll be rolling
> > an
> > > >> actual 2.0.0 release
> > > >> candidate. Look for this in a week or two after beta-2 goes out,
> after
> > > >> we've done more
> > > >> testing and documentation (and we fix issues raised by you all
> against

Re: [VOTE] The first hbase-2.0.0-beta-2 Release Candidate is available for download

2018-03-05 Thread Jean-Marc Spaggiari
Wierd. I recompiled with the new JARs and all, it now it works. Sorry for
the spam. Running some tests.

JMS

2018-03-05 20:48 GMT-05:00 Jean-Marc Spaggiari <jean-m...@spaggiari.org>:

> Just curious, has something changed in the LoadBalancer?
>
> My own personal balancer was working fine with 2.0.0-b1 but doesn't seems
> to work anymore with 2.0.0-b2:
> 2018-03-05 20:33:55,073 ERROR [master/node2:6] master.HMaster: Failed
> to become active master
> java.lang.NoSuchFieldError: clusterStatus
> at c.c.s.hbase.LoadBasedBalancer.initialize(LoadBasedBalancer.java:42)
> at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitializati
> on(HMaster.java:869)
> at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(
> HMaster.java:2020)
> at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:553)
> at java.lang.Thread.run(Thread.java:748)
>
> Has the visibility of the clusterStatus field from StochasticLoadBalancer
> changed between the 2 versions? I'm not good enough at GIT to be able to
> compare them :(
>
> JMS
>
>
> 2018-03-05 16:15 GMT-05:00 Umesh Agashe <uaga...@cloudera.com>:
>
>> Thanks Stack! Created HBASE-20135.
>>
>> On Mon, Mar 5, 2018 at 11:31 AM, Stack <st...@duboce.net> wrote:
>>
>> > Thanks  Umesh. File an issue though? Stuff Anoops comment in it. While
>> > 'harmless', it will scare users; we should fix it.
>> > Thanks for trying the RC,
>> > S
>> >
>> > On Sun, Mar 4, 2018 at 8:18 PM, Umesh Agashe <uaga...@cloudera.com>
>> wrote:
>> >
>> > > Yes, I started hbase2 over hbase1 when I got the exception. I also
>> tried
>> > > running TestReplicationAdmin a couple of times and can not reliably
>> > > reproduce the problem.
>> > >
>> > > I am changing my vote to +1.
>> > >
>> > > Thanks,
>> > > Umesh
>> > >
>> > >
>> > > On Sun, Mar 4, 2018 at 2:43 PM, Stack <st...@duboce.net> wrote:
>> > >
>> > > > Thank Umesh for giving it a run.
>> > > >
>> > > > Did you start the hbase2 over an hbase1 dataset. I've seen the
>> > exception
>> > > > you note when I've done this. The startup keeps going over this
>> > > exception,
>> > > > right? (IIRC, its a complaint reading a file written w/ hbase1... We
>> > fail
>> > > > to read in the bloom filter which is not the end-of-the-world).
>> > > >
>> > > > On the replication failures, you can reproduce reliably?
>> > > >
>> > > > Thanks,
>> > > > St.Ack
>> > > >
>> > > >
>> > > >
>> > > > On Sun, Mar 4, 2018 at 9:04 AM, Umesh Agashe <uaga...@cloudera.com>
>> > > wrote:
>> > > >
>> > > > > -1 non-binding (concerns noted below)
>> > > > >
>> > > > > download src & bin tar ball   - OK
>> > > > > signatures & sums
>> - OK
>> > > > > build from source (openjdk version "1.8.0_151")  - OK
>> > > > > rat check
>> > >  -
>> > > > > OK
>> > > > > unit tests
>> > > >  -
>> > > > > NOT OK
>> > > > > start local instance from bin & CRUD from shell  - OK
>> > > > > LTT write, read1 million rows, 2 cols/row  - OK
>> > > > > Upgrade from 1.4.2
>>  - OK
>> > > > > check logs
>> > >  -
>> > > > > NOT OK
>> > > > >
>> > > > > NOTE:
>> > > > > * Error message in the log is concerning.
>> > > > >
>> > > > > Found following exception logged multiple times (~11) in the log:
>> > > > > ERROR [StoreFileOpenerThread-test_cf-1]
>> > regionserver.StoreFileReader:
>> > > > > Error
>> > > > > reading bloom filter meta for GENERAL_BLOOM_META -- proceeding
>> > without
>> > > > > java.io.IOException: Comparator class
>> > > > > org.apache.hadoop.hbase.KeyValue$RawBytesComparator
>> > > > > is not instantiable
>> > > > > at org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.
>> > > createComp
>> > > > > arator(FixedFileTrailer.java:628)
>> > > > > at org.apache.hadoop.hbase.io.hfile.

Re: [VOTE] The first hbase-2.0.0-beta-2 Release Candidate is available for download

2018-03-05 Thread Jean-Marc Spaggiari
Just curious, has something changed in the LoadBalancer?

My own personal balancer was working fine with 2.0.0-b1 but doesn't seems
to work anymore with 2.0.0-b2:
2018-03-05 20:33:55,073 ERROR [master/node2:6] master.HMaster: Failed
to become active master
java.lang.NoSuchFieldError: clusterStatus
at c.c.s.hbase.LoadBasedBalancer.initialize(LoadBasedBalancer.java:42)
at
org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:869)
at
org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2020)
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:553)
at java.lang.Thread.run(Thread.java:748)

Has the visibility of the clusterStatus field from StochasticLoadBalancer
changed between the 2 versions? I'm not good enough at GIT to be able to
compare them :(

JMS


2018-03-05 16:15 GMT-05:00 Umesh Agashe :

> Thanks Stack! Created HBASE-20135.
>
> On Mon, Mar 5, 2018 at 11:31 AM, Stack  wrote:
>
> > Thanks  Umesh. File an issue though? Stuff Anoops comment in it. While
> > 'harmless', it will scare users; we should fix it.
> > Thanks for trying the RC,
> > S
> >
> > On Sun, Mar 4, 2018 at 8:18 PM, Umesh Agashe 
> wrote:
> >
> > > Yes, I started hbase2 over hbase1 when I got the exception. I also
> tried
> > > running TestReplicationAdmin a couple of times and can not reliably
> > > reproduce the problem.
> > >
> > > I am changing my vote to +1.
> > >
> > > Thanks,
> > > Umesh
> > >
> > >
> > > On Sun, Mar 4, 2018 at 2:43 PM, Stack  wrote:
> > >
> > > > Thank Umesh for giving it a run.
> > > >
> > > > Did you start the hbase2 over an hbase1 dataset. I've seen the
> > exception
> > > > you note when I've done this. The startup keeps going over this
> > > exception,
> > > > right? (IIRC, its a complaint reading a file written w/ hbase1... We
> > fail
> > > > to read in the bloom filter which is not the end-of-the-world).
> > > >
> > > > On the replication failures, you can reproduce reliably?
> > > >
> > > > Thanks,
> > > > St.Ack
> > > >
> > > >
> > > >
> > > > On Sun, Mar 4, 2018 at 9:04 AM, Umesh Agashe 
> > > wrote:
> > > >
> > > > > -1 non-binding (concerns noted below)
> > > > >
> > > > > download src & bin tar ball   - OK
> > > > > signatures & sums-
> OK
> > > > > build from source (openjdk version "1.8.0_151")  - OK
> > > > > rat check
> > >  -
> > > > > OK
> > > > > unit tests
> > > >  -
> > > > > NOT OK
> > > > > start local instance from bin & CRUD from shell  - OK
> > > > > LTT write, read1 million rows, 2 cols/row  - OK
> > > > > Upgrade from 1.4.2   -
> OK
> > > > > check logs
> > >  -
> > > > > NOT OK
> > > > >
> > > > > NOTE:
> > > > > * Error message in the log is concerning.
> > > > >
> > > > > Found following exception logged multiple times (~11) in the log:
> > > > > ERROR [StoreFileOpenerThread-test_cf-1]
> > regionserver.StoreFileReader:
> > > > > Error
> > > > > reading bloom filter meta for GENERAL_BLOOM_META -- proceeding
> > without
> > > > > java.io.IOException: Comparator class
> > > > > org.apache.hadoop.hbase.KeyValue$RawBytesComparator
> > > > > is not instantiable
> > > > > at org.apache.hadoop.hbase.io.hfile.FixedFileTrailer.
> > > createComp
> > > > > arator(FixedFileTrailer.java:628)
> > > > > at org.apache.hadoop.hbase.io.hfile.CompoundBloomFilter.<
> > > init>(
> > > > > CompoundBloomFilter.java:79)
> > > > > at org.apache.hadoop.hbase.util.BloomFilterFactory.
> > > createFromMe
> > > > > ta(BloomFilterFactory.java:104)
> > > > > at org.apache.hadoop.hbase.regionserver.StoreFileReader.
> > > loadBlo
> > > > > omfilter(StoreFileReader.java:479)
> > > > > at org.apache.hadoop.hbase.regionserver.HStoreFile.open(
> > > HStoreF
> > > > > ile.java:425)
> > > > > at org.apache.hadoop.hbase.regionserver.HStoreFile.
> > > initReader(H
> > > > > StoreFile.java:460)
> > > > > at org.apache.hadoop.hbase.regionserver.HStore.
> > > createStoreFileA
> > > > > ndReader(HStore.java:671)
> > > > > at org.apache.hadoop.hbase.regionserver.HStore.lambda$
> > > openStore
> > > > > Files$0(HStore.java:537)
> > > > > at java.util.concurrent.FutureTask.run(FutureTask.
> java:266)
> > > > > at java.util.concurrent.Executors$RunnableAdapter.
> > > call(Executor
> > > > > s.java:511)
> > > > > at java.util.concurrent.FutureTask.run(FutureTask.
> java:266)
> > > > > at java.util.concurrent.ThreadPoolExecutor.runWorker(
> > > ThreadPool
> > > > > Executor.java:1149)
> > > > > at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> > > ThreadPoo
> > > > > lExecutor.java:624)
> > > > > at java.lang.Thread.run(Thread.java:748)
> > > > > Caused by: java.lang.NullPointerException
> > > > >
> > > > >
> > > > > 

[jira] [Created] (HBASE-20105) Allow flushes to target SSD storage

2018-02-28 Thread Jean-Marc Spaggiari (JIRA)
Jean-Marc Spaggiari created HBASE-20105:
---

 Summary: Allow flushes to target SSD storage
 Key: HBASE-20105
 URL: https://issues.apache.org/jira/browse/HBASE-20105
 Project: HBase
  Issue Type: New Feature
Affects Versions: hbase-2.0.0-alpha-4
Reporter: Jean-Marc Spaggiari


On heavy writes usecases, flushes are compactes together pretty quickly. 
Allowing flushes to go on SSD allows faster flush and faster first compactions. 
Subsequent compactions going on regular storage.

 

I will be interesting to have an option to target SSD for flushes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20101) HBase should provide a way to re-validate locality

2018-02-27 Thread Jean-Marc Spaggiari (JIRA)
Jean-Marc Spaggiari created HBASE-20101:
---

 Summary: HBase should provide a way to re-validate locality
 Key: HBASE-20101
 URL: https://issues.apache.org/jira/browse/HBASE-20101
 Project: HBase
  Issue Type: New Feature
Reporter: Jean-Marc Spaggiari


HDFS blocks can move for many reasons. HDFS balancing, lost of a disk, of a 
node, etc. However, today, locality seems to be calculated when the files are 
opened for the first time. Even disabling and re-enabling the regions doesn't 
trigger a re-calculation of the locality. 

We should provide a way to let the user ask for this number to be re-calculated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-02-26 Thread Jean-Marc Spaggiari
Hum. "Broke" my cluster cluster again...

2018-02-26 13:54:44,053 WARN  [ProcExecWrkr-14]
assignment.RegionTransitionProcedure: Retryable error trying to transition:
pid=409, ppid=344, state=RUNNABLE:REGION_TRANSITION_DISPATCH;
UnassignProcedure table=page_crc, region=6d459de812e7ff0a3aff9a6285979a4c,
server=node3.distparser.com,16020,1519665621427; rit=OPENING, location=
node3.distparser.com,16020,1519665621427
org.apache.hadoop.hbase.exceptions.UnexpectedStateException: Expected
[SPLITTING, SPLIT, MERGING, OPEN, CLOSING] so could move to CLOSING but
current state=OPENING
at
org.apache.hadoop.hbase.master.assignment.RegionStates$RegionStateNode.transitionState(RegionStates.java:155)
at
org.apache.hadoop.hbase.master.assignment.AssignmentManager.markRegionAsClosing(AssignmentManager.java:1530)
at
org.apache.hadoop.hbase.master.assignment.UnassignProcedure.updateTransition(UnassignProcedure.java:179)
at
org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:309)
at
org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:85)
at
org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1456)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1225)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1735)

Is there an easy way to recover from that? Can I just drop procedure wal?
Or do I have to wipe the table again and transfer back from source? :-/

JMS

2018-01-11 18:16 GMT-05:00 Stack <st...@duboce.net>:

> Thanks JMS.
> S
>
> On Thu, Jan 11, 2018 at 9:36 AM, Jean-Marc Spaggiari <
> jean-m...@spaggiari.org> wrote:
>
> > Opened HBASE-19767 <https://issues.apache.org/jira/browse/HBASE-19767>
> > and HBASE-19768.
> > Regarding the issue to create the log writer, it fails even if the DN is
> > already declared dead on the NN side...
> >
> > 2018-01-10 21:37 GMT-05:00 Apekshit Sharma <a...@cloudera.com>:
> >
> > > On Wed, Jan 10, 2018 at 11:25 AM, Zach York <
> > zyork.contribut...@gmail.com>
> > > wrote:
> > >
> > > > What is the expectation for flaky tests? I was going to post some
> test
> > > > failures, but saw that they were included in the excludes for flaky
> > > tests.
> > > >
> > > > I understand we might be okay with having flaky tests for this beta-1
> > > (and
> > > > obviously for dev), but I would assume that we want consistent test
> > > results
> > > > for the official 2.0.0 release.
> > > >
> > >
> > > Yeah, that's the goal, but sadly not many hands on deck are working on
> > > that, so doesn't seem in reach.
> > >
> > >
> > > > Do we have JIRAs created for all the flaky tests so that we can start
> > > > fixing them before the beta-2/official RCs get put up?
> > > >
> > >
> > > Whenever i start working on one, i search for it in jira first in case
> > > someone's already working on it, if not I create a new one. (treating
> > jira
> > > as a lock to avoid redundant work).
> > > Creating just the jiras doesn't really help until someone takes them
> and
> > so
> > > most just remain open. But chicken and egg problem maybe? Might be good
> > to
> > > create jira for a few simple ones to see if peeps starting contributing
> > on
> > > this front?
> > >
> > >
> > > > I'd be happy to help try to track down the root causes of the
> flakiness
> > > and
> > > > try to fix these problematic tests.
> > > >
> > > Any help here would be great!
> > > Here's a personal thank you :
> > > http://calmcoolcollective.net/wp-content/uploads/2016/08/
> > chocolatechip.jpg
> > > :)
> > >
> > >
> > > >
> > > > Thanks,
> > > > Zach
> > > >
> > > > On Wed, Jan 10, 2018 at 9:37 AM, Stack <st...@duboce.net> wrote:
> > > >
> > > > > Put up a JIRA and dump this stuff in JMS. Sounds like we need a bit
> > > more
> > > > > test coverage at least. Thanks sir.
> > > > > St.Ack
> > > > >
> > > > > On Wed, Jan 10, 2018 at 2:52 AM, Jean-Marc Spaggiari <
> > > > > jean-m...@spaggiari.org> wrote:
>

Re: client.HBaseAdmin No serialized HRegionInfo

2018-02-23 Thread Jean-Marc Spaggiari
To be 100% honest, I really like the admin throwing an error! It triggers
the attention.

I did a distscp from CDH 5.12 to a brand new HBase 2.0 cluster. I'm not
100% sure if I have the same issue on the source side.

Just ran HBCK on the source and I have the same issue. So at least it's
consistent. Just HBaseAdmin that is triggering this new message. I like it
;)

JMS

2018-02-23 15:11 GMT-05:00 Stack <st...@duboce.net>:

> On Fri, Feb 23, 2018 at 11:04 AM, Jean-Marc Spaggiari <
> jean-m...@spaggiari.org> wrote:
>
> > Hi guys,
> >
> > In HBase 2.0.0-beta-1, getting this:
> >
> > 2018-02-23 14:01:55,499 WARN  [main] client.HBaseAdmin
> > (HBaseAdmin.java:visit(1948)) - No serialized HRegionInfo in
> > keyvalues={page_proposed,,1384198043055.c15b13946fa4318a0a956e067d59eb
> > d4./info:server/1384198854149/Put/vlen=11/seqid=0,
> > page_proposed,,1384198043055.c15b13946fa4318a0a956e067d59eb
> > d4./info:serverstartcode/1384198854149/Put/vlen=8/seqid=0}
> >
> >
> > I'm not getting that on other versions. And I'm not sure what it means...
> >
> > Any idea?
> >
> >
> The Admin client is getting detail from hbase:meta. It found a row where
> there was no serialized regioninfo and is WARNing you of this. This should
> never happen. It is dumping out all it did find in the meta table.
>
> Why no regioninfo in hbase:meta? This a fresh hbase2 or are you starting an
> hbase2 over an old hbase1 dataset? If latter, could it have been null
> before the update?
>
> Thanks,
> S
>
>
>
> > Thanks,
> >
> > JMS
> >
>


Re: client.HBaseAdmin No serialized HRegionInfo

2018-02-23 Thread Jean-Marc Spaggiari
Oh! It might explain ;)

Thanks Sir. Will fix that manually then...

JMS

Le 23 févr. 2018 15 h 04, "Stack" <st...@duboce.net> a écrit :

On Fri, Feb 23, 2018 at 11:10 AM, Ted Yu <yuzhih...@gmail.com> wrote:

> Can you check whether region c15b13946fa4318a0a956e067d59ebd4 is healthy
> (via hbck) ?
>
>

hbck does not work against hbase2 [1].
S

1.
https://docs.google.com/document/d/1WCsVlnHjJeKUcl7wHwqb4z9iEu_
ktczrlKHK8N4SZzs/edit#heading=h.3jy69yxm0gxm



> You may find some clue in master log.
>
> Cheers
>
>
>
> On Fri, Feb 23, 2018 at 11:04 AM, Jean-Marc Spaggiari <
> jean-m...@spaggiari.org> wrote:
>
> > Hi guys,
> >
> > In HBase 2.0.0-beta-1, getting this:
> >
> > 2018-02-23 14:01:55,499 WARN  [main] client.HBaseAdmin
> > (HBaseAdmin.java:visit(1948)) - No serialized HRegionInfo in
> > keyvalues={page_proposed,,1384198043055.c15b13946fa4318a0a956e067d59eb
> > d4./info:server/1384198854149/Put/vlen=11/seqid=0,
> > page_proposed,,1384198043055.c15b13946fa4318a0a956e067d59eb
> > d4./info:serverstartcode/1384198854149/Put/vlen=8/seqid=0}
> >
> >
> > I'm not getting that on other versions. And I'm not sure what it
means...
> >
> > Any idea?
> >
> > Thanks,
> >
> > JMS
> >
>


Re: client.HBaseAdmin No serialized HRegionInfo

2018-02-23 Thread Jean-Marc Spaggiari
Tried -fixMeta and -repair and nothing seems to make any difference...  Not
even sure it's taken into consideration... Strange...

2018-02-23 14:37 GMT-05:00 Jean-Marc Spaggiari <jean-m...@spaggiari.org>:

> BTW, the "Empty REGIONINFO_QUALIFIER found in hbase:meta" should tell
> more details about the "issue"...
>
> 2018-02-23 14:29 GMT-05:00 Jean-Marc Spaggiari <jean-m...@spaggiari.org>:
>
>> 1 inconsistencies detected.
>> Status: INCONSISTENT
>>
>> But this table is fine:
>> Table page_proposed is okay.
>> Number of regions: 16
>> Deployed on:  node1.distparser.com,16020,1519412120531
>> node3.distparser.com,16020,1519412121079 
>> node4.distparser.com,16020,1519412125106
>> node5.distparser.com,16020,1519412125539 
>> node6.distparser.com,16020,1519412120498
>> node7.distparser.com,16020,1519412124131 node8.distparser.com,16020,151
>> 9412120794
>>
>> The inconsistency is due to this:
>> 2018-02-23 14:25:18,393 INFO  [main] util.HBaseFsck: Loading regionsinfo
>> from the hbase:meta table
>> ERROR: Empty REGIONINFO_QUALIFIER found in hbase:meta
>>
>> Number of empty REGIONINFO_QUALIFIER rows in hbase:meta: 1
>>
>> What is interesting is this:
>> hbase@node2:~/hbase-2.0.0-beta-1$ hdfs dfs -ls
>> /hbase/data/default/page_proposed/c15b13946fa4318a0a956e067d59ebd4
>> ls: `/hbase/data/default/page_proposed/c15b13946fa4318a0a956e067d59ebd4':
>> No such file or directory
>>
>> So sounds like this is referenced into the META but not into HDFS. Should
>> not HBCK report that instead of saying the table is Okay?
>>
>> JMS
>>
>>
>> 2018-02-23 14:10 GMT-05:00 Ted Yu <yuzhih...@gmail.com>:
>>
>>> Can you check whether region c15b13946fa4318a0a956e067d59ebd4 is healthy
>>> (via hbck) ?
>>>
>>> You may find some clue in master log.
>>>
>>> Cheers
>>>
>>>
>>>
>>> On Fri, Feb 23, 2018 at 11:04 AM, Jean-Marc Spaggiari <
>>> jean-m...@spaggiari.org> wrote:
>>>
>>> > Hi guys,
>>> >
>>> > In HBase 2.0.0-beta-1, getting this:
>>> >
>>> > 2018-02-23 14:01:55,499 WARN  [main] client.HBaseAdmin
>>> > (HBaseAdmin.java:visit(1948)) - No serialized HRegionInfo in
>>> > keyvalues={page_proposed,,1384198043055.c15b13946fa4318a0a956e067d59eb
>>> > d4./info:server/1384198854149/Put/vlen=11/seqid=0,
>>> > page_proposed,,1384198043055.c15b13946fa4318a0a956e067d59eb
>>> > d4./info:serverstartcode/1384198854149/Put/vlen=8/seqid=0}
>>> >
>>> >
>>> > I'm not getting that on other versions. And I'm not sure what it
>>> means...
>>> >
>>> > Any idea?
>>> >
>>> > Thanks,
>>> >
>>> > JMS
>>> >
>>>
>>
>>
>


Re: client.HBaseAdmin No serialized HRegionInfo

2018-02-23 Thread Jean-Marc Spaggiari
BTW, the "Empty REGIONINFO_QUALIFIER found in hbase:meta" should tell more
details about the "issue"...

2018-02-23 14:29 GMT-05:00 Jean-Marc Spaggiari <jean-m...@spaggiari.org>:

> 1 inconsistencies detected.
> Status: INCONSISTENT
>
> But this table is fine:
> Table page_proposed is okay.
> Number of regions: 16
> Deployed on:  node1.distparser.com,16020,1519412120531
> node3.distparser.com,16020,1519412121079 
> node4.distparser.com,16020,1519412125106
> node5.distparser.com,16020,1519412125539 
> node6.distparser.com,16020,1519412120498
> node7.distparser.com,16020,1519412124131 node8.distparser.com,16020,
> 1519412120794
>
> The inconsistency is due to this:
> 2018-02-23 14:25:18,393 INFO  [main] util.HBaseFsck: Loading regionsinfo
> from the hbase:meta table
> ERROR: Empty REGIONINFO_QUALIFIER found in hbase:meta
>
> Number of empty REGIONINFO_QUALIFIER rows in hbase:meta: 1
>
> What is interesting is this:
> hbase@node2:~/hbase-2.0.0-beta-1$ hdfs dfs -ls /hbase/data/default/page_
> proposed/c15b13946fa4318a0a956e067d59ebd4
> ls: `/hbase/data/default/page_proposed/c15b13946fa4318a0a956e067d59ebd4':
> No such file or directory
>
> So sounds like this is referenced into the META but not into HDFS. Should
> not HBCK report that instead of saying the table is Okay?
>
> JMS
>
>
> 2018-02-23 14:10 GMT-05:00 Ted Yu <yuzhih...@gmail.com>:
>
>> Can you check whether region c15b13946fa4318a0a956e067d59ebd4 is healthy
>> (via hbck) ?
>>
>> You may find some clue in master log.
>>
>> Cheers
>>
>>
>>
>> On Fri, Feb 23, 2018 at 11:04 AM, Jean-Marc Spaggiari <
>> jean-m...@spaggiari.org> wrote:
>>
>> > Hi guys,
>> >
>> > In HBase 2.0.0-beta-1, getting this:
>> >
>> > 2018-02-23 14:01:55,499 WARN  [main] client.HBaseAdmin
>> > (HBaseAdmin.java:visit(1948)) - No serialized HRegionInfo in
>> > keyvalues={page_proposed,,1384198043055.c15b13946fa4318a0a956e067d59eb
>> > d4./info:server/1384198854149/Put/vlen=11/seqid=0,
>> > page_proposed,,1384198043055.c15b13946fa4318a0a956e067d59eb
>> > d4./info:serverstartcode/1384198854149/Put/vlen=8/seqid=0}
>> >
>> >
>> > I'm not getting that on other versions. And I'm not sure what it
>> means...
>> >
>> > Any idea?
>> >
>> > Thanks,
>> >
>> > JMS
>> >
>>
>
>


Re: client.HBaseAdmin No serialized HRegionInfo

2018-02-23 Thread Jean-Marc Spaggiari
1 inconsistencies detected.
Status: INCONSISTENT

But this table is fine:
Table page_proposed is okay.
Number of regions: 16
Deployed on:  node1.distparser.com,16020,1519412120531
node3.distparser.com,16020,1519412121079
node4.distparser.com,16020,1519412125106
node5.distparser.com,16020,1519412125539
node6.distparser.com,16020,1519412120498
node7.distparser.com,16020,1519412124131 node8.distparser.com
,16020,1519412120794

The inconsistency is due to this:
2018-02-23 14:25:18,393 INFO  [main] util.HBaseFsck: Loading regionsinfo
from the hbase:meta table
ERROR: Empty REGIONINFO_QUALIFIER found in hbase:meta

Number of empty REGIONINFO_QUALIFIER rows in hbase:meta: 1

What is interesting is this:
hbase@node2:~/hbase-2.0.0-beta-1$ hdfs dfs -ls
/hbase/data/default/page_proposed/c15b13946fa4318a0a956e067d59ebd4
ls: `/hbase/data/default/page_proposed/c15b13946fa4318a0a956e067d59ebd4':
No such file or directory

So sounds like this is referenced into the META but not into HDFS. Should
not HBCK report that instead of saying the table is Okay?

JMS


2018-02-23 14:10 GMT-05:00 Ted Yu <yuzhih...@gmail.com>:

> Can you check whether region c15b13946fa4318a0a956e067d59ebd4 is healthy
> (via hbck) ?
>
> You may find some clue in master log.
>
> Cheers
>
>
>
> On Fri, Feb 23, 2018 at 11:04 AM, Jean-Marc Spaggiari <
> jean-m...@spaggiari.org> wrote:
>
> > Hi guys,
> >
> > In HBase 2.0.0-beta-1, getting this:
> >
> > 2018-02-23 14:01:55,499 WARN  [main] client.HBaseAdmin
> > (HBaseAdmin.java:visit(1948)) - No serialized HRegionInfo in
> > keyvalues={page_proposed,,1384198043055.c15b13946fa4318a0a956e067d59eb
> > d4./info:server/1384198854149/Put/vlen=11/seqid=0,
> > page_proposed,,1384198043055.c15b13946fa4318a0a956e067d59eb
> > d4./info:serverstartcode/1384198854149/Put/vlen=8/seqid=0}
> >
> >
> > I'm not getting that on other versions. And I'm not sure what it means...
> >
> > Any idea?
> >
> > Thanks,
> >
> > JMS
> >
>


client.HBaseAdmin No serialized HRegionInfo

2018-02-23 Thread Jean-Marc Spaggiari
Hi guys,

In HBase 2.0.0-beta-1, getting this:

2018-02-23 14:01:55,499 WARN  [main] client.HBaseAdmin
(HBaseAdmin.java:visit(1948)) - No serialized HRegionInfo in
keyvalues={page_proposed,,1384198043055.c15b13946fa4318a0a956e067d59ebd4./info:server/1384198854149/Put/vlen=11/seqid=0,
page_proposed,,1384198043055.c15b13946fa4318a0a956e067d59ebd4./info:serverstartcode/1384198854149/Put/vlen=8/seqid=0}


I'm not getting that on other versions. And I'm not sure what it means...

Any idea?

Thanks,

JMS


[jira] [Created] (HBASE-20045) When running compaction, cache recent blocks.

2018-02-21 Thread Jean-Marc Spaggiari (JIRA)
Jean-Marc Spaggiari created HBASE-20045:
---

 Summary: When running compaction, cache recent blocks.
 Key: HBASE-20045
 URL: https://issues.apache.org/jira/browse/HBASE-20045
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0-beta-1
Reporter: Jean-Marc Spaggiari


HBase already allows to cache blocks on flush. This is very useful for usecases 
where most queries are against recent data. However, as soon as their is a 
compaction, those blocks are evicted. It will be interesting to have a table 
level parameter to say "When compacting, cache blocks less than 24 hours old". 
That way, when running compaction, all blocks where some data are less than 24h 
hold, will be automatically cached. 

 

Very useful for table design where there is TS in the key but a long history 
(Like a year of sensor data).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: HBaseCon Plans?

2018-02-08 Thread Jean-Marc Spaggiari
So who's jumping in for NY or SF? ;)

2018-02-08 4:46 GMT-05:00 Bijieshan :

> Huawei can continue to hold HBaseCon Asia 2018 :-)
>
> Best Regards,
> Jieshan.
> -Original Message-
> From: saint@gmail.com [mailto:saint@gmail.com] On Behalf Of Stack
> Sent: 2018年2月8日 0:37
> To: HBase Dev List 
> Subject: Re: HBaseCon Plans?
>
> On Fri, Feb 2, 2018 at 9:13 PM, Mike Drob  wrote:
>
> > Hi folks, has there been any consideration put forth toward the next
> > HBaseCon? The last one was very productive for me personally, but I
> > hadn't heard anything about the schedule for 2018 so figured I could ask
> on list.
> >
> > Mike
> >
>
> Is been kinda quiet this year in terms of hbasecon2018. We, the community,
> have been running the last bunch hosted by a generous, main sponsor (Huawei
> in Shenzhen and Google on east and west coast). If there was the interest,
> we could go beat the bushes to turn up a venue and a date. Wouldn't have to
> be a grand affair.
>
> Thanks,
> St.Ack
>


Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-11 Thread Jean-Marc Spaggiari
Opened HBASE-19767 <https://issues.apache.org/jira/browse/HBASE-19767>
and HBASE-19768.
Regarding the issue to create the log writer, it fails even if the DN is
already declared dead on the NN side...

2018-01-10 21:37 GMT-05:00 Apekshit Sharma <a...@cloudera.com>:

> On Wed, Jan 10, 2018 at 11:25 AM, Zach York <zyork.contribut...@gmail.com>
> wrote:
>
> > What is the expectation for flaky tests? I was going to post some test
> > failures, but saw that they were included in the excludes for flaky
> tests.
> >
> > I understand we might be okay with having flaky tests for this beta-1
> (and
> > obviously for dev), but I would assume that we want consistent test
> results
> > for the official 2.0.0 release.
> >
>
> Yeah, that's the goal, but sadly not many hands on deck are working on
> that, so doesn't seem in reach.
>
>
> > Do we have JIRAs created for all the flaky tests so that we can start
> > fixing them before the beta-2/official RCs get put up?
> >
>
> Whenever i start working on one, i search for it in jira first in case
> someone's already working on it, if not I create a new one. (treating jira
> as a lock to avoid redundant work).
> Creating just the jiras doesn't really help until someone takes them and so
> most just remain open. But chicken and egg problem maybe? Might be good to
> create jira for a few simple ones to see if peeps starting contributing on
> this front?
>
>
> > I'd be happy to help try to track down the root causes of the flakiness
> and
> > try to fix these problematic tests.
> >
> Any help here would be great!
> Here's a personal thank you :
> http://calmcoolcollective.net/wp-content/uploads/2016/08/chocolatechip.jpg
> :)
>
>
> >
> > Thanks,
> > Zach
> >
> > On Wed, Jan 10, 2018 at 9:37 AM, Stack <st...@duboce.net> wrote:
> >
> > > Put up a JIRA and dump this stuff in JMS. Sounds like we need a bit
> more
> > > test coverage at least. Thanks sir.
> > > St.Ack
> > >
> > > On Wed, Jan 10, 2018 at 2:52 AM, Jean-Marc Spaggiari <
> > > jean-m...@spaggiari.org> wrote:
> > >
> > > > The DN was dead since December 31st... I really hope the DN figured
> > that
> > > > :-/
> > > >
> > > > I will retry with making sure that the NN is aware the local DN is
> > dead,
> > > > and see. I let you know.
> > > >
> > > > Thanks,
> > > >
> > > > JMS
> > > >
> > > > 2018-01-10 5:50 GMT-05:00 张铎(Duo Zhang) <palomino...@gmail.com>:
> > > >
> > > > > The problem maybe that the DN is dead, but NN does not know and
> keep
> > > > > telling RS that you should try to connect to it. And for the new
> > > > > AsyncFSWAL, we need to connect to all the 3 DNs successfully
> before
> > > > > writing actual data to it, so the RS sucks...
> > > > >
> > > > > This maybe a problem.
> > > > >
> > > > > 2018-01-10 18:40 GMT+08:00 Jean-Marc Spaggiari <
> > > jean-m...@spaggiari.org
> > > > >:
> > > > >
> > > > > > You're correct. It was dead. I thought HBase will be able to
> > survive
> > > > > that.
> > > > > > Same the DN dies after the RS has started, RS will fail closing
> > > nicely
> > > > :(
> > > > > >
> > > > > > 2018-01-10 5:38 GMT-05:00 张铎(Duo Zhang) <palomino...@gmail.com>:
> > > > > >
> > > > > > > Connection refuse? Have you checked the status of the datanode
> on
> > > > > node8?
> > > > > > >
> > > > > > > 2018-01-10 18:31 GMT+08:00 Jean-Marc Spaggiari <
> > > > > jean-m...@spaggiari.org
> > > > > > >:
> > > > > > >
> > > > > > > > I know, this one sunk, but still running it on my cluster, so
> > > here
> > > > > is a
> > > > > > > new
> > > > > > > > issue I just got
> > > > > > > >
> > > > > > > > Any idea what this can be? I see this only a one of my
> nodes...
> > > > > > > >
> > > > > > > > 2018-01-10 05:22:55,786 WARN  [regionserver/node8.com/192.
> > > > > > 168.23.2:16020
> > > > > > > ]
> > > > > > > > wal.AsyncFSWAL: create wal log writer hdf

[jira] [Created] (HBASE-19768) RegionServer startup failing when DN is dead

2018-01-11 Thread Jean-Marc Spaggiari (JIRA)
Jean-Marc Spaggiari created HBASE-19768:
---

 Summary: RegionServer startup failing when DN is dead
 Key: HBASE-19768
 URL: https://issues.apache.org/jira/browse/HBASE-19768
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Marc Spaggiari


When starting HBase, if the datanode hosted on the same host is dead but not 
yet detected by the namenode, HBase will fail to start

{code}
515691223393/node8.distparser.com%2C16020%2C1515691223393.1515691238778 failed, 
retry = 7
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
 syscall:getsockopt(..) failed: Connexion refusée: /192.168.23.2:50010
at 
org.apache.hbase.thirdparty.io.netty.channel.unix.Socket.finishConnect(..)(Unknown
 Source)
Caused by: 
org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeConnectException:
 syscall:getsockopt(..) failed: Connexion refusée
... 1 more
{code}

and will also get stuck to stop:
{code}
hbase@node2:~/hbase-2.0.0-beta-1$ bin/stop-hbase.sh 
stopping 
hbase^C
hbase@node2:~/hbase-2.0.0-beta-1$ bin/stop-hbase.sh 
stopping 
hbase..
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/home/hbase/hbase-2.0.0-beta-1/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/home/hbase/hbase-2.0.0-beta-1/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
{code}

The most interesting is that it seems to fail the same way even if the DN is 
declared dead on HDFS side:

{code}
515692041367/node8.distparser.com%2C16020%2C1515692041367.1515692057716 failed, 
retry = 4
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
 syscall:getsockopt(..) failed: Connexion refusée: /192.168.23.2:50010
at 
org.apache.hbase.thirdparty.io.netty.channel.unix.Socket.finishConnect(..)(Unknown
 Source)
Caused by: 
org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeConnectException:
 syscall:getsockopt(..) failed: Connexion refusée
... 1 more
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-19767) Master web UI shows negative values for Remaining KVs

2018-01-11 Thread Jean-Marc Spaggiari (JIRA)
Jean-Marc Spaggiari created HBASE-19767:
---

 Summary: Master web UI shows negative values for Remaining KVs
 Key: HBASE-19767
 URL: https://issues.apache.org/jira/browse/HBASE-19767
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0-alpha-4
Reporter: Jean-Marc Spaggiari


In the Master Web UI, under the compaction tab, the Remaining KVs sometimes 
shows negative values.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: [VOTE] The second hbase-2.0.0-beta-1 Release Candidate is available for download

2018-01-10 Thread Jean-Marc Spaggiari
"Remaining KVs" field in master WebUI shows negative values... Is that
tracked anywhere?

2018-01-10 5:12 GMT-05:00 Jean-Marc Spaggiari <jean-m...@spaggiari.org>:

> Oh, you're right! I missed it! Ok. Thanks. Will wait for the next one.
>
> JM
>
> 2018-01-10 5:00 GMT-05:00 Yu Li <car...@gmail.com>:
>
>> I believe this one is already sunk according to stack's reply:
>>
>> -- Forwarded message --
>> From: Stack <st...@duboce.net>
>> Date: 10 January 2018 at 02:09
>> Subject: Re: [VOTE] The second hbase-2.0.0-beta-1 Release Candidate is
>> available for download
>> To: HBase Dev List <dev@hbase.apache.org>
>>
>>
>> Thanks Andrew.
>>
>> Let me work on this.
>>
>> RC is sunk. Will put up a new one after I address the above failure.
>>
>> Best Regards,
>> Yu
>>
>> On 10 January 2018 at 17:54, Jean-Marc Spaggiari <jean-m...@spaggiari.org
>> >
>> wrote:
>>
>> > Are we going to sink this one because of Andrew's -1?
>> >
>> > If so I will wait for the next one to test it...
>> >
>> > Le 10 janv. 2018 03 h 24, "Balazs Meszaros" <
>> balazs.mesza...@cloudera.com>
>> > a écrit :
>> >
>> > > +1
>> > >
>> > > - signatures, checksums OK,
>> > > - unit tests passes (8u112),
>> > > - shell works,
>> > > - load test tool also successed.
>> > >
>> > > Best regards,
>> > > Balazs
>> > >
>> > > On Tue, Jan 9, 2018 at 10:38 PM, Andrew Purtell <apurt...@apache.org>
>> > > wrote:
>> > >
>> > > > Linux here.
>> > > >
>> > > >
>> > > > On Tue, Jan 9, 2018 at 1:13 PM, Stack <st...@duboce.net> wrote:
>> > > >
>> > > > > Andrew:
>> > > > >
>> > > > > I don't see this:
>> > > > >
>> > > > > [INFO] ---
>> > > > > [INFO]  T E S T S
>> > > > > [INFO] ---
>> > > > > [INFO] Running
>> > > > > org.apache.hadoop.hbase.regionserver.TestMemstoreLABWithoutPool
>> > > > > [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time
>> > elapsed:
>> > > > > 4.409 s - in org.apache.hadoop.hbase.regionserver.
>> > > > > TestMemstoreLABWithoutPool
>> > > > > [INFO]
>> > > > > [INFO] Results:
>> > > > > [INFO]
>> > > > > [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0
>> > > > >
>> > > > > Linux and Mac.
>> > > > >
>> > > > > What you reckon sir?
>> > > > >
>> > > > > St.Ack
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Tue, Jan 9, 2018 at 10:03 AM, Andrew Purtell <
>> apurt...@apache.org
>> > >
>> > > > > wrote:
>> > > > >
>> > > > > > -1
>> > > > > >
>> > > > > > Checked sums and signatures: ok
>> > > > > > RAT check passes: ok (8u144)
>> > > > > > Built from source: ok (8u144)
>> > > > > > Unit tests pass: failed (8u144), TestMemstoreLABWithoutPool
>> always
>> > > > fails
>> > > > > >
>> > > > > >
>> > > > > > TestMemstoreLABWithoutPool.org.apache.hadoop.hbase.regionser
>> ver.
>> > > > > > TestMemstoreLABWithoutPool
>> > > > > > can fail with OOME if not run in isolation.
>> > > > > >
>> > > > > > If run in isolation, I get
>> > > > > >
>> > > > > > org.apache.hadoop.hbase.regionserver.TestMemstoreLABWithoutP
>> ool.
>> > > > > > testLABChunkQueueWithMultipleMSLABs(org.apache.hadoop.hbase.
>> > > > > regionserver.
>> > > > > > TestMemstoreLABWithoutPool)
>> > > > > > [ERROR]   Run 1:
>> > > > > > TestMemstoreLABWithoutPool.testLABChunkQueueWithMultipleMSLA
>> Bs:143
>> > > All
>> > > > > the
>> > > > > > chunks

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-10 Thread Jean-Marc Spaggiari
The DN was dead since December 31st... I really hope the DN figured that :-/

I will retry with making sure that the NN is aware the local DN is dead,
and see. I let you know.

Thanks,

JMS

2018-01-10 5:50 GMT-05:00 张铎(Duo Zhang) <palomino...@gmail.com>:

> The problem maybe that the DN is dead, but NN does not know and keep
> telling RS that you should try to connect to it. And for the new
> AsyncFSWAL, we need to connect to all the 3 DNs successfully  before
> writing actual data to it, so the RS sucks...
>
> This maybe a problem.
>
> 2018-01-10 18:40 GMT+08:00 Jean-Marc Spaggiari <jean-m...@spaggiari.org>:
>
> > You're correct. It was dead. I thought HBase will be able to survive
> that.
> > Same the DN dies after the RS has started, RS will fail closing nicely :(
> >
> > 2018-01-10 5:38 GMT-05:00 张铎(Duo Zhang) <palomino...@gmail.com>:
> >
> > > Connection refuse? Have you checked the status of the datanode on
> node8?
> > >
> > > 2018-01-10 18:31 GMT+08:00 Jean-Marc Spaggiari <
> jean-m...@spaggiari.org
> > >:
> > >
> > > > I know, this one sunk, but still running it on my cluster, so here
> is a
> > > new
> > > > issue I just got
> > > >
> > > > Any idea what this can be? I see this only a one of my nodes...
> > > >
> > > > 2018-01-10 05:22:55,786 WARN  [regionserver/node8.com/192.
> > 168.23.2:16020
> > > ]
> > > > wal.AsyncFSWAL: create wal log writer hdfs://
> > > > node2.com:8020/hbase/WALs/node8.com,16020,1515579724994/
> > > node8.com%2C16020%
> > > > 2C1515579724994.1515579743134
> > > > failed, retry = 6
> > > > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$
> > > > AnnotatedConnectException:
> > > > syscall:getsockopt(..) failed: Connexion refusée: /
> 192.168.23.2:50010
> > > > at
> > > > org.apache.hbase.thirdparty.io.netty.channel.unix.Socket.
> > > > finishConnect(..)(Unknown
> > > > Source)
> > > > Caused by:
> > > > org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$
> > > > NativeConnectException:
> > > > syscall:getsockopt(..) failed: Connexion refusée
> > > > ... 1 more
> > > >
> > > >
> > > > From the same node, if I ls while the RS is starting, I can see the
> > > related
> > > > directoy:
> > > >
> > > >
> > > > hbase@node8:~/hbase-2.0.0-beta-1/logs$
> /home/hadoop/hadoop-2.7.5/bin/
> > > hdfs
> > > > dfs -ls /hbase/WALs/
> > > > Found 35 items
> > > > ...
> > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> > /hbase/WALs/
> > > > node1.com,16020,1515579724884
> > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> > /hbase/WALs/
> > > > node3.com,16020,1515579738916
> > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> > /hbase/WALs/
> > > > node4.com,16020,1515579717193
> > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> > /hbase/WALs/
> > > > node5.com,16020,1515579724586
> > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> > /hbase/WALs/
> > > > node6.com,16020,1515579724999
> > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> > /hbase/WALs/
> > > > node7.com,16020,1515579725681
> > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:23
> > /hbase/WALs/
> > > > node8.com,16020,1515579724994
> > > >
> > > >
> > > >
> > > > and after the RS tries many times and fails the directory is gone:
> > > > hbase@node8:~/hbase-2.0.0-beta-1/logs$
> /home/hadoop/hadoop-2.7.5/bin/
> > > hdfs
> > > > dfs -ls /hbase/WALs/
> > > > Found 34 items
> > > > ...
> > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> > /hbase/WALs/
> > > > node1.com,16020,1515579724884
> > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> > /hbase/WALs/
> > > > node3.com,16020,1515579738916
> > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> > /hbase/WALs/
> > > > node4.com,16020,1515579717193
> > > > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22
> > /hbase/WALs/
> > > > node5.com,16020,1515579724586
> > > >

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-10 Thread Jean-Marc Spaggiari
You're correct. It was dead. I thought HBase will be able to survive that.
Same the DN dies after the RS has started, RS will fail closing nicely :(

2018-01-10 5:38 GMT-05:00 张铎(Duo Zhang) <palomino...@gmail.com>:

> Connection refuse? Have you checked the status of the datanode on node8?
>
> 2018-01-10 18:31 GMT+08:00 Jean-Marc Spaggiari <jean-m...@spaggiari.org>:
>
> > I know, this one sunk, but still running it on my cluster, so here is a
> new
> > issue I just got
> >
> > Any idea what this can be? I see this only a one of my nodes...
> >
> > 2018-01-10 05:22:55,786 WARN  [regionserver/node8.com/192.168.23.2:16020
> ]
> > wal.AsyncFSWAL: create wal log writer hdfs://
> > node2.com:8020/hbase/WALs/node8.com,16020,1515579724994/
> node8.com%2C16020%
> > 2C1515579724994.1515579743134
> > failed, retry = 6
> > org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$
> > AnnotatedConnectException:
> > syscall:getsockopt(..) failed: Connexion refusée: /192.168.23.2:50010
> > at
> > org.apache.hbase.thirdparty.io.netty.channel.unix.Socket.
> > finishConnect(..)(Unknown
> > Source)
> > Caused by:
> > org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$
> > NativeConnectException:
> > syscall:getsockopt(..) failed: Connexion refusée
> > ... 1 more
> >
> >
> > From the same node, if I ls while the RS is starting, I can see the
> related
> > directoy:
> >
> >
> > hbase@node8:~/hbase-2.0.0-beta-1/logs$ /home/hadoop/hadoop-2.7.5/bin/
> hdfs
> > dfs -ls /hbase/WALs/
> > Found 35 items
> > ...
> > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> > node1.com,16020,1515579724884
> > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> > node3.com,16020,1515579738916
> > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> > node4.com,16020,1515579717193
> > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> > node5.com,16020,1515579724586
> > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> > node6.com,16020,1515579724999
> > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> > node7.com,16020,1515579725681
> > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:23 /hbase/WALs/
> > node8.com,16020,1515579724994
> >
> >
> >
> > and after the RS tries many times and fails the directory is gone:
> > hbase@node8:~/hbase-2.0.0-beta-1/logs$ /home/hadoop/hadoop-2.7.5/bin/
> hdfs
> > dfs -ls /hbase/WALs/
> > Found 34 items
> > ...
> > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> > node1.com,16020,1515579724884
> > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> > node3.com,16020,1515579738916
> > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> > node4.com,16020,1515579717193
> > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> > node5.com,16020,1515579724586
> > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> > node6.com,16020,1515579724999
> > drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> > node7.com,16020,1515579725681
> >
> >
> >
> >
> > 2018-01-10 05:23:46,177 ERROR [regionserver/node8.com/192.168.23.2:16020
> ]
> > regionserver.HRegionServer: * ABORTING region server
> > node8.com,16020,1515579724994:
> > Unhandled: Failed to create wal log writer hdfs://
> > node2.com:8020/hbase/WALs/node8.com,16020,1515579724994/
> node8.com%2C16020%
> > 2C1515579724994.1515579743134
> > after retrying 10 time(s) *
> > java.io.IOException: Failed to create wal log writer hdfs://
> > node2.com:8020/hbase/WALs/node8.com,16020,1515579724994/
> node8.com%2C16020%
> > 2C1515579724994.1515579743134
> > after retrying 10 time(s)
> > at
> > org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.
> createWriterInstance(
> > AsyncFSWAL.java:663)
> > at
> > org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.
> createWriterInstance(
> > AsyncFSWAL.java:130)
> > at
> > org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(
> > AbstractFSWAL.java:766)
> > at
> > org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(
> > AbstractFSWAL.java:504)
> > at
> > org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.<
> > init>(AsyncFSWAL.java:264)
> > at
> > org.

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-10 Thread Jean-Marc Spaggiari
Oh, interesting! If the local DN is dead, HBase can not start... I will
have expected it to just used HDFS and any other node... That's why my
HBase was not able to start. Same, if the DN dies, HBase will not be able
to stop. Should we not be able to survive one DN failure?

JM

2018-01-10 5:31 GMT-05:00 Jean-Marc Spaggiari <jean-m...@spaggiari.org>:

> I know, this one sunk, but still running it on my cluster, so here is a
> new issue I just got
>
> Any idea what this can be? I see this only a one of my nodes...
>
> 2018-01-10 05:22:55,786 WARN  [regionserver/node8.com/192.168.23.2:16020]
> wal.AsyncFSWAL: create wal log writer hdfs://node2.com:8020/hbase/
> WALs/node8.com,16020,1515579724994/node8.com%2C16020%2C1515579724994.
> 1515579743134 failed, retry = 6
> org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
> syscall:getsockopt(..) failed: Connexion refusée: /192.168.23.2:50010
> at 
> org.apache.hbase.thirdparty.io.netty.channel.unix.Socket.finishConnect(..)(Unknown
> Source)
> Caused by: 
> org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeConnectException:
> syscall:getsockopt(..) failed: Connexion refusée
> ... 1 more
>
>
> From the same node, if I ls while the RS is starting, I can see the
> related directoy:
>
>
> hbase@node8:~/hbase-2.0.0-beta-1/logs$ /home/hadoop/hadoop-2.7.5/bin/hdfs
> dfs -ls /hbase/WALs/
> Found 35 items
> ...
> drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> node1.com,16020,1515579724884
> drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> node3.com,16020,1515579738916
> drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> node4.com,16020,1515579717193
> drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> node5.com,16020,1515579724586
> drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> node6.com,16020,1515579724999
> drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> node7.com,16020,1515579725681
> drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:23 /hbase/WALs/
> node8.com,16020,1515579724994
>
>
>
> and after the RS tries many times and fails the directory is gone:
> hbase@node8:~/hbase-2.0.0-beta-1/logs$ /home/hadoop/hadoop-2.7.5/bin/hdfs
> dfs -ls /hbase/WALs/
> Found 34 items
> ...
> drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> node1.com,16020,1515579724884
> drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> node3.com,16020,1515579738916
> drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> node4.com,16020,1515579717193
> drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> node5.com,16020,1515579724586
> drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> node6.com,16020,1515579724999
> drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:22 /hbase/WALs/
> node7.com,16020,1515579725681
>
>
>
>
> 2018-01-10 05:23:46,177 ERROR [regionserver/node8.com/192.168.23.2:16020]
> regionserver.HRegionServer: * ABORTING region server 
> node8.com,16020,1515579724994:
> Unhandled: Failed to create wal log writer hdfs://node2.com:8020/hbase/
> WALs/node8.com,16020,1515579724994/node8.com%2C16020%2C1515579724994.
> 1515579743134 after retrying 10 time(s) *
> java.io.IOException: Failed to create wal log writer hdfs://
> node2.com:8020/hbase/WALs/node8.com,16020,1515579724994/node8.com%
> 2C16020%2C1515579724994.1515579743134 after retrying 10 time(s)
> at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.
> createWriterInstance(AsyncFSWAL.java:663)
> at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.
> createWriterInstance(AsyncFSWAL.java:130)
> at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(
> AbstractFSWAL.java:766)
> at org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.rollWriter(
> AbstractFSWAL.java:504)
> at org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.<
> init>(AsyncFSWAL.java:264)
> at org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createWAL(
> AsyncFSWALProvider.java:69)
> at org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createWAL(
> AsyncFSWALProvider.java:44)
> at org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(
> AbstractFSWALProvider.java:139)
> at org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(
> AbstractFSWALProvider.java:55)
> at org.apache.hadoop.hbase.wal.WALFactory.getWAL(WALFactory.java:244)
> at org.apache.hadoop.hbase.regionserver.HRegionServer.
> getWAL(HRegionServer.java:2123)
> at org.apache.hadoop.hbase.regionserver.HRegionServer.
> buildServerLoad(

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-10 Thread Jean-Marc Spaggiari
:70)
at
org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:149)
at
org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:3016)

Which is very surprising, because I can clearly see the directory being
created.

Another attempt here, we I even look one step deeper and can see the
generated file:
2018-01-10 05:27:58,116 WARN  [regionserver/node8.com/192.168.23.2:16020]
wal.AsyncFSWAL: create wal log writer
hdfs://node2.com:8020*/hbase/WALs/node8.com
<http://node8.com>,16020,1515580031417/node8.com
<http://node8.com>%2C16020%2C1515580031417.1515580037373* failed, retry = 7
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException:
syscall:getsockopt(..) failed: Connexion refusée: /192.168.23.2:50010
at
org.apache.hbase.thirdparty.io.netty.channel.unix.Socket.finishConnect(..)(Unknown
Source)
Caused by:
org.apache.hbase.thirdparty.io.netty.channel.unix.Errors$NativeConnectException:
syscall:getsockopt(..) failed: Connexion refusée
... 1 more
2018-01-10 05:28:08,210 INFO  [regionserver/node8.com/192.168.23.2:16020]
util.FSHDFSUtils: Recover lease on dfs file /hbase/WALs/node8.com
,16020,1515580031417/node8.com%2C16020%2C1515580031417.1515580037373
2018-01-10 05:28:08,228 INFO  [regionserver/node8.com/192.168.23.2:16020]
util.FSHDFSUtils: Failed to recover lease, attempt=0 on file=/hbase/WALs/
node8.com,16020,1515580031417/node8.com%2C16020%2C1515580031417.1515580037373
after 17ms

hbase@node8:~/hbase-2.0.0-beta-1/logs$ /home/hadoop/hadoop-2.7.5/bin/hdfs
dfs -ls -R /hbase/WALs/ | grep node8
drwxr-xr-x   - hbase supergroup  0 2018-01-10 05:28 /hbase/WALs/
node8.com,16020,1515580031417
-rw-r--r--   3 hbase supergroup  0 2018-01-10 05:28
*/hbase/WALs/node8.com
<http://node8.com>,16020,1515580031417/node8.com
<http://node8.com>%2C16020%2C1515580031417.1515580037373*


But still says it fails. Any clue? all other nodes are working fine.

2018-01-09 16:25 GMT-05:00 Stack <st...@duboce.net>:

> On Tue, Jan 9, 2018 at 10:07 AM, Andrew Purtell <apurt...@apache.org>
> wrote:
>
> > I just vetoed the RC because TestMemstoreLABWithoutPool always fails for
> > me. It was the same with the last RC too. My Java is Oracle Java 8u144
> > running on x64 Linux (Ubuntu xenial). Let me know if you need me to
> provide
> > the test output.
> >
> >
> Ok. I can't make it fail. I'm going to disable it and file an issue where
> we can work on figuring what is different here.
>
> Thanks A,
>
> St.Ack
>
>
>
> >
> > On Tue, Jan 9, 2018 at 9:31 AM, Stack <st...@duboce.net> wrote:
> >
> > > I put up a new RC JMS. It still has flakies (though Duo fixed
> > > TestFromClientSide...). Was thinking that we could release beta-1
> though
> > it
> > > has flakies. We'll keep working on cutting these down as we approach
> GA.
> > > St.Ack
> > >
> > > On Sun, Jan 7, 2018 at 10:02 PM, Stack <st...@duboce.net> wrote:
> > >
> > > > On Sun, Jan 7, 2018 at 3:14 AM, Jean-Marc Spaggiari <
> > > > jean-m...@spaggiari.org> wrote:
> > > >
> > > >> Ok, thanks Stack. I will keep it running all day long until I get a
> > > >> successful one. Is that useful that I report all the failed? Or
> just a
> > > >> wast
> > > >> of time? Here is the last failed:
> > > >>
> > > >> [INFO] Results:
> > > >> [INFO]
> > > >> [ERROR] Failures:
> > > >> [ERROR]   TestFromClientSide.testCheckAndDeleteWithCompareOp:4982
> > > >> expected: but was:
> > > >> [ERROR] Errors:
> > > >> [ERROR]   TestDLSAsyncFSWAL>AbstractTestDLS.testThreeRSAbort:401 »
> > > >> TableNotFound Region ...
> > > >> [INFO]
> > > >> [ERROR] Tests run: 3585, Failures: 1, Errors: 1, Skipped: 44
> > > >> [INFO]
> > > >>
> > > >>
> > > >>
> > > > Thanks for bringing up flakies. If we look at the nightlies' run, we
> > can
> > > > get the current list. Probably no harm if all tests pass once in a
> > while
> > > > (smile).
> > > >
> > > > Looking at your findings, TestFromClientSide.
> > > testCheckAndDeleteWithCompareOp
> > > > looks to be new to beta-1. Its a cranky one. I'm looking at it. Might
> > > punt
> > > > to beta-2 if can't figure it by tomorrow. HBASE-19731.
> > > >
> > > > TestDLSAsyncFSWAL is a flakey that unfortunately passes locally.
> > > >
> > > > Let me see what others we have...
> > > >
> > > > S
> > > >
>

Re: [VOTE] The second hbase-2.0.0-beta-1 Release Candidate is available for download

2018-01-10 Thread Jean-Marc Spaggiari
Oh, you're right! I missed it! Ok. Thanks. Will wait for the next one.

JM

2018-01-10 5:00 GMT-05:00 Yu Li <car...@gmail.com>:

> I believe this one is already sunk according to stack's reply:
>
> -- Forwarded message --
> From: Stack <st...@duboce.net>
> Date: 10 January 2018 at 02:09
> Subject: Re: [VOTE] The second hbase-2.0.0-beta-1 Release Candidate is
> available for download
> To: HBase Dev List <dev@hbase.apache.org>
>
>
> Thanks Andrew.
>
> Let me work on this.
>
> RC is sunk. Will put up a new one after I address the above failure.
>
> Best Regards,
> Yu
>
> On 10 January 2018 at 17:54, Jean-Marc Spaggiari <jean-m...@spaggiari.org>
> wrote:
>
> > Are we going to sink this one because of Andrew's -1?
> >
> > If so I will wait for the next one to test it...
> >
> > Le 10 janv. 2018 03 h 24, "Balazs Meszaros" <
> balazs.mesza...@cloudera.com>
> > a écrit :
> >
> > > +1
> > >
> > > - signatures, checksums OK,
> > > - unit tests passes (8u112),
> > > - shell works,
> > > - load test tool also successed.
> > >
> > > Best regards,
> > > Balazs
> > >
> > > On Tue, Jan 9, 2018 at 10:38 PM, Andrew Purtell <apurt...@apache.org>
> > > wrote:
> > >
> > > > Linux here.
> > > >
> > > >
> > > > On Tue, Jan 9, 2018 at 1:13 PM, Stack <st...@duboce.net> wrote:
> > > >
> > > > > Andrew:
> > > > >
> > > > > I don't see this:
> > > > >
> > > > > [INFO] ---
> > > > > [INFO]  T E S T S
> > > > > [INFO] ---
> > > > > [INFO] Running
> > > > > org.apache.hadoop.hbase.regionserver.TestMemstoreLABWithoutPool
> > > > > [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time
> > elapsed:
> > > > > 4.409 s - in org.apache.hadoop.hbase.regionserver.
> > > > > TestMemstoreLABWithoutPool
> > > > > [INFO]
> > > > > [INFO] Results:
> > > > > [INFO]
> > > > > [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0
> > > > >
> > > > > Linux and Mac.
> > > > >
> > > > > What you reckon sir?
> > > > >
> > > > > St.Ack
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Jan 9, 2018 at 10:03 AM, Andrew Purtell <
> apurt...@apache.org
> > >
> > > > > wrote:
> > > > >
> > > > > > -1
> > > > > >
> > > > > > Checked sums and signatures: ok
> > > > > > RAT check passes: ok (8u144)
> > > > > > Built from source: ok (8u144)
> > > > > > Unit tests pass: failed (8u144), TestMemstoreLABWithoutPool
> always
> > > > fails
> > > > > >
> > > > > >
> > > > > > TestMemstoreLABWithoutPool.org.apache.hadoop.hbase.regionserver.
> > > > > > TestMemstoreLABWithoutPool
> > > > > > can fail with OOME if not run in isolation.
> > > > > >
> > > > > > If run in isolation, I get
> > > > > >
> > > > > > org.apache.hadoop.hbase.regionserver.TestMemstoreLABWithoutPool.
> > > > > > testLABChunkQueueWithMultipleMSLABs(org.apache.hadoop.hbase.
> > > > > regionserver.
> > > > > > TestMemstoreLABWithoutPool)
> > > > > > [ERROR]   Run 1:
> > > > > > TestMemstoreLABWithoutPool.testLABChunkQueueWithMultipleM
> SLABs:143
> > > All
> > > > > the
> > > > > > chunks must have been cleared
> > > > > > [ERROR]   Run 2:
> > > > > > TestMemstoreLABWithoutPool.testLABChunkQueueWithMultipleM
> SLABs:143
> > > All
> > > > > the
> > > > > > chunks must have been cleared
> > > > > > [ERROR]   Run 3:
> > > > > > TestMemstoreLABWithoutPool.testLABChunkQueueWithMultipleM
> SLABs:143
> > > All
> > > > > the
> > > > > > chunks must have been cleared
> > > > > > [ERROR]   Run 4:
> > > > > > TestMemstoreLABWithoutPool.testLABChunkQueueWithMultipleM
> SLABs:143
> > &

Re: [VOTE] The second hbase-2.0.0-beta-1 Release Candidate is available for download

2018-01-10 Thread Jean-Marc Spaggiari
Are we going to sink this one because of Andrew's -1?

If so I will wait for the next one to test it...

Le 10 janv. 2018 03 h 24, "Balazs Meszaros" 
a écrit :

> +1
>
> - signatures, checksums OK,
> - unit tests passes (8u112),
> - shell works,
> - load test tool also successed.
>
> Best regards,
> Balazs
>
> On Tue, Jan 9, 2018 at 10:38 PM, Andrew Purtell 
> wrote:
>
> > Linux here.
> >
> >
> > On Tue, Jan 9, 2018 at 1:13 PM, Stack  wrote:
> >
> > > Andrew:
> > >
> > > I don't see this:
> > >
> > > [INFO] ---
> > > [INFO]  T E S T S
> > > [INFO] ---
> > > [INFO] Running
> > > org.apache.hadoop.hbase.regionserver.TestMemstoreLABWithoutPool
> > > [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed:
> > > 4.409 s - in org.apache.hadoop.hbase.regionserver.
> > > TestMemstoreLABWithoutPool
> > > [INFO]
> > > [INFO] Results:
> > > [INFO]
> > > [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0
> > >
> > > Linux and Mac.
> > >
> > > What you reckon sir?
> > >
> > > St.Ack
> > >
> > >
> > >
> > >
> > > On Tue, Jan 9, 2018 at 10:03 AM, Andrew Purtell 
> > > wrote:
> > >
> > > > -1
> > > >
> > > > Checked sums and signatures: ok
> > > > RAT check passes: ok (8u144)
> > > > Built from source: ok (8u144)
> > > > Unit tests pass: failed (8u144), TestMemstoreLABWithoutPool always
> > fails
> > > >
> > > >
> > > > TestMemstoreLABWithoutPool.org.apache.hadoop.hbase.regionserver.
> > > > TestMemstoreLABWithoutPool
> > > > can fail with OOME if not run in isolation.
> > > >
> > > > If run in isolation, I get
> > > >
> > > > org.apache.hadoop.hbase.regionserver.TestMemstoreLABWithoutPool.
> > > > testLABChunkQueueWithMultipleMSLABs(org.apache.hadoop.hbase.
> > > regionserver.
> > > > TestMemstoreLABWithoutPool)
> > > > [ERROR]   Run 1:
> > > > TestMemstoreLABWithoutPool.testLABChunkQueueWithMultipleMSLABs:143
> All
> > > the
> > > > chunks must have been cleared
> > > > [ERROR]   Run 2:
> > > > TestMemstoreLABWithoutPool.testLABChunkQueueWithMultipleMSLABs:143
> All
> > > the
> > > > chunks must have been cleared
> > > > [ERROR]   Run 3:
> > > > TestMemstoreLABWithoutPool.testLABChunkQueueWithMultipleMSLABs:143
> All
> > > the
> > > > chunks must have been cleared
> > > > [ERROR]   Run 4:
> > > > TestMemstoreLABWithoutPool.testLABChunkQueueWithMultipleMSLABs:143
> All
> > > the
> > > > chunks must have been cleared
> > > >
> > > >
> > > >
> > > >
> > > > On Mon, Jan 8, 2018 at 7:43 PM, Stack  wrote:
> > > >
> > > > > The second release candidate for HBase 2.0.0-beta-1 is up at:
> > > > >
> > > > >   https://dist.apache.org/repos/dist/dev/hbase/hbase-2.0.0-
> > beta-1-RC1/
> > > > >
> > > > > Maven artifacts are available from a staging directory here:
> > > > >
> > > > >   https://repository.apache.org/content/repositories/
> > > orgapachehbase-1190
> > > > >
> > > > > All was signed with my key at 8ACC93D2 [1]
> > > > >
> > > > > I tagged the RC as 2.0.0-beta-1-RC1.5 (It took a few attempts) at
> > > > > hash 4c31374a90a487e19a6ef04d7b7adba43dd92ecf
> > > > >
> > > > > This second RC has fix nice bug fixes over RC0 including fix for
> > > failing
> > > > > and flakey tests.
> > > > >
> > > > > hbase-2.0.0-beta-1 is our first beta release. It includes all that
> > was
> > > in
> > > > > previous alphas (new assignment manager, offheap read/write path,
> > > > in-memory
> > > > > compactions, etc.). The APIs and feature-set are sealed.
> > > > >
> > > > > hbase-2.0.0-beta-1 is a not-for-production preview of hbase-2.0.0.
> It
> > > is
> > > > > meant for devs and downstreamers to test drive and flag us if we
> > messed
> > > > up
> > > > > on anything ahead of our rolling GAs. We are particular interested
> in
> > > > > hearing from Coprocessor developers.
> > > > >
> > > > > The list of features addressed in 2.0.0 so far can be found here
> [3].
> > > > There
> > > > > are thousands. The list of ~2k+ fixes in 2.0.0 exclusively can be
> > found
> > > > > here [4] (My JIRA JQL foo is a bit dodgy -- forgive me if
> mistakes).
> > > > >
> > > > > I've updated our overview doc. on the state of 2.0.0 [6]. We'll do
> > one
> > > > more
> > > > > beta before we put up our first 2.0.0 Release Candidate by the end
> of
> > > > > January, 2.0.0-beta-2. Its focus will be making it so users can do
> a
> > > > > rolling upgrade on to hbase-2.x from hbase-1.x (and any bug fixes
> > found
> > > > > running beta-1). Here is the list of what we have targeted so far
> for
> > > > > beta-2 [5]. Check it out.
> > > > >
> > > > > One known issue is that the User API has not been properly filtered
> > so
> > > it
> > > > > shows more than just InterfaceAudience Public content (HBASE-19663,
> > to
> > > be
> > > > > fixed by beta-2).
> > > > >
> > > > > Please take this beta for a spin. Please vote on whether it ok 

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-07 Thread Jean-Marc Spaggiari
Excellent! Thanks again! Starting again with the tests...

JMS

2018-01-07 8:04 GMT-05:00 张铎(Duo Zhang) <palomino...@gmail.com>:

> The last '-fn' option in the mvn command does that magic for you.
>
> 2018-01-07 19:03 GMT+08:00 Jean-Marc Spaggiari <jean-m...@spaggiari.org>:
>
> > So that's the way! Super. Thanks 张铎. Last, is there a way to keep going
> > with the remaining tests even if we get a failure on a test?
> >
> > JMS
> >
> > 2018-01-07 5:56 GMT-05:00 张铎(Duo Zhang) <palomino...@gmail.com>:
> >
> > > You can try to copy the command line from the pre commit job where we
> > will
> > > bypass the flakey tests...
> > >
> > > This is the command I use to run UTs
> > >
> > > mvn -PrunAllTests
> > > -Dtest.exclude.pattern=**/master.assignment.
> > TestMergeTableRegionsProcedure
> > > .java,**/master.balancer.TestRegionsOnMasterOptions.
> > java,**/regionserver.
> > > TestDefaultCompactSelection.java,**/client.TestMultiParallel.java,**/
> > > regionserver.TestRegionMergeTransactionOnCluster.java,**/master.
> > > TestAssignmentManagerMetrics.java,**/snapshot.
> > TestExportSnapshot.java,**/
> > > master.TestDLSAsyncFSWAL.java,**/master.balancer.
> > > TestStochasticLoadBalancer2.java,**/master.assignment.
> > > TestAssignmentManager.java,**/client.TestAsyncTableGetMultiThreaded
> > > .java,**/master.balancer.TestFavoredStochasticLoadBalan
> > cer.java,**/master.
> > > TestDLSFSHLog.java,**/trace.TestHTraceHooks.java,**/client.
> > > TestMultiRespectsLimits.java,**/client.TestBlockEvictionFromClient.
> > > java,**/mob.compactions.TestMobCompactor.java,**/regionserver.
> > > TestRegionServerReadRequestMetrics.java,**/client.
> > > TestTableSnapshotScanner.java,**/quotas.TestQuotaStatusRPCs.
> > > java,**/replication.TestReplicationSmallTests.
> java,**/master.assignment.
> > > TestSplitTableRegionProcedure.java,**/replication.
> > > TestReplicationKillSlaveRS.java,**/quotas.
> TestSnapshotQuotaObserverChore
> > > .java,**/quotas.TestQuotaThrottle.java,**/client.TestReplicasClient.
> > > java,**/TestZooKeeper.java,**/master.TestRestartCluster.
> > > java,**/client.locking.TestEntityLocks.java,**/client.
> > > TestMobSnapshotCloneIndependence.java,**/regionserver.
> > > TestMemstoreLABWithoutPool.java,**/client.
> TestMetaWithReplicas.java,**/
> > > regionserver.wal.TestAsyncLogRolling.java,**/snapshot.
> > > TestSecureExportSnapshot.java,**/TestIOFencing.java,**/master.
> > > TestMetaShutdownHandler.java,**/client.TestSizeFailures.
> > > java,**/regionserver.TestFSErrorsExposed.java,**/
> > > master.TestSplitLogManager.java,**/master.cleaner.
> > > TestHFileCleaner.java,**/TestFromClientSide**
> > > -Dsurefire.secondPartForkCount=1 clean test -fn
> > >
> > > The TestFromClientSide is not reported by the flakey tests detector but
> > > same with you, I found that it fails all the time, so also exclude it.
> > >
> > > Hope this would help.
> > >
> > > 2018-01-07 17:14 GMT+08:00 Jean-Marc Spaggiari <
> jean-m...@spaggiari.org
> > >:
> > >
> > > > Ok, thanks Stack. I will keep it running all day long until I get a
> > > > successful one. Is that useful that I report all the failed? Or just
> a
> > > wast
> > > > of time? Here is the last failed:
> > > >
> > > > [INFO] Results:
> > > > [INFO]
> > > > [ERROR] Failures:
> > > > [ERROR]   TestFromClientSide.testCheckAndDeleteWithCompareOp:4982
> > > > expected: but was:
> > > > [ERROR] Errors:
> > > > [ERROR]   TestDLSAsyncFSWAL>AbstractTestDLS.testThreeRSAbort:401 »
> > > > TableNotFound Region ...
> > > > [INFO]
> > > > [ERROR] Tests run: 3585, Failures: 1, Errors: 1, Skipped: 44
> > > > [INFO]
> > > >
> > > >
> > > > JMS
> > > >
> > > > 2018-01-07 1:55 GMT-05:00 Apekshit Sharma <a...@cloudera.com>:
> > > >
> > > > > bq. Don't you think we have enough branches already mighty Appy?
> > > > > Yeah we do...sigh.
> > > > >
> > > > >
> > > > > idk about that. But don't we need a *patch* branch branch-2.0 (just
> > > like
> > > > > branch-1.4) where we "make backwards-compatible bug fixes" and a
> > > *minor*
> > > > > branch branch-2 where we "add functionality 

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-07 Thread Jean-Marc Spaggiari
So that's the way! Super. Thanks 张铎. Last, is there a way to keep going
with the remaining tests even if we get a failure on a test?

JMS

2018-01-07 5:56 GMT-05:00 张铎(Duo Zhang) <palomino...@gmail.com>:

> You can try to copy the command line from the pre commit job where we will
> bypass the flakey tests...
>
> This is the command I use to run UTs
>
> mvn -PrunAllTests
> -Dtest.exclude.pattern=**/master.assignment.TestMergeTableRegionsProcedure
> .java,**/master.balancer.TestRegionsOnMasterOptions.java,**/regionserver.
> TestDefaultCompactSelection.java,**/client.TestMultiParallel.java,**/
> regionserver.TestRegionMergeTransactionOnCluster.java,**/master.
> TestAssignmentManagerMetrics.java,**/snapshot.TestExportSnapshot.java,**/
> master.TestDLSAsyncFSWAL.java,**/master.balancer.
> TestStochasticLoadBalancer2.java,**/master.assignment.
> TestAssignmentManager.java,**/client.TestAsyncTableGetMultiThreaded
> .java,**/master.balancer.TestFavoredStochasticLoadBalancer.java,**/master.
> TestDLSFSHLog.java,**/trace.TestHTraceHooks.java,**/client.
> TestMultiRespectsLimits.java,**/client.TestBlockEvictionFromClient.
> java,**/mob.compactions.TestMobCompactor.java,**/regionserver.
> TestRegionServerReadRequestMetrics.java,**/client.
> TestTableSnapshotScanner.java,**/quotas.TestQuotaStatusRPCs.
> java,**/replication.TestReplicationSmallTests.java,**/master.assignment.
> TestSplitTableRegionProcedure.java,**/replication.
> TestReplicationKillSlaveRS.java,**/quotas.TestSnapshotQuotaObserverChore
> .java,**/quotas.TestQuotaThrottle.java,**/client.TestReplicasClient.
> java,**/TestZooKeeper.java,**/master.TestRestartCluster.
> java,**/client.locking.TestEntityLocks.java,**/client.
> TestMobSnapshotCloneIndependence.java,**/regionserver.
> TestMemstoreLABWithoutPool.java,**/client.TestMetaWithReplicas.java,**/
> regionserver.wal.TestAsyncLogRolling.java,**/snapshot.
> TestSecureExportSnapshot.java,**/TestIOFencing.java,**/master.
> TestMetaShutdownHandler.java,**/client.TestSizeFailures.
> java,**/regionserver.TestFSErrorsExposed.java,**/
> master.TestSplitLogManager.java,**/master.cleaner.
> TestHFileCleaner.java,**/TestFromClientSide**
> -Dsurefire.secondPartForkCount=1 clean test -fn
>
> The TestFromClientSide is not reported by the flakey tests detector but
> same with you, I found that it fails all the time, so also exclude it.
>
> Hope this would help.
>
> 2018-01-07 17:14 GMT+08:00 Jean-Marc Spaggiari <jean-m...@spaggiari.org>:
>
> > Ok, thanks Stack. I will keep it running all day long until I get a
> > successful one. Is that useful that I report all the failed? Or just a
> wast
> > of time? Here is the last failed:
> >
> > [INFO] Results:
> > [INFO]
> > [ERROR] Failures:
> > [ERROR]   TestFromClientSide.testCheckAndDeleteWithCompareOp:4982
> > expected: but was:
> > [ERROR] Errors:
> > [ERROR]   TestDLSAsyncFSWAL>AbstractTestDLS.testThreeRSAbort:401 »
> > TableNotFound Region ...
> > [INFO]
> > [ERROR] Tests run: 3585, Failures: 1, Errors: 1, Skipped: 44
> > [INFO]
> >
> >
> > JMS
> >
> > 2018-01-07 1:55 GMT-05:00 Apekshit Sharma <a...@cloudera.com>:
> >
> > > bq. Don't you think we have enough branches already mighty Appy?
> > > Yeah we do...sigh.
> > >
> > >
> > > idk about that. But don't we need a *patch* branch branch-2.0 (just
> like
> > > branch-1.4) where we "make backwards-compatible bug fixes" and a
> *minor*
> > > branch branch-2 where we "add functionality in a backwards-compatible
> > > manner".
> > > Quotes are from http://hbase.apache.org/book.
> > html#hbase.versioning.post10.
> > > I stumbled on this issue when thinking about backporting
> > > https://issues.apache.org/jira/browse/HBASE-17436 for 2.1.
> > >
> > > -- Appy
> > >
> > >
> > > On Sat, Jan 6, 2018 at 4:11 PM, stack <saint@gmail.com> wrote:
> > >
> > > > It is not you.  There are a bunch of flies we need to fix. This
> latter
> > is
> > > > for sure flakey.  Let me take a look. Thanks, JMS.
> > > >
> > > > S
> > > >
> > > > On Jan 6, 2018 5:57 PM, "Jean-Marc Spaggiari" <
> jean-m...@spaggiari.org
> > >
> > > > wrote:
> > > >
> > > > I might not doing the right magic to get that run If someone is
> > able
> > > to
> > > > get all the tests pass, can you please share the command you run?
> > > >
> > > > Thanks,
> > > >
> > > > JMS
> > > &

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-07 Thread Jean-Marc Spaggiari
Ok, thanks Stack. I will keep it running all day long until I get a
successful one. Is that useful that I report all the failed? Or just a wast
of time? Here is the last failed:

[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR]   TestFromClientSide.testCheckAndDeleteWithCompareOp:4982
expected: but was:
[ERROR] Errors:
[ERROR]   TestDLSAsyncFSWAL>AbstractTestDLS.testThreeRSAbort:401 »
TableNotFound Region ...
[INFO]
[ERROR] Tests run: 3585, Failures: 1, Errors: 1, Skipped: 44
[INFO]


JMS

2018-01-07 1:55 GMT-05:00 Apekshit Sharma <a...@cloudera.com>:

> bq. Don't you think we have enough branches already mighty Appy?
> Yeah we do...sigh.
>
>
> idk about that. But don't we need a *patch* branch branch-2.0 (just like
> branch-1.4) where we "make backwards-compatible bug fixes" and a *minor*
> branch branch-2 where we "add functionality in a backwards-compatible
> manner".
> Quotes are from http://hbase.apache.org/book.html#hbase.versioning.post10.
> I stumbled on this issue when thinking about backporting
> https://issues.apache.org/jira/browse/HBASE-17436 for 2.1.
>
> -- Appy
>
>
> On Sat, Jan 6, 2018 at 4:11 PM, stack <saint@gmail.com> wrote:
>
> > It is not you.  There are a bunch of flies we need to fix. This latter is
> > for sure flakey.  Let me take a look. Thanks, JMS.
> >
> > S
> >
> > On Jan 6, 2018 5:57 PM, "Jean-Marc Spaggiari" <jean-m...@spaggiari.org>
> > wrote:
> >
> > I might not doing the right magic to get that run If someone is able
> to
> > get all the tests pass, can you please share the command you run?
> >
> > Thanks,
> >
> > JMS
> >
> >
> > [INFO] Results:
> > [INFO]
> > [ERROR] Failures:
> > [ERROR]   TestFromClientSide.testCheckAndDeleteWithCompareOp:4982
> > expected: but was:
> > [ERROR]
> > org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure
> > .testMergeRegionsConcurrently(org.apache.hadoop.hbase.master.assig
> > nment.TestMergeTableRegionsProcedure)
> > [ERROR]   Run 1:
> > TestMergeTableRegionsProcedure.setup:111->resetProcExecutorTestingKillFl
> > ag:138
> > expected executor to be running
> > [ERROR]   Run 2:
> > TestMergeTableRegionsProcedure.tearDown:128->
> > resetProcExecutorTestingKillFl
> > ag:138
> > expected executor to be running
> > [INFO]
> > [ERROR]
> > org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure
> > .testMergeTwoRegions(org.apache.hadoop.hbase.master.assignment.Tes
> > tMergeTableRegionsProcedure)
> > [ERROR]   Run 1:
> > TestMergeTableRegionsProcedure.setup:111->resetProcExecutorTestingKillFl
> > ag:138
> > expected executor to be running
> > [ERROR]   Run 2:
> > TestMergeTableRegionsProcedure.tearDown:128->
> > resetProcExecutorTestingKillFl
> > ag:138
> > expected executor to be running
> > [INFO]
> > [ERROR]
> > org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure
> .
> > testRecoveryAndDoubleExecution(org.apache.hadoop.hbase.master.ass
> > ignment.TestMergeTableRegionsProcedure)
> > [ERROR]   Run 1:
> > TestMergeTableRegionsProcedure.setup:111->resetProcExecutorTestingKillFl
> > ag:138
> > expected executor to be running
> > [ERROR]   Run 2:
> > TestMergeTableRegionsProcedure.tearDown:128->
> > resetProcExecutorTestingKillFl
> > ag:138
> > expected executor to be running
> > [INFO]
> > [ERROR]
> > org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure
> .
> > testRollbackAndDoubleExecution(org.apache.hadoop.hbase.master.ass
> > ignment.TestMergeTableRegionsProcedure)
> > [ERROR]   Run 1:
> > TestMergeTableRegionsProcedure.testRollbackAndDoubleExecution:272
> > expected: but was:
> > [ERROR]   Run 2:
> > TestMergeTableRegionsProcedure.tearDown:128->
> > resetProcExecutorTestingKillFl
> > ag:138
> > expected executor to be running
> > [INFO]
> > [ERROR]   TestSnapshotQuotaObserverChore.testSnapshotSize:276 Waiting
> > timed
> > out after [30 000] msec
> > [ERROR]
> >  TestHRegionWithInMemoryFlush>TestHRegion.testWritesWhileScanning:3813
> > expected null, but was: NotServingRegionException:
> > testWritesWhileScanning,,1515277468063.468265483817cb6da632026ba5b306f6.
> > is
> > closing>
> > [ERROR] Errors:
> > [ERROR]   TestDLSAsyncFSWAL>AbstractTestDLS.testThreeRSAbort:401 »
> > TableNotFound testThr...
> > [ERROR]
> > org.apache.hadoop.hbase.master.balancer.TestRegionsOnMaster

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-06 Thread Jean-Marc Spaggiari
I might not doing the right magic to get that run If someone is able to
get all the tests pass, can you please share the command you run?

Thanks,

JMS


[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR]   TestFromClientSide.testCheckAndDeleteWithCompareOp:4982
expected: but was:
[ERROR]
org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure.testMergeRegionsConcurrently(org.apache.hadoop.hbase.master.assig
nment.TestMergeTableRegionsProcedure)
[ERROR]   Run 1:
TestMergeTableRegionsProcedure.setup:111->resetProcExecutorTestingKillFlag:138
expected executor to be running
[ERROR]   Run 2:
TestMergeTableRegionsProcedure.tearDown:128->resetProcExecutorTestingKillFlag:138
expected executor to be running
[INFO]
[ERROR]
org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure.testMergeTwoRegions(org.apache.hadoop.hbase.master.assignment.Tes
tMergeTableRegionsProcedure)
[ERROR]   Run 1:
TestMergeTableRegionsProcedure.setup:111->resetProcExecutorTestingKillFlag:138
expected executor to be running
[ERROR]   Run 2:
TestMergeTableRegionsProcedure.tearDown:128->resetProcExecutorTestingKillFlag:138
expected executor to be running
[INFO]
[ERROR]
org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure.testRecoveryAndDoubleExecution(org.apache.hadoop.hbase.master.ass
ignment.TestMergeTableRegionsProcedure)
[ERROR]   Run 1:
TestMergeTableRegionsProcedure.setup:111->resetProcExecutorTestingKillFlag:138
expected executor to be running
[ERROR]   Run 2:
TestMergeTableRegionsProcedure.tearDown:128->resetProcExecutorTestingKillFlag:138
expected executor to be running
[INFO]
[ERROR]
org.apache.hadoop.hbase.master.assignment.TestMergeTableRegionsProcedure.testRollbackAndDoubleExecution(org.apache.hadoop.hbase.master.ass
ignment.TestMergeTableRegionsProcedure)
[ERROR]   Run 1:
TestMergeTableRegionsProcedure.testRollbackAndDoubleExecution:272
expected: but was:
[ERROR]   Run 2:
TestMergeTableRegionsProcedure.tearDown:128->resetProcExecutorTestingKillFlag:138
expected executor to be running
[INFO]
[ERROR]   TestSnapshotQuotaObserverChore.testSnapshotSize:276 Waiting timed
out after [30 000] msec
[ERROR]
 TestHRegionWithInMemoryFlush>TestHRegion.testWritesWhileScanning:3813
expected null, but was:
[ERROR] Errors:
[ERROR]   TestDLSAsyncFSWAL>AbstractTestDLS.testThreeRSAbort:401 »
TableNotFound testThr...
[ERROR]
org.apache.hadoop.hbase.master.balancer.TestRegionsOnMasterOptions.testRegionsOnAllServers(org.apache.hadoop.hbase.master.balancer.TestRegionsOnMasterOptions)
[ERROR]   Run 1:
TestRegionsOnMasterOptions.testRegionsOnAllServers:94->checkBalance:207->Object.wait:-2
» TestTimedOut
[ERROR]   Run 2: TestRegionsOnMasterOptions.testRegionsOnAllServers »
Appears to be stuck in t...
[INFO]
[INFO]
[ERROR] Tests run: 3604, Failures: 7, Errors: 2, Skipped: 44
[INFO]


2018-01-06 15:52 GMT-05:00 Jean-Marc Spaggiari <jean-m...@spaggiari.org>:

> Deleted the class to get all the tests running. Was running on the RC1
> from the tar.
>
> I know get those one failing.
>
> [ERROR] Failures:
> [ERROR]   TestFavoredStochasticLoadBalancer.test2FavoredNodesDead:352
> Balancer did not run
> [ERROR]   TestRegionMergeTransactionOnCluster.testCleanMergeReference:284
> hdfs://localhost:45311/user/jmspaggi/test-data/7c269e83-
> 5982-449e-8cf8-6bab4c7c/data/default/testCleanMergeReference/
> f1bdc6441b090dbacb391c74eaf0d1d0
> [ERROR] Errors:
> [ERROR]   TestDLSAsyncFSWAL>AbstractTestDLS.testThreeRSAbort:401 »
> TableNotFound Region ...
> [INFO]
> [ERROR] Tests run: 3604, Failures: 2, Errors: 1, Skipped: 44
>
>
> I have not been able to get all the tests passed locally for a while :(
>
> JM
>
> 2018-01-06 15:05 GMT-05:00 Ted Yu <yuzhih...@gmail.com>:
>
>> Looks like you didn't include HBASE-19666 which would be in the next RC.
>>
>> On Sat, Jan 6, 2018 at 10:52 AM, Jean-Marc Spaggiari <
>> jean-m...@spaggiari.org> wrote:
>>
>> > Trying with a different command line (mvn test -P runAllTests
>> > -Dsurefire.secondPartThreadCount=12 -Dtest.build.data.basedirector
>> y=/ram4g
>> > ) I get all those one failing.  How are you able to get everything
>> passed???
>> >
>> > [INFO] Results:
>> > [INFO]
>> > [ERROR] Failures:
>> > [ERROR]   TestDefaultCompactSelection.testCompactionRatio:74->TestCom
>> > pactionPolicy.compactEquals:182->TestCompactionPolicy.compactEquals:201
>> > expected:<[[4, 2, 1]]> but was:<[[]]>
>> > [ERROR]   TestDefaultCompactSelection.testStuckStoreCompaction:145->T
>> > estCompactionPolicy.compactEquals:182->TestCompactionPolicy.
>> compactEquals:201
>> > expected:<[[]30, 30, 30]> but was:<[[99, 30, ]30, 30, 30]>
>> > [INFO]
>> > 

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-06 Thread Jean-Marc Spaggiari
Deleted the class to get all the tests running. Was running on the RC1 from
the tar.

I know get those one failing.

[ERROR] Failures:
[ERROR]   TestFavoredStochasticLoadBalancer.test2FavoredNodesDead:352
Balancer did not run
[ERROR]   TestRegionMergeTransactionOnCluster.testCleanMergeReference:284
hdfs://localhost:45311/user/jmspaggi/test-data/7c269e83-5982-449e-8cf8-6bab4c7c/data/default/testCleanMergeReference/f1bdc6441b090dbacb391c74eaf0d1d0
[ERROR] Errors:
[ERROR]   TestDLSAsyncFSWAL>AbstractTestDLS.testThreeRSAbort:401 »
TableNotFound Region ...
[INFO]
[ERROR] Tests run: 3604, Failures: 2, Errors: 1, Skipped: 44


I have not been able to get all the tests passed locally for a while :(

JM

2018-01-06 15:05 GMT-05:00 Ted Yu <yuzhih...@gmail.com>:

> Looks like you didn't include HBASE-19666 which would be in the next RC.
>
> On Sat, Jan 6, 2018 at 10:52 AM, Jean-Marc Spaggiari <
> jean-m...@spaggiari.org> wrote:
>
> > Trying with a different command line (mvn test -P runAllTests
> > -Dsurefire.secondPartThreadCount=12 -Dtest.build.data.basedirector
> y=/ram4g
> > ) I get all those one failing.  How are you able to get everything
> passed???
> >
> > [INFO] Results:
> > [INFO]
> > [ERROR] Failures:
> > [ERROR]   TestDefaultCompactSelection.testCompactionRatio:74->TestCom
> > pactionPolicy.compactEquals:182->TestCompactionPolicy.compactEquals:201
> > expected:<[[4, 2, 1]]> but was:<[[]]>
> > [ERROR]   TestDefaultCompactSelection.testStuckStoreCompaction:145->T
> > estCompactionPolicy.compactEquals:182->TestCompactionPolicy.
> compactEquals:201
> > expected:<[[]30, 30, 30]> but was:<[[99, 30, ]30, 30, 30]>
> > [INFO]
> > [ERROR] Tests run: 1235, Failures: 2, Errors: 0, Skipped: 4
> >
> > Second run:
> > [INFO] Results:
> > [INFO]
> > [ERROR] Failures:
> > [ERROR]   TestDefaultCompactSelection.testCompactionRatio:74->
> > TestCompactionPolicy.compactEquals:182->TestCompactionPolicy
> .compactEquals:201
> > expected:<[[4, 2, 1]]> but was:<[[]]>
> > [ERROR]   TestDefaultCompactSelection.testStuckStoreCompaction:145->
> > TestCompactionPolicy.compactEquals:182->TestCompactionPolicy
> .compactEquals:201
> > expected:<[[]30, 30, 30]> but was:<[[99, 30, ]30, 30, 30]>
> > [INFO]
> > [ERROR] Tests run: 1235, Failures: 2, Errors: 0, Skipped: 4
> >
> > Then again:
> >
> > [INFO] Results:
> > [INFO]
> > [ERROR] Failures:
> > [ERROR]   TestDefaultCompactSelection.testCompactionRatio:74->
> > TestCompactionPolicy.compactEquals:182->TestCompactionPolicy
> .compactEquals:201
> > expected:<[[4, 2, 1]]> but was:<[[]]>
> > [ERROR]   TestDefaultCompactSelection.testStuckStoreCompaction:145->
> > TestCompactionPolicy.compactEquals:182->TestCompactionPolicy
> .compactEquals:201
> > expected:<[[]30, 30, 30]> but was:<[[99, 30, ]30, 30, 30]>
> > [INFO]
> > [ERROR] Tests run: 1235, Failures: 2, Errors: 0, Skipped: 4
> > [INFO]
> > [INFO] 
> > 
> > [INFO] Reactor Summary:
> >
> >
> > Sound like it's always the exact same result. Do I have a way to exclude
> > this TestCompactionPolicy test from the run?
> >
> > Here are more details from the last failure:
> > 
> > ---
> > Test set: org.apache.hadoop.hbase.regionserver.TestDefaultCompactSelec
> tion
> > 
> > ---
> > Tests run: 4, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 1.323 s
> > <<< FAILURE! - in org.apache.hadoop.hbase.regionserver.
> > TestDefaultCompactSelection
> > testStuckStoreCompaction(org.apache.hadoop.hbase.regionserve
> r.TestDefaultCompactSelection)
> > Time elapsed: 1.047 s  <<< FAILURE!
> > org.junit.ComparisonFailure: expected:<[[]30, 30, 30]> but was:<[[99, 30,
> > ]30, 30, 30]>
> > at org.apache.hadoop.hbase.regionserver.
> > TestDefaultCompactSelection.testStuckStoreCompaction(
> > TestDefaultCompactSelection.java:145)
> >
> > testCompactionRatio(org.apache.hadoop.hbase.regionserver.Tes
> tDefaultCompactSelection)
> > Time elapsed: 0.096 s  <<< FAILURE!
> > org.junit.ComparisonFailure: expected:<[[4, 2, 1]]> but was:<[[]]>
> > at org.apache.hadoop.hbase.regionserver.
> > TestDefaultCompactSelection.testCompactionRatio(
> > TestDefaultCompactSelection.java:74

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-06 Thread Jean-Marc Spaggiari
How you guys are able to get the tests running?

For me it keeps failing on TestReversedScannerCallable.

I tried many times, always fails in the same place. I'm running on a 4GB
tmpfs. Details are below. Am I doing something wrong?

JM



./dev-support/hbasetests.sh runAllTests



[INFO] Running org.apache.hadoop.hbase.client.TestOperation
[INFO]
[INFO] Results:
[INFO]
[ERROR] Errors:
[ERROR]   TestReversedScannerCallable.unnecessary Mockito stubbings »
UnnecessaryStubbing
[INFO]
[ERROR] Tests run: 245, Failures: 0, Errors: 1, Skipped: 8
[INFO]
[INFO]

[INFO] Reactor Summary:
[INFO]
[INFO] Apache HBase ... SUCCESS [
1.409 s]
[INFO] Apache HBase - Checkstyle .. SUCCESS [
1.295 s]
[INFO] Apache HBase - Build Support ... SUCCESS [
0.038 s]
[INFO] Apache HBase - Error Prone Rules ... SUCCESS [
1.069 s]
[INFO] Apache HBase - Annotations . SUCCESS [
1.450 s]
[INFO] Apache HBase - Build Configuration . SUCCESS [
0.073 s]
[INFO] Apache HBase - Shaded Protocol . SUCCESS [
14.292 s]
[INFO] Apache HBase - Common .. SUCCESS [01:51
min]
[INFO] Apache HBase - Metrics API . SUCCESS [
2.878 s]
[INFO] Apache HBase - Hadoop Compatibility  SUCCESS [
12.216 s]
[INFO] Apache HBase - Metrics Implementation .. SUCCESS [
7.206 s]
[INFO] Apache HBase - Hadoop Two Compatibility  SUCCESS [
12.440 s]
[INFO] Apache HBase - Protocol  SUCCESS [
0.074 s]
[INFO] Apache HBase - Client .. FAILURE [02:10
min]
[INFO] Apache HBase - Zookeeper ... SKIPPED
[INFO] Apache HBase - Replication . SKIPPED





---
Test set: org.apache.hadoop.hbase.client.TestReversedScannerCallable
---
Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.515 s <<<
FAILURE! - in org.apache.hadoop.hbase.client.TestReversedScannerCallable
unnecessary Mockito
stubbings(org.apache.hadoop.hbase.client.TestReversedScannerCallable)  Time
elapsed: 0.014 s  <<< ERROR!
org.mockito.exceptions.misusing.UnnecessaryStubbingException:

Unnecessary stubbings detected in test class: TestReversedScannerCallable
Clean & maintainable test code requires zero unnecessary code.
Following stubbings are unnecessary (click to navigate to relevant line of
code):
  1. -> at
org.apache.hadoop.hbase.client.TestReversedScannerCallable.setUp(TestReversedScannerCallable.java:66)
  2. -> at
org.apache.hadoop.hbase.client.TestReversedScannerCallable.setUp(TestReversedScannerCallable.java:68)
Please remove unnecessary stubbings. More info: javadoc for
UnnecessaryStubbingException class.


2018-01-06 0:44 GMT-05:00 stack :

> On Jan 5, 2018 4:44 PM, "Apekshit Sharma"  wrote:
>
> bq. Care needs to be exercised backporting. Bug fixes only please. If in
> doubt, ping me, the RM, please. Thanks.
> In that case, shouldn't we branch out branch-2.0? We can then do normal
> backports to branch-2 and only bug fixes to branch-2.0.
>
>
>
> Don't you think we have enough branches already mighty Appy?
>
> No new features on branch-2? New features are in master/3.0.0 only?
>
> S
>
>
>
>
>
>
> On Fri, Jan 5, 2018 at 9:48 AM, Andrew Purtell 
> wrote:
>
> > TestMemstoreLABWithoutPool is a flake, not a consistent fail.
> >
> >
> > On Fri, Jan 5, 2018 at 7:18 AM, Stack  wrote:
> >
> > > On Thu, Jan 4, 2018 at 2:24 PM, Andrew Purtell 
> > > wrote:
> > >
> > > > This one is probably my fault:
> > > >
> > > > TestDefaultCompactSelection
> > > >
> > > > HBASE-19406
> > > >
> > > >
> > > Balazs fixed it above, HBASE-19666
> > >
> > >
> > >
> > > > It can easily be reverted. The failure of interest
> > > > is TestMemstoreLABWithoutPool.testLABChunkQueueWithMultipleMSLABs.
> > > >
> > > >
> > > This seems fine. Passes in nightly
> > > https://builds.apache.org/view/H-L/view/HBase/job/HBase%
> > > 20Nightly/job/branch-2/171/testReport/org.apache.hadoop.
> > > hbase.regionserver/TestMemstoreLABWithoutPool/
> > > and locally against the tag. It fails consistently for you Andrew?
> > >
> > >
> > > > > Should all unit tests pass on a beta? I think so, at least if the
> > > > failures
> > > > > are 100% repeatable.
> > > > >
> > > >
> > >
> > > This is fair. Let me squash this RC and roll another.
> > >
> > > Will put it up in a few hours.
> > >
> > > Thanks,
> > > S
> > >
> > >
> > >
> > > > > -0
> > > > >
> > > > > Checked sums and signatures: ok
> > > > > RAT check: ok
> > > > > Built from source: ok (8u144)
> > > > > Ran unit tests: some failures (8u144)

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2018-01-04 Thread Jean-Marc Spaggiari
If I re-run from the original cluster, now that I have snappy enabled, it
works. But if it helps I can easily remove snappy libs, transfer from
source, re-run and capture all the logs. It's an easy step. Just confirm
and I will do it.

Apart from that, everything else seems to run correctly. I ran some
RowCounters, merged regions, compactions, splits, alters, etc. Have not
found anything else. Still +1 ;)

JMS

2018-01-03 22:26 GMT-05:00 Stack :

> +1 from me.
> S
>
> On Fri, Dec 29, 2017 at 12:15 PM, Stack  wrote:
>
> > The first release candidate for HBase 2.0.0-beta-1 is up at:
> >
> >  https://dist.apache.org/repos/dist/dev/hbase/hbase-2.0.0-beta-1-RC0/
> >
> > Maven artifacts are available from a staging directory here:
> >
> >  https://repository.apache.org/content/repositories/orgapachehbase-1188
> >
> > All was signed with my key at 8ACC93D2 [1]
> >
> > I tagged the RC as 2.0.0-beta-1-RC0 (0907563eb72697b394b8b960fe5488
> > 7d6ff304fd)
> >
> > hbase-2.0.0-beta-1 is our first beta release. It includes all that was in
> > previous alphas (new assignment manager, offheap read/write path,
> in-memory
> > compactions, etc.). The APIs and feature-set are sealed.
> >
> > hbase-2.0.0-beta-1 is a not-for-production preview of hbase-2.0.0. It is
> > meant for devs and downstreamers to test drive and flag us if we messed
> up
> > on anything ahead of our rolling GAs. We are particular interested in
> > hearing from Coprocessor developers.
> >
> > The list of features addressed in 2.0.0 so far can be found here [3].
> > There are thousands. The list of ~2k+ fixes in 2.0.0 exclusively can be
> > found here [4] (My JIRA JQL foo is a bit dodgy -- forgive me if
> mistakes).
> >
> > I've updated our overview doc. on the state of 2.0.0 [6]. We'll do one
> > more beta before we put up our first 2.0.0 Release Candidate by the end
> of
> > January, 2.0.0-beta-2. Its focus will be making it so users can do a
> > rolling upgrade on to hbase-2.x from hbase-1.x (and any bug fixes found
> > running beta-1). Here is the list of what we have targeted so far for
> > beta-2 [5]. Check it out.
> >
> > One knownissue is that the User API has not been properly filtered so it
> > shows more than just InterfaceAudience Public content (HBASE-19663, to be
> > fixed by beta-2).
> >
> > Please take this beta for a spin. Please vote on whether it ok to put out
> > this RC as our first beta (Note CHANGES has not yet been updated). Let
> the
> > VOTE be open for 72 hours (Monday)
> >
> > Thanks,
> > Your 2.0.0 Release Manager
> >
> > 1. http://pgp.mit.edu/pks/lookup?op=get=0x9816C7FC8ACC93D2
> > 3. https://goo.gl/scYjJr
> > 4. https://goo.gl/dFFT8b
> > 5. https://issues.apache.org/jira/projects/HBASE/versions/12340862
> > 6. https://docs.google.com/document/d/1WCsVlnHjJeKUcl7wHwqb4
> > z9iEu_ktczrlKHK8N4SZzs/
> >
>


Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2017-12-31 Thread Jean-Marc Spaggiari
Sorry to spam the list :(

Another interesting thing.

Now most of my tablesare online. For few I'm getting this:
Caused by: java.lang.IllegalArgumentException: Invalid HFile version:
major=2, minor=1: expected at least major=2 and minor=3
at
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.checkFileVersion(HFileReaderImpl.java:332)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.(HFileReaderImpl.java:199)
at org.apache.hadoop.hbase.io.hfile.HFile.openReader(HFile.java:538)
... 13 more

What is interesting is tat I'm not doing anything on the source cluster for
weeks/months. So all tables are all major compacted the same way. I will
major compact them all under HFiles v3 format and retry.

2017-12-31 13:33 GMT-05:00 Jean-Marc Spaggiari <jean-m...@spaggiari.org>:

> Ok. With a brand new DestCP from source cluster, regions are getting
> assigned correctly. So sound like if they get stuck initially for any
> reason, then even if the reason is fixed they can not get assigned anymore
> again. Will keep playing.
>
> I kept the previous /hbase just in case we need something from it.
>
> Thanks,
>
> JMS
>
> 2017-12-31 10:23 GMT-05:00 Jean-Marc Spaggiari <jean-m...@spaggiari.org>:
>
>> Nothing bad that I can see. Here is a region server log:
>> https://pastebin.com/0r76Y6ap
>>
>> Disabling the table makes the regions leave the transition mode. I'm
>> trying to disable all tables one by one (because it get stuck after each
>> disable) and will see if re-enabling them helps...
>>
>> On the master side, I now have errors all over:
>> 2017-12-31 10:06:26,511 WARN  [ProcExecWrkr-89]
>> assignment.RegionTransitionProcedure: Retryable error trying to
>> transition: pid=511, ppid=398, state=RUNNABLE:REGION_TRANSITION_DISPATCH;
>> UnassignProcedure table=work_proposed, 
>> region=d0a58b76ad9376b12b3e763660049d3d,
>> server=node3.com,16020,1514693337210; rit=OPENING, location=node3.com
>> ,16020,1514693337210
>> org.apache.hadoop.hbase.exceptions.UnexpectedStateException: Expected
>> [SPLITTING, SPLIT, MERGING, OPEN, CLOSING] so could move to CLOSING but
>> current state=OPENING
>> at org.apache.hadoop.hbase.master.assignment.RegionStates$Regio
>> nStateNode.transitionState(RegionStates.java:155)
>> at org.apache.hadoop.hbase.master.assignment.AssignmentManager.
>> markRegionAsClosing(AssignmentManager.java:1530)
>> at org.apache.hadoop.hbase.master.assignment.UnassignProcedure.
>> updateTransition(UnassignProcedure.java:179)
>> at org.apache.hadoop.hbase.master.assignment.RegionTransitionPr
>> ocedure.execute(RegionTransitionProcedure.java:309)
>> at org.apache.hadoop.hbase.master.assignment.RegionTransitionPr
>> ocedure.execute(RegionTransitionProcedure.java:85)
>> at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Proce
>> dure.java:845)
>> at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execPro
>> cedure(ProcedureExecutor.java:1456)
>> at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execute
>> Procedure(ProcedureExecutor.java:1225)
>> at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$
>> 800(ProcedureExecutor.java:78)
>> at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerT
>> hread.run(ProcedureExecutor.java:1735)
>>
>> Non-stop showing on the logs. Probably because I disabled the table.
>> Restarting HBase so see if it clears that a but...
>>
>> After restart there isn't any org.apache.hadoop.hbase.except
>> ions.UnexpectedStateException on the logs. Only INFO lever. And nothing
>> bad. But still, regions are stuck in transition even for the disabled
>> tables.
>>
>> Master ls are here. I removed some sections because it always says the
>> same thing, for each and every single region: https://pastebin.com/K
>> 6SQ7DXP
>>
>> JMS
>>
>> 2017-12-31 9:58 GMT-05:00 stack <saint@gmail.com>:
>>
>>> There is nothing further up in the master log from regionservers or on
>>> regionservers side on open?
>>>
>>> Thanks,
>>> S
>>>
>>> On Dec 31, 2017 8:37 AM, "stack" <saint@gmail.com> wrote:
>>>
>>> > Good questions.  If you disable snappy does it work?  If you start over
>>> > fresh does it work?  It should be picking up native libs.  Make an
>>> issue
>>> > please jms.  Thanks for giving it a go.
>>> >
>>> > S
>>> >
>>> > On Dec 30, 2017 11:49 PM, "Jean-Marc Spaggiari" <
>>> jean-m...@spaggiari.org>
>>> > wrote:
>>> >
>

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2017-12-31 Thread Jean-Marc Spaggiari
Ok. With a brand new DestCP from source cluster, regions are getting
assigned correctly. So sound like if they get stuck initially for any
reason, then even if the reason is fixed they can not get assigned anymore
again. Will keep playing.

I kept the previous /hbase just in case we need something from it.

Thanks,

JMS

2017-12-31 10:23 GMT-05:00 Jean-Marc Spaggiari <jean-m...@spaggiari.org>:

> Nothing bad that I can see. Here is a region server log:
> https://pastebin.com/0r76Y6ap
>
> Disabling the table makes the regions leave the transition mode. I'm
> trying to disable all tables one by one (because it get stuck after each
> disable) and will see if re-enabling them helps...
>
> On the master side, I now have errors all over:
> 2017-12-31 10:06:26,511 WARN  [ProcExecWrkr-89]
> assignment.RegionTransitionProcedure: Retryable error trying to
> transition: pid=511, ppid=398, state=RUNNABLE:REGION_TRANSITION_DISPATCH;
> UnassignProcedure table=work_proposed, 
> region=d0a58b76ad9376b12b3e763660049d3d,
> server=node3.com,16020,1514693337210; rit=OPENING, location=node3.com
> ,16020,1514693337210
> org.apache.hadoop.hbase.exceptions.UnexpectedStateException: Expected
> [SPLITTING, SPLIT, MERGING, OPEN, CLOSING] so could move to CLOSING but
> current state=OPENING
> at org.apache.hadoop.hbase.master.assignment.RegionStates$
> RegionStateNode.transitionState(RegionStates.java:155)
> at org.apache.hadoop.hbase.master.assignment.AssignmentManager.
> markRegionAsClosing(AssignmentManager.java:1530)
> at org.apache.hadoop.hbase.master.assignment.UnassignProcedure.
> updateTransition(UnassignProcedure.java:179)
> at org.apache.hadoop.hbase.master.assignment.RegionTransitionPr
> ocedure.execute(RegionTransitionProcedure.java:309)
> at org.apache.hadoop.hbase.master.assignment.RegionTransitionPr
> ocedure.execute(RegionTransitionProcedure.java:85)
> at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(
> Procedure.java:845)
> at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execPro
> cedure(ProcedureExecutor.java:1456)
> at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execute
> Procedure(ProcedureExecutor.java:1225)
> at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$
> 800(ProcedureExecutor.java:78)
> at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerT
> hread.run(ProcedureExecutor.java:1735)
>
> Non-stop showing on the logs. Probably because I disabled the table.
> Restarting HBase so see if it clears that a but...
>
> After restart there isn't any org.apache.hadoop.hbase.except
> ions.UnexpectedStateException on the logs. Only INFO lever. And nothing
> bad. But still, regions are stuck in transition even for the disabled
> tables.
>
> Master ls are here. I removed some sections because it always says the
> same thing, for each and every single region: https://pastebin.com/K
> 6SQ7DXP
>
> JMS
>
> 2017-12-31 9:58 GMT-05:00 stack <saint@gmail.com>:
>
>> There is nothing further up in the master log from regionservers or on
>> regionservers side on open?
>>
>> Thanks,
>> S
>>
>> On Dec 31, 2017 8:37 AM, "stack" <saint@gmail.com> wrote:
>>
>> > Good questions.  If you disable snappy does it work?  If you start over
>> > fresh does it work?  It should be picking up native libs.  Make an issue
>> > please jms.  Thanks for giving it a go.
>> >
>> > S
>> >
>> > On Dec 30, 2017 11:49 PM, "Jean-Marc Spaggiari" <
>> jean-m...@spaggiari.org>
>> > wrote:
>> >
>> >> Hi Stack,
>> >>
>> >> I just tried to give it a try... Wipe out all HDFS content and code,
>> all
>> >> HBase content and code, and all ZK. Re-build a brand new cluster with 7
>> >> physical worker nodes. I'm able to get HBase start, how-ever I'm not
>> able
>> >> to get my regions online.
>> >>
>> >> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
>> >> assignment.AssignmentManager: TODO Handle stuck in transition:
>> >> rit=OPENING,
>> >> location=node8.16020,151469206, table=pageMini,
>> >> region=a778eb67898dfd378e426f2e7700faea
>> >> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
>> >> assignment.AssignmentManager: TODO Handle stuck in transition:
>> >> rit=OPENING,
>> >> location=node6.16020,1514693336563, table=work_proposed,
>> >> region=4a1d86197ace3f4c8b1c8de28dbe1d34
>> >> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
>> >> assignment.AssignmentManager: TODO Handle stuck in transition:
>> >> rit=OPEN

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2017-12-31 Thread Jean-Marc Spaggiari
Nothing bad that I can see. Here is a region server log:
https://pastebin.com/0r76Y6ap

Disabling the table makes the regions leave the transition mode. I'm trying
to disable all tables one by one (because it get stuck after each disable)
and will see if re-enabling them helps...

On the master side, I now have errors all over:
2017-12-31 10:06:26,511 WARN  [ProcExecWrkr-89]
assignment.RegionTransitionProcedure: Retryable error trying to transition:
pid=511, ppid=398, state=RUNNABLE:REGION_TRANSITION_DISPATCH;
UnassignProcedure table=work_proposed,
region=d0a58b76ad9376b12b3e763660049d3d, server=node3.com,16020,1514693337210;
rit=OPENING, location=node3.com,16020,1514693337210
org.apache.hadoop.hbase.exceptions.UnexpectedStateException: Expected
[SPLITTING, SPLIT, MERGING, OPEN, CLOSING] so could move to CLOSING but
current state=OPENING
at
org.apache.hadoop.hbase.master.assignment.RegionStates$RegionStateNode.transitionState(RegionStates.java:155)
at
org.apache.hadoop.hbase.master.assignment.AssignmentManager.markRegionAsClosing(AssignmentManager.java:1530)
at
org.apache.hadoop.hbase.master.assignment.UnassignProcedure.updateTransition(UnassignProcedure.java:179)
at
org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:309)
at
org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:85)
at
org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:845)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1456)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1225)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78)
at
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1735)

Non-stop showing on the logs. Probably because I disabled the table.
Restarting HBase so see if it clears that a but...

After restart there isn't any
org.apache.hadoop.hbase.exceptions.UnexpectedStateException on the logs.
Only INFO lever. And nothing bad. But still, regions are stuck in
transition even for the disabled tables.

Master ls are here. I removed some sections because it always says the same
thing, for each and every single region: https://pastebin.com/K6SQ7DXP

JMS

2017-12-31 9:58 GMT-05:00 stack <saint@gmail.com>:

> There is nothing further up in the master log from regionservers or on
> regionservers side on open?
>
> Thanks,
> S
>
> On Dec 31, 2017 8:37 AM, "stack" <saint@gmail.com> wrote:
>
> > Good questions.  If you disable snappy does it work?  If you start over
> > fresh does it work?  It should be picking up native libs.  Make an issue
> > please jms.  Thanks for giving it a go.
> >
> > S
> >
> > On Dec 30, 2017 11:49 PM, "Jean-Marc Spaggiari" <jean-m...@spaggiari.org
> >
> > wrote:
> >
> >> Hi Stack,
> >>
> >> I just tried to give it a try... Wipe out all HDFS content and code, all
> >> HBase content and code, and all ZK. Re-build a brand new cluster with 7
> >> physical worker nodes. I'm able to get HBase start, how-ever I'm not
> able
> >> to get my regions online.
> >>
> >> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> >> assignment.AssignmentManager: TODO Handle stuck in transition:
> >> rit=OPENING,
> >> location=node8.16020,151469206, table=pageMini,
> >> region=a778eb67898dfd378e426f2e7700faea
> >> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> >> assignment.AssignmentManager: TODO Handle stuck in transition:
> >> rit=OPENING,
> >> location=node6.16020,1514693336563, table=work_proposed,
> >> region=4a1d86197ace3f4c8b1c8de28dbe1d34
> >> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> >> assignment.AssignmentManager: TODO Handle stuck in transition:
> >> rit=OPENING,
> >> location=node1.16020,1514693336898, table=page_crc,
> >> region=86b3912a09a5676b6851636ed22c2abc
> >> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> >> assignment.AssignmentManager: TODO Handle stuck in transition:
> >> rit=OPENING,
> >> location=node7.16020,1514693337406, table=pageAvro,
> >> region=391784c43c87bdea6df05f96accad0ff
> >> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> >> assignment.AssignmentManager: TODO Handle stuck in transition:
> >> rit=OPENING,
> >> location=node8.16020,151469206, table=page,
> >> region=5850d782a3beea18872769bf8fd70fc7
> >> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> >> assignment.AssignmentManager: TODO Handle stuck in transition:
> >> rit=OPENIN

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2017-12-31 Thread Jean-Marc Spaggiari
.java:119)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:534)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)
at 
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.lang.Thread.run(Thread.java:748)


2017-12-31 9:37 GMT-05:00 stack <saint@gmail.com>:

> Good questions.  If you disable snappy does it work?  If you start over
> fresh does it work?  It should be picking up native libs.  Make an issue
> please jms.  Thanks for giving it a go.
>
> S
>
> On Dec 30, 2017 11:49 PM, "Jean-Marc Spaggiari" <jean-m...@spaggiari.org>
> wrote:
>
> > Hi Stack,
> >
> > I just tried to give it a try... Wipe out all HDFS content and code, all
> > HBase content and code, and all ZK. Re-build a brand new cluster with 7
> > physical worker nodes. I'm able to get HBase start, how-ever I'm not able
> > to get my regions online.
> >
> > 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> > assignment.AssignmentManager: TODO Handle stuck in transition:
> rit=OPENING,
> > location=node8.16020,151469206, table=pageMini,
> > region=a778eb67898dfd378e426f2e7700faea
> > 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> > assignment.AssignmentManager: TODO Handle stuck in transition:
> rit=OPENING,
> > location=node6.16020,1514693336563, table=work_proposed,
> > region=4a1d86197ace3f4c8b1c8de28dbe1d34
> > 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> > assignment.AssignmentManager: TODO Handle stuck in transition:
> rit=OPENING,
> > location=node1.16020,1514693336898, table=page_crc,
> > region=86b3912a09a5676b6851636ed22c2abc
> > 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> > assignment.AssignmentManager: TODO Handle stuck in transition:
> rit=OPENING,
> > location=node7.16020,1514693337406, table=pageAvro,
> > region=391784c43c87bdea6df05f96accad0ff
> > 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> > assignment.AssignmentManager: TODO Handle stuck in transition:
> rit=OPENING,
> > location=node8.16020,151469206, table=page,
> > region=5850d782a3beea18872769bf8fd70fc7
> > 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> > assignment.AssignmentManager: TODO Handle stuck in transition:
> rit=OPENING,
> > location=node5.16020,1514693330961, table=work_proposed,
> > region=1d892c9b54b66f802b82c2f9fe847f1f
> > 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> > assignment.AssignmentManager: TODO Handle stuck in transition:
> rit=OPENING,
> > location=node5.16020,1514693330961, table=pageAvro,
> > region=e9de2c68cc01883e959d7953a4251687
> > 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> > assignment.AssignmentManager: TODO Handle stuck in transition:
> rit=OPENING,
> > location=node3.16020,1514693337210, table=page,
> > region=e2e5fc1c262273893f10e92f24817d1b
> > 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> > assignment.AssignmentManager: TODO Handle stuck in transition:
> rit=OPENING,
> > location=node3.16020,1514693337210, table=page,
> > region=89c443c09f10bd1584b1bb86a637e1a8
> > 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> > assignment.AssignmentManager: TODO Handle stuck in transition:
> rit=OPENING,
> > location=node5.16020,1514693330961, table=page,
> > region=8ca93e9285233ca7b31992f194056bc1
> > 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> > assignment.AssignmentManager: TODO Handle stuck in transition:
> rit=OPENING,
> > location=node4.16020,1514693339685, table=work_proposed,
> > region=9afcf06c4d0d21d7e04b0223edcfc40a
> > 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> > assignment.AssignmentManager: TODO Handle stuck in transition:
> rit=OPENING,
> > location=node6.16020,1514693336563, table=page,
&g

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2017-12-30 Thread Jean-Marc Spaggiari
I forgot to say that I distCP the entire /hbase folder from another 1.3
HBase cluster ;) That's why there is data here.

2017-12-31 0:48 GMT-05:00 Jean-Marc Spaggiari <jean-m...@spaggiari.org>:

> Hi Stack,
>
> I just tried to give it a try... Wipe out all HDFS content and code, all
> HBase content and code, and all ZK. Re-build a brand new cluster with 7
> physical worker nodes. I'm able to get HBase start, how-ever I'm not able
> to get my regions online.
>
> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node8.16020,151469206, table=pageMini, region=
> a778eb67898dfd378e426f2e7700faea
> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node6.16020,1514693336563, table=work_proposed, region=
> 4a1d86197ace3f4c8b1c8de28dbe1d34
> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node1.16020,1514693336898, table=page_crc, region=
> 86b3912a09a5676b6851636ed22c2abc
> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node7.16020,1514693337406, table=pageAvro, region=
> 391784c43c87bdea6df05f96accad0ff
> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node8.16020,151469206, table=page, region=
> 5850d782a3beea18872769bf8fd70fc7
> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node5.16020,1514693330961, table=work_proposed, region=
> 1d892c9b54b66f802b82c2f9fe847f1f
> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node5.16020,1514693330961, table=pageAvro, region=
> e9de2c68cc01883e959d7953a4251687
> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node3.16020,1514693337210, table=page, region=
> e2e5fc1c262273893f10e92f24817d1b
> 2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node3.16020,1514693337210, table=page, region=
> 89c443c09f10bd1584b1bb86a637e1a8
> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node5.16020,1514693330961, table=page, region=
> 8ca93e9285233ca7b31992f194056bc1
> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node4.16020,1514693339685, table=work_proposed, region=
> 9afcf06c4d0d21d7e04b0223edcfc40a
> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node6.16020,1514693336563, table=page, region=
> 3457b3237c576eecd550eccee3f584cd
> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node1.16020,1514693336898, table=page, region=
> dd5fb1dbd41945a9ccbc110b8d4a51b5
> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node7.16020,1514693337406, table=work_proposed, region=
> 480bb37af54d9fa57c727da9e8a33578
> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node8.16020,151469206, table=page_crc, region=
> 56b18d470a569c5474ea084f0d995726
> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node6.16020,1514693336563, table=page_duplicate, region=
> e744a9af161de965c70c7d1a08b07660
> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node1.16020,1514693336898, table=page_proposed, region=
> 1c75e53308acac6313db4be63c2b48fe
> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node8.16020,151469206, table=work_proposed, region=
> 45a25ba85f6341a177db7b15554259f9
> 2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
> assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
> location=node3.16020,1514693337210, table=work_proposed, region=
> d0a58b76ad9376b12

Re: [VOTE] The first hbase-2.0.0-beta-1 Release Candidate is available

2017-12-30 Thread Jean-Marc Spaggiari
Hi Stack,

I just tried to give it a try... Wipe out all HDFS content and code, all
HBase content and code, and all ZK. Re-build a brand new cluster with 7
physical worker nodes. I'm able to get HBase start, how-ever I'm not able
to get my regions online.

2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
location=node8.16020,151469206, table=pageMini,
region=a778eb67898dfd378e426f2e7700faea
2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
location=node6.16020,1514693336563, table=work_proposed,
region=4a1d86197ace3f4c8b1c8de28dbe1d34
2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
location=node1.16020,1514693336898, table=page_crc,
region=86b3912a09a5676b6851636ed22c2abc
2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
location=node7.16020,1514693337406, table=pageAvro,
region=391784c43c87bdea6df05f96accad0ff
2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
location=node8.16020,151469206, table=page,
region=5850d782a3beea18872769bf8fd70fc7
2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
location=node5.16020,1514693330961, table=work_proposed,
region=1d892c9b54b66f802b82c2f9fe847f1f
2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
location=node5.16020,1514693330961, table=pageAvro,
region=e9de2c68cc01883e959d7953a4251687
2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
location=node3.16020,1514693337210, table=page,
region=e2e5fc1c262273893f10e92f24817d1b
2017-12-31 00:42:03,187 WARN  [ProcExecTimeout]
assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
location=node3.16020,1514693337210, table=page,
region=89c443c09f10bd1584b1bb86a637e1a8
2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
location=node5.16020,1514693330961, table=page,
region=8ca93e9285233ca7b31992f194056bc1
2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
location=node4.16020,1514693339685, table=work_proposed,
region=9afcf06c4d0d21d7e04b0223edcfc40a
2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
location=node6.16020,1514693336563, table=page,
region=3457b3237c576eecd550eccee3f584cd
2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
location=node1.16020,1514693336898, table=page,
region=dd5fb1dbd41945a9ccbc110b8d4a51b5
2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
location=node7.16020,1514693337406, table=work_proposed,
region=480bb37af54d9fa57c727da9e8a33578
2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
location=node8.16020,151469206, table=page_crc,
region=56b18d470a569c5474ea084f0d995726
2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
location=node6.16020,1514693336563, table=page_duplicate,
region=e744a9af161de965c70c7d1a08b07660
2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
location=node1.16020,1514693336898, table=page_proposed,
region=1c75e53308acac6313db4be63c2b48fe
2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
location=node8.16020,151469206, table=work_proposed,
region=45a25ba85f6341a177db7b15554259f9
2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
location=node3.16020,1514693337210, table=work_proposed,
region=d0a58b76ad9376b12b3e763660049d3d
2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
location=node3.16020,1514693337210, table=page,
region=599a4b7b21b1d93fa232ebbbef37a31b
2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
location=node1.16020,1514693336898, table=page_proposed,
region=55c07269cc907b8e8875c2a1c4ec27d5
2017-12-31 00:42:03,188 WARN  [ProcExecTimeout]
assignment.AssignmentManager: TODO Handle stuck in transition: rit=OPENING,
location=node5.,16020,1514693330961, table=page_crc,

Documentation update: 1.4 and hadoop compatibility?

2017-12-30 Thread Jean-Marc Spaggiari
Hi,

I have not seen a JIRA regarding adding 1.4 into this chart:
https://hbase.apache.org/book.html#hadoop Did I missed it? If not, I will
open one...

Thanks,

JMS


[jira] [Created] (HBASE-19582) Tags on append doesn't behave like expected

2017-12-21 Thread Jean-Marc Spaggiari (JIRA)
Jean-Marc Spaggiari created HBASE-19582:
---

 Summary: Tags on append doesn't behave like expected
 Key: HBASE-19582
 URL: https://issues.apache.org/jira/browse/HBASE-19582
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 2.0.0-alpha-4
Reporter: Jean-Marc Spaggiari


When appending a tag an HBase cell, they seems to not really be append be live 
their own life. In the example below, I put a cell, append the TTL, and we can 
see between the 2 scans that only the TTL append cell expires. I was expecting 
those 2 cells to become one and expire together.

[code]
hbase(main):082:0> put 't1', 'r1', 'f1:c1', 'value'
0 row(s) in 0.1350 seconds

hbase(main):083:0> append 't1', 'r1', 'f1:c1', '', { TTL => 5000 }
0 row(s) in 0.0080 seconds

hbase(main):084:0> scan 't1'
ROW   COLUMN+CELL   

   
 r1   column=f1:c1, 
timestamp=1513879615014, value=value
   
1 row(s) in 0.0730 seconds

hbase(main):085:0> scan 't1'
ROW   COLUMN+CELL   

   
 r1   column=f1:c1, 
timestamp=1513879599375, value=value
   
1 row(s) in 0.0500 seconds
[code]




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Request to Join Slack Channel

2017-12-21 Thread Jean-Marc Spaggiari
Me too please ;)

Thanks,

JMS

2017-12-20 20:54 GMT-05:00 Philippe Laflamme :

> Hi,
>
> I'd like to join the slack discussion and the guide mentions writing to
> this address to obtain an invite. May I obtain an invite please?
>
> Thanks,
> Philippe
>


Re: [DISCUSSION] Default configurations in hbase-2.0.0 hbase-default.xml

2017-12-19 Thread Jean-Marc Spaggiari
Can we get all tables by default Snappy compressed? I think because of the
license we can not, right? Just asking, in case there is an option for
that... Also +1 on balancing by table...

2017-12-18 17:34 GMT-05:00 Stack :

> (I thought I'd already posted a DISCUSSION on defaults for 2.0.0 but can't
> find it...)
>
> Dear All:
>
> I'm trying to get some eyeballs/thoughts on changes you'd like seen in
> hbase defaults for hbase-2.0.0. We have a an ISSUE and some good discussion
> already up at HBASE-19148.
>
> A good case is being made for enabling balancing by table as default.
>
> Guanghao Zhang has already put in place more sensible retry/timeout
> numbers.
>
> Anything else we should change? Shout here or up on the issue.
>
> Thanks,
> S
>


  1   2   3   4   5   6   7   8   >