[jira] [Resolved] (HBASE-20522) Maven artifacts for HBase 2 are not available at maven central

2018-05-03 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HBASE-20522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía resolved HBASE-20522.
--
Resolution: Fixed

The deps are available, I confirm it is fixed now. Thanks [~stack]

> Maven artifacts for HBase 2 are not available at maven central
> --
>
> Key: HBASE-20522
> URL: https://issues.apache.org/jira/browse/HBASE-20522
> Project: HBase
>  Issue Type: Bug
>  Components: build
>Affects Versions: 2.0.0
>Reporter: Ismaël Mejía
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20530) Composition of backup directory incorrectly contains namespace when restoring

2018-05-03 Thread Ted Yu (JIRA)
Ted Yu created HBASE-20530:
--

 Summary: Composition of backup directory incorrectly contains 
namespace when restoring
 Key: HBASE-20530
 URL: https://issues.apache.org/jira/browse/HBASE-20530
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu


Here is partial listing of output from incremental backup:
{code}
5306 2018-05-04 02:38 
hdfs://mycluster/user/hbase/backup_loc/backup_1525401467793/table_almphxih4u/cf1/5648501da7194783947bbf07b172f07e
{code}
When restoring, here is what HBackupFileSystem.getTableBackupDir returns:
{code}
fileBackupDir=hdfs://mycluster/user/hbase/backup_loc/backup_1525401467793/default/table_almphxih4u
{code}
You can see that namespace gets in the way, leading to inability of finding the 
proper hfile.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20529) Make sure that there are no remote wals when transit cluster from DA to A

2018-05-03 Thread Guanghao Zhang (JIRA)
Guanghao Zhang created HBASE-20529:
--

 Summary: Make sure that there are no remote wals when transit 
cluster from DA to A
 Key: HBASE-20529
 URL: https://issues.apache.org/jira/browse/HBASE-20529
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Reporter: Guanghao Zhang


Consider we have two clusters in A and S state, and then we transit A to DA. 
And later we want to transit DA to A, since the remote cluster is in S, we 
should be able to do it. But there are some remote wals on the HDFS for the 
cluster in S state, so we need to wait the remote wals was removed first before 
transiting the cluster in DA state to A. Need add a check for this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-20510) Add a downloads page to hbase.apache.org to tie mirrored artifacts to their hash and signature

2018-05-03 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-20510.
---
   Resolution: Fixed
 Assignee: stack
Fix Version/s: 3.0.0

Resolving for now. Can open new issue to prettify

> Add a downloads page to hbase.apache.org to tie mirrored artifacts to their 
> hash and signature
> --
>
> Key: HBASE-20510
> URL: https://issues.apache.org/jira/browse/HBASE-20510
> Project: HBase
>  Issue Type: Task
>  Components: website
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: 
> 0001-HBASE-20510-Add-a-downloads-page-to-hbase.apache.org.patch, Screen Shot 
> 2018-04-30 at 4.50.45 PM.png
>
>
> I tried to push to announce for 2.0.0 but it got blocked because mirrors 
> don't have signatures and hashes (I only just noticed. Its probably been this 
> way for ages). Signatures and hashes are hosted on www.apache.org only. 
> So, there needs to be a connection made between hashes and signatures hosted 
> on apache.org and the associated artifacts gotten from mirrors. Projects 
> usually do this on a 'download' page. Add one with 2.0.0 as first entry.
> [~misty] FYI



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Changes to website re: hbasecon

2018-05-03 Thread Stack
Looks great. Thanks boys.
S

On Thu, May 3, 2018 at 3:11 PM, Josh Elser  wrote:

> ClayB was telling me last week that he had folks having a hard time
> getting to the "current" HBaseCon 2018 page on our site via Google
> (apparently they kept sending people to the previous site
> hbase.apache.org/www.hbasecon.com)
>
> I just pushed a couple of changes:
>
> 1. JS redirect of h.a.o/www.hbasecon.com to h.a.o/hbasecon-2018
> 2. A tabular archive page at h.a.o/hbasecon-archives.html that Clay wrote
> for us (super nice!)
> 3. Updated hbasecon-2018 with a pointer to this archive page
>
> Please holler if this offends you greatly :)
>


Changes to website re: hbasecon

2018-05-03 Thread Josh Elser
ClayB was telling me last week that he had folks having a hard time 
getting to the "current" HBaseCon 2018 page on our site via Google 
(apparently they kept sending people to the previous site 
hbase.apache.org/www.hbasecon.com)


I just pushed a couple of changes:

1. JS redirect of h.a.o/www.hbasecon.com to h.a.o/hbasecon-2018
2. A tabular archive page at h.a.o/hbasecon-archives.html that Clay 
wrote for us (super nice!)

3. Updated hbasecon-2018 with a pointer to this archive page

Please holler if this offends you greatly :)


[jira] [Resolved] (HBASE-20498) [AMv2] Stuck in UnexpectedStateException

2018-05-03 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-20498.
---
Resolution: Invalid

Resolving. Logs are gone and my statement of '... we go assign regions' is 
misisng vital detail...  Will keep an eye out for this... going forward but 
killing this JIRA.

> [AMv2] Stuck in UnexpectedStateException
> 
>
> Key: HBASE-20498
> URL: https://issues.apache.org/jira/browse/HBASE-20498
> Project: HBase
>  Issue Type: Sub-task
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Fix For: 2.0.1
>
> Attachments: HBASE-20498.branch-2.001.patch
>
>
> Here is how we get into a stuck scenario in hbase-2.0.0RC2.
>  * Assign a region. It is moved to OPENING state then RPCs the RS.
>  * RS opens region. Tells Master.
>  * Master tries to complete the assign by updating hbase:meta.
>  * hbase:meta is hosed because I'd deployed a bad patch that blocked 
> hbase:meta updates
>  * Master is stuck retrying RPCs to RS hosting hbase:meta; we want to update 
> our new OPEN state in hbase:meta.
>  * I kill Master because I want to fix the broke patch.
>  * On restart, a script sets table to be DISABLED.
>  * As part of startup, we go to assign regions.
>  * We skip assigning regions because the table is DISABLED; i.e. we skip the  
> replay of the unfinished assign.
>  * The region is now a free-agent; no lock held, so, the queued unassign that 
> is part of the disable table can run
>  * It fails because region is in OPENING state, an UnexpectedStateException 
> is thrown.
> We loop complaining the above.
> Resolution requires finishing previous assign first, then we can disable.
> Let me try and write a test to manufacture this state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-20507) Do not need to call recoverLease on the broken file when we fail to create a wal writer

2018-05-03 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-20507.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Re-resolving.

> Do not need to call recoverLease on the broken file when we fail to create a 
> wal writer
> ---
>
> Key: HBASE-20507
> URL: https://issues.apache.org/jira/browse/HBASE-20507
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.1.0, 2.0.1
>
> Attachments: 20507.addendum.patch, HBASE-20507.patch
>
>
> I tried locally with a UT, if we overwrite a file which is currently being 
> written, the old file will be completed and then deleted. If you call close 
> on the previous file, a no lease exception will be thrown which means that 
> the file has already been completed.
> So we do not need to close a file if it will be overwritten immediately, 
> since recoverLease may take a very long time...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (HBASE-20507) Do not need to call recoverLease on the broken file when we fail to create a wal writer

2018-05-03 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-20507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack reopened HBASE-20507:
---

Reopening to add an addendum (nightlies are broke since this went in).

> Do not need to call recoverLease on the broken file when we fail to create a 
> wal writer
> ---
>
> Key: HBASE-20507
> URL: https://issues.apache.org/jira/browse/HBASE-20507
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 2.1.0, 2.0.1
>
> Attachments: 20507.addendum.patch, HBASE-20507.patch
>
>
> I tried locally with a UT, if we overwrite a file which is currently being 
> written, the old file will be completed and then deleted. If you call close 
> on the previous file, a no lease exception will be thrown which means that 
> the file has already been completed.
> So we do not need to close a file if it will be overwritten immediately, 
> since recoverLease may take a very long time...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Fwd:

2018-05-03 Thread Xi Yang
Wow, This explanation is really detailed. That helps me much! I totally
understand the read process now.
Thanks a million.

Thanks,
Alex

2018-05-02 22:33 GMT-07:00 ramkrishna vasudevan <
ramkrishna.s.vasude...@gmail.com>:

> Regarding the read flow this is what happens
>
> 1)  Create a region level scanner
> 2) the region level scanner can comprise of more than one store scanner
> (each store scanner works on one column family).
> 3) Every store scanner wil comprise of memstore scanner and a set of hfile
> scanners (based on number of store files).
> 4) The scan tries to read data in lexographical order.
>  For eg, for simplicty take you have row1 to row5 and there is only one
> column family 'f1' and one column 'c1'. Assume row1 was already written and
> it is flushed to a store file. Row2 to row5 are in the memstore .
> When the scanner starts it will form a heap with all these memstore scanner
> and store file (hfile) scanners. Internally since row1 is smaller
> lexographically the row1 from the store file is retrieved first. This row1
> for the first time will be in HDFS (and not in block cache). The remaining
> rows are fetched from memstore scanners. there is no block cache concept at
> the memstore level. Memstore is just a simple Key value map.
>
> When the same scan is issued the next time we go through the above steps
> but to fetch row1, the store file scanner that has row1, fetches the block
> cache that has row1 (instead of HDFS) and returns the value from block
> cache and the remaining rows are again fetched from memstore scanners from
> the underlying memstore.
>
> Hope this helps.
>
> REgards
> Ram
>
> On Thu, May 3, 2018 at 9:17 AM, Xi Yang  wrote:
>
> > Hi Tim,
> >
> > Thanks for confirm the question.  That question confused me for a long
> > time. Really appreciate.
> >
> >
> > About another question, I still don't know whether ModelA is correct or
> > Model B is correct. Still confused
> >
> >
> > Thanks,
> > Alex
> >
> > 2018-05-02 13:53 GMT-07:00 Tim Robertson :
> >
> > > Thanks Alex,
> > >
> > > Yes, looking at that code I believe you are correct - the memStore
> > scanner
> > > is appended after the block scanners.
> > > The block scanners may or may not see hits in the block cache when they
> > > read. If they don't get a hit, they'll open the block from the
> underlying
> > > HFile(s).
> > >
> > >
> > >
> > > On Wed, May 2, 2018 at 10:41 PM, Xi Yang 
> wrote:
> > >
> > > > Hi Tim,
> > > >
> > > > Thank you for detailed explanation. Yes, that really helps me! I
> really
> > > > appreciate it!
> > > >
> > > >
> > > > But I still confused about the sequence:
> > > >
> > > > I've read these codes in *HStore.getScanners* :
> > > >
> > > >
> > > > *// TODO this used to get the store files in descending order,*
> > > > *// but now we get them in ascending order, which I think is*
> > > > *// actually more correct, since memstore get put at the end.*
> > > > *List sfScanners =
> > > > StoreFileScanner.getScannersForStoreFiles(storeFilesToScan,*
> > > > *  cacheBlocks, usePread, isCompaction, false, matcher, readPt);*
> > > > *List scanners = new
> > ArrayList<>(sfScanners.size() +
> > > > 1);*
> > > > *scanners.addAll(sfScanners);*
> > > > *// Then the memstore scanners*
> > > > *scanners.addAll(memStoreScanners);*
> > > >
> > > >
> > > > Is it mean this step:
> > > >
> > > >
> > > > *2) It looks in the memstore to see if there are any writes still in
> > > > memoryready to flush down to the HFiles that needs merged with the
> data
> > > > read in 1) *
> > > >
> > > > is behind the following step?
> > > >
> > > > *c) the data is read from the opened block *
> > > >
> > > >
> > > >
> > > >
> > > > Here are explanation of the images I drew before, so that we don't
> need
> > > the
> > > > images:
> > > >
> > > > When a read request come in
> > > > Model A
> > > >
> > > >1. get Scanners (including StoreScanner and MemStoreScanner).
> > > >MemStoreScanner is the last one
> > > >2. Begin with the first StoreScanner
> > > >3. Try to get the block from BlockCache of the StoreScanner
> > > >4. Try to get the block from HFile of the StoreScanner
> > > >5. Go to the next StoreScanner
> > > >6. Loop #2 - #5 until all StoreScanner been used
> > > >7. Try to get the block from memStore
> > > >
> > > >
> > > > Model B
> > > >
> > > >1. Try to get the block from BlockCache, if failed then go to #2
> > > >2. get Scanners (including StoreScanner and MemStoreScanner).
> > > >MemStoreScanner is the last on
> > > >3. Begin with the first StoreScanner
> > > >4. Try to get the block from HFile of the StoreScanner
> > > >5. Go to the next StoreScanner
> > > >6. Loop #4 - #5 until all StoreScanner been used
> > > >7. Try to get the block from memStore
> > > >
> > > >
> > > >
> > > > Thanks,
> > > > Alex
> > > >
> > > >
> > > > 2018-05-02 1:04 GMT-07:00 Tim Robertson :
> > > >
> > > > > Hi Alex,
> > > > >
> >

[jira] [Created] (HBASE-20528) Revise collections copying from iteration to built-in function

2018-05-03 Thread Hua-Yi Ho (JIRA)
Hua-Yi Ho created HBASE-20528:
-

 Summary: Revise collections copying from iteration to built-in 
function
 Key: HBASE-20528
 URL: https://issues.apache.org/jira/browse/HBASE-20528
 Project: HBase
  Issue Type: Improvement
Reporter: Hua-Yi Ho


Some collection codes in file
StochasticLoadBalancer.java, AbstractHBaseTool.java, HFileInputFormat.java, 
Result.java, and WalPlayer.java, using iterations to copy whole data in 
collections. The iterations can just replace by just Colletions.addAll and 
Arrays.copyOf.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20527) Remove unused code in MetaTableAccessor.java

2018-05-03 Thread Mingdao Yang (JIRA)
Mingdao Yang created HBASE-20527:


 Summary: Remove unused code in MetaTableAccessor.java
 Key: HBASE-20527
 URL: https://issues.apache.org/jira/browse/HBASE-20527
 Project: HBase
  Issue Type: Improvement
Affects Versions: 3.0.0
Reporter: Mingdao Yang


META_REGION_PREFIX isn't used. I'll clean it up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[DISCUSS] Effective HBase in the Cloud

2018-05-03 Thread Josh Elser

Hi,

I'm pleased to finally be able to share this design document with you 
all. It's the result of internal review from half a dozen or so from 
within our community (Enis, Devaraj, Artem, and Clay easily come to 
mind) after multiple months of review and iteration.


Abstract:


Infrastructure as a service (IaaS) via public cloud infrastructure 
offerings (Cloud Iaas) has grown dramatically in popularity through 
services like Amazon EC2, Google Compute Engine, and Microsoft Azure 
Compute. Across Apache HBase users, the majority of new system 
architectures include some form of Cloud IaaS as a means to increase the 
capabilities and/or decrease the cost of operation of their system. 
However, deploying HBase on these platforms comes with difficulties as 
HBase has a non-optional dependency on Apache Hadoop HDFS to guarantee 
the durability of data written to HBase. This document outlines a 
proposal to remove HBase’s dependency on HDFS by replacing the current 
Write-Ahead-Log (WAL) implementation using Apache Ratis (incubating). It 
covers why the HDFS dependency is a problem on Cloud IaaS, how Ratis can 
be used to replace HDFS-based WALs, and a high-level development plan to 
effectively implement the replacement of this extremely critical HBase 
internal component without becoming tied to a single Cloud IaaS offering.



The document is available on Google Docs[1] and there is also PDF 
available [2] of the current version. I'm happy to assist those who do 
not want to use the copy on a Google service (e.g. transcribe 
mailing-list chatter onto the Doc).


Thanks to some of the same folks who helped with this document, I also 
have a fairly in-depth analysis of what we think the required work will 
entail. For the HBase specific changes, I'd like to avoid the pitfall we 
commonly face and work towards frequent merges into master that do not 
destabilize the build (keep things "Green") to avoid stalling our 
forward momentum after 2.0. If people are curious/interested, I'm happy 
to delve some more into how I think we can implement this.


- Josh

[1] 
https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit#

[2] https://home.apache.org/~elserj/Effective%20HBase%20in%20the%20Cloud.pdf


Re: Looking forward to releasing 1.2.7

2018-05-03 Thread Sean Busbey
Hi!

Thanks for bringing this up.

I'm still the RM for 1.2 and I've been anxious to get the release
cadence going again for almost 3 months now. Unfortunately, the
nightly tests are still failing:

https://builds.apache.org/view/H-L/view/HBase/job/HBase%20Nightly/job/branch-1.2/

For previous releases in this line, I relied on the nightly tests on
ASF infra passing as a health check for the code branch.

I've been tied up in HBase 2.0 and $dayjob work, so haven't managed to
chase down the remaining failures. Maybe it's time I find a different
way to assure myself.

Andrew P, I think for the 1.4 release line you do unit test running on
some non-ASF system. I am 100% certain I could get a VM somewhere to
do the  same for 1.2 RCs. Do you know off-hand what size I need to
grab as far as cpu / mem is concerned?

On Thu, May 3, 2018 at 7:31 AM, Kang Minwoo  wrote:
> Hello!
>
> I am looking forward to releasing HBase 1.2.7.
> Do you know when 1.2.7 will be released?
>
> Best regards,
> Minwoo Kang


Looking forward to releasing 1.2.7

2018-05-03 Thread Kang Minwoo
Hello!

I am looking forward to releasing HBase 1.2.7.
Do you know when 1.2.7 will be released?

Best regards,
Minwoo Kang


[jira] [Created] (HBASE-20526) multithreads bulkload performance

2018-05-03 Thread Key Hutu (JIRA)
Key Hutu created HBASE-20526:


 Summary: multithreads bulkload performance
 Key: HBASE-20526
 URL: https://issues.apache.org/jira/browse/HBASE-20526
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Affects Versions: 2.0.0
Reporter: Key Hutu
Assignee: Key Hutu


When doing bulkload , some interactive with zookeeper to getting region key 
range may be cost more time.

In multithreads enviorment, the duration maybe cost 5 minute or more.

>From the executor log, like 'Reading reply sessionid:0x262fb37f4a07080 , 
>packet:: clientPath:null server ...' contents appear many times.

 

It likely to provide new method for bulkload, caching the key range outside

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20525) Refactoring the code of read path

2018-05-03 Thread Duo Zhang (JIRA)
Duo Zhang created HBASE-20525:
-

 Summary: Refactoring the code of read path
 Key: HBASE-20525
 URL: https://issues.apache.org/jira/browse/HBASE-20525
 Project: HBase
  Issue Type: Umbrella
  Components: scan
Reporter: Duo Zhang
 Fix For: 3.0.0


The known problems of the current implementation:

1. 'Seek or skip' should be decided at StoreFileScanner level, not StoreScanner.
2. As now we support creating multiple StoreFileReader instances for a single 
HFile, we do not need to load the file info and other meta infos every time 
when creating a new StoreFileReader instance.
3. 'Pread or stream' should be decided at StoreFileScanner level, not 
StoreScanner.
4. Make sure that we can return at any point during a scan, at least when 
filterRowKey we can not stop until we reach the next row, no matter how many 
cells we need to skip...
5. Doing bytes comparing everywhere, where we need to know if there is a row 
change, a family change, a qualifier change, etc. This is a performance killer.

And the most important thing is that, the code is way too complicated now and 
become out of control...

This should be done before our 3.0.0 release.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20524) Need to clear metrics when ReplicationSourceManager refresh replication sources

2018-05-03 Thread Guanghao Zhang (JIRA)
Guanghao Zhang created HBASE-20524:
--

 Summary: Need to clear metrics when ReplicationSourceManager 
refresh replication sources
 Key: HBASE-20524
 URL: https://issues.apache.org/jira/browse/HBASE-20524
 Project: HBase
  Issue Type: Bug
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang


When ReplicationSourceManager refresh replication sources, it will close the 
old source first, then startup a new source. The new source will use a new 
metrics, but forgot to clear the metrics for old sources.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)