date:20150316

Re: Rough goal timelines for 1.1 and 2.0

2015-03-16 Thread Ted Yu

+1 on Nick being the RM for release 1.1

On Mon, Mar 16, 2015 at 11:50 AM, Enis Söztutar enis@gmail.com wrote:

 I would love to see 1.1 in or before May. We already have good stuff in
 branch-1, enough to justify a minor release. Some of the features are
 still in the pipeline waiting to be finished (MOB, procV2, etc).
 Personally, I think we should get HBASE-12972, and ProcV2, RPC quotas (and
 other multi-tenancy improvements not yet backported) and call it 1.1.

 I would +1 either Nick or Andrew, both should be excellent RMs.

 Enis

 On Mon, Mar 16, 2015 at 11:05 AM, Andrew Purtell apurt...@apache.org
 wrote:

  FWIW, the Region proposal (HBASE-12972) is ready for review. The
 companion
  issue for SplitTransaction and RegionMergeTransaction (HBASE-12975) needs
  more discussion but could be ready to go in a = one month timeframe.
 
  On Mon, Mar 16, 2015 at 10:30 AM, Nick Dimiduk ndimi...@gmail.com
 wrote:
 
   I think we can learn a lesson or two from the vendor marketing machines
  --
   a release timed with HBaseCon would be ideal in this regard. My
  obligations
   to the event are minimal, so I'm willing to volunteer as RM for 1.1. Do
  we
   think we can make some of these decisions in time for spinning RC's in
   mid-April? That's just about a month away.
  
   -n
  
   On Sat, Mar 14, 2015 at 10:37 AM, Elliott Clark ecl...@apache.org
  wrote:
  
I'm most looking forward to rpc quotas and the buffer improvements
 that
stack has put in. So for me getting a 1.1 in May 1 would be cool.
That would allow us to talk about what was just released at HBaseCon,
  and
maybe even have 1.1.0 in production at places.
   
On Fri, Mar 13, 2015 at 11:44 AM, Sean Busbey bus...@cloudera.com
   wrote:
   
 The only reason I can think of to make decisions now would be if we
   want
to
 ensure we have consensus for the changes for Phoenix and enough
 time
  to
 implement them.

 Given that AFAIK it's those changes that'll drive having a 1.1
  release,
 seems prudent. But I haven't been tracking the changes lately.

 I think we're all in agreement that something needs to be done, and
   that
 HBase 1.1 and Phoenix 5 are the places to do it. Probably it won't
 be
 contentious to just decide as changes are ready?

 --
 Sean
 On Mar 13, 2015 1:28 PM, Andrew Purtell apurt...@apache.org
  wrote:

  That was my question.. We can discuss them independently? Or is
   there a
  reason not to?
 
  On Fri, Mar 13, 2015 at 11:10 AM, Sean Busbey 
 bus...@cloudera.com
  
 wrote:
 
   On Fri, Mar 13, 2015 at 12:31 PM, Andrew Purtell 
apurt...@apache.org
   wrote:
  
Do we need to couple decisions for 1.1 and 2.0 in the same
 discussion?
   
   
   Like what? Interface changes for Phoenix maybe?
  
   --
   Sean
  
 
 
 
  --
  Best regards,
 
 - Andy
 
  Problems worthy of attack prove their worth by hitting back. -
 Piet
Hein
  (via Tom White)
 

   
  
 
 
 
  --
  Best regards,
 
 - Andy
 
  Problems worthy of attack prove their worth by hitting back. - Piet Hein
  (via Tom White)

[jira] [Created] (HBASE-13259) mmap() based BucketCache IOEngine

2015-03-16 Thread Zee Chen (JIRA)

Zee Chen created HBASE-13259:


 Summary: mmap() based BucketCache IOEngine
 Key: HBASE-13259
 URL: https://issues.apache.org/jira/browse/HBASE-13259
 Project: HBase
  Issue Type: New Feature
  Components: BlockCache
Affects Versions: 0.98.10
Reporter: Zee Chen


Of the existing BucketCache IOEngines, FileIOEngine uses pread() to
copy data from kernel space to user space. This is a good choice when the
total working set size is much bigger than the available RAM and the latency is 
dominated by IO access. However, when the entire working set is
small enough to fit in the RAM, using mmap() (and subsequent memcpy()) to move 
data from kernel space to user space is faster. I have run some short keyval 
gets tests and the results indicate a reduction of 2%-7% of kernel CPU on my 
system, depending on the load. On the gets, the latency
histograms from mmap() are identical to those from pread(), but peak
throughput is close to 40% higher.

This patch modifies ByteByfferArray to allow it to specify a backing file.

Example for using this feature: set  hbase.bucketcache.ioengine to 
mmap:/dev/shm/bucketcache.0 in hbase-site.xml.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: mmap() based BucketCache IOEngine

2015-03-16 Thread Nick Dimiduk

Yes please!

On Mon, Mar 16, 2015 at 12:43 PM, Zee Chen zeo...@gmail.com wrote:

 Of the existing BucketCache IOEngines, the FileIOEngine uses pread() to
 copy data from kernel space to user space. This is a good choice when the
 total working set size is much bigger than the available RAM and the
 latency is dominated by IO access. However, when the entire working set is
 small enough to fit in the RAM, using mmap() (and subsequent memcpy()) to
 move data from kernel space to user space is faster. I have run some short
 keyval gets tests and the results indicate a reduction of 2%-7% of kernel
 CPU on my system, depending on the load. On the gets, the latency
 histograms from mmap() are identical to those from pread(), but peak
 throughput is close to 40% higher.

 Already tested the patch at Flurry. Anyone interested in reviewing the
 patch?

 -Zee

Re: Question on EnableTableHandler code

2015-03-16 Thread Stephen Jiang

Now (1) is under control (HBASE-13254).  Let us talk about (2).  Looks like
we are doing best effort to online all regions of a table during 'enable
table' operation.  My argument is that we should be consistent with all
conditions.  Currently, we fail if bulk assignment failed with some reason;
but if we don't even do assignment, we declare successful.  It is not
consistent; if we are doing best effort, then we should always succeed on
'making regions online' operation (with warning messages - by the way,
warning message is good for debugging, but client could not see it).

Here is the current logic
*done = false;*
*if (we can find servers to do bulk assignment) {*
*   if (bulk assignment is complete) {*
*  done = true; *
*} else { // either bulk assignment failed or interrupted   *
*  done = false;*
*}*
*  }*
*} else { // bulk plan could not be found*
*  done = true;*
*}  *


On Mon, Mar 16, 2015 at 11:48 AM, Andrey Stepachev oct...@gmail.com wrote:

 Thanks Stephen.

 on (2): I think that much better to guarantee that table was enabled (i.e.
 all internal structures reflect that fact and balancer knows about new
 table). But result of that could be checked asyncronically from Admin.
 Does it make sense?

 On Mon, Mar 16, 2015 at 6:10 PM, Stephen Jiang syuanjiang...@gmail.com
 wrote:

 Andrey, I will take care of (1).

 And (2) :-) if your guys agree.  Because it is not consistent, if the
 bulk assigned failed, we would fail the enabling table; however, if the
 bulk assign not starts, we would enable table with offline regions - really
 inconsistent - we either all fail in those scenarios or all succeed with
 offline region (best effort approach).

 Thanks
 Stephen

 On Mon, Mar 16, 2015 at 11:01 AM, Andrey Stepachev oct...@gmail.com
 wrote:

 Stephen, would you like to create jira for case (1)?

 Thank you.

 On Mon, Mar 16, 2015 at 5:58 PM, Andrey Stepachev oct...@gmail.com
 wrote:

  Thanks Stephen.
 
  Looks like you are right. For (1) case we really don't need there
  state cleanup. That is a bug. Should throw TableNotFoundException.
 
  As for (2) in case of no online region servers available we could
  leave table enabled, but no regions would be assigned.
 
  Actually that rises good question what enable table means,
  i.e. do we really need to guarantee that on table enable absolutely
  all regions are online, or that could be done in Admin on client side.
 
  So for now it seems that Enable handler do what is best, and leave
  table enabled but unassigned to be later assigned by Balancer.
 
  On Mon, Mar 16, 2015 at 5:34 PM, Stephen Jiang 
 syuanjiang...@gmail.com
  wrote:
 
  I want to make sure that the following logic in EnableTableHandler is
  correct:
 
  (1). In EnableTableHandler#prepare - if the table is not existed, it
  marked
  the table as deleted and not throw exception.  The result is the table
  lock
  is released and the caller has no knowledge that the table not exist
 or
  already deleted, it would continue the next step.
 
  Currently, this would happen during recovery (the caller is
  AssignmentManager#recoverTableInEnablingState()) - however, looking at
  recovery code, it expects TableNotFoundException  Should we always
 throw
  exception - if the table not exist?  I want to make sure that I don't
  break
  recovery logic by modifying.
 
  public EnableTableHandler prepare() {
 
  ...
 
  // Check if table exists
 
if (!MetaTableAccessor.tableExists(this.server.getConnection(),
  tableName)) {
 
  // retainAssignment is true only during recovery.  In normal
 case
  it is false
 
  if (!this.skipTableStateCheck) {
 
throw new TableNotFoundException(tableName);
 
  }
 
  this.assignmentManager.getTableStateManager().setDeletedTable(
  tableName);
 
}
 
  ...
 
  }
  (2). In EnableTableHandler#handleEnableTable() - if the bulk assign
 plan
  could not be find, it would leave regions to be offline and declare
 enable
  table succeed - i think this is a bug and we should retry or fail -
 but I
  want to make sure that there are some merit behind this logic
 
private void handleEnableTable() {
 
  MapServerName, ListHRegionInfo bulkPlan =
 
  this.assignmentManager.getBalancer().retainAssignment(
  regionsToAssign, onlineServers);
 
  if (bulkPlan != null) {
 
...
 
} else {
 
LOG.info(Balancer was unable to find suitable servers for
 table  +
  tableName
 
+ , leaving unassigned);
 
done = true;
 
  }
 
  if (done) {
 
// Flip the table to enabled.
 
this.assignmentManager.getTableStateManager().setTableState(
 
  this.tableName, TableState.State.ENABLED);
 
LOG.info(Table ' + this.tableName
 
+ ' was successfully enabled. Status: done= + done);
 
 }
 
   ...
 
  }
 
 
  thanks
  Stephen
 
 
 
 
  --
  Andrey.
 



 --
 Andrey.





 --

[jira] [Created] (HBASE-13257) Show coverage report on jenkins

2015-03-16 Thread zhangduo (JIRA)

zhangduo created HBASE-13257:


 Summary: Show coverage report on jenkins
 Key: HBASE-13257
 URL: https://issues.apache.org/jira/browse/HBASE-13257
 Project: HBase
  Issue Type: Task
Reporter: zhangduo
Priority: Minor


Think of showing jacoco coverage report on https://builds.apache.org .

And there is an advantage of showing it on jenkins that the jenkins jacoco 
plugin can handle cross module coverage.

Can not do it locally since https://github.com/jacoco/jacoco/pull/97 is still 
pending.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-13260) Bootstrap Tables for fun and profit

2015-03-16 Thread Enis Soztutar (JIRA)

Enis Soztutar created HBASE-13260:
-

 Summary: Bootstrap Tables for fun and profit 
 Key: HBASE-13260
 URL: https://issues.apache.org/jira/browse/HBASE-13260
 Project: HBase
  Issue Type: Bug
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 2.0.0, 1.1.0


Over at the ProcV2 discussions(HBASE-12439) and elsewhere I was mentioning an 
idea where we may want to use regular old regions to store/persist some data 
needed for HBase master to operate. 

We regularly use system tables for storing system data. acl, meta, namespace, 
quota are some examples. We also store the table state in meta now. Some data 
is persisted in zk only (replication peers and replication state, etc). We are 
moving away from zk as a permanent storage. As any self-respecting database 
does, we should store almost all of our data in HBase itself. 

However, we have an availability dependency between different kinds of data. 
For example all system tables need meta to be assigned first. All master 
operations need ns table to be assigned, etc. 

For at least two types of data, (1) procedure v2 states, (2) RS groups in 
HBASE-6721 we cannot depend on meta being assigned since assignment itself 
will depend on accessing this data. The solution in (1) is to implement a 
custom WAL format, and custom recover lease and WAL recovery. The solution in 
(2) is to have the table to store this data, but also cache it in zk for 
bootrapping initial assignments. 

For solving both of the above (and possible future use cases if any), I propose 
we add a boostrap table concept, which is: 
 - A set of predefined tables hosted in a separate dir in HDFS. 
 - A table is only 1 region, not splittable 
 - Not assigned through regular assignment 
 - Hosted only on 1 server (typically master)
 - Has a dedicated WAL. 
 - A service does WAL recovery + fencing for these tables. 

This has the benefit of using a region to keep the data, but frees us to 
re-implement caching and we can use the same WAL / Memstore / Recovery 
mechanisms that are battle-tested. 



 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-13258) Promote TestHRegion to LargeTests

2015-03-16 Thread zhangduo (JIRA)

zhangduo created HBASE-13258:


 Summary: Promote TestHRegion to LargeTests
 Key: HBASE-13258
 URL: https://issues.apache.org/jira/browse/HBASE-13258
 Project: HBase
  Issue Type: Sub-task
  Components: test
Reporter: zhangduo
Assignee: zhangduo


It always timeout we I tried to get a coverage report locally. The problem is 
testWritesWhileGetting, it runs extremely slow when jacoco agent enabled(not a 
bug, there is progress).

Since it has a VerySlowRegionServerTests annotation on it, I think it is OK to 
promote it to LargeTests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Jira role cleanup

2015-03-16 Thread Ted Yu

bq. Beyond that I agree that we should limit this to a known set of people
(the contributors).
+1

bq. Maybe discuss this briefly at the next PMC meeting
+1 too.

On Sun, Mar 15, 2015 at 11:12 PM, lars hofhansl la...@apache.org wrote:

 Hmm... This is interesting. I think Jira management should be left to the
 committers. One can pretty much mess up a release, and make it hard to
 account for what's in and what's not when jiras are changed the around (the
 ultimate truth can be reconstructed from the git commit records, but that's
 tedious).
 Minimally somebody needs to be able to assign a jira to the person
 providing the patch, if those are committers only that's tedious but OK -
 we've been doing that anyway.Ideally the person could assign an _open_
 issue to him/herself and log work against an issue and change the due data.
 Those seem abilities we could grant to everybody as long as they are
 limited to open issues.
 Beyond that I agree that we should limit this to a known set of people
 (the contributors). Maybe discuss this briefly at the next PMC meeting,
 we're due to have one anyway. I'm willing to host one at Salesforce.

 -- Lars

   From: Sean Busbey bus...@cloudera.com
  To: dev dev@hbase.apache.org; lars hofhansl la...@apache.org
  Sent: Sunday, March 15, 2015 9:46 PM
  Subject: Re: Jira role cleanup

 I can make it so that issues can be assigned to non-contributors. Even if
 we don't do that, I believe jira permissions are all about constraining
 current actions, and are not enforced on existing ticketes.

 However, the contributor role in jira has several other abilities
 associated with it. Right now, in the order they appear in jira:

 * edit an issue's due date
 * move issues (between project workflows or projects the user has create
 on)
 * assign issues to other people
 * resolve and reopen issues, assign a fix version (but not close them)
 * manage watchers on an issue
 * log work against an issue

 Any of these could also be changed to remove contributors or allow wider
 jira users.

 If assignable users can assign to themselves when they don't have the
 assign users permission, then the only one I think we use is resolve and
 reopen issues. And I don't think I'd want that open to all jira users.

 Do we want to have to handle marking issues resolved for folks? It makes
 sense to me, since I usually do that once I push the commit.





 On Sun, Mar 15, 2015 at 11:07 PM, lars hofhansl la...@apache.org wrote:

  Not sure what jira does about an assignee when (s)he is removed from the
  contributors list (I know you have to add a person to the contributors
 list
  order to assign a jira to them).Other than the committers, we probably
 have
  at least one jira assigned to a contributor (otherwise why add him/her as
  contributor).
  Can we change the jira rules in our space to allow assigning jiras to
  users even when they're not listed as contributors?
  We do not have a formal contributor status (why not?), so this list is
  only needed because of jira.
  -- Lars
 
   From: Sean Busbey bus...@cloudera.com
   To: dev dev@hbase.apache.org
   Sent: Friday, March 13, 2015 9:09 AM
   Subject: Re: Jira role cleanup
 
  On Fri, Mar 13, 2015 at 11:01 AM, Andrew Purtell apurt...@apache.org
  wrote:
 
   +1
   I think it would be fine to trim the contributor list too. We can
 always
   add people back on demand in order to (re)assign issues.
  
  
  I wasn't sure how we generate the list of contributors. But then I
 noticed
  that we don't link to jira for it like I thought we did[1].
 
  How about I make a saved jira query for people who have had jira's
 assigned
  to them, add a link to that query for our here are the contributors
  section, and then trim off from the role anyone who hasn't been assigned
 an
  issue in the last year?
 
 
  [1]: http://hbase.apache.org/team-list.html
 
 
 
  --
  Sean
 
 
 
 



 --
 Sean

[jira] [Created] (HBASE-13250) chown of ExportSnapshot does not cover all path and files

2015-03-16 Thread He Liangliang (JIRA)

He Liangliang created HBASE-13250:
-

 Summary: chown of ExportSnapshot does not cover all path and files
 Key: HBASE-13250
 URL: https://issues.apache.org/jira/browse/HBASE-13250
 Project: HBase
  Issue Type: Bug
Reporter: He Liangliang
Assignee: He Liangliang
Priority: Critical


The chuser/chgroup function only covers the leaf hfile. The ownership of hfile 
parent paths and snapshot reference files are not changed as expected.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-13249) Concurrency issue in SnapshotFileCache

2015-03-16 Thread He Liangliang (JIRA)

He Liangliang created HBASE-13249:
-

 Summary: Concurrency issue in SnapshotFileCache
 Key: HBASE-13249
 URL: https://issues.apache.org/jira/browse/HBASE-13249
 Project: HBase
  Issue Type: Bug
Reporter: He Liangliang
Assignee: He Liangliang


In refreshCache, if step 3 fails for some reason, the successive call may 
return success directly but the cache is already corrupt (got cleared in the 
previous failed call):
{quote}
// 1. update the modified time
this.lastModifiedTime = lastTimestamp;

// 2.clear the cache
this.cache.clear();
MapString, SnapshotDirectoryInfo known = new HashMapString, 
SnapshotDirectoryInfo();

// 3. check each of the snapshot directories
{quote}

This will cause files got deleted unexpectedly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Status of Huawei's 2' Indexing?

2015-03-16 Thread Michael Segel

Sigh. 
Here we go again… 

1) Complexity?

2) Speed when looking at the indexes in a more general case. 

3) Resources required to do the search become excessive...

... 

Again, your indexes will be orthogonal to the base table. 
If you can’t understand that… then you need to sit back, drink a few cocktails. 
(Burbon, Single Malts, craft beers, Vodka, whatever… ) AND THINK ABOUT THE 
PROBLEM.

I think that’s the biggest issue. You’re not thinking about the problem enough 
before you take hand to keyboard and bang out crappy code. 

To make it simple… so what you’re saying is that you want to have two indexes… 
one orthogonal to the base table so you can use it for table joins or when you 
want faster performance, and the second when you want an advanced filter on a 
region. (God only know why you would want that…) 

Seriously? 

Apply KISS and then get back to me.  


 On Mar 16, 2015, at 10:38 AM, Vladimir Rodionov vladrodio...@gmail.com 
 wrote:
 
 There is nothing wrong with co-locating index and data on a same RS. This
 will greatly improve single table search. Joins are evil anyway. Leave them
 to RDBMS Zoo.
 
 -Vlad
 
 
 On Mon, Mar 16, 2015 at 8:14 AM, Michael Segel michael_se...@hotmail.com
 wrote:
 
 You’ll have to excuse Andy.
 
 He’s a bit slow.  HBASE-13044 should have been done 2 years ago. And it
 was trivial. Just got done last month….
 
 But I digress… The long story short…
 
 HBASE-9203 was brain dead from inception.  Huawei’s idea was to index on
 the region which had two problems.
 1) Complexity in that they wanted to keep the index on the same region
 server
 2) Joins become impossible.  Well, actually not impossible, but incredibly
 slow when compared to the alternative.
 
 You really should go back to the email chain.
 Their defense (including Salesforce who was going to push this approach)
 fell apart when you asked the simple question on how do you handle joins?
 
 That’s their OOPS moment. Once you start to understand that, then allowing
 the index to be orthogonal to the base table, things started to come
 together.
 
 In short, you have a query either against a single table, or if you’re
 doing a join.  You then get the indexes and assuming that you’re only using
 the AND predicate, its a simple intersection of the index result sets.
 (Since the result sets are ordered, its relatively trivial to walk through
 and find the intersections of N Lists in a single pass.)
 
 
 Now you have your result set of base table row keys and you can work with
 that data. (Either returning the records to the client, or as input to a
 map/reduce job.
 
 That’s the 30K view.  There’s more to it, but once Salesforce got the
 basic idea, they ran with it. It was really that simple concept that the
 index would be orthogonal to the base table that got them moving in the
 right direction.
 
 
 To Joseph’s point, indexing isn’t necessarily an RDBMS feature. However,
 it seems that some of the Committers are suffering from rectal induced
 hypoxia. HBASE-12853 was created not just to help solve the issue of ‘hot
 spotting’ but also to get the Committers to focus on bringing the solutions
 that they glum on in the client, back to the server side of things.
 
 Unfortunately the last great attempt at fixing things on the server side
 was the bastardization of coprocessors which again, suffers from the lack
 of thought.  This isn’t to say that allowing users to extend the server
 side functionality is wrong. (Because it isn’t.) But that the
 implementation done in HBase is a tad lacking in thought.
 
 So in terms of indexing…
 Longer term picture, there has to be some fixes on the server side of
 things to allow one to associate an index (allowing for different types) to
 a base table, yet the implementation of using the index would end up
 becoming a client.  And by client, it would be an external query engine
 processor that could/should sit on the cluster.
 
 But hey! What do I know?
 I gave up trying to have an intelligent/civilized conversation with Andrew
 because he just couldn’t grasp the basics.  ;-)
 
 
 
 
 
 On Mar 13, 2015, at 4:14 PM, Andrew Purtell apurt...@apache.org wrote:
 
 When I made that remark I was thinking of a recent discussion we had at a
 joint Phoenix and HBase developer meetup. The difference of opinion was
 certainly civilized. (smile) I'm not aware of any specific written
 discussion, it may or may not exist. I'm pretty sure a revival of
 HBASE-9203
 would attract some controversy, but let me be clearer this time than I
 was
 before that this is just my opinion, FWIW.
 
 
 On Thu, Mar 12, 2015 at 3:58 PM, Rose, Joseph 
 joseph.r...@childrens.harvard.edu wrote:
 
 I saw that it was added to their project. I’m really not keen on
 bringing
 in all the RDBMS apparatus on top of hbase, so I decided to follow other
 avenues first (like trying to patch 0.98, for better or worse.)
 
 That Phoenix article seems like a good breakdown of the various indexing
 architectures.
 
 HBASE-9203

Re: Status of Huawei's 2' Indexing?

2015-03-16 Thread Andrew Purtell

I don't understand the repeated mention of Salesforce in that invective.
As point of fact the work of adding local mutable indexes to Phoenix was
done by a contributor from Huawei, who has since moved over to Hortonworks,
if I'm not mistaken - but not like affiliation matters, it really doesn't.

As for the rest, well I've had to give up on your like and respect, but I
picked up the pieces of my life a while back after we had that falling out
over coprocessors.


On Mon, Mar 16, 2015 at 8:14 AM, Michael Segel michael_se...@hotmail.com
wrote:

 You’ll have to excuse Andy.

 He’s a bit slow.  HBASE-13044 should have been done 2 years ago. And it
 was trivial. Just got done last month….

 But I digress… The long story short…

 HBASE-9203 was brain dead from inception.  Huawei’s idea was to index on
 the region which had two problems.
 1) Complexity in that they wanted to keep the index on the same region
 server
 2) Joins become impossible.  Well, actually not impossible, but incredibly
 slow when compared to the alternative.

 You really should go back to the email chain.
 Their defense (including Salesforce who was going to push this approach)
 fell apart when you asked the simple question on how do you handle joins?

 That’s their OOPS moment. Once you start to understand that, then allowing
 the index to be orthogonal to the base table, things started to come
 together.

 In short, you have a query either against a single table, or if you’re
 doing a join.  You then get the indexes and assuming that you’re only using
 the AND predicate, its a simple intersection of the index result sets.
 (Since the result sets are ordered, its relatively trivial to walk through
 and find the intersections of N Lists in a single pass.)


 Now you have your result set of base table row keys and you can work with
 that data. (Either returning the records to the client, or as input to a
 map/reduce job.

 That’s the 30K view.  There’s more to it, but once Salesforce got the
 basic idea, they ran with it. It was really that simple concept that the
 index would be orthogonal to the base table that got them moving in the
 right direction.


 To Joseph’s point, indexing isn’t necessarily an RDBMS feature. However,
 it seems that some of the Committers are suffering from rectal induced
 hypoxia. HBASE-12853 was created not just to help solve the issue of ‘hot
 spotting’ but also to get the Committers to focus on bringing the solutions
 that they glum on in the client, back to the server side of things.

 Unfortunately the last great attempt at fixing things on the server side
 was the bastardization of coprocessors which again, suffers from the lack
 of thought.  This isn’t to say that allowing users to extend the server
 side functionality is wrong. (Because it isn’t.) But that the
 implementation done in HBase is a tad lacking in thought.

 So in terms of indexing…
 Longer term picture, there has to be some fixes on the server side of
 things to allow one to associate an index (allowing for different types) to
 a base table, yet the implementation of using the index would end up
 becoming a client.  And by client, it would be an external query engine
 processor that could/should sit on the cluster.

 But hey! What do I know?
 I gave up trying to have an intelligent/civilized conversation with Andrew
 because he just couldn’t grasp the basics.  ;-)





  On Mar 13, 2015, at 4:14 PM, Andrew Purtell apurt...@apache.org wrote:
 
  When I made that remark I was thinking of a recent discussion we had at a
  joint Phoenix and HBase developer meetup. The difference of opinion was
  certainly civilized. (smile) I'm not aware of any specific written
  discussion, it may or may not exist. I'm pretty sure a revival of
 HBASE-9203
  would attract some controversy, but let me be clearer this time than I
 was
  before that this is just my opinion, FWIW.
 
 
  On Thu, Mar 12, 2015 at 3:58 PM, Rose, Joseph 
  joseph.r...@childrens.harvard.edu wrote:
 
  I saw that it was added to their project. I’m really not keen on
 bringing
  in all the RDBMS apparatus on top of hbase, so I decided to follow other
  avenues first (like trying to patch 0.98, for better or worse.)
 
  That Phoenix article seems like a good breakdown of the various indexing
  architectures.
 
  HBASE-9203 (the ticket that deals with 2’ indexes) is pretty civilized
 (as
  are most of them, it seems) so I didn’t know there were these
 differences
  of opinion. Did I miss the mailing list thread where the architectural
  differences were discussed?
 
 
  -j

 The opinions expressed here are mine, while they may reflect a cognitive
 thought, that is purely accidental.
 Use at your own risk.
 Michael Segel
 michael_segel (AT) hotmail.com








-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: Status of Huawei's 2' Indexing?

2015-03-16 Thread Stack

On Mon, Mar 16, 2015 at 8:14 AM, Michael Segel michael_se...@hotmail.com
wrote:

 You’ll have to excuse Andy.

 He’s a bit slow.


...


I gave up trying to have an intelligent/civilized conversation with Andrew
 because he just couldn’t grasp the basics.  ;-)



Michael:

Quit insult and ad hominem. Stick to the tech.

St.Ack












  On Mar 13, 2015, at 4:14 PM, Andrew Purtell apurt...@apache.org wrote:
 
  When I made that remark I was thinking of a recent discussion we had at a
  joint Phoenix and HBase developer meetup. The difference of opinion was
  certainly civilized. (smile) I'm not aware of any specific written
  discussion, it may or may not exist. I'm pretty sure a revival of
 HBASE-9203
  would attract some controversy, but let me be clearer this time than I
 was
  before that this is just my opinion, FWIW.
 
 
  On Thu, Mar 12, 2015 at 3:58 PM, Rose, Joseph 
  joseph.r...@childrens.harvard.edu wrote:
 
  I saw that it was added to their project. I’m really not keen on
 bringing
  in all the RDBMS apparatus on top of hbase, so I decided to follow other
  avenues first (like trying to patch 0.98, for better or worse.)
 
  That Phoenix article seems like a good breakdown of the various indexing
  architectures.
 
  HBASE-9203 (the ticket that deals with 2’ indexes) is pretty civilized
 (as
  are most of them, it seems) so I didn’t know there were these
 differences
  of opinion. Did I miss the mailing list thread where the architectural
  differences were discussed?
 
 
  -j

 The opinions expressed here are mine, while they may reflect a cognitive
 thought, that is purely accidental.
 Use at your own risk.
 Michael Segel
 michael_segel (AT) hotmail.com

Re: Status of Huawei's 2' Indexing?

2015-03-16 Thread Michael Segel

You’ll have to excuse Andy. 

He’s a bit slow.  HBASE-13044 should have been done 2 years ago. And it was 
trivial. Just got done last month…. 

But I digress… The long story short… 

HBASE-9203 was brain dead from inception.  Huawei’s idea was to index on the 
region which had two problems. 
1) Complexity in that they wanted to keep the index on the same region server
2) Joins become impossible.  Well, actually not impossible, but incredibly slow 
when compared to the alternative. 

You really should go back to the email chain. 
Their defense (including Salesforce who was going to push this approach) fell 
apart when you asked the simple question on how do you handle joins? 

That’s their OOPS moment. Once you start to understand that, then allowing the 
index to be orthogonal to the base table, things started to come together. 

In short, you have a query either against a single table, or if you’re doing a 
join.  You then get the indexes and assuming that you’re only using the AND 
predicate, its a simple intersection of the index result sets. (Since the 
result sets are ordered, its relatively trivial to walk through and find the 
intersections of N Lists in a single pass.) 


Now you have your result set of base table row keys and you can work with that 
data. (Either returning the records to the client, or as input to a map/reduce 
job. 

That’s the 30K view.  There’s more to it, but once Salesforce got the basic 
idea, they ran with it. It was really that simple concept that the index would 
be orthogonal to the base table that got them moving in the right direction. 


To Joseph’s point, indexing isn’t necessarily an RDBMS feature. However, it 
seems that some of the Committers are suffering from rectal induced hypoxia. 
HBASE-12853 was created not just to help solve the issue of ‘hot spotting’ but 
also to get the Committers to focus on bringing the solutions that they glum on 
in the client, back to the server side of things. 

Unfortunately the last great attempt at fixing things on the server side was 
the bastardization of coprocessors which again, suffers from the lack of 
thought.  This isn’t to say that allowing users to extend the server side 
functionality is wrong. (Because it isn’t.) But that the implementation done in 
HBase is a tad lacking in thought. 

So in terms of indexing… 
Longer term picture, there has to be some fixes on the server side of things to 
allow one to associate an index (allowing for different types) to a base table, 
yet the implementation of using the index would end up becoming a client.  And 
by client, it would be an external query engine processor that could/should sit 
on the cluster. 

But hey! What do I know? 
I gave up trying to have an intelligent/civilized conversation with Andrew 
because he just couldn’t grasp the basics.  ;-) 





 On Mar 13, 2015, at 4:14 PM, Andrew Purtell apurt...@apache.org wrote:
 
 When I made that remark I was thinking of a recent discussion we had at a
 joint Phoenix and HBase developer meetup. The difference of opinion was
 certainly civilized. (smile) I'm not aware of any specific written
 discussion, it may or may not exist. I'm pretty sure a revival of HBASE-9203
 would attract some controversy, but let me be clearer this time than I was
 before that this is just my opinion, FWIW.
 
 
 On Thu, Mar 12, 2015 at 3:58 PM, Rose, Joseph 
 joseph.r...@childrens.harvard.edu wrote:
 
 I saw that it was added to their project. I’m really not keen on bringing
 in all the RDBMS apparatus on top of hbase, so I decided to follow other
 avenues first (like trying to patch 0.98, for better or worse.)
 
 That Phoenix article seems like a good breakdown of the various indexing
 architectures.
 
 HBASE-9203 (the ticket that deals with 2’ indexes) is pretty civilized (as
 are most of them, it seems) so I didn’t know there were these differences
 of opinion. Did I miss the mailing list thread where the architectural
 differences were discussed?
 
 
 -j

The opinions expressed here are mine, while they may reflect a cognitive 
thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: Status of Huawei's 2' Indexing?

2015-03-16 Thread Rose, Joseph

Michael,

I don’t understand the invective. I’m sure you have something to
contribute but when bring on this tone the only thing I hear are the snide
comments.


-j


P.s., I’ll refer you to this: https://hbase.apache.org/book.html#_joins


On 3/16/15, 11:15 AM, Michael Segel michael_se...@hotmail.com wrote:

You’ll have to excuse Andy.

He’s a bit slow.  HBASE-13044 should have been done 2 years ago. And it
was trivial. Just got done last month….

But I digress… The long story short…

HBASE-9203 was brain dead from inception.  Huawei’s idea was to index on
the region which had two problems.
1) Complexity in that they wanted to keep the index on the same region
server
2) Joins become impossible.  Well, actually not impossible, but
incredibly slow when compared to the alternative.

You really should go back to the email chain.
Their defense (including Salesforce who was going to push this approach)
fell apart when you asked the simple question on how do you handle joins?

That’s their OOPS moment. Once you start to understand that, then
allowing the index to be orthogonal to the base table, things started to
come together. 

In short, you have a query either against a single table, or if you’re
doing a join.  You then get the indexes and assuming that you’re only
using the AND predicate, its a simple intersection of the index result
sets. (Since the result sets are ordered, its relatively trivial to walk
through and find the intersections of N Lists in a single pass.)


Now you have your result set of base table row keys and you can work with
that data. (Either returning the records to the client, or as input to a
map/reduce job. 

That’s the 30K view.  There’s more to it, but once Salesforce got the
basic idea, they ran with it. It was really that simple concept that the
index would be orthogonal to the base table that got them moving in the
right direction. 


To Joseph’s point, indexing isn’t necessarily an RDBMS feature. However,
it seems that some of the Committers are suffering from rectal induced
hypoxia. HBASE-12853 was created not just to help solve the issue of ‘hot
spotting’ but also to get the Committers to focus on bringing the
solutions that they glum on in the client, back to the server side of
things. 

Unfortunately the last great attempt at fixing things on the server side
was the bastardization of coprocessors which again, suffers from the lack
of thought.  This isn’t to say that allowing users to extend the server
side functionality is wrong. (Because it isn’t.) But that the
implementation done in HBase is a tad lacking in thought.

So in terms of indexing…
Longer term picture, there has to be some fixes on the server side of
things to allow one to associate an index (allowing for different types)
to a base table, yet the implementation of using the index would end up
becoming a client.  And by client, it would be an external query engine
processor that could/should sit on the cluster.

But hey! What do I know?
I gave up trying to have an intelligent/civilized conversation with
Andrew because he just couldn’t grasp the basics.  ;-)





 On Mar 13, 2015, at 4:14 PM, Andrew Purtell apurt...@apache.org wrote:
 
 When I made that remark I was thinking of a recent discussion we had at
a
 joint Phoenix and HBase developer meetup. The difference of opinion was
 certainly civilized. (smile) I'm not aware of any specific written
 discussion, it may or may not exist. I'm pretty sure a revival of
HBASE-9203
 would attract some controversy, but let me be clearer this time than I
was
 before that this is just my opinion, FWIW.
 
 
 On Thu, Mar 12, 2015 at 3:58 PM, Rose, Joseph 
 joseph.r...@childrens.harvard.edu wrote:
 
 I saw that it was added to their project. I’m really not keen on
bringing
 in all the RDBMS apparatus on top of hbase, so I decided to follow
other
 avenues first (like trying to patch 0.98, for better or worse.)
 
 That Phoenix article seems like a good breakdown of the various
indexing
 architectures.
 
 HBASE-9203 (the ticket that deals with 2’ indexes) is pretty civilized
(as
 are most of them, it seems) so I didn’t know there were these
differences
 of opinion. Did I miss the mailing list thread where the architectural
 differences were discussed?
 
 
 -j

The opinions expressed here are mine, while they may reflect a cognitive
thought, that is purely accidental.
Use at your own risk.
Michael Segel
michael_segel (AT) hotmail.com

Re: Status of Huawei's 2' Indexing?

2015-03-16 Thread Vladimir Rodionov

There is nothing wrong with co-locating index and data on a same RS. This
will greatly improve single table search. Joins are evil anyway. Leave them
to RDBMS Zoo.

-Vlad


On Mon, Mar 16, 2015 at 8:14 AM, Michael Segel michael_se...@hotmail.com
wrote:

 You’ll have to excuse Andy.

 He’s a bit slow.  HBASE-13044 should have been done 2 years ago. And it
 was trivial. Just got done last month….

 But I digress… The long story short…

 HBASE-9203 was brain dead from inception.  Huawei’s idea was to index on
 the region which had two problems.
 1) Complexity in that they wanted to keep the index on the same region
 server
 2) Joins become impossible.  Well, actually not impossible, but incredibly
 slow when compared to the alternative.

 You really should go back to the email chain.
 Their defense (including Salesforce who was going to push this approach)
 fell apart when you asked the simple question on how do you handle joins?

 That’s their OOPS moment. Once you start to understand that, then allowing
 the index to be orthogonal to the base table, things started to come
 together.

 In short, you have a query either against a single table, or if you’re
 doing a join.  You then get the indexes and assuming that you’re only using
 the AND predicate, its a simple intersection of the index result sets.
 (Since the result sets are ordered, its relatively trivial to walk through
 and find the intersections of N Lists in a single pass.)


 Now you have your result set of base table row keys and you can work with
 that data. (Either returning the records to the client, or as input to a
 map/reduce job.

 That’s the 30K view.  There’s more to it, but once Salesforce got the
 basic idea, they ran with it. It was really that simple concept that the
 index would be orthogonal to the base table that got them moving in the
 right direction.


 To Joseph’s point, indexing isn’t necessarily an RDBMS feature. However,
 it seems that some of the Committers are suffering from rectal induced
 hypoxia. HBASE-12853 was created not just to help solve the issue of ‘hot
 spotting’ but also to get the Committers to focus on bringing the solutions
 that they glum on in the client, back to the server side of things.

 Unfortunately the last great attempt at fixing things on the server side
 was the bastardization of coprocessors which again, suffers from the lack
 of thought.  This isn’t to say that allowing users to extend the server
 side functionality is wrong. (Because it isn’t.) But that the
 implementation done in HBase is a tad lacking in thought.

 So in terms of indexing…
 Longer term picture, there has to be some fixes on the server side of
 things to allow one to associate an index (allowing for different types) to
 a base table, yet the implementation of using the index would end up
 becoming a client.  And by client, it would be an external query engine
 processor that could/should sit on the cluster.

 But hey! What do I know?
 I gave up trying to have an intelligent/civilized conversation with Andrew
 because he just couldn’t grasp the basics.  ;-)





  On Mar 13, 2015, at 4:14 PM, Andrew Purtell apurt...@apache.org wrote:
 
  When I made that remark I was thinking of a recent discussion we had at a
  joint Phoenix and HBase developer meetup. The difference of opinion was
  certainly civilized. (smile) I'm not aware of any specific written
  discussion, it may or may not exist. I'm pretty sure a revival of
 HBASE-9203
  would attract some controversy, but let me be clearer this time than I
 was
  before that this is just my opinion, FWIW.
 
 
  On Thu, Mar 12, 2015 at 3:58 PM, Rose, Joseph 
  joseph.r...@childrens.harvard.edu wrote:
 
  I saw that it was added to their project. I’m really not keen on
 bringing
  in all the RDBMS apparatus on top of hbase, so I decided to follow other
  avenues first (like trying to patch 0.98, for better or worse.)
 
  That Phoenix article seems like a good breakdown of the various indexing
  architectures.
 
  HBASE-9203 (the ticket that deals with 2’ indexes) is pretty civilized
 (as
  are most of them, it seems) so I didn’t know there were these
 differences
  of opinion. Did I miss the mailing list thread where the architectural
  differences were discussed?
 
 
  -j

 The opinions expressed here are mine, while they may reflect a cognitive
 thought, that is purely accidental.
 Use at your own risk.
 Michael Segel
 michael_segel (AT) hotmail.com

Re: Status of Huawei's 2' Indexing?

2015-03-16 Thread Michael Segel

Andrew, because 2+ years ago,  Phoenix wasn’t an Apache project. 

At the time, Huawei was releasing their research on it and Salesforce was 
implementing it. 
I mention the company names because those were the parties involved in the work 
as well as the discussion. Also those companies are mentioned in a lot of the 
earlier documentation. 

What pretty much ended those conversations is when I asked “How do you handle 
table Joins?”. And again since Phoenix was a Salesforce.com 
http://salesforce.com/ project at the time, his response was that Phoenix 
doesn’t do table joins.  (Which they supposedly do now…) 

I would have gone further to mention Informix’s XPS Distributed Relational 
Database, however the last time I talked about some of the lessons learned from 
the RDBMS advances done back in the 90’s you seemed to have an issue with it.  
Of course there we were talking about coprocessors and I compared it to the 
extensibility done to RDBSs and what worked and what didn’t.  The irony is that 
Mike Olson who was part of Illustra is now at Cloudera. (And Informix 
eventually got it right)


Its very disappointing that this issue has been raised again. Once you talk 
about table Joins the index is orthogonal to the base table and the argument 
becomes moot. 
Add to this using a different type of index, or allowing multiple indexes to 
the base table and you now have the issue of column families all over again, 
but in spades. Again this makes the Huawei’s idea unworkable.

It would even be pointless to try and hold a discussion on what should happen 
client side and what should happen server side to support indexes. 

My suggestion is that when you think you have an answer, stop, go get a few 
drinks and spend more time thinking about your answer. 

Later


 On Mar 16, 2015, at 11:41 AM, Andrew Purtell apurt...@apache.org wrote:
 
 I don't understand the repeated mention of Salesforce in that invective.
 As point of fact the work of adding local mutable indexes to Phoenix was
 done by a contributor from Huawei, who has since moved over to Hortonworks,
 if I'm not mistaken - but not like affiliation matters, it really doesn't.
 
 As for the rest, well I've had to give up on your like and respect, but I
 picked up the pieces of my life a while back after we had that falling out
 over coprocessors.
 
 
 On Mon, Mar 16, 2015 at 8:14 AM, Michael Segel michael_se...@hotmail.com
 wrote:
 
 You’ll have to excuse Andy.
 
 He’s a bit slow.  HBASE-13044 should have been done 2 years ago. And it
 was trivial. Just got done last month….
 
 But I digress… The long story short…
 
 HBASE-9203 was brain dead from inception.  Huawei’s idea was to index on
 the region which had two problems.
 1) Complexity in that they wanted to keep the index on the same region
 server
 2) Joins become impossible.  Well, actually not impossible, but incredibly
 slow when compared to the alternative.
 
 You really should go back to the email chain.
 Their defense (including Salesforce who was going to push this approach)
 fell apart when you asked the simple question on how do you handle joins?
 
 That’s their OOPS moment. Once you start to understand that, then allowing
 the index to be orthogonal to the base table, things started to come
 together.
 
 In short, you have a query either against a single table, or if you’re
 doing a join.  You then get the indexes and assuming that you’re only using
 the AND predicate, its a simple intersection of the index result sets.
 (Since the result sets are ordered, its relatively trivial to walk through
 and find the intersections of N Lists in a single pass.)
 
 
 Now you have your result set of base table row keys and you can work with
 that data. (Either returning the records to the client, or as input to a
 map/reduce job.
 
 That’s the 30K view.  There’s more to it, but once Salesforce got the
 basic idea, they ran with it. It was really that simple concept that the
 index would be orthogonal to the base table that got them moving in the
 right direction.
 
 
 To Joseph’s point, indexing isn’t necessarily an RDBMS feature. However,
 it seems that some of the Committers are suffering from rectal induced
 hypoxia. HBASE-12853 was created not just to help solve the issue of ‘hot
 spotting’ but also to get the Committers to focus on bringing the solutions
 that they glum on in the client, back to the server side of things.
 
 Unfortunately the last great attempt at fixing things on the server side
 was the bastardization of coprocessors which again, suffers from the lack
 of thought.  This isn’t to say that allowing users to extend the server
 side functionality is wrong. (Because it isn’t.) But that the
 implementation done in HBase is a tad lacking in thought.
 
 So in terms of indexing…
 Longer term picture, there has to be some fixes on the server side of
 things to allow one to associate an index (allowing for different types) to
 a base table, yet the implementation of using the index would

Re: Jira role cleanup

2015-03-16 Thread Andrew Purtell

 I think Jira management should be left to the committers. One can pretty
much mess up a release, and make it hard to account for what's in and
what's not when jiras are changed the around (the ultimate truth can be
reconstructed from the git commit records, but that's tedious).

I agree we should avoid allowing contributors to change JIRA metadata if
this is possible to restrict. Our commit log conventions aren't universally
followed, due to human error, so they are not all tagged with issue
identifiers, or the correct identifier.



On Sun, Mar 15, 2015 at 11:12 PM, lars hofhansl la...@apache.org wrote:

 Hmm... This is interesting. I think Jira management should be left to the
 committers. One can pretty much mess up a release, and make it hard to
 account for what's in and what's not when jiras are changed the around (the
 ultimate truth can be reconstructed from the git commit records, but that's
 tedious).
 Minimally somebody needs to be able to assign a jira to the person
 providing the patch, if those are committers only that's tedious but OK -
 we've been doing that anyway.Ideally the person could assign an _open_
 issue to him/herself and log work against an issue and change the due data.
 Those seem abilities we could grant to everybody as long as they are
 limited to open issues.
 Beyond that I agree that we should limit this to a known set of people
 (the contributors). Maybe discuss this briefly at the next PMC meeting,
 we're due to have one anyway. I'm willing to host one at Salesforce.

 -- Lars

   From: Sean Busbey bus...@cloudera.com
  To: dev dev@hbase.apache.org; lars hofhansl la...@apache.org
  Sent: Sunday, March 15, 2015 9:46 PM
  Subject: Re: Jira role cleanup

 I can make it so that issues can be assigned to non-contributors. Even if
 we don't do that, I believe jira permissions are all about constraining
 current actions, and are not enforced on existing ticketes.

 However, the contributor role in jira has several other abilities
 associated with it. Right now, in the order they appear in jira:

 * edit an issue's due date
 * move issues (between project workflows or projects the user has create
 on)
 * assign issues to other people
 * resolve and reopen issues, assign a fix version (but not close them)
 * manage watchers on an issue
 * log work against an issue

 Any of these could also be changed to remove contributors or allow wider
 jira users.

 If assignable users can assign to themselves when they don't have the
 assign users permission, then the only one I think we use is resolve and
 reopen issues. And I don't think I'd want that open to all jira users.

 Do we want to have to handle marking issues resolved for folks? It makes
 sense to me, since I usually do that once I push the commit.





 On Sun, Mar 15, 2015 at 11:07 PM, lars hofhansl la...@apache.org wrote:

  Not sure what jira does about an assignee when (s)he is removed from the
  contributors list (I know you have to add a person to the contributors
 list
  order to assign a jira to them).Other than the committers, we probably
 have
  at least one jira assigned to a contributor (otherwise why add him/her as
  contributor).
  Can we change the jira rules in our space to allow assigning jiras to
  users even when they're not listed as contributors?
  We do not have a formal contributor status (why not?), so this list is
  only needed because of jira.
  -- Lars
 
   From: Sean Busbey bus...@cloudera.com
   To: dev dev@hbase.apache.org
   Sent: Friday, March 13, 2015 9:09 AM
   Subject: Re: Jira role cleanup
 
  On Fri, Mar 13, 2015 at 11:01 AM, Andrew Purtell apurt...@apache.org
  wrote:
 
   +1
   I think it would be fine to trim the contributor list too. We can
 always
   add people back on demand in order to (re)assign issues.
  
  
  I wasn't sure how we generate the list of contributors. But then I
 noticed
  that we don't link to jira for it like I thought we did[1].
 
  How about I make a saved jira query for people who have had jira's
 assigned
  to them, add a link to that query for our here are the contributors
  section, and then trim off from the role anyone who hasn't been assigned
 an
  issue in the last year?
 
 
  [1]: http://hbase.apache.org/team-list.html
 
 
 
  --
  Sean
 
 
 
 



 --
 Sean







-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: Rough goal timelines for 1.1 and 2.0

2015-03-16 Thread Andrew Purtell

And, now we have *two* volunteers for RM for 1.1. (Nick and myself). Let's
take that as interest in getting it done and do it.  As far as I'm
concerned, it's all yours Nick, have at it!


On Mon, Mar 16, 2015 at 10:30 AM, Nick Dimiduk ndimi...@gmail.com wrote:

 I think we can learn a lesson or two from the vendor marketing machines --
 a release timed with HBaseCon would be ideal in this regard. My obligations
 to the event are minimal, so I'm willing to volunteer as RM for 1.1. Do we
 think we can make some of these decisions in time for spinning RC's in
 mid-April? That's just about a month away.

 -n

 On Sat, Mar 14, 2015 at 10:37 AM, Elliott Clark ecl...@apache.org wrote:

  I'm most looking forward to rpc quotas and the buffer improvements that
  stack has put in. So for me getting a 1.1 in May 1 would be cool.
  That would allow us to talk about what was just released at HBaseCon, and
  maybe even have 1.1.0 in production at places.
 
  On Fri, Mar 13, 2015 at 11:44 AM, Sean Busbey bus...@cloudera.com
 wrote:
 
   The only reason I can think of to make decisions now would be if we
 want
  to
   ensure we have consensus for the changes for Phoenix and enough time to
   implement them.
  
   Given that AFAIK it's those changes that'll drive having a 1.1 release,
   seems prudent. But I haven't been tracking the changes lately.
  
   I think we're all in agreement that something needs to be done, and
 that
   HBase 1.1 and Phoenix 5 are the places to do it. Probably it won't be
   contentious to just decide as changes are ready?
  
   --
   Sean
   On Mar 13, 2015 1:28 PM, Andrew Purtell apurt...@apache.org wrote:
  
That was my question.. We can discuss them independently? Or is
 there a
reason not to?
   
On Fri, Mar 13, 2015 at 11:10 AM, Sean Busbey bus...@cloudera.com
   wrote:
   
 On Fri, Mar 13, 2015 at 12:31 PM, Andrew Purtell 
  apurt...@apache.org
 wrote:

  Do we need to couple decisions for 1.1 and 2.0 in the same
   discussion?
 
 
 Like what? Interface changes for Phoenix maybe?

 --
 Sean

   
   
   
--
Best regards,
   
   - Andy
   
Problems worthy of attack prove their worth by hitting back. - Piet
  Hein
(via Tom White)
   
  
 




-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: Jira role cleanup

2015-03-16 Thread Sean Busbey

Okay, it sounds like there's decent consensus. How much of this cleanup can
I take care of before the PMC meets?

Everyone fine if I do the earlier pruning we talked about and look into the
anyone is assignable bit?



On Mon, Mar 16, 2015 at 12:38 PM, Andrew Purtell apurt...@apache.org
wrote:

  I think Jira management should be left to the committers. One can pretty
 much mess up a release, and make it hard to account for what's in and
 what's not when jiras are changed the around (the ultimate truth can be
 reconstructed from the git commit records, but that's tedious).

 I agree we should avoid allowing contributors to change JIRA metadata if
 this is possible to restrict. Our commit log conventions aren't universally
 followed, due to human error, so they are not all tagged with issue
 identifiers, or the correct identifier.



I'm conflicted on this bit. Incrementally giving people more responsibility
is our best path to good decisions wrt new committers. I think it makes
sense to not give every new contributor the ability to set fix versions. On
the other hand, I'm sure there will be folks that I trust to accurately set
that metadata before they become committers.  And what about folks who
contribute by helping to clean up said jira metadata?

When y'all discuss this, could someone please advocate a middle ground
where we no longer default to all contributors get metadata edit rights,
but we maintain a jira role that can be granted at the discretion of
existing jira admins (or commiters, or PMC or whatever y'all are
comfortable with)?

-- 
Sean

Re: Question on EnableTableHandler code

2015-03-16 Thread Andrey Stepachev

Stephen, would you like to create jira for case (1)?

Thank you.

On Mon, Mar 16, 2015 at 5:58 PM, Andrey Stepachev oct...@gmail.com wrote:

 Thanks Stephen.

 Looks like you are right. For (1) case we really don't need there
 state cleanup. That is a bug. Should throw TableNotFoundException.

 As for (2) in case of no online region servers available we could
 leave table enabled, but no regions would be assigned.

 Actually that rises good question what enable table means,
 i.e. do we really need to guarantee that on table enable absolutely
 all regions are online, or that could be done in Admin on client side.

 So for now it seems that Enable handler do what is best, and leave
 table enabled but unassigned to be later assigned by Balancer.

 On Mon, Mar 16, 2015 at 5:34 PM, Stephen Jiang syuanjiang...@gmail.com
 wrote:

 I want to make sure that the following logic in EnableTableHandler is
 correct:

 (1). In EnableTableHandler#prepare - if the table is not existed, it
 marked
 the table as deleted and not throw exception.  The result is the table
 lock
 is released and the caller has no knowledge that the table not exist or
 already deleted, it would continue the next step.

 Currently, this would happen during recovery (the caller is
 AssignmentManager#recoverTableInEnablingState()) - however, looking at
 recovery code, it expects TableNotFoundException  Should we always throw
 exception - if the table not exist?  I want to make sure that I don't
 break
 recovery logic by modifying.

 public EnableTableHandler prepare() {

 ...

 // Check if table exists

   if (!MetaTableAccessor.tableExists(this.server.getConnection(),
 tableName)) {

 // retainAssignment is true only during recovery.  In normal case
 it is false

 if (!this.skipTableStateCheck) {

   throw new TableNotFoundException(tableName);

 }

 this.assignmentManager.getTableStateManager().setDeletedTable(
 tableName);

   }

 ...

 }
 (2). In EnableTableHandler#handleEnableTable() - if the bulk assign plan
 could not be find, it would leave regions to be offline and declare enable
 table succeed - i think this is a bug and we should retry or fail - but I
 want to make sure that there are some merit behind this logic

   private void handleEnableTable() {

 MapServerName, ListHRegionInfo bulkPlan =

 this.assignmentManager.getBalancer().retainAssignment(
 regionsToAssign, onlineServers);

 if (bulkPlan != null) {

   ...

   } else {

   LOG.info(Balancer was unable to find suitable servers for table  +
 tableName

   + , leaving unassigned);

   done = true;

 }

 if (done) {

   // Flip the table to enabled.

   this.assignmentManager.getTableStateManager().setTableState(

 this.tableName, TableState.State.ENABLED);

   LOG.info(Table ' + this.tableName

   + ' was successfully enabled. Status: done= + done);

}

  ...

 }


 thanks
 Stephen




 --
 Andrey.




-- 
Andrey.

Re: Rough goal timelines for 1.1 and 2.0

2015-03-16 Thread Andrew Purtell

FWIW, the Region proposal (HBASE-12972) is ready for review. The companion
issue for SplitTransaction and RegionMergeTransaction (HBASE-12975) needs
more discussion but could be ready to go in a = one month timeframe.

On Mon, Mar 16, 2015 at 10:30 AM, Nick Dimiduk ndimi...@gmail.com wrote:

 I think we can learn a lesson or two from the vendor marketing machines --
 a release timed with HBaseCon would be ideal in this regard. My obligations
 to the event are minimal, so I'm willing to volunteer as RM for 1.1. Do we
 think we can make some of these decisions in time for spinning RC's in
 mid-April? That's just about a month away.

 -n

 On Sat, Mar 14, 2015 at 10:37 AM, Elliott Clark ecl...@apache.org wrote:

  I'm most looking forward to rpc quotas and the buffer improvements that
  stack has put in. So for me getting a 1.1 in May 1 would be cool.
  That would allow us to talk about what was just released at HBaseCon, and
  maybe even have 1.1.0 in production at places.
 
  On Fri, Mar 13, 2015 at 11:44 AM, Sean Busbey bus...@cloudera.com
 wrote:
 
   The only reason I can think of to make decisions now would be if we
 want
  to
   ensure we have consensus for the changes for Phoenix and enough time to
   implement them.
  
   Given that AFAIK it's those changes that'll drive having a 1.1 release,
   seems prudent. But I haven't been tracking the changes lately.
  
   I think we're all in agreement that something needs to be done, and
 that
   HBase 1.1 and Phoenix 5 are the places to do it. Probably it won't be
   contentious to just decide as changes are ready?
  
   --
   Sean
   On Mar 13, 2015 1:28 PM, Andrew Purtell apurt...@apache.org wrote:
  
That was my question.. We can discuss them independently? Or is
 there a
reason not to?
   
On Fri, Mar 13, 2015 at 11:10 AM, Sean Busbey bus...@cloudera.com
   wrote:
   
 On Fri, Mar 13, 2015 at 12:31 PM, Andrew Purtell 
  apurt...@apache.org
 wrote:

  Do we need to couple decisions for 1.1 and 2.0 in the same
   discussion?
 
 
 Like what? Interface changes for Phoenix maybe?

 --
 Sean

   
   
   
--
Best regards,
   
   - Andy
   
Problems worthy of attack prove their worth by hitting back. - Piet
  Hein
(via Tom White)
   
  
 




-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: Question on EnableTableHandler code

2015-03-16 Thread Stephen Jiang

Andrey, I will take care of (1).

And (2) :-) if your guys agree.  Because it is not consistent, if the bulk
assigned failed, we would fail the enabling table; however, if the bulk
assign not starts, we would enable table with offline regions - really
inconsistent - we either all fail in those scenarios or all succeed with
offline region (best effort approach).

Thanks
Stephen

On Mon, Mar 16, 2015 at 11:01 AM, Andrey Stepachev oct...@gmail.com wrote:

 Stephen, would you like to create jira for case (1)?

 Thank you.

 On Mon, Mar 16, 2015 at 5:58 PM, Andrey Stepachev oct...@gmail.com
 wrote:

  Thanks Stephen.
 
  Looks like you are right. For (1) case we really don't need there
  state cleanup. That is a bug. Should throw TableNotFoundException.
 
  As for (2) in case of no online region servers available we could
  leave table enabled, but no regions would be assigned.
 
  Actually that rises good question what enable table means,
  i.e. do we really need to guarantee that on table enable absolutely
  all regions are online, or that could be done in Admin on client side.
 
  So for now it seems that Enable handler do what is best, and leave
  table enabled but unassigned to be later assigned by Balancer.
 
  On Mon, Mar 16, 2015 at 5:34 PM, Stephen Jiang syuanjiang...@gmail.com
  wrote:
 
  I want to make sure that the following logic in EnableTableHandler is
  correct:
 
  (1). In EnableTableHandler#prepare - if the table is not existed, it
  marked
  the table as deleted and not throw exception.  The result is the table
  lock
  is released and the caller has no knowledge that the table not exist or
  already deleted, it would continue the next step.
 
  Currently, this would happen during recovery (the caller is
  AssignmentManager#recoverTableInEnablingState()) - however, looking at
  recovery code, it expects TableNotFoundException  Should we always throw
  exception - if the table not exist?  I want to make sure that I don't
  break
  recovery logic by modifying.
 
  public EnableTableHandler prepare() {
 
  ...
 
  // Check if table exists
 
if (!MetaTableAccessor.tableExists(this.server.getConnection(),
  tableName)) {
 
  // retainAssignment is true only during recovery.  In normal
 case
  it is false
 
  if (!this.skipTableStateCheck) {
 
throw new TableNotFoundException(tableName);
 
  }
 
  this.assignmentManager.getTableStateManager().setDeletedTable(
  tableName);
 
}
 
  ...
 
  }
  (2). In EnableTableHandler#handleEnableTable() - if the bulk assign plan
  could not be find, it would leave regions to be offline and declare
 enable
  table succeed - i think this is a bug and we should retry or fail - but
 I
  want to make sure that there are some merit behind this logic
 
private void handleEnableTable() {
 
  MapServerName, ListHRegionInfo bulkPlan =
 
  this.assignmentManager.getBalancer().retainAssignment(
  regionsToAssign, onlineServers);
 
  if (bulkPlan != null) {
 
...
 
} else {
 
LOG.info(Balancer was unable to find suitable servers for table
  +
  tableName
 
+ , leaving unassigned);
 
done = true;
 
  }
 
  if (done) {
 
// Flip the table to enabled.
 
this.assignmentManager.getTableStateManager().setTableState(
 
  this.tableName, TableState.State.ENABLED);
 
LOG.info(Table ' + this.tableName
 
+ ' was successfully enabled. Status: done= + done);
 
 }
 
   ...
 
  }
 
 
  thanks
  Stephen
 
 
 
 
  --
  Andrey.
 



 --
 Andrey.

Re: Rough goal timelines for 1.1 and 2.0

2015-03-16 Thread Nick Dimiduk

I think we can learn a lesson or two from the vendor marketing machines --
a release timed with HBaseCon would be ideal in this regard. My obligations
to the event are minimal, so I'm willing to volunteer as RM for 1.1. Do we
think we can make some of these decisions in time for spinning RC's in
mid-April? That's just about a month away.

-n

On Sat, Mar 14, 2015 at 10:37 AM, Elliott Clark ecl...@apache.org wrote:

 I'm most looking forward to rpc quotas and the buffer improvements that
 stack has put in. So for me getting a 1.1 in May 1 would be cool.
 That would allow us to talk about what was just released at HBaseCon, and
 maybe even have 1.1.0 in production at places.

 On Fri, Mar 13, 2015 at 11:44 AM, Sean Busbey bus...@cloudera.com wrote:

  The only reason I can think of to make decisions now would be if we want
 to
  ensure we have consensus for the changes for Phoenix and enough time to
  implement them.
 
  Given that AFAIK it's those changes that'll drive having a 1.1 release,
  seems prudent. But I haven't been tracking the changes lately.
 
  I think we're all in agreement that something needs to be done, and that
  HBase 1.1 and Phoenix 5 are the places to do it. Probably it won't be
  contentious to just decide as changes are ready?
 
  --
  Sean
  On Mar 13, 2015 1:28 PM, Andrew Purtell apurt...@apache.org wrote:
 
   That was my question.. We can discuss them independently? Or is there a
   reason not to?
  
   On Fri, Mar 13, 2015 at 11:10 AM, Sean Busbey bus...@cloudera.com
  wrote:
  
On Fri, Mar 13, 2015 at 12:31 PM, Andrew Purtell 
 apurt...@apache.org
wrote:
   
 Do we need to couple decisions for 1.1 and 2.0 in the same
  discussion?


Like what? Interface changes for Phoenix maybe?
   
--
Sean
   
  
  
  
   --
   Best regards,
  
  - Andy
  
   Problems worthy of attack prove their worth by hitting back. - Piet
 Hein
   (via Tom White)

Re: Question on EnableTableHandler code

2015-03-16 Thread Andrey Stepachev

Thanks Stephen.

Looks like you are right. For (1) case we really don't need there
state cleanup. That is a bug. Should throw TableNotFoundException.

As for (2) in case of no online region servers available we could
leave table enabled, but no regions would be assigned.

Actually that rises good question what enable table means,
i.e. do we really need to guarantee that on table enable absolutely
all regions are online, or that could be done in Admin on client side.

So for now it seems that Enable handler do what is best, and leave
table enabled but unassigned to be later assigned by Balancer.

On Mon, Mar 16, 2015 at 5:34 PM, Stephen Jiang syuanjiang...@gmail.com
wrote:

 I want to make sure that the following logic in EnableTableHandler is
 correct:

 (1). In EnableTableHandler#prepare - if the table is not existed, it marked
 the table as deleted and not throw exception.  The result is the table lock
 is released and the caller has no knowledge that the table not exist or
 already deleted, it would continue the next step.

 Currently, this would happen during recovery (the caller is
 AssignmentManager#recoverTableInEnablingState()) - however, looking at
 recovery code, it expects TableNotFoundException  Should we always throw
 exception - if the table not exist?  I want to make sure that I don't break
 recovery logic by modifying.

 public EnableTableHandler prepare() {

 ...

 // Check if table exists

   if (!MetaTableAccessor.tableExists(this.server.getConnection(),
 tableName)) {

 // retainAssignment is true only during recovery.  In normal case
 it is false

 if (!this.skipTableStateCheck) {

   throw new TableNotFoundException(tableName);

 }

 this.assignmentManager.getTableStateManager().setDeletedTable(
 tableName);

   }

 ...

 }
 (2). In EnableTableHandler#handleEnableTable() - if the bulk assign plan
 could not be find, it would leave regions to be offline and declare enable
 table succeed - i think this is a bug and we should retry or fail - but I
 want to make sure that there are some merit behind this logic

   private void handleEnableTable() {

 MapServerName, ListHRegionInfo bulkPlan =

 this.assignmentManager.getBalancer().retainAssignment(
 regionsToAssign, onlineServers);

 if (bulkPlan != null) {

   ...

   } else {

   LOG.info(Balancer was unable to find suitable servers for table  +
 tableName

   + , leaving unassigned);

   done = true;

 }

 if (done) {

   // Flip the table to enabled.

   this.assignmentManager.getTableStateManager().setTableState(

 this.tableName, TableState.State.ENABLED);

   LOG.info(Table ' + this.tableName

   + ' was successfully enabled. Status: done= + done);

}

  ...

 }


 thanks
 Stephen




-- 
Andrey.

Re: Jira role cleanup

2015-03-16 Thread Nick Dimiduk

bq. Our commit log conventions aren't universally followed, due to human
error

Going forward, I think we can alleviate this issue with a git hook and a
regexp.

On Mon, Mar 16, 2015 at 10:38 AM, Andrew Purtell apurt...@apache.org
wrote:

  I think Jira management should be left to the committers. One can pretty
 much mess up a release, and make it hard to account for what's in and
 what's not when jiras are changed the around (the ultimate truth can be
 reconstructed from the git commit records, but that's tedious).

 I agree we should avoid allowing contributors to change JIRA metadata if
 this is possible to restrict. Our commit log conventions aren't universally
 followed, due to human error, so they are not all tagged with issue
 identifiers, or the correct identifier.



 On Sun, Mar 15, 2015 at 11:12 PM, lars hofhansl la...@apache.org wrote:

  Hmm... This is interesting. I think Jira management should be left to the
  committers. One can pretty much mess up a release, and make it hard to
  account for what's in and what's not when jiras are changed the around
 (the
  ultimate truth can be reconstructed from the git commit records, but
 that's
  tedious).
  Minimally somebody needs to be able to assign a jira to the person
  providing the patch, if those are committers only that's tedious but OK -
  we've been doing that anyway.Ideally the person could assign an _open_
  issue to him/herself and log work against an issue and change the due
 data.
  Those seem abilities we could grant to everybody as long as they are
  limited to open issues.
  Beyond that I agree that we should limit this to a known set of people
  (the contributors). Maybe discuss this briefly at the next PMC meeting,
  we're due to have one anyway. I'm willing to host one at Salesforce.
 
  -- Lars
 
From: Sean Busbey bus...@cloudera.com
   To: dev dev@hbase.apache.org; lars hofhansl la...@apache.org
   Sent: Sunday, March 15, 2015 9:46 PM
   Subject: Re: Jira role cleanup
 
  I can make it so that issues can be assigned to non-contributors. Even if
  we don't do that, I believe jira permissions are all about constraining
  current actions, and are not enforced on existing ticketes.
 
  However, the contributor role in jira has several other abilities
  associated with it. Right now, in the order they appear in jira:
 
  * edit an issue's due date
  * move issues (between project workflows or projects the user has create
  on)
  * assign issues to other people
  * resolve and reopen issues, assign a fix version (but not close them)
  * manage watchers on an issue
  * log work against an issue
 
  Any of these could also be changed to remove contributors or allow wider
  jira users.
 
  If assignable users can assign to themselves when they don't have the
  assign users permission, then the only one I think we use is resolve and
  reopen issues. And I don't think I'd want that open to all jira users.
 
  Do we want to have to handle marking issues resolved for folks? It makes
  sense to me, since I usually do that once I push the commit.
 
 
 
 
 
  On Sun, Mar 15, 2015 at 11:07 PM, lars hofhansl la...@apache.org
 wrote:
 
   Not sure what jira does about an assignee when (s)he is removed from
 the
   contributors list (I know you have to add a person to the contributors
  list
   order to assign a jira to them).Other than the committers, we probably
  have
   at least one jira assigned to a contributor (otherwise why add him/her
 as
   contributor).
   Can we change the jira rules in our space to allow assigning jiras to
   users even when they're not listed as contributors?
   We do not have a formal contributor status (why not?), so this list is
   only needed because of jira.
   -- Lars
  
From: Sean Busbey bus...@cloudera.com
To: dev dev@hbase.apache.org
Sent: Friday, March 13, 2015 9:09 AM
Subject: Re: Jira role cleanup
  
   On Fri, Mar 13, 2015 at 11:01 AM, Andrew Purtell apurt...@apache.org
   wrote:
  
+1
I think it would be fine to trim the contributor list too. We can
  always
add people back on demand in order to (re)assign issues.
   
   
   I wasn't sure how we generate the list of contributors. But then I
  noticed
   that we don't link to jira for it like I thought we did[1].
  
   How about I make a saved jira query for people who have had jira's
  assigned
   to them, add a link to that query for our here are the contributors
   section, and then trim off from the role anyone who hasn't been
 assigned
  an
   issue in the last year?
  
  
   [1]: http://hbase.apache.org/team-list.html
  
  
  
   --
   Sean
  
  
  
  
 
 
 
  --
  Sean
 
 
 
 



 --
 Best regards,

- Andy

 Problems worthy of attack prove their worth by hitting back. - Piet Hein
 (via Tom White)

Re: Status of Huawei's 2' Indexing?

2015-03-16 Thread Andrew Purtell

That's patently untrue and pure paranoia. The comment about having a
civilized discussion had nothing to do with you Michael. Joseph said:

HBASE-9203 (the ticket that deals with 2’ indexes) is pretty civilized (as are
most of them, it seems)


and so I responded as you saw. I was not thinking of you, I swear I never
think of you unless you write in and call me names. Please let these nice
people get back to the topic at hand.




 On 3/16/15, 12:18 PM, Michael Segel michael_se...@hotmail.com wrote:

 Joseph,
 
 The issue with Andrew goes back a few years.  His comment about having a
 civilized discussion was a personal dig at me.
 
 
  On Mar 16, 2015, at 10:38 AM, Rose, Joseph
 joseph.r...@childrens.harvard.edu wrote:
 
  Michael,
 
  I don’t understand the invective. I’m sure you have something to
  contribute but when bring on this tone the only thing I hear are the
 snide
  comments.
 
 
  -j
 
 
  P.s., I’ll refer you to this:
 
 https://urldefense.proofpoint.com/v2/url?u=https-3A__hbase.apache.org_boo
 k.html-23-5Fjoinsd=BQIFaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU
 r=j9wyupjEn0B7jf5KuX71llCBNN37RKmLLRc05fkUwaA79i0DrYaVuQHxlqAccDLcm=ujJC
 fI0GwgZ1Qx9be1fW7FIRqFeS-UmWVS304uhfKLss=2TGF0r5VvzExMqV31LmI3rQd4B8eJq_
 PqYKJXUqAjNke=
 
 
  On 3/16/15, 11:15 AM, Michael Segel michael_se...@hotmail.com
 wrote:
 
  You’ll have to excuse Andy.
 
  He’s a bit slow.  HBASE-13044 should have been done 2 years ago. And it
  was trivial. Just got done last month….
 
  But I digress… The long story short…
 
  HBASE-9203 was brain dead from inception.  Huawei’s idea was to index
 on
  the region which had two problems.
  1) Complexity in that they wanted to keep the index on the same region
  server
  2) Joins become impossible.  Well, actually not impossible, but
  incredibly slow when compared to the alternative.
 
  You really should go back to the email chain.
  Their defense (including Salesforce who was going to push this
 approach)
  fell apart when you asked the simple question on how do you handle
 joins?
 
  That’s their OOPS moment. Once you start to understand that, then
  allowing the index to be orthogonal to the base table, things started
 to
  come together.
 
  In short, you have a query either against a single table, or if you’re
  doing a join.  You then get the indexes and assuming that you’re only
  using the AND predicate, its a simple intersection of the index result
  sets. (Since the result sets are ordered, its relatively trivial to
 walk
  through and find the intersections of N Lists in a single pass.)
 
 
  Now you have your result set of base table row keys and you can work
 with
  that data. (Either returning the records to the client, or as input to
 a
  map/reduce job.
 
  That’s the 30K view.  There’s more to it, but once Salesforce got the
  basic idea, they ran with it. It was really that simple concept that
 the
  index would be orthogonal to the base table that got them moving in the
  right direction.
 
 
  To Joseph’s point, indexing isn’t necessarily an RDBMS feature.
 However,
  it seems that some of the Committers are suffering from rectal induced
  hypoxia. HBASE-12853 was created not just to help solve the issue of
 ‘hot
  spotting’ but also to get the Committers to focus on bringing the
  solutions that they glum on in the client, back to the server side of
  things.
 
  Unfortunately the last great attempt at fixing things on the server
 side
  was the bastardization of coprocessors which again, suffers from the
 lack
  of thought.  This isn’t to say that allowing users to extend the server
  side functionality is wrong. (Because it isn’t.) But that the
  implementation done in HBase is a tad lacking in thought.
 
  So in terms of indexing…
  Longer term picture, there has to be some fixes on the server side of
  things to allow one to associate an index (allowing for different
 types)
  to a base table, yet the implementation of using the index would end up
  becoming a client.  And by client, it would be an external query engine
  processor that could/should sit on the cluster.
 
  But hey! What do I know?
  I gave up trying to have an intelligent/civilized conversation with
  Andrew because he just couldn’t grasp the basics.  ;-)
 
 
 
 
 
  On Mar 13, 2015, at 4:14 PM, Andrew Purtell apurt...@apache.org
 wrote:
 
  When I made that remark I was thinking of a recent discussion we had
 at
  a
  joint Phoenix and HBase developer meetup. The difference of opinion
 was
  certainly civilized. (smile) I'm not aware of any specific written
  discussion, it may or may not exist. I'm pretty sure a revival of
  HBASE-9203
  would attract some controversy, but let me be clearer this time than I
  was
  before that this is just my opinion, FWIW.
 
 
  On Thu, Mar 12, 2015 at 3:58 PM, Rose, Joseph 
  joseph.r...@childrens.harvard.edu wrote:
 
  I saw that it was added to their project. I’m really not keen on
  bringing
  in all the RDBMS apparatus on top of hbase, so I

Re: Status of Huawei's 2' Indexing?

2015-03-16 Thread lars hofhansl

Dude... Relax... Let's keep it cordial, please.

To the topic:
Any CS 101 student can implement an eventually consistent index on top of HBase.
The part that is always missed is: How do you keep it consistent?There you have 
essentially two choices: (1) every update to an indexed table becomes a 
distributed transaction or (2) you keep region server local indexes.
There is nothing wrong with #2. It's good for not-so-selective indexes.
There is also nothing wrong with #1. This one is good for highly selective 
indexes (PK, etc)

Indexes and joins do not have to be conflated. And maybe your use case is fine 
with eventually consistent indexes. In that case just write your stuff into two 
tables and be done with it.

-- Lars
 
  From: Michael Segel michael_se...@hotmail.com
 To: dev@hbase.apache.org 
 Sent: Monday, March 16, 2015 8:14 AM
 Subject: Re: Status of Huawei's 2' Indexing?
   
You’ll have to excuse Andy. 

He’s a bit slow.  HBASE-13044 should have been done 2 years ago. And it was 
trivial. Just got done last month…. 

But I digress… The long story short… 

HBASE-9203 was brain dead from inception.  Huawei’s idea was to index on the 
region which had two problems. 
1) Complexity in that they wanted to keep the index on the same region server
2) Joins become impossible.  Well, actually not impossible, but incredibly slow 
when compared to the alternative. 

You really should go back to the email chain. 
Their defense (including Salesforce who was going to push this approach) fell 
apart when you asked the simple question on how do you handle joins? 

That’s their OOPS moment. Once you start to understand that, then allowing the 
index to be orthogonal to the base table, things started to come together. 

In short, you have a query either against a single table, or if you’re doing a 
join.  You then get the indexes and assuming that you’re only using the AND 
predicate, its a simple intersection of the index result sets. (Since the 
result sets are ordered, its relatively trivial to walk through and find the 
intersections of N Lists in a single pass.) 


Now you have your result set of base table row keys and you can work with that 
data. (Either returning the records to the client, or as input to a map/reduce 
job. 

That’s the 30K view.  There’s more to it, but once Salesforce got the basic 
idea, they ran with it. It was really that simple concept that the index would 
be orthogonal to the base table that got them moving in the right direction. 


To Joseph’s point, indexing isn’t necessarily an RDBMS feature. However, it 
seems that some of the Committers are suffering from rectal induced hypoxia. 
HBASE-12853 was created not just to help solve the issue of ‘hot spotting’ but 
also to get the Committers to focus on bringing the solutions that they glum on 
in the client, back to the server side of things. 

Unfortunately the last great attempt at fixing things on the server side was 
the bastardization of coprocessors which again, suffers from the lack of 
thought.  This isn’t to say that allowing users to extend the server side 
functionality is wrong. (Because it isn’t.) But that the implementation done in 
HBase is a tad lacking in thought. 

So in terms of indexing… 
Longer term picture, there has to be some fixes on the server side of things to 
allow one to associate an index (allowing for different types) to a base table, 
yet the implementation of using the index would end up becoming a client.  And 
by client, it would be an external query engine processor that could/should sit 
on the cluster. 

But hey! What do I know? 
I gave up trying to have an intelligent/civilized conversation with Andrew 
because he just couldn’t grasp the basics.  ;-) 







 On Mar 13, 2015, at 4:14 PM, Andrew Purtell apurt...@apache.org wrote:
 
 When I made that remark I was thinking of a recent discussion we had at a
 joint Phoenix and HBase developer meetup. The difference of opinion was
 certainly civilized. (smile) I'm not aware of any specific written
 discussion, it may or may not exist. I'm pretty sure a revival of HBASE-9203
 would attract some controversy, but let me be clearer this time than I was
 before that this is just my opinion, FWIW.
 
 
 On Thu, Mar 12, 2015 at 3:58 PM, Rose, Joseph 
 joseph.r...@childrens.harvard.edu wrote:
 
 I saw that it was added to their project. I’m really not keen on bringing
 in all the RDBMS apparatus on top of hbase, so I decided to follow other
 avenues first (like trying to patch 0.98, for better or worse.)
 
 That Phoenix article seems like a good breakdown of the various indexing
 architectures.
 
 HBASE-9203 (the ticket that deals with 2’ indexes) is pretty civilized (as
 are most of them, it seems) so I didn’t know there were these differences
 of opinion. Did I miss the mailing list thread where the architectural
 differences were discussed?
 
 
 -j

The opinions expressed here are mine, while they may reflect a cognitive 
thought, that is purely

Question on EnableTableHandler code

2015-03-16 Thread Stephen Jiang

I want to make sure that the following logic in EnableTableHandler is
correct:

(1). In EnableTableHandler#prepare - if the table is not existed, it marked
the table as deleted and not throw exception.  The result is the table lock
is released and the caller has no knowledge that the table not exist or
already deleted, it would continue the next step.

Currently, this would happen during recovery (the caller is
AssignmentManager#recoverTableInEnablingState()) - however, looking at
recovery code, it expects TableNotFoundException  Should we always throw
exception - if the table not exist?  I want to make sure that I don't break
recovery logic by modifying.

public EnableTableHandler prepare() {

...

// Check if table exists

  if (!MetaTableAccessor.tableExists(this.server.getConnection(),
tableName)) {

// retainAssignment is true only during recovery.  In normal case
it is false

if (!this.skipTableStateCheck) {

  throw new TableNotFoundException(tableName);

}

this.assignmentManager.getTableStateManager().setDeletedTable(
tableName);

  }

...

}
(2). In EnableTableHandler#handleEnableTable() - if the bulk assign plan
could not be find, it would leave regions to be offline and declare enable
table succeed - i think this is a bug and we should retry or fail - but I
want to make sure that there are some merit behind this logic

  private void handleEnableTable() {

MapServerName, ListHRegionInfo bulkPlan =

this.assignmentManager.getBalancer().retainAssignment(
regionsToAssign, onlineServers);

if (bulkPlan != null) {

  ...

  } else {

  LOG.info(Balancer was unable to find suitable servers for table  +
tableName

  + , leaving unassigned);

  done = true;

}

if (done) {

  // Flip the table to enabled.

  this.assignmentManager.getTableStateManager().setTableState(

this.tableName, TableState.State.ENABLED);

  LOG.info(Table ' + this.tableName

  + ' was successfully enabled. Status: done= + done);

   }

 ...

}


thanks
Stephen

Re: Status of Huawei's 2' Indexing?

2015-03-16 Thread Rose, Joseph

Alright, let’s see if I can get this discussion back on track.

I have a sensibly defined table for patient data; its rowkey is simply
lastname:firstname, since it’s convenient for the bulk of my lookups.
Unfortunately I also need to efficiently find patients using an ID string,
whose literal value is buried in a value field. I’m sure this situation is
not foreign to the people on this list.

It’s been suggested that I implement 2’ indexes myself — fine. All the
research I’ve done seems to end with that suggestion, with the exception
of Phoenix (I don’t want the RDBMS layer) and Huawei’s stuff (which seems
to incite some discussion here). I’m happy to put this together but I’d
rather go with something that has been vetted and has a larger developer
community than one (i.e., ME). Besides, I have a full enough plate at the
moment that I’d rather not have to do this, too.

Are there constructive suggestions regarding how I can proceed with HBase?
Right now even a well-vetted local index would be a godsend.

Thanks.


-j


p.s., I’ll refer you to this post for a slightly more detailed rundown of
how I plan to do things:
http://article.gmane.org/gmane.comp.java.hadoop.hbase.user/46467


On 3/16/15, 12:18 PM, Michael Segel michael_se...@hotmail.com wrote:

Joseph, 

The issue with Andrew goes back a few years.  His comment about having a
civilized discussion was a personal dig at me.


 On Mar 16, 2015, at 10:38 AM, Rose, Joseph
joseph.r...@childrens.harvard.edu wrote:
 
 Michael,
 
 I don’t understand the invective. I’m sure you have something to
 contribute but when bring on this tone the only thing I hear are the
snide
 comments.
 
 
 -j
 
 
 P.s., I’ll refer you to this:
https://urldefense.proofpoint.com/v2/url?u=https-3A__hbase.apache.org_boo
k.html-23-5Fjoinsd=BQIFaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU
r=j9wyupjEn0B7jf5KuX71llCBNN37RKmLLRc05fkUwaA79i0DrYaVuQHxlqAccDLcm=ujJC
fI0GwgZ1Qx9be1fW7FIRqFeS-UmWVS304uhfKLss=2TGF0r5VvzExMqV31LmI3rQd4B8eJq_
PqYKJXUqAjNke= 
 
 
 On 3/16/15, 11:15 AM, Michael Segel michael_se...@hotmail.com wrote:
 
 You’ll have to excuse Andy.
 
 He’s a bit slow.  HBASE-13044 should have been done 2 years ago. And it
 was trivial. Just got done last month….
 
 But I digress… The long story short…
 
 HBASE-9203 was brain dead from inception.  Huawei’s idea was to index
on
 the region which had two problems.
 1) Complexity in that they wanted to keep the index on the same region
 server
 2) Joins become impossible.  Well, actually not impossible, but
 incredibly slow when compared to the alternative.
 
 You really should go back to the email chain.
 Their defense (including Salesforce who was going to push this
approach)
 fell apart when you asked the simple question on how do you handle
joins?
 
 That’s their OOPS moment. Once you start to understand that, then
 allowing the index to be orthogonal to the base table, things started
to
 come together. 
 
 In short, you have a query either against a single table, or if you’re
 doing a join.  You then get the indexes and assuming that you’re only
 using the AND predicate, its a simple intersection of the index result
 sets. (Since the result sets are ordered, its relatively trivial to
walk
 through and find the intersections of N Lists in a single pass.)
 
 
 Now you have your result set of base table row keys and you can work
with
 that data. (Either returning the records to the client, or as input to
a
 map/reduce job.
 
 That’s the 30K view.  There’s more to it, but once Salesforce got the
 basic idea, they ran with it. It was really that simple concept that
the
 index would be orthogonal to the base table that got them moving in the
 right direction.
 
 
 To Joseph’s point, indexing isn’t necessarily an RDBMS feature.
However,
 it seems that some of the Committers are suffering from rectal induced
 hypoxia. HBASE-12853 was created not just to help solve the issue of
‘hot
 spotting’ but also to get the Committers to focus on bringing the
 solutions that they glum on in the client, back to the server side of
 things. 
 
 Unfortunately the last great attempt at fixing things on the server
side
 was the bastardization of coprocessors which again, suffers from the
lack
 of thought.  This isn’t to say that allowing users to extend the server
 side functionality is wrong. (Because it isn’t.) But that the
 implementation done in HBase is a tad lacking in thought.
 
 So in terms of indexing…
 Longer term picture, there has to be some fixes on the server side of
 things to allow one to associate an index (allowing for different
types)
 to a base table, yet the implementation of using the index would end up
 becoming a client.  And by client, it would be an external query engine
 processor that could/should sit on the cluster.
 
 But hey! What do I know?
 I gave up trying to have an intelligent/civilized conversation with
 Andrew because he just couldn’t grasp the basics.  ;-)
 
 
 
 
 
 On Mar 13, 2015, at 4:14 PM, Andrew Purtell

Re: Status of Huawei's 2' Indexing?

2015-03-16 Thread Wilm Schumacher

Hi Joseph,

I think that you kicked off this discussion, because to implement an
indexing mechanism for hbase in general is much more complicate than
your specific problem. The people on this list want to bear every
possible (or at least A LOT) of applications in mind. A too easy
mechanism wouldn't fit the needs of most of the users (thus would be
useless), a more complicate model is harder to maintain and you would
have to find more coders etc.. Thus with your application question you
seemed to walked right into a very general discussion.

Furthermore this is a user question, as you do not want to change the
code of hbase, aren't you ;). I'll try an answer on the general user
list in a couple of minutes, thus more people can discuss and we can get
traffic out of this list, okay?

Best wishes

Wilm

Am 16.03.2015 um 18:46 schrieb Rose, Joseph:
 Alright, let’s see if I can get this discussion back on track.

 I have a sensibly defined table for patient data; its rowkey is simply
 lastname:firstname, since it’s convenient for the bulk of my lookups.
 Unfortunately I also need to efficiently find patients using an ID string,
 whose literal value is buried in a value field. I’m sure this situation is
 not foreign to the people on this list.

 It’s been suggested that I implement 2’ indexes myself — fine. All the
 research I’ve done seems to end with that suggestion, with the exception
 of Phoenix (I don’t want the RDBMS layer) and Huawei’s stuff (which seems
 to incite some discussion here). I’m happy to put this together but I’d
 rather go with something that has been vetted and has a larger developer
 community than one (i.e., ME). Besides, I have a full enough plate at the
 moment that I’d rather not have to do this, too.

 Are there constructive suggestions regarding how I can proceed with HBase?
 Right now even a well-vetted local index would be a godsend.

 Thanks.


 -j


 p.s., I’ll refer you to this post for a slightly more detailed rundown of
 how I plan to do things:
 http://article.gmane.org/gmane.comp.java.hadoop.hbase.user/46467


 On 3/16/15, 12:18 PM, Michael Segel michael_se...@hotmail.com wrote:

 Joseph, 

 The issue with Andrew goes back a few years.  His comment about having a
 civilized discussion was a personal dig at me.


 On Mar 16, 2015, at 10:38 AM, Rose, Joseph
 joseph.r...@childrens.harvard.edu wrote:

 Michael,

 I don’t understand the invective. I’m sure you have something to
 contribute but when bring on this tone the only thing I hear are the
 snide
 comments.


 -j


 P.s., I’ll refer you to this:
 https://urldefense.proofpoint.com/v2/url?u=https-3A__hbase.apache.org_boo
 k.html-23-5Fjoinsd=BQIFaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU
 r=j9wyupjEn0B7jf5KuX71llCBNN37RKmLLRc05fkUwaA79i0DrYaVuQHxlqAccDLcm=ujJC
 fI0GwgZ1Qx9be1fW7FIRqFeS-UmWVS304uhfKLss=2TGF0r5VvzExMqV31LmI3rQd4B8eJq_
 PqYKJXUqAjNke= 


 On 3/16/15, 11:15 AM, Michael Segel michael_se...@hotmail.com wrote:

 You’ll have to excuse Andy.

 He’s a bit slow.  HBASE-13044 should have been done 2 years ago. And it
 was trivial. Just got done last month….

 But I digress… The long story short…

 HBASE-9203 was brain dead from inception.  Huawei’s idea was to index
 on
 the region which had two problems.
 1) Complexity in that they wanted to keep the index on the same region
 server
 2) Joins become impossible.  Well, actually not impossible, but
 incredibly slow when compared to the alternative.

 You really should go back to the email chain.
 Their defense (including Salesforce who was going to push this
 approach)
 fell apart when you asked the simple question on how do you handle
 joins?

 That’s their OOPS moment. Once you start to understand that, then
 allowing the index to be orthogonal to the base table, things started
 to
 come together. 

 In short, you have a query either against a single table, or if you’re
 doing a join.  You then get the indexes and assuming that you’re only
 using the AND predicate, its a simple intersection of the index result
 sets. (Since the result sets are ordered, its relatively trivial to
 walk
 through and find the intersections of N Lists in a single pass.)


 Now you have your result set of base table row keys and you can work
 with
 that data. (Either returning the records to the client, or as input to
 a
 map/reduce job.

 That’s the 30K view.  There’s more to it, but once Salesforce got the
 basic idea, they ran with it. It was really that simple concept that
 the
 index would be orthogonal to the base table that got them moving in the
 right direction.


 To Joseph’s point, indexing isn’t necessarily an RDBMS feature.
 However,
 it seems that some of the Committers are suffering from rectal induced
 hypoxia. HBASE-12853 was created not just to help solve the issue of
 ‘hot
 spotting’ but also to get the Committers to focus on bringing the
 solutions that they glum on in the client, back to the server side of
 things. 

 Unfortunately the last great attempt at fixing

[jira] [Created] (HBASE-13256) HBaseConfiguration#checkDefaultsVersion(Configuration) has spelling error

2015-03-16 Thread Josh Elser (JIRA)

Josh Elser created HBASE-13256:
--

 Summary: HBaseConfiguration#checkDefaultsVersion(Configuration) 
has spelling error
 Key: HBASE-13256
 URL: https://issues.apache.org/jira/browse/HBASE-13256
 Project: HBase
  Issue Type: Improvement
  Components: Client
Reporter: Josh Elser
Assignee: Josh Elser
Priority: Trivial
 Fix For: 2.0.0, 1.1.0


Happened to stumble onto the exception thrown by 
{{HBaseConfiguration#checkDefaultsVersion}}. This improves the spelling/grammar.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Question on EnableTableHandler code

2015-03-16 Thread Andrey Stepachev

Stephen , you are right , that is my code and that thing was overlooked :)

I think we need completely remove state cleanup code.
Actually how it done tablestate manager could not return table
which later will be rendered as nonexistent.
Basically that means, that if we got nonexistent table in EnableTable
handler
we can fail even in recovery, because it means that something really bad
happened
and seems we can ask to run hbck.

So in short, we just need to throw TableNotFoundException and remove flag
skipTableStateCheck from EnableTableHandler.


On Mon, Mar 16, 2015 at 7:49 PM, Stephen Jiang syuanjiang...@gmail.com
wrote:

 thanks, Rajeshbabu, HBASE-10215 is not the last change, The HBASE-7767
 (hello, Andrey [?]) removed the exception throw code after setting up the
 table state, what we really want is as follows (if Andrey agrees with the
 change, I will create a JIRA and send out the patch today):

   // Check if table exists

   if (!MetaTableAccessor.tableExists(this.server.getConnection(),
 tableName)) {

 // retainAssignment is true only during recovery.  In normal case
 it is false

 if (this.skipTableStateCheck) {

   this.assignmentManager.getTableStateManager().setDeletedTable(
 tableName);

 }

throw new TableNotFoundException(tableName);

   }



 On Mon, Mar 16, 2015 at 12:09 PM, Rajeshbabu Chintaguntla 
 chrajeshbab...@gmail.com wrote:

 Hi Stephen and Andrey,

 The first step added to remove stale znodes if table creation fails after
 znode creation.
 See HBASE-10215 https://issues.apache.org/jira/browse/HBASE-10215.  Not
 sure still we need it or not.

 Thanks,
 Rajeshbabu.




 On Tue, Mar 17, 2015 at 12:18 AM, Andrey Stepachev oct...@gmail.com
 wrote:

  Thanks Stephen.
 
  on (2): I think that much better to guarantee that table was enabled
 (i.e.
  all internal structures reflect that fact and balancer knows about new
  table). But result of that could be checked asyncronically from Admin.
  Does it make sense?
 
  On Mon, Mar 16, 2015 at 6:10 PM, Stephen Jiang syuanjiang...@gmail.com
 
  wrote:
 
   Andrey, I will take care of (1).
  
   And (2) :-) if your guys agree.  Because it is not consistent, if the
  bulk
   assigned failed, we would fail the enabling table; however, if the
 bulk
   assign not starts, we would enable table with offline regions - really
   inconsistent - we either all fail in those scenarios or all succeed
 with
   offline region (best effort approach).
  
   Thanks
   Stephen
  
   On Mon, Mar 16, 2015 at 11:01 AM, Andrey Stepachev oct...@gmail.com
   wrote:
  
   Stephen, would you like to create jira for case (1)?
  
   Thank you.
  
   On Mon, Mar 16, 2015 at 5:58 PM, Andrey Stepachev oct...@gmail.com
   wrote:
  
Thanks Stephen.
   
Looks like you are right. For (1) case we really don't need there
state cleanup. That is a bug. Should throw TableNotFoundException.
   
As for (2) in case of no online region servers available we could
leave table enabled, but no regions would be assigned.
   
Actually that rises good question what enable table means,
i.e. do we really need to guarantee that on table enable absolutely
all regions are online, or that could be done in Admin on client
 side.
   
So for now it seems that Enable handler do what is best, and leave
table enabled but unassigned to be later assigned by Balancer.
   
On Mon, Mar 16, 2015 at 5:34 PM, Stephen Jiang 
  syuanjiang...@gmail.com
   
wrote:
   
I want to make sure that the following logic in
 EnableTableHandler is
correct:
   
(1). In EnableTableHandler#prepare - if the table is not existed,
 it
marked
the table as deleted and not throw exception.  The result is the
  table
lock
is released and the caller has no knowledge that the table not
 exist
  or
already deleted, it would continue the next step.
   
Currently, this would happen during recovery (the caller is
AssignmentManager#recoverTableInEnablingState()) - however,
 looking
  at
recovery code, it expects TableNotFoundException  Should we always
   throw
exception - if the table not exist?  I want to make sure that I
 don't
break
recovery logic by modifying.
   
public EnableTableHandler prepare() {
   
...
   
// Check if table exists
   
  if
 (!MetaTableAccessor.tableExists(this.server.getConnection(),
tableName)) {
   
// retainAssignment is true only during recovery.  In
 normal
   case
it is false
   
if (!this.skipTableStateCheck) {
   
  throw new TableNotFoundException(tableName);
   
}
   
   
   this.assignmentManager.getTableStateManager().setDeletedTable(
tableName);
   
  }
   
...
   
}
(2). In EnableTableHandler#handleEnableTable() - if the bulk
 assign
   plan
could not be find, it would leave regions to be offline and
 declare
   enable
table succeed - i

[jira] [Created] (HBASE-13254) EnableTableHandler#prepare would not throw TableNotFoundException during recovery

2015-03-16 Thread Stephen Yuan Jiang (JIRA)

Stephen Yuan Jiang created HBASE-13254:
--

 Summary: EnableTableHandler#prepare would not throw 
TableNotFoundException during recovery
 Key: HBASE-13254
 URL: https://issues.apache.org/jira/browse/HBASE-13254
 Project: HBase
  Issue Type: Bug
Reporter: Stephen Yuan Jiang
Assignee: Stephen Yuan Jiang
Priority: Minor


During recovery, when EnableTableHandler#prepare() is called, If the table does 
not exist, it marks the table as deleted and does NOT throw 
TableNotFoundException.  The result is that the table lock is released and the 
caller has no knowledge that the table not exist or already deleted, it would 
continue the next step.

{code}
  public EnableTableHandler prepare()
  throws TableNotFoundException, TableNotDisabledException, IOException {
   ...
try {
  // Check if table exists
  if (!MetaTableAccessor.tableExists(this.server.getConnection(), 
tableName)) {
// retainAssignment is true only during recovery.  In normal case it is 
false
if (!this.skipTableStateCheck) {
  throw new TableNotFoundException(tableName);
}
 
this.assignmentManager.getTableStateManager().setDeletedTable(tableName);
  }
   ...
  }
{code}

However,look at the recovery code that calls the EnableTableHandler#prepare 
function, AssignmentManager#recoverTableInEnablingState() expects 
TableNotFoundException so that it can skip the table.

{code}
  private void recoverTableInEnablingState()
  throws KeeperException, IOException {
SetTableName enablingTables = tableStateManager.
getTablesInStates(TableState.State.ENABLING);
if (enablingTables.size() != 0) {
  for (TableName tableName : enablingTables) {
// Recover by calling EnableTableHandler
LOG.info(The table  + tableName
+  is in ENABLING state.  Hence recovering by moving the table
+  to ENABLED state.);
// enableTable in sync way during master startup,
// no need to invoke coprocessor
EnableTableHandler eth = new EnableTableHandler(this.server, tableName,
  this, tableLockManager, true);
try {
  eth.prepare();
} catch (TableNotFoundException e) {
  LOG.warn(Table  + tableName +  not found in hbase:meta to 
recover.);
  continue;
}
eth.process();
  }
}
  }
{code}

The proposed fix is always throw TableNotFoundException in 
EnableTableHandler#prepare if the table does not exist.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-13255) Bad grammar in RegionServer status page

2015-03-16 Thread Josh Elser (JIRA)

Josh Elser created HBASE-13255:
--

 Summary: Bad grammar in RegionServer status page
 Key: HBASE-13255
 URL: https://issues.apache.org/jira/browse/HBASE-13255
 Project: HBase
  Issue Type: Improvement
  Components: monitoring
Reporter: Josh Elser
Assignee: Josh Elser
Priority: Trivial
 Fix For: 2.0.0, 1.1.0


Noticed on the rs-status page, the blurb under the Regions section could use 
some grammatical improvements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Jira role cleanup

2015-03-16 Thread lars hofhansl

Hmm... This is interesting. I think Jira management should be left to the 
committers. One can pretty much mess up a release, and make it hard to account 
for what's in and what's not when jiras are changed the around (the ultimate 
truth can be reconstructed from the git commit records, but that's tedious).
Minimally somebody needs to be able to assign a jira to the person providing 
the patch, if those are committers only that's tedious but OK - we've been 
doing that anyway.Ideally the person could assign an _open_ issue to 
him/herself and log work against an issue and change the due data. Those seem 
abilities we could grant to everybody as long as they are limited to open 
issues.
Beyond that I agree that we should limit this to a known set of people (the 
contributors). Maybe discuss this briefly at the next PMC meeting, we're due to 
have one anyway. I'm willing to host one at Salesforce.

-- Lars

  From: Sean Busbey bus...@cloudera.com
 To: dev dev@hbase.apache.org; lars hofhansl la...@apache.org 
 Sent: Sunday, March 15, 2015 9:46 PM
 Subject: Re: Jira role cleanup
   
I can make it so that issues can be assigned to non-contributors. Even if
we don't do that, I believe jira permissions are all about constraining
current actions, and are not enforced on existing ticketes.

However, the contributor role in jira has several other abilities
associated with it. Right now, in the order they appear in jira:

* edit an issue's due date
* move issues (between project workflows or projects the user has create on)
* assign issues to other people
* resolve and reopen issues, assign a fix version (but not close them)
* manage watchers on an issue
* log work against an issue

Any of these could also be changed to remove contributors or allow wider
jira users.

If assignable users can assign to themselves when they don't have the
assign users permission, then the only one I think we use is resolve and
reopen issues. And I don't think I'd want that open to all jira users.

Do we want to have to handle marking issues resolved for folks? It makes
sense to me, since I usually do that once I push the commit.





On Sun, Mar 15, 2015 at 11:07 PM, lars hofhansl la...@apache.org wrote:

 Not sure what jira does about an assignee when (s)he is removed from the
 contributors list (I know you have to add a person to the contributors list
 order to assign a jira to them).Other than the committers, we probably have
 at least one jira assigned to a contributor (otherwise why add him/her as
 contributor).
 Can we change the jira rules in our space to allow assigning jiras to
 users even when they're not listed as contributors?
 We do not have a formal contributor status (why not?), so this list is
 only needed because of jira.
 -- Lars

      From: Sean Busbey bus...@cloudera.com
  To: dev dev@hbase.apache.org
  Sent: Friday, March 13, 2015 9:09 AM
  Subject: Re: Jira role cleanup

 On Fri, Mar 13, 2015 at 11:01 AM, Andrew Purtell apurt...@apache.org
 wrote:

  +1
  I think it would be fine to trim the contributor list too. We can always
  add people back on demand in order to (re)assign issues.
 
 
 I wasn't sure how we generate the list of contributors. But then I noticed
 that we don't link to jira for it like I thought we did[1].

 How about I make a saved jira query for people who have had jira's assigned
 to them, add a link to that query for our here are the contributors
 section, and then trim off from the role anyone who hasn't been assigned an
 issue in the last year?


 [1]: http://hbase.apache.org/team-list.html



 --
 Sean







-- 
Sean

[jira] [Resolved] (HBASE-13217) Flush procedure fails in trunk due to ZK issue

2015-03-16 Thread ramkrishna.s.vasudevan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-13217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan resolved HBASE-13217.

Resolution: Invalid

I am closing this issue for now.  It still occurs for me in our internal branch 
last updated on March 12th.  May be there is something internally that is 
causing this. When I raised this issue I verified cross version also. May be 
that was the reason the flush was failing even if the RS was a pure trunk based 
version. Thanks to Jerry he for helping in verifying the issue.

 Flush procedure fails in trunk due to ZK issue
 --

 Key: HBASE-13217
 URL: https://issues.apache.org/jira/browse/HBASE-13217
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan

 When ever I try to flush explicitly in the trunk code the flush procedure 
 fails due to ZK issue
 {code}
 ERROR: org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable 
 via 
 stobdtserver3,16040,1426172670959:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable:
  java.io.IOException: org.apache.zookeeper.KeeperException$NoNodeException: 
 KeeperErrorCode = NoNode for 
 /hbase/flush-table-proc/acquired/TestTable/stobdtserver3,16040,1426172670959
 at 
 org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:83)
 at 
 org.apache.hadoop.hbase.procedure.Procedure.isCompleted(Procedure.java:368)
 at 
 org.apache.hadoop.hbase.procedure.flush.MasterFlushTableProcedureManager.isProcedureDone(MasterFlushTableProcedureManager.java:196)
 at 
 org.apache.hadoop.hbase.master.MasterRpcServices.isProcedureDone(MasterRpcServices.java:905)
 at 
 org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:47019)
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2073)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
 at 
 org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
 at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: 
 org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: 
 java.io.IOException: org.apache.zookeeper.KeeperException$NoNodeException: 
 KeeperErrorCode = NoNode for 
 /hbase/flush-table-proc/acquired/TestTable/stobdtserver3,16040,1426172670959
 at 
 org.apache.hadoop.hbase.procedure.Subprocedure.cancel(Subprocedure.java:273)
 at 
 org.apache.hadoop.hbase.procedure.ProcedureMember.controllerConnectionFailure(ProcedureMember.java:225)
 at 
 org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.sendMemberAcquired(ZKProcedureMemberRpcs.java:254)
 at 
 org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:166)
 at 
 org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:52)
 at java.util.concurrent.FutureTask.run(FutureTask.java:262)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 ... 1 more
 {code}
 Once this occurs, even on restart of the RS the RS becomes unusable.  I have 
 verified that the ZK remains intact and there is no problem with it.  a bit 
 older version of trunk ( 3months) does not have this problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Rough goal timelines for 1.1 and 2.0

2015-03-16 Thread Enis Söztutar

I would love to see 1.1 in or before May. We already have good stuff in
branch-1, enough to justify a minor release. Some of the features are
still in the pipeline waiting to be finished (MOB, procV2, etc).
Personally, I think we should get HBASE-12972, and ProcV2, RPC quotas (and
other multi-tenancy improvements not yet backported) and call it 1.1.

I would +1 either Nick or Andrew, both should be excellent RMs.

Enis

On Mon, Mar 16, 2015 at 11:05 AM, Andrew Purtell apurt...@apache.org
wrote:

 FWIW, the Region proposal (HBASE-12972) is ready for review. The companion
 issue for SplitTransaction and RegionMergeTransaction (HBASE-12975) needs
 more discussion but could be ready to go in a = one month timeframe.

 On Mon, Mar 16, 2015 at 10:30 AM, Nick Dimiduk ndimi...@gmail.com wrote:

  I think we can learn a lesson or two from the vendor marketing machines
 --
  a release timed with HBaseCon would be ideal in this regard. My
 obligations
  to the event are minimal, so I'm willing to volunteer as RM for 1.1. Do
 we
  think we can make some of these decisions in time for spinning RC's in
  mid-April? That's just about a month away.
 
  -n
 
  On Sat, Mar 14, 2015 at 10:37 AM, Elliott Clark ecl...@apache.org
 wrote:
 
   I'm most looking forward to rpc quotas and the buffer improvements that
   stack has put in. So for me getting a 1.1 in May 1 would be cool.
   That would allow us to talk about what was just released at HBaseCon,
 and
   maybe even have 1.1.0 in production at places.
  
   On Fri, Mar 13, 2015 at 11:44 AM, Sean Busbey bus...@cloudera.com
  wrote:
  
The only reason I can think of to make decisions now would be if we
  want
   to
ensure we have consensus for the changes for Phoenix and enough time
 to
implement them.
   
Given that AFAIK it's those changes that'll drive having a 1.1
 release,
seems prudent. But I haven't been tracking the changes lately.
   
I think we're all in agreement that something needs to be done, and
  that
HBase 1.1 and Phoenix 5 are the places to do it. Probably it won't be
contentious to just decide as changes are ready?
   
--
Sean
On Mar 13, 2015 1:28 PM, Andrew Purtell apurt...@apache.org
 wrote:
   
 That was my question.. We can discuss them independently? Or is
  there a
 reason not to?

 On Fri, Mar 13, 2015 at 11:10 AM, Sean Busbey bus...@cloudera.com
 
wrote:

  On Fri, Mar 13, 2015 at 12:31 PM, Andrew Purtell 
   apurt...@apache.org
  wrote:
 
   Do we need to couple decisions for 1.1 and 2.0 in the same
discussion?
  
  
  Like what? Interface changes for Phoenix maybe?
 
  --
  Sean
 



 --
 Best regards,

- Andy

 Problems worthy of attack prove their worth by hitting back. - Piet
   Hein
 (via Tom White)

   
  
 



 --
 Best regards,

- Andy

 Problems worthy of attack prove their worth by hitting back. - Piet Hein
 (via Tom White)

[jira] [Created] (HBASE-13252) Git rid of managed connections and connection caching

2015-03-16 Thread Mikhail Antonov (JIRA)

Mikhail Antonov created HBASE-13252:
---

 Summary: Git rid of managed connections and connection caching
 Key: HBASE-13252
 URL: https://issues.apache.org/jira/browse/HBASE-13252
 Project: HBase
  Issue Type: Sub-task
  Components: API
Affects Versions: 2.0.0
Reporter: Mikhail Antonov
Assignee: Mikhail Antonov


(Need to):

 - Remove CONNECTION_INSTANCES from ConnectionManager
 - Remove 'managed' property from ClusterConnection, HCI and places where it's 
used now
 - AFAIS this property isn't visible to client (ClusterConnection is private 
interface), so technically this would not even be backward-incompatible change, 
and no release note needed?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Status of Huawei's 2' Indexing?

2015-03-16 Thread Michael Segel

Joseph, 

First, I would strongly recommend against using HBase… but since you insist.

Lets start with your row key. 

1) REMEMBER HIPPA

2) How are you going to access the base table? 


So if for example, you’re never going to do a “Get me Mary Smith’s record” but 
more “Show me all of the patients who had a positive TB test and cluster them 
by zip code…” You may want to consider using a UUID since you’re always going 
to go after the data via an index. 

If you want to use a patient’s name  e.g. “last|first”, you will want to take 
the hash of it.  

Now lets talk about indexing. 

First, what’s the use case for the database? 
Do you want real time access to specific records? Then you would want to 
consider using Lucene, however that would be a bit more heavy lifting. 

The simplest index is an inverted table index. 
Two ways to implement. 

One is to create the row key as the attribute value and then each column 
contains the RowKey of the base table. using the Rowkey’s value as the column 
header as well so that you get your results in sort order. 

The other way is to create a single row per record where the rowkey is 
“attribute|RowKey” and then the only column is the Rowkey itself.  

This is more skinny table vs fat table  and of course you could do something in 
the middle that limits the number of columns to N columns per row and then your 
result set is a set of rows.

That’s pretty much it.  You build your index either via a M/R job or as you 
insert a row, you insert in to the index at the same time. 




 On Mar 16, 2015, at 12:46 PM, Rose, Joseph 
 joseph.r...@childrens.harvard.edu wrote:
 
 Alright, let’s see if I can get this discussion back on track.
 
 I have a sensibly defined table for patient data; its rowkey is simply
 lastname:firstname, since it’s convenient for the bulk of my lookups.
 Unfortunately I also need to efficiently find patients using an ID string,
 whose literal value is buried in a value field. I’m sure this situation is
 not foreign to the people on this list.
 
 It’s been suggested that I implement 2’ indexes myself — fine. All the
 research I’ve done seems to end with that suggestion, with the exception
 of Phoenix (I don’t want the RDBMS layer) and Huawei’s stuff (which seems
 to incite some discussion here). I’m happy to put this together but I’d
 rather go with something that has been vetted and has a larger developer
 community than one (i.e., ME). Besides, I have a full enough plate at the
 moment that I’d rather not have to do this, too.
 
 Are there constructive suggestions regarding how I can proceed with HBase?
 Right now even a well-vetted local index would be a godsend.
 
 Thanks.
 
 
 -j
 
 
 p.s., I’ll refer you to this post for a slightly more detailed rundown of
 how I plan to do things:
 http://article.gmane.org/gmane.comp.java.hadoop.hbase.user/46467
 
 
 On 3/16/15, 12:18 PM, Michael Segel michael_se...@hotmail.com wrote:
 
 Joseph, 
 
 The issue with Andrew goes back a few years.  His comment about having a
 civilized discussion was a personal dig at me.
 
 
 On Mar 16, 2015, at 10:38 AM, Rose, Joseph
 joseph.r...@childrens.harvard.edu wrote:
 
 Michael,
 
 I don’t understand the invective. I’m sure you have something to
 contribute but when bring on this tone the only thing I hear are the
 snide
 comments.
 
 
 -j
 
 
 P.s., I’ll refer you to this:
 https://urldefense.proofpoint.com/v2/url?u=https-3A__hbase.apache.org_boo
 k.html-23-5Fjoinsd=BQIFaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU
 r=j9wyupjEn0B7jf5KuX71llCBNN37RKmLLRc05fkUwaA79i0DrYaVuQHxlqAccDLcm=ujJC
 fI0GwgZ1Qx9be1fW7FIRqFeS-UmWVS304uhfKLss=2TGF0r5VvzExMqV31LmI3rQd4B8eJq_
 PqYKJXUqAjNke= 
 
 
 On 3/16/15, 11:15 AM, Michael Segel michael_se...@hotmail.com wrote:
 
 You’ll have to excuse Andy.
 
 He’s a bit slow.  HBASE-13044 should have been done 2 years ago. And it
 was trivial. Just got done last month….
 
 But I digress… The long story short…
 
 HBASE-9203 was brain dead from inception.  Huawei’s idea was to index
 on
 the region which had two problems.
 1) Complexity in that they wanted to keep the index on the same region
 server
 2) Joins become impossible.  Well, actually not impossible, but
 incredibly slow when compared to the alternative.
 
 You really should go back to the email chain.
 Their defense (including Salesforce who was going to push this
 approach)
 fell apart when you asked the simple question on how do you handle
 joins?
 
 That’s their OOPS moment. Once you start to understand that, then
 allowing the index to be orthogonal to the base table, things started
 to
 come together. 
 
 In short, you have a query either against a single table, or if you’re
 doing a join.  You then get the indexes and assuming that you’re only
 using the AND predicate, its a simple intersection of the index result
 sets. (Since the result sets are ordered, its relatively trivial to
 walk
 through and find the intersections of N Lists in a single pass.)
 
 
 Now you have your

Re: Question on EnableTableHandler code

2015-03-16 Thread Rajeshbabu Chintaguntla

Hi Stephen and Andrey,

The first step added to remove stale znodes if table creation fails after
znode creation.
See HBASE-10215 https://issues.apache.org/jira/browse/HBASE-10215.  Not
sure still we need it or not.

Thanks,
Rajeshbabu.




On Tue, Mar 17, 2015 at 12:18 AM, Andrey Stepachev oct...@gmail.com wrote:

 Thanks Stephen.

 on (2): I think that much better to guarantee that table was enabled (i.e.
 all internal structures reflect that fact and balancer knows about new
 table). But result of that could be checked asyncronically from Admin.
 Does it make sense?

 On Mon, Mar 16, 2015 at 6:10 PM, Stephen Jiang syuanjiang...@gmail.com
 wrote:

  Andrey, I will take care of (1).
 
  And (2) :-) if your guys agree.  Because it is not consistent, if the
 bulk
  assigned failed, we would fail the enabling table; however, if the bulk
  assign not starts, we would enable table with offline regions - really
  inconsistent - we either all fail in those scenarios or all succeed with
  offline region (best effort approach).
 
  Thanks
  Stephen
 
  On Mon, Mar 16, 2015 at 11:01 AM, Andrey Stepachev oct...@gmail.com
  wrote:
 
  Stephen, would you like to create jira for case (1)?
 
  Thank you.
 
  On Mon, Mar 16, 2015 at 5:58 PM, Andrey Stepachev oct...@gmail.com
  wrote:
 
   Thanks Stephen.
  
   Looks like you are right. For (1) case we really don't need there
   state cleanup. That is a bug. Should throw TableNotFoundException.
  
   As for (2) in case of no online region servers available we could
   leave table enabled, but no regions would be assigned.
  
   Actually that rises good question what enable table means,
   i.e. do we really need to guarantee that on table enable absolutely
   all regions are online, or that could be done in Admin on client side.
  
   So for now it seems that Enable handler do what is best, and leave
   table enabled but unassigned to be later assigned by Balancer.
  
   On Mon, Mar 16, 2015 at 5:34 PM, Stephen Jiang 
 syuanjiang...@gmail.com
  
   wrote:
  
   I want to make sure that the following logic in EnableTableHandler is
   correct:
  
   (1). In EnableTableHandler#prepare - if the table is not existed, it
   marked
   the table as deleted and not throw exception.  The result is the
 table
   lock
   is released and the caller has no knowledge that the table not exist
 or
   already deleted, it would continue the next step.
  
   Currently, this would happen during recovery (the caller is
   AssignmentManager#recoverTableInEnablingState()) - however, looking
 at
   recovery code, it expects TableNotFoundException  Should we always
  throw
   exception - if the table not exist?  I want to make sure that I don't
   break
   recovery logic by modifying.
  
   public EnableTableHandler prepare() {
  
   ...
  
   // Check if table exists
  
 if (!MetaTableAccessor.tableExists(this.server.getConnection(),
   tableName)) {
  
   // retainAssignment is true only during recovery.  In normal
  case
   it is false
  
   if (!this.skipTableStateCheck) {
  
 throw new TableNotFoundException(tableName);
  
   }
  
  
  this.assignmentManager.getTableStateManager().setDeletedTable(
   tableName);
  
 }
  
   ...
  
   }
   (2). In EnableTableHandler#handleEnableTable() - if the bulk assign
  plan
   could not be find, it would leave regions to be offline and declare
  enable
   table succeed - i think this is a bug and we should retry or fail -
  but I
   want to make sure that there are some merit behind this logic
  
 private void handleEnableTable() {
  
   MapServerName, ListHRegionInfo bulkPlan =
  
   this.assignmentManager.getBalancer().retainAssignment(
   regionsToAssign, onlineServers);
  
   if (bulkPlan != null) {
  
 ...
  
 } else {
  
 LOG.info(Balancer was unable to find suitable servers for
 table
   +
   tableName
  
 + , leaving unassigned);
  
 done = true;
  
   }
  
   if (done) {
  
 // Flip the table to enabled.
  
 this.assignmentManager.getTableStateManager().setTableState(
  
   this.tableName, TableState.State.ENABLED);
  
 LOG.info(Table ' + this.tableName
  
 + ' was successfully enabled. Status: done= + done);
  
  }
  
...
  
   }
  
  
   thanks
   Stephen
  
  
  
  
   --
   Andrey.
  
 
 
 
  --
  Andrey.
 
 
 


 --
 Andrey.

Re: Status of Huawei's 2' Indexing?

2015-03-16 Thread Rose, Joseph

Thanks, Wilm. I’ll look for the thread there.

Obviously I didn’t realize there was so much back story: I was asking
about this specific implementation because it seems to be fairly well
thought out and have good commentary in the Jira ticket (HBASE-9203). At
the time I thought it was mostly a dev concern. I think we’ve moved on, as
you pointed out.

I'd be happy to contribute to hbase if I have something to offer. I’m just
starting with this, so let’s see where it takes us.

For those of you joining us late, you can find the continuation here:
http://mail-archives.apache.org/mod_mbox/hbase-user/201503.mbox/%3C550722DA
.3040009%40gmail.com%3E


-j


On 3/16/15, 2:09 PM, Wilm Schumacher wilm.schumac...@gmail.com wrote:

Hi Joseph,

I think that you kicked off this discussion, because to implement an
indexing mechanism for hbase in general is much more complicate than
your specific problem. The people on this list want to bear every
possible (or at least A LOT) of applications in mind. A too easy
mechanism wouldn't fit the needs of most of the users (thus would be
useless), a more complicate model is harder to maintain and you would
have to find more coders etc.. Thus with your application question you
seemed to walked right into a very general discussion.

Furthermore this is a user question, as you do not want to change the
code of hbase, aren't you ;). I'll try an answer on the general user
list in a couple of minutes, thus more people can discuss and we can get
traffic out of this list, okay?

Best wishes

Wilm

Am 16.03.2015 um 18:46 schrieb Rose, Joseph:
 Alright, let’s see if I can get this discussion back on track.

 I have a sensibly defined table for patient data; its rowkey is simply
 lastname:firstname, since it’s convenient for the bulk of my lookups.
 Unfortunately I also need to efficiently find patients using an ID
string,
 whose literal value is buried in a value field. I’m sure this situation
is
 not foreign to the people on this list.

 It’s been suggested that I implement 2’ indexes myself — fine. All the
 research I’ve done seems to end with that suggestion, with the exception
 of Phoenix (I don’t want the RDBMS layer) and Huawei’s stuff (which
seems
 to incite some discussion here). I’m happy to put this together but I’d
 rather go with something that has been vetted and has a larger developer
 community than one (i.e., ME). Besides, I have a full enough plate at
the
 moment that I’d rather not have to do this, too.

 Are there constructive suggestions regarding how I can proceed with
HBase?
 Right now even a well-vetted local index would be a godsend.

 Thanks.


 -j


 p.s., I’ll refer you to this post for a slightly more detailed rundown
of
 how I plan to do things:
 
https://urldefense.proofpoint.com/v2/url?u=http-3A__article.gmane.org_gma
ne.comp.java.hadoop.hbase.user_46467d=BQIDaQc=qS4goWBT7poplM69zy_3xhKwE
W14JZMSdioCoppxeFUr=j9wyupjEn0B7jf5KuX71llCBNN37RKmLLRc05fkUwaA79i0DrYaV
uQHxlqAccDLcm=NwQpAjAe0QcCDK7Dp0galpRYD3IvcpoK3xijbLf1WFos=lBW_VCH7IruB
tyg3PhTjU_CW2-po9IFfiIYNMpglIRke=


 On 3/16/15, 12:18 PM, Michael Segel michael_se...@hotmail.com wrote:

 Joseph, 

 The issue with Andrew goes back a few years.  His comment about having
a
 civilized discussion was a personal dig at me.


 On Mar 16, 2015, at 10:38 AM, Rose, Joseph
 joseph.r...@childrens.harvard.edu wrote:

 Michael,

 I don’t understand the invective. I’m sure you have something to
 contribute but when bring on this tone the only thing I hear are the
 snide
 comments.


 -j


 P.s., I’ll refer you to this:
 
https://urldefense.proofpoint.com/v2/url?u=https-3A__hbase.apache.org_b
oo
 
k.html-23-5Fjoinsd=BQIFaQc=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeF
U
 
r=j9wyupjEn0B7jf5KuX71llCBNN37RKmLLRc05fkUwaA79i0DrYaVuQHxlqAccDLcm=uj
JC
 
fI0GwgZ1Qx9be1fW7FIRqFeS-UmWVS304uhfKLss=2TGF0r5VvzExMqV31LmI3rQd4B8eJ
q_
 PqYKJXUqAjNke=


 On 3/16/15, 11:15 AM, Michael Segel michael_se...@hotmail.com
wrote:

 You’ll have to excuse Andy.

 He’s a bit slow.  HBASE-13044 should have been done 2 years ago. And
it
 was trivial. Just got done last month….

 But I digress… The long story short…

 HBASE-9203 was brain dead from inception.  Huawei’s idea was to index
 on
 the region which had two problems.
 1) Complexity in that they wanted to keep the index on the same
region
 server
 2) Joins become impossible.  Well, actually not impossible, but
 incredibly slow when compared to the alternative.

 You really should go back to the email chain.
 Their defense (including Salesforce who was going to push this
 approach)
 fell apart when you asked the simple question on how do you handle
 joins?

 That’s their OOPS moment. Once you start to understand that, then
 allowing the index to be orthogonal to the base table, things started
 to
 come together.

 In short, you have a query either against a single table, or if
you’re
 doing a join.  You then get the indexes and assuming that you’re only
 using the AND predicate, its a simple

Re: Jira role cleanup

2015-03-16 Thread Sean Busbey

On Mon, Mar 16, 2015 at 1:28 PM, Andrew Purtell apurt...@apache.org wrote:

 On Mon, Mar 16, 2015 at 11:02 AM, Nick Dimiduk ndimi...@gmail.com wrote:

  bq. Our commit log conventions aren't universally followed, due to human
  error
 
  Going forward, I think we can alleviate this issue with a git hook and a
  regexp.
 

 That's a good idea.



FYI, ASF Infra doesn't allow custom server side git hooks[1] and I couldn't
find anything that does message enforcement casually checking in the asf
git hooks[2].

It looks like Apache Cloudstack has some client side hooks they recommend
for committers that do commit message enforcement[3]. We could use
something similar.

[1]: http://www.apache.org/dev/writable-git (last bullet)
[2]: http://s.apache.org/qxT
[3]:
https://cwiki.apache.org/confluence/display/CLOUDSTACK/Git#Git-CommitMessages

-- 
Sean

Re: Rough goal timelines for 1.1 and 2.0

2015-03-16 Thread Stack

Thanks for raising this topic Mr Busbey.

A 1.1 before hbasecon would be sweet. As has been said already, 1.1 has a
bunch of good stuff in it already -- e.g. flush by column family -- so
worthwhile pushing it out soon.

+1 on Nick for RM because it is good to spread the RM'ing load.

St.Ack

On Mon, Mar 16, 2015 at 11:50 AM, Enis Söztutar enis@gmail.com wrote:

 I would love to see 1.1 in or before May. We already have good stuff in
 branch-1, enough to justify a minor release. Some of the features are
 still in the pipeline waiting to be finished (MOB, procV2, etc).
 Personally, I think we should get HBASE-12972, and ProcV2, RPC quotas (and
 other multi-tenancy improvements not yet backported) and call it 1.1.

 I would +1 either Nick or Andrew, both should be excellent RMs.

 Enis

 On Mon, Mar 16, 2015 at 11:05 AM, Andrew Purtell apurt...@apache.org
 wrote:

  FWIW, the Region proposal (HBASE-12972) is ready for review. The
 companion
  issue for SplitTransaction and RegionMergeTransaction (HBASE-12975) needs
  more discussion but could be ready to go in a = one month timeframe.
 
  On Mon, Mar 16, 2015 at 10:30 AM, Nick Dimiduk ndimi...@gmail.com
 wrote:
 
   I think we can learn a lesson or two from the vendor marketing machines
  --
   a release timed with HBaseCon would be ideal in this regard. My
  obligations
   to the event are minimal, so I'm willing to volunteer as RM for 1.1. Do
  we
   think we can make some of these decisions in time for spinning RC's in
   mid-April? That's just about a month away.
  
   -n
  
   On Sat, Mar 14, 2015 at 10:37 AM, Elliott Clark ecl...@apache.org
  wrote:
  
I'm most looking forward to rpc quotas and the buffer improvements
 that
stack has put in. So for me getting a 1.1 in May 1 would be cool.
That would allow us to talk about what was just released at HBaseCon,
  and
maybe even have 1.1.0 in production at places.
   
On Fri, Mar 13, 2015 at 11:44 AM, Sean Busbey bus...@cloudera.com
   wrote:
   
 The only reason I can think of to make decisions now would be if we
   want
to
 ensure we have consensus for the changes for Phoenix and enough
 time
  to
 implement them.

 Given that AFAIK it's those changes that'll drive having a 1.1
  release,
 seems prudent. But I haven't been tracking the changes lately.

 I think we're all in agreement that something needs to be done, and
   that
 HBase 1.1 and Phoenix 5 are the places to do it. Probably it won't
 be
 contentious to just decide as changes are ready?

 --
 Sean
 On Mar 13, 2015 1:28 PM, Andrew Purtell apurt...@apache.org
  wrote:

  That was my question.. We can discuss them independently? Or is
   there a
  reason not to?
 
  On Fri, Mar 13, 2015 at 11:10 AM, Sean Busbey 
 bus...@cloudera.com
  
 wrote:
 
   On Fri, Mar 13, 2015 at 12:31 PM, Andrew Purtell 
apurt...@apache.org
   wrote:
  
Do we need to couple decisions for 1.1 and 2.0 in the same
 discussion?
   
   
   Like what? Interface changes for Phoenix maybe?
  
   --
   Sean
  
 
 
 
  --
  Best regards,
 
 - Andy
 
  Problems worthy of attack prove their worth by hitting back. -
 Piet
Hein
  (via Tom White)
 

   
  
 
 
 
  --
  Best regards,
 
 - Andy
 
  Problems worthy of attack prove their worth by hitting back. - Piet Hein
  (via Tom White)

Re: Jira role cleanup

2015-03-16 Thread Andrew Purtell

On Mon, Mar 16, 2015 at 11:02 AM, Nick Dimiduk ndimi...@gmail.com wrote:

 bq. Our commit log conventions aren't universally followed, due to human
 error

 Going forward, I think we can alleviate this issue with a git hook and a
 regexp.


That's a good idea.



 On Mon, Mar 16, 2015 at 10:38 AM, Andrew Purtell apurt...@apache.org
 wrote:

   I think Jira management should be left to the committers. One can
 pretty
  much mess up a release, and make it hard to account for what's in and
  what's not when jiras are changed the around (the ultimate truth can be
  reconstructed from the git commit records, but that's tedious).
 
  I agree we should avoid allowing contributors to change JIRA metadata if
  this is possible to restrict. Our commit log conventions aren't
 universally
  followed, due to human error, so they are not all tagged with issue
  identifiers, or the correct identifier.
 
 
 
  On Sun, Mar 15, 2015 at 11:12 PM, lars hofhansl la...@apache.org
 wrote:
 
   Hmm... This is interesting. I think Jira management should be left to
 the
   committers. One can pretty much mess up a release, and make it hard to
   account for what's in and what's not when jiras are changed the around
  (the
   ultimate truth can be reconstructed from the git commit records, but
  that's
   tedious).
   Minimally somebody needs to be able to assign a jira to the person
   providing the patch, if those are committers only that's tedious but
 OK -
   we've been doing that anyway.Ideally the person could assign an _open_
   issue to him/herself and log work against an issue and change the due
  data.
   Those seem abilities we could grant to everybody as long as they are
   limited to open issues.
   Beyond that I agree that we should limit this to a known set of people
   (the contributors). Maybe discuss this briefly at the next PMC meeting,
   we're due to have one anyway. I'm willing to host one at Salesforce.
  
   -- Lars
  
 From: Sean Busbey bus...@cloudera.com
To: dev dev@hbase.apache.org; lars hofhansl la...@apache.org
Sent: Sunday, March 15, 2015 9:46 PM
Subject: Re: Jira role cleanup
  
   I can make it so that issues can be assigned to non-contributors. Even
 if
   we don't do that, I believe jira permissions are all about constraining
   current actions, and are not enforced on existing ticketes.
  
   However, the contributor role in jira has several other abilities
   associated with it. Right now, in the order they appear in jira:
  
   * edit an issue's due date
   * move issues (between project workflows or projects the user has
 create
   on)
   * assign issues to other people
   * resolve and reopen issues, assign a fix version (but not close them)
   * manage watchers on an issue
   * log work against an issue
  
   Any of these could also be changed to remove contributors or allow
 wider
   jira users.
  
   If assignable users can assign to themselves when they don't have the
   assign users permission, then the only one I think we use is resolve
 and
   reopen issues. And I don't think I'd want that open to all jira users.
  
   Do we want to have to handle marking issues resolved for folks? It
 makes
   sense to me, since I usually do that once I push the commit.
  
  
  
  
  
   On Sun, Mar 15, 2015 at 11:07 PM, lars hofhansl la...@apache.org
  wrote:
  
Not sure what jira does about an assignee when (s)he is removed from
  the
contributors list (I know you have to add a person to the
 contributors
   list
order to assign a jira to them).Other than the committers, we
 probably
   have
at least one jira assigned to a contributor (otherwise why add
 him/her
  as
contributor).
Can we change the jira rules in our space to allow assigning jiras to
users even when they're not listed as contributors?
We do not have a formal contributor status (why not?), so this list
 is
only needed because of jira.
-- Lars
   
 From: Sean Busbey bus...@cloudera.com
 To: dev dev@hbase.apache.org
 Sent: Friday, March 13, 2015 9:09 AM
 Subject: Re: Jira role cleanup
   
On Fri, Mar 13, 2015 at 11:01 AM, Andrew Purtell 
 apurt...@apache.org
wrote:
   
 +1
 I think it would be fine to trim the contributor list too. We can
   always
 add people back on demand in order to (re)assign issues.


I wasn't sure how we generate the list of contributors. But then I
   noticed
that we don't link to jira for it like I thought we did[1].
   
How about I make a saved jira query for people who have had jira's
   assigned
to them, add a link to that query for our here are the contributors
section, and then trim off from the role anyone who hasn't been
  assigned
   an
issue in the last year?
   
   
[1]: http://hbase.apache.org/team-list.html
   
   
   
--
Sean
   
   
   
   
  
  
  
   --
   Sean
  
  
  
  
 
 
 
  --
  Best regards,
 
 - Andy
 
  Problems worthy of attack prove their

[jira] [Created] (HBASE-13251) Correct 'HBase, MapReduce, and the CLASSPATH' section in HBase Ref Guide

2015-03-16 Thread Jerry He (JIRA)

Jerry He created HBASE-13251:


 Summary: Correct 'HBase, MapReduce, and the CLASSPATH' section in 
HBase Ref Guide
 Key: HBASE-13251
 URL: https://issues.apache.org/jira/browse/HBASE-13251
 Project: HBase
  Issue Type: Improvement
  Components: documentation
Reporter: Jerry He


As [~busbey] pointed out in HBASE-13149, we have a section HBase, MapReduce, 
and the CLASSPATH in the HBase Ref Guide.
http://hbase.apache.org/book.html#hbase.mapreduce.classpath

There are duplication, errors and misinformation in the section.

Need to cleanup and polish it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Question on EnableTableHandler code

2015-03-16 Thread Andrey Stepachev

Thanks Stephen.

on (2): I think that much better to guarantee that table was enabled (i.e.
all internal structures reflect that fact and balancer knows about new
table). But result of that could be checked asyncronically from Admin.
Does it make sense?

On Mon, Mar 16, 2015 at 6:10 PM, Stephen Jiang syuanjiang...@gmail.com
wrote:

 Andrey, I will take care of (1).

 And (2) :-) if your guys agree.  Because it is not consistent, if the bulk
 assigned failed, we would fail the enabling table; however, if the bulk
 assign not starts, we would enable table with offline regions - really
 inconsistent - we either all fail in those scenarios or all succeed with
 offline region (best effort approach).

 Thanks
 Stephen

 On Mon, Mar 16, 2015 at 11:01 AM, Andrey Stepachev oct...@gmail.com
 wrote:

 Stephen, would you like to create jira for case (1)?

 Thank you.

 On Mon, Mar 16, 2015 at 5:58 PM, Andrey Stepachev oct...@gmail.com
 wrote:

  Thanks Stephen.
 
  Looks like you are right. For (1) case we really don't need there
  state cleanup. That is a bug. Should throw TableNotFoundException.
 
  As for (2) in case of no online region servers available we could
  leave table enabled, but no regions would be assigned.
 
  Actually that rises good question what enable table means,
  i.e. do we really need to guarantee that on table enable absolutely
  all regions are online, or that could be done in Admin on client side.
 
  So for now it seems that Enable handler do what is best, and leave
  table enabled but unassigned to be later assigned by Balancer.
 
  On Mon, Mar 16, 2015 at 5:34 PM, Stephen Jiang syuanjiang...@gmail.com
 
  wrote:
 
  I want to make sure that the following logic in EnableTableHandler is
  correct:
 
  (1). In EnableTableHandler#prepare - if the table is not existed, it
  marked
  the table as deleted and not throw exception.  The result is the table
  lock
  is released and the caller has no knowledge that the table not exist or
  already deleted, it would continue the next step.
 
  Currently, this would happen during recovery (the caller is
  AssignmentManager#recoverTableInEnablingState()) - however, looking at
  recovery code, it expects TableNotFoundException  Should we always
 throw
  exception - if the table not exist?  I want to make sure that I don't
  break
  recovery logic by modifying.
 
  public EnableTableHandler prepare() {
 
  ...
 
  // Check if table exists
 
if (!MetaTableAccessor.tableExists(this.server.getConnection(),
  tableName)) {
 
  // retainAssignment is true only during recovery.  In normal
 case
  it is false
 
  if (!this.skipTableStateCheck) {
 
throw new TableNotFoundException(tableName);
 
  }
 
  this.assignmentManager.getTableStateManager().setDeletedTable(
  tableName);
 
}
 
  ...
 
  }
  (2). In EnableTableHandler#handleEnableTable() - if the bulk assign
 plan
  could not be find, it would leave regions to be offline and declare
 enable
  table succeed - i think this is a bug and we should retry or fail -
 but I
  want to make sure that there are some merit behind this logic
 
private void handleEnableTable() {
 
  MapServerName, ListHRegionInfo bulkPlan =
 
  this.assignmentManager.getBalancer().retainAssignment(
  regionsToAssign, onlineServers);
 
  if (bulkPlan != null) {
 
...
 
} else {
 
LOG.info(Balancer was unable to find suitable servers for table
  +
  tableName
 
+ , leaving unassigned);
 
done = true;
 
  }
 
  if (done) {
 
// Flip the table to enabled.
 
this.assignmentManager.getTableStateManager().setTableState(
 
  this.tableName, TableState.State.ENABLED);
 
LOG.info(Table ' + this.tableName
 
+ ' was successfully enabled. Status: done= + done);
 
 }
 
   ...
 
  }
 
 
  thanks
  Stephen
 
 
 
 
  --
  Andrey.
 



 --
 Andrey.





-- 
Andrey.

[jira] [Created] (HBASE-13253) LoadIncrementalHFiles unify hfiles discovery

2015-03-16 Thread Matteo Bertozzi (JIRA)

Matteo Bertozzi created HBASE-13253:
---

 Summary: LoadIncrementalHFiles unify hfiles discovery
 Key: HBASE-13253
 URL: https://issues.apache.org/jira/browse/HBASE-13253
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Attachments: HBASE-13253-v0.patch

We have two copy-pasted code-path in createTable() and discoverLoadQueue(). 
They do does the same exact loop on the fs with the same validation logic. we 
should unify them, to avoid having them out of sync



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Rough goal timelines for 1.1 and 2.0

2015-03-16 Thread Andrew Purtell

Agreed! Since Nick has volunteered to RM 1.1 please let me withdraw my
earlier volunteerism for that task, unless Nick declines.

On Mon, Mar 16, 2015 at 12:15 PM, Stack st...@duboce.net wrote:

 Thanks for raising this topic Mr Busbey.

 A 1.1 before hbasecon would be sweet. As has been said already, 1.1 has a
 bunch of good stuff in it already -- e.g. flush by column family -- so
 worthwhile pushing it out soon.

 +1 on Nick for RM because it is good to spread the RM'ing load.

 St.Ack

 On Mon, Mar 16, 2015 at 11:50 AM, Enis Söztutar enis@gmail.com
 wrote:

  I would love to see 1.1 in or before May. We already have good stuff in
  branch-1, enough to justify a minor release. Some of the features are
  still in the pipeline waiting to be finished (MOB, procV2, etc).
  Personally, I think we should get HBASE-12972, and ProcV2, RPC quotas
 (and
  other multi-tenancy improvements not yet backported) and call it 1.1.
 
  I would +1 either Nick or Andrew, both should be excellent RMs.
 
  Enis
 
  On Mon, Mar 16, 2015 at 11:05 AM, Andrew Purtell apurt...@apache.org
  wrote:
 
   FWIW, the Region proposal (HBASE-12972) is ready for review. The
  companion
   issue for SplitTransaction and RegionMergeTransaction (HBASE-12975)
 needs
   more discussion but could be ready to go in a = one month timeframe.
  
   On Mon, Mar 16, 2015 at 10:30 AM, Nick Dimiduk ndimi...@gmail.com
  wrote:
  
I think we can learn a lesson or two from the vendor marketing
 machines
   --
a release timed with HBaseCon would be ideal in this regard. My
   obligations
to the event are minimal, so I'm willing to volunteer as RM for 1.1.
 Do
   we
think we can make some of these decisions in time for spinning RC's
 in
mid-April? That's just about a month away.
   
-n
   
On Sat, Mar 14, 2015 at 10:37 AM, Elliott Clark ecl...@apache.org
   wrote:
   
 I'm most looking forward to rpc quotas and the buffer improvements
  that
 stack has put in. So for me getting a 1.1 in May 1 would be cool.
 That would allow us to talk about what was just released at
 HBaseCon,
   and
 maybe even have 1.1.0 in production at places.

 On Fri, Mar 13, 2015 at 11:44 AM, Sean Busbey bus...@cloudera.com
 
wrote:

  The only reason I can think of to make decisions now would be if
 we
want
 to
  ensure we have consensus for the changes for Phoenix and enough
  time
   to
  implement them.
 
  Given that AFAIK it's those changes that'll drive having a 1.1
   release,
  seems prudent. But I haven't been tracking the changes lately.
 
  I think we're all in agreement that something needs to be done,
 and
that
  HBase 1.1 and Phoenix 5 are the places to do it. Probably it
 won't
  be
  contentious to just decide as changes are ready?
 
  --
  Sean
  On Mar 13, 2015 1:28 PM, Andrew Purtell apurt...@apache.org
   wrote:
 
   That was my question.. We can discuss them independently? Or is
there a
   reason not to?
  
   On Fri, Mar 13, 2015 at 11:10 AM, Sean Busbey 
  bus...@cloudera.com
   
  wrote:
  
On Fri, Mar 13, 2015 at 12:31 PM, Andrew Purtell 
 apurt...@apache.org
wrote:
   
 Do we need to couple decisions for 1.1 and 2.0 in the same
  discussion?


Like what? Interface changes for Phoenix maybe?
   
--
Sean
   
  
  
  
   --
   Best regards,
  
  - Andy
  
   Problems worthy of attack prove their worth by hitting back. -
  Piet
 Hein
   (via Tom White)
  
 

   
  
  
  
   --
   Best regards,
  
  - Andy
  
   Problems worthy of attack prove their worth by hitting back. - Piet
 Hein
   (via Tom White)
  
 




-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

mmap() based BucketCache IOEngine

2015-03-16 Thread Zee Chen

Of the existing BucketCache IOEngines, the FileIOEngine uses pread() to
copy data from kernel space to user space. This is a good choice when the
total working set size is much bigger than the available RAM and the
latency is dominated by IO access. However, when the entire working set is
small enough to fit in the RAM, using mmap() (and subsequent memcpy()) to
move data from kernel space to user space is faster. I have run some short
keyval gets tests and the results indicate a reduction of 2%-7% of kernel
CPU on my system, depending on the load. On the gets, the latency
histograms from mmap() are identical to those from pread(), but peak
throughput is close to 40% higher.

Already tested the patch at Flurry. Anyone interested in reviewing the
patch?

-Zee

Re: Question on EnableTableHandler code

2015-03-16 Thread Stephen Jiang

thanks, Rajeshbabu, HBASE-10215 is not the last change, The HBASE-7767
(hello, Andrey [?]) removed the exception throw code after setting up the
table state, what we really want is as follows (if Andrey agrees with the
change, I will create a JIRA and send out the patch today):

  // Check if table exists

  if (!MetaTableAccessor.tableExists(this.server.getConnection(),
tableName)) {

// retainAssignment is true only during recovery.  In normal case
it is false

if (this.skipTableStateCheck) {

  this.assignmentManager.getTableStateManager().setDeletedTable(
tableName);

}

   throw new TableNotFoundException(tableName);

  }



On Mon, Mar 16, 2015 at 12:09 PM, Rajeshbabu Chintaguntla 
chrajeshbab...@gmail.com wrote:

 Hi Stephen and Andrey,

 The first step added to remove stale znodes if table creation fails after
 znode creation.
 See HBASE-10215 https://issues.apache.org/jira/browse/HBASE-10215.  Not
 sure still we need it or not.

 Thanks,
 Rajeshbabu.




 On Tue, Mar 17, 2015 at 12:18 AM, Andrey Stepachev oct...@gmail.com
 wrote:

  Thanks Stephen.
 
  on (2): I think that much better to guarantee that table was enabled
 (i.e.
  all internal structures reflect that fact and balancer knows about new
  table). But result of that could be checked asyncronically from Admin.
  Does it make sense?
 
  On Mon, Mar 16, 2015 at 6:10 PM, Stephen Jiang syuanjiang...@gmail.com
  wrote:
 
   Andrey, I will take care of (1).
  
   And (2) :-) if your guys agree.  Because it is not consistent, if the
  bulk
   assigned failed, we would fail the enabling table; however, if the bulk
   assign not starts, we would enable table with offline regions - really
   inconsistent - we either all fail in those scenarios or all succeed
 with
   offline region (best effort approach).
  
   Thanks
   Stephen
  
   On Mon, Mar 16, 2015 at 11:01 AM, Andrey Stepachev oct...@gmail.com
   wrote:
  
   Stephen, would you like to create jira for case (1)?
  
   Thank you.
  
   On Mon, Mar 16, 2015 at 5:58 PM, Andrey Stepachev oct...@gmail.com
   wrote:
  
Thanks Stephen.
   
Looks like you are right. For (1) case we really don't need there
state cleanup. That is a bug. Should throw TableNotFoundException.
   
As for (2) in case of no online region servers available we could
leave table enabled, but no regions would be assigned.
   
Actually that rises good question what enable table means,
i.e. do we really need to guarantee that on table enable absolutely
all regions are online, or that could be done in Admin on client
 side.
   
So for now it seems that Enable handler do what is best, and leave
table enabled but unassigned to be later assigned by Balancer.
   
On Mon, Mar 16, 2015 at 5:34 PM, Stephen Jiang 
  syuanjiang...@gmail.com
   
wrote:
   
I want to make sure that the following logic in EnableTableHandler
 is
correct:
   
(1). In EnableTableHandler#prepare - if the table is not existed,
 it
marked
the table as deleted and not throw exception.  The result is the
  table
lock
is released and the caller has no knowledge that the table not
 exist
  or
already deleted, it would continue the next step.
   
Currently, this would happen during recovery (the caller is
AssignmentManager#recoverTableInEnablingState()) - however, looking
  at
recovery code, it expects TableNotFoundException  Should we always
   throw
exception - if the table not exist?  I want to make sure that I
 don't
break
recovery logic by modifying.
   
public EnableTableHandler prepare() {
   
...
   
// Check if table exists
   
  if
 (!MetaTableAccessor.tableExists(this.server.getConnection(),
tableName)) {
   
// retainAssignment is true only during recovery.  In
 normal
   case
it is false
   
if (!this.skipTableStateCheck) {
   
  throw new TableNotFoundException(tableName);
   
}
   
   
   this.assignmentManager.getTableStateManager().setDeletedTable(
tableName);
   
  }
   
...
   
}
(2). In EnableTableHandler#handleEnableTable() - if the bulk assign
   plan
could not be find, it would leave regions to be offline and declare
   enable
table succeed - i think this is a bug and we should retry or fail -
   but I
want to make sure that there are some merit behind this logic
   
  private void handleEnableTable() {
   
MapServerName, ListHRegionInfo bulkPlan =
   
this.assignmentManager.getBalancer().retainAssignment(
regionsToAssign, onlineServers);
   
if (bulkPlan != null) {
   
  ...
   
  } else {
   
  LOG.info(Balancer was unable to find suitable servers for
  table
+
tableName
   
  + , leaving unassigned);
   
  done = true;
   
}
   
if (done) {
   
  // Flip the table to

Re: Status of Huawei's 2' Indexing?

2015-03-16 Thread Rajeshbabu Chintaguntla

Hi Rose,

Sorry for late reply.

bq. Is there work on this that I don’t see?
You can try this [1] for checking something with 0.98.3 version(sorry not
that much latest). We thought of making it independent from HBase. Trying
to do when ever find time(only few kernel changes left in bulkload to
prepare and load data together to data table and all indexes in single
job.).

bq. Did I miss the mailing list thread where the architectural
differences were discussed?
You can find the discussion that time happened here[2].

By the time I started working on this in HBase lot of things done in
Phoenix indexing which I didn't know like 1)failover handling 2)data type
support 3) maintaining standard index meta data separately in catalog
tables 4) expression based filters in Phoenix and many more.. which are
missing in hindex. So we thought of integrating the same solution to
Phoenix first and able to do with minimal changes. To avoid the
complexities with colocation raised an improvement action in Phoenix hope
it simplifies many things[3].

[1] https://github.com/Huawei-Hadoop/hindex/tree/hbase-0.98
[2]
http://search-hadoop.com/m/L1qeI1U99nd1subj=Re+Design+review+Secondary+index+support+through+coprocess
[3] https://issues.apache.org/jira/browse/PHOENIX-1734

Thanks,
Rajeshbabu.

On Tue, Mar 17, 2015 at 12:52 AM, Michael Segel michael_se...@hotmail.com
wrote:

You miss the point.
Your index is going to be orthogonal to your base table.
Again, how do you handle joins?

In terms of indexing… you have to ways of building your index.
1) In a separate M/R job.
2) As each row is inserted, the coprocessor inserts the data in to the
secondary indexes.

More to your point…

Yes there is a delta between when you write your row to the base table and
when you write your row to your inverted index table.
The short answer is that time is relative and it doesn’t matter. Again,
you’re going to have to think about that issue for a while before it sinks
in. You’re not dealing with an RTOS problem… so its not real time but
subjective real time.

In terms of writing to two tables… what do you think your relational
database is doing? ;-)

I suggest you think more about the problem and the more you think about
the problem, you’ll understand that there are tradeoffs and when you walk
through the problem you’ll come to the conclusion that you want your index
table(s) to be orthogonal to the base table.

On Mar 16, 2015, at 12:54 PM, lars hofhansl la...@apache.org wrote:

Dude... Relax... Let's keep it cordial, please.

To the topic:
Any CS 101 student can implement an eventually consistent index on top
of HBase.
The part that is always missed is: How do you keep it consistent?There
you have essentially two choices: (1) every update to an indexed table
becomes a distributed transaction or (2) you keep region server local
indexes.
There is nothing wrong with #2. It's good for not-so-selective indexes.
There is also nothing wrong with #1. This one is good for highly
selective indexes (PK, etc)

Indexes and joins do not have to be conflated. And maybe your use case
is fine with eventually consistent indexes. In that case just write your
stuff into two tables and be done with it.

-- Lars

From: Michael Segel michael_se...@hotmail.com
To: dev@hbase.apache.org
Sent: Monday, March 16, 2015 8:14 AM
Subject: Re: Status of Huawei's 2' Indexing?

You’ll have to excuse Andy.

He’s a bit slow. HBASE-13044 should have been done 2 years ago. And it
was trivial. Just got done last month….

But I digress… The long story short…

HBASE-9203 was brain dead from inception. Huawei’s idea was to index on
the region which had two problems.
1) Complexity in that they wanted to keep the index on the same region
server
2) Joins become impossible. Well, actually not impossible, but
incredibly slow when compared to the alternative.

You really should go back to the email chain.
Their defense (including Salesforce who was going to push this approach)
fell apart when you asked the simple question on how do you handle joins?

That’s their OOPS moment. Once you start to understand that, then
allowing the index to be orthogonal to the base table, things started to
come together.

In short, you have a query either against a single table, or if you’re
doing a join. You then get the indexes and assuming that you’re only using
the AND predicate, its a simple intersection of the index result sets.
(Since the result sets are ordered, its relatively trivial to walk through
and find the intersections of N Lists in a single pass.)

Now you have your result set of base table row keys and you can work
with that data. (Either returning the records to the client, or as input to
a map/reduce job.

That’s the 30K view. There’s more to it, but once Salesforce got the
basic idea, they ran with it. It was really that simple concept that the
index would be orthogonal to the

52 matches

Mail list logo