date:20140111


[ 
https://issues.apache.org/jira/browse/HBASE-10296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868787#comment-13868787
 ] 

Feng Honghua commented on HBASE-10296:
--

bq.but that ZK path is used to find the hbase master even if it moves round a 
cluster -what would happen there?
Typically we adopt master-based paxos in practice, so naturally the master 
process hosting the master paxos replica is the active master. the active 
master is elected by paxos protocal, not by zk. and each standby master knows 
who is the current active master. when the active master moves around(for 
instance when active master dies or its lease timeout), the client or app who 
attempts to talk with the old active master will fail in two ways: fail to 
connect if active master dies, or fail by knowing it's now not the active 
master and the current new active master info. for the former the client/app 
will try randomly other alive master instance and that master will accept its 
request if it's the new active master, or tell it the current active master 
info if it's not the current active master. for the latter it can now talk to 
the active master...and like how to access a zk, client/app should know the 
master assemble addresses to access a  HBase cluster. (assuming you're saying 
finding the active master, correct me if I'm wrong)

 Replace ZK with a paxos running within master processes to provide better 
 master failover performance and state consistency
 ---

 Key: HBASE-10296
 URL: https://issues.apache.org/jira/browse/HBASE-10296
 Project: HBase
  Issue Type: Brainstorming
  Components: master, Region Assignment, regionserver
Reporter: Feng Honghua

 Currently master relies on ZK to elect active master, monitor liveness and 
 store almost all of its states, such as region states, table info, 
 replication info and so on. And zk also plays as a channel for 
 master-regionserver communication(such as in region assigning) and 
 client-regionserver communication(such as replication state/behavior change). 
 But zk as a communication channel is fragile due to its one-time watch and 
 asynchronous notification mechanism which together can leads to missed 
 events(hence missed messages), for example the master must rely on the state 
 transition logic's idempotence to maintain the region assigning state 
 machine's correctness, actually almost all of the most tricky inconsistency 
 issues can trace back their root cause to the fragility of zk as a 
 communication channel.
 Replace zk with paxos running within master processes have following benefits:
 1. better master failover performance: all master, either the active or the 
 standby ones, have the same latest states in memory(except lag ones but which 
 can eventually catch up later on). whenever the active master dies, the newly 
 elected active master can immediately play its role without such failover 
 work as building its in-memory states by consulting meta-table and zk.
 2. better state consistency: master's in-memory states are the only truth 
 about the system,which can eliminate inconsistency from the very beginning. 
 and though the states are contained by all masters, paxos guarantees they are 
 identical at any time.
 3. more direct and simple communication pattern: client changes state by 
 sending requests to master, master and regionserver talk directly to each 
 other by sending request and response...all don't bother to using a 
 third-party storage like zk which can introduce more uncertainty, worse 
 latency and more complexity.
 4. zk can only be used as liveness monitoring for determining if a 
 regionserver is dead, and later on we can eliminate zk totally when we build 
 heartbeat between master and regionserver.
 I know this might looks like a very crazy re-architect, but it deserves deep 
 thinking and serious discussion for it, right?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10296) Replace ZK with a paxos running within master processes to provide better master failover performance and state consistency

[
https://issues.apache.org/jira/browse/HBASE-10296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868792#comment-13868792
]

Feng Honghua commented on HBASE-10296:
--

bq.One aspect of ZK that is worth remembering is that it lets other apps keep
an eye on what is going on
Yes, this is a good question. ZK's watch/notification pattern can be viewed as
a communication mechanism: each ZK node represents a piece of data, app A
updates this ZK node when it updates the data, then app B which has a watch on
it will receives a notification when the data is updated.
If we use paxos to replace ZK, the data represented by each ZK node now is
hosted within each master process' memory as an data structure, updated via the
paxos replicated state machine triggered by client/regionserver requests. Now
the watch/notification center is moved from ZK to master, and we can still use
the node-watch-list mechanism for implementation which's used by ZK.
The above 'keep an eye on what is going on'(or watch/notify) now is changed in
two ways:
1. master - zk - regionserver communication now is replaced by
master-regionserver direct communication
2. client-zk-regionserver communication now is replaced by
client-master-regionserver communication (master plays the role of original
ZK)
a note: we can now provide more flexible options by exposing sync/async
notification and one-time/permanent watch. by ZK only one-time async watch is
provided.

Replace ZK with a paxos running within master processes to provide better
master failover performance and state consistency
---

Key: HBASE-10296
URL: https://issues.apache.org/jira/browse/HBASE-10296
Project: HBase
Issue Type: Brainstorming
Components: master, Region Assignment, regionserver
Reporter: Feng Honghua

Currently master relies on ZK to elect active master, monitor liveness and
store almost all of its states, such as region states, table info,
replication info and so on. And zk also plays as a channel for
master-regionserver communication(such as in region assigning) and
client-regionserver communication(such as replication state/behavior change).
But zk as a communication channel is fragile due to its one-time watch and
asynchronous notification mechanism which together can leads to missed
events(hence missed messages), for example the master must rely on the state
transition logic's idempotence to maintain the region assigning state
machine's correctness, actually almost all of the most tricky inconsistency
issues can trace back their root cause to the fragility of zk as a
communication channel.
Replace zk with paxos running within master processes have following benefits:
1. better master failover performance: all master, either the active or the
standby ones, have the same latest states in memory(except lag ones but which
can eventually catch up later on). whenever the active master dies, the newly
elected active master can immediately play its role without such failover
work as building its in-memory states by consulting meta-table and zk.
2. better state consistency: master's in-memory states are the only truth
about the system,which can eliminate inconsistency from the very beginning.
and though the states are contained by all masters, paxos guarantees they are
identical at any time.
3. more direct and simple communication pattern: client changes state by
sending requests to master, master and regionserver talk directly to each
other by sending request and response...all don't bother to using a
third-party storage like zk which can introduce more uncertainty, worse
latency and more complexity.
4. zk can only be used as liveness monitoring for determining if a
regionserver is dead, and later on we can eliminate zk totally when we build
heartbeat between master and regionserver.
I know this might looks like a very crazy re-architect, but it deserves deep
thinking and serious discussion for it, right?

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10296) Replace ZK with a paxos running within master processes to provide better master failover performance and state consistency


[ 
https://issues.apache.org/jira/browse/HBASE-10296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868800#comment-13868800
 ] 

Feng Honghua commented on HBASE-10296:
--

bq.The google chubby paper goes into some detail about why they implemented a 
Paxos Service and not a paxos library.
I believe google should have a paxos library, which is used in megastore and 
spanner, right? And this fact is mentioned in a google paper *paxos made live* 
:-)
Implementing paxos as a standalone/shared service or a library has their own 
benefits and drawbacks.
A service: simple API and simple for app to use, can be shared by multiple 
apps; but abuse by one app can negatively influence other apps using the same 
paxos service (we ever encountered several times such cases before :-()
A library: more difficult for app to use, but have better isolation level(won't 
be affected by possible abuse from other app), and have more primitives and 
more flexibility.

 Replace ZK with a paxos running within master processes to provide better 
 master failover performance and state consistency
 ---

 Key: HBASE-10296
 URL: https://issues.apache.org/jira/browse/HBASE-10296
 Project: HBase
  Issue Type: Brainstorming
  Components: master, Region Assignment, regionserver
Reporter: Feng Honghua

 Currently master relies on ZK to elect active master, monitor liveness and 
 store almost all of its states, such as region states, table info, 
 replication info and so on. And zk also plays as a channel for 
 master-regionserver communication(such as in region assigning) and 
 client-regionserver communication(such as replication state/behavior change). 
 But zk as a communication channel is fragile due to its one-time watch and 
 asynchronous notification mechanism which together can leads to missed 
 events(hence missed messages), for example the master must rely on the state 
 transition logic's idempotence to maintain the region assigning state 
 machine's correctness, actually almost all of the most tricky inconsistency 
 issues can trace back their root cause to the fragility of zk as a 
 communication channel.
 Replace zk with paxos running within master processes have following benefits:
 1. better master failover performance: all master, either the active or the 
 standby ones, have the same latest states in memory(except lag ones but which 
 can eventually catch up later on). whenever the active master dies, the newly 
 elected active master can immediately play its role without such failover 
 work as building its in-memory states by consulting meta-table and zk.
 2. better state consistency: master's in-memory states are the only truth 
 about the system,which can eliminate inconsistency from the very beginning. 
 and though the states are contained by all masters, paxos guarantees they are 
 identical at any time.
 3. more direct and simple communication pattern: client changes state by 
 sending requests to master, master and regionserver talk directly to each 
 other by sending request and response...all don't bother to using a 
 third-party storage like zk which can introduce more uncertainty, worse 
 latency and more complexity.
 4. zk can only be used as liveness monitoring for determining if a 
 regionserver is dead, and later on we can eliminate zk totally when we build 
 heartbeat between master and regionserver.
 I know this might looks like a very crazy re-architect, but it deserves deep 
 thinking and serious discussion for it, right?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10296) Replace ZK with a paxos running within master processes to provide better master failover performance and state consistency

[
https://issues.apache.org/jira/browse/HBASE-10296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868802#comment-13868802
]

Feng Honghua commented on HBASE-10296:
--

[~lhofhansl] / [~apurtell] / [~ste...@apache.org] / [~e...@apache.org] :
1. paxos / raft / zab library extracted from ZK are all good candidates :-)
2. I agree that implementing a *correct* consensus protocal for production
usage is extremely hard, that's why I tagged this jira's type as brainstorming.
my intention is to raise it to discuss what's a better / more reasonable
architect would look like.
3. If we finally all agree on a better architect/design after
analysis/discussion/proof, we can approach it in an conservative and
incremental way, maybe eventually someday we make it :-)

Replace ZK with a paxos running within master processes to provide better
master failover performance and state consistency
---

Key: HBASE-10296
URL: https://issues.apache.org/jira/browse/HBASE-10296
Project: HBase
Issue Type: Brainstorming
Components: master, Region Assignment, regionserver
Reporter: Feng Honghua

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10296) Replace ZK with a paxos running within master processes to provide better master failover performance and state consistency


[ 
https://issues.apache.org/jira/browse/HBASE-10296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868803#comment-13868803
 ] 

Feng Honghua commented on HBASE-10296:
--

thanks all guys for the questions/directing/material/history-notes, really 
appreciated :-)

 Replace ZK with a paxos running within master processes to provide better 
 master failover performance and state consistency
 ---

 Key: HBASE-10296
 URL: https://issues.apache.org/jira/browse/HBASE-10296
 Project: HBase
  Issue Type: Brainstorming
  Components: master, Region Assignment, regionserver
Reporter: Feng Honghua

 Currently master relies on ZK to elect active master, monitor liveness and 
 store almost all of its states, such as region states, table info, 
 replication info and so on. And zk also plays as a channel for 
 master-regionserver communication(such as in region assigning) and 
 client-regionserver communication(such as replication state/behavior change). 
 But zk as a communication channel is fragile due to its one-time watch and 
 asynchronous notification mechanism which together can leads to missed 
 events(hence missed messages), for example the master must rely on the state 
 transition logic's idempotence to maintain the region assigning state 
 machine's correctness, actually almost all of the most tricky inconsistency 
 issues can trace back their root cause to the fragility of zk as a 
 communication channel.
 Replace zk with paxos running within master processes have following benefits:
 1. better master failover performance: all master, either the active or the 
 standby ones, have the same latest states in memory(except lag ones but which 
 can eventually catch up later on). whenever the active master dies, the newly 
 elected active master can immediately play its role without such failover 
 work as building its in-memory states by consulting meta-table and zk.
 2. better state consistency: master's in-memory states are the only truth 
 about the system,which can eliminate inconsistency from the very beginning. 
 and though the states are contained by all masters, paxos guarantees they are 
 identical at any time.
 3. more direct and simple communication pattern: client changes state by 
 sending requests to master, master and regionserver talk directly to each 
 other by sending request and response...all don't bother to using a 
 third-party storage like zk which can introduce more uncertainty, worse 
 latency and more complexity.
 4. zk can only be used as liveness monitoring for determining if a 
 regionserver is dead, and later on we can eliminate zk totally when we build 
 heartbeat between master and regionserver.
 I know this might looks like a very crazy re-architect, but it deserves deep 
 thinking and serious discussion for it, right?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10295) Refactor the replication implementation to eliminate permanent zk node


[ 
https://issues.apache.org/jira/browse/HBASE-10295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868811#comment-13868811
 ] 

Feng Honghua commented on HBASE-10295:
--

[~stack] Yes, sound feasible, thanks :-)

 Refactor the replication  implementation to eliminate permanent zk node
 ---

 Key: HBASE-10295
 URL: https://issues.apache.org/jira/browse/HBASE-10295
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Feng Honghua
Assignee: Feng Honghua
 Fix For: 0.99.0


 Though this is a broader and bigger change, it original motivation derives 
 from [HBASE-8751|https://issues.apache.org/jira/browse/HBASE-8751]: the newly 
 introduced per-peer tableCFs attribute should be treated the same way as the 
 peer-state, which is a permanent sub-node under peer node but using permanent 
 zk node is deemed as an incorrect practice. So let's refactor to eliminate 
 the permanent zk node. And the HBASE-8751 can then align its newly introduced 
 per-peer tableCFs attribute with this *correct* implementation theme.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-9203) Secondary index support through coprocessors


[ 
https://issues.apache.org/jira/browse/HBASE-9203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868812#comment-13868812
 ] 

Anoop Sam John commented on HBASE-9203:
---

I think instead of the padding approach, can change this to having a separator 
byte. (0 byte) .. That should work out.

 Secondary index support through coprocessors
 

 Key: HBASE-9203
 URL: https://issues.apache.org/jira/browse/HBASE-9203
 Project: HBase
  Issue Type: New Feature
Reporter: rajeshbabu
Assignee: rajeshbabu
 Attachments: SecondaryIndex Design.pdf, SecondaryIndex 
 Design_Updated.pdf, SecondaryIndex Design_Updated_2.pdf


 We have been working on implementing secondary index in HBase and open 
 sourced  on hbase 0.94.8 version.
 The project is available on github.
 https://github.com/Huawei-Hadoop/hindex
 This Jira is to support secondary index on trunk(0.98).
 Following features will be supported.
 -  multiple indexes on table,
 -  multi column index,
 -  index based on part of a column value,
 -  equals and range condition scans using index, and
 -  bulk loading data to indexed table (Indexing done with bulk load)
 Most of the kernel changes needed for secondary index is available in trunk. 
 Very minimal changes needed for it.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10310) ZNodeCleaner session expired for /hbase/master


[ 
https://issues.apache.org/jira/browse/HBASE-10310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868819#comment-13868819
 ] 

Hudson commented on HBASE-10310:


SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-1.1 #50 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/50/])
HBASE-10310. ZNodeCleaner session expired for /hbase/master (Samir Ahmic) 
(apurtell: rev 1557273)
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ZNodeClearer.java


 ZNodeCleaner session expired for /hbase/master
 --

 Key: HBASE-10310
 URL: https://issues.apache.org/jira/browse/HBASE-10310
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.96.1.1
 Environment: x86_64 GNU/Linux
Reporter: Samir Ahmic
Assignee: Samir Ahmic
 Fix For: 0.98.0, 0.96.2, 0.99.0

 Attachments: HBASE-10310.patch


 I was testing hbase master clear command while working on [HBASE-7386] here 
 is command and exception:
 {code}
 $ export HBASE_ZNODE_FILE=/tmp/hbase-hadoop-master.znode; ./hbase master clear
 14/01/10 14:05:44 INFO zookeeper.ZooKeeper: Initiating client connection, 
 connectString=zk1:2181 sessionTimeout=9 watcher=clean znode for master, 
 quorum=zk1:2181, baseZNode=/hbase
 14/01/10 14:05:44 INFO zookeeper.RecoverableZooKeeper: Process 
 identifier=clean znode for master connecting to ZooKeeper ensemble=zk1:2181
 14/01/10 14:05:44 INFO zookeeper.ClientCnxn: Opening socket connection to 
 server zk1/172.17.33.5:2181. Will not attempt to authenticate using SASL 
 (Unable to locate a login configuration)
 14/01/10 14:05:44 INFO zookeeper.ClientCnxn: Socket connection established to 
 zk11/172.17.33.5:2181, initiating session
 14/01/10 14:05:44 INFO zookeeper.ClientCnxn: Session establishment complete 
 on server zk1/172.17.33.5:2181, sessionid = 0x1427a96bfea4a8a, negotiated 
 timeout = 4
 14/01/10 14:05:44 INFO zookeeper.ZooKeeper: Session: 0x1427a96bfea4a8a closed
 14/01/10 14:05:44 INFO zookeeper.ClientCnxn: EventThread shut down
 14/01/10 14:05:44 WARN zookeeper.RecoverableZooKeeper: Possibly transient 
 ZooKeeper, quorum=zk1:2181, 
 exception=org.apache.zookeeper.KeeperException$SessionExpiredException: 
 KeeperErrorCode = Session expired for /hbase/master
 14/01/10 14:05:44 INFO util.RetryCounter: Sleeping 1000ms before retry #0...
 14/01/10 14:05:45 WARN zookeeper.RecoverableZooKeeper: Possibly transient 
 ZooKeeper, quorum=zk1:2181, 
 exception=org.apache.zookeeper.KeeperException$SessionExpiredException: 
 KeeperErrorCode = Session expired for /hbase/master
 14/01/10 14:05:45 ERROR zookeeper.RecoverableZooKeeper: ZooKeeper getData 
 failed after 1 attempts
 14/01/10 14:05:45 WARN zookeeper.ZKUtil: clean znode for 
 master-0x1427a96bfea4a8a, quorum=zk1:2181, baseZNode=/hbase Unable to get 
 data of znode /hbase/master
 org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
 = Session expired for /hbase/master
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
   at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
   at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:337)
   at 
 org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataNoWatch(ZKUtil.java:777)
   at 
 org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.deleteIfEquals(MasterAddressTracker.java:170)
   at org.apache.hadoop.hbase.ZNodeClearer.clear(ZNodeClearer.java:160)
   at 
 org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:138)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
   at 
 org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
   at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2779)
 14/01/10 14:05:45 ERROR zookeeper.ZooKeeperWatcher: clean znode for 
 master-0x1427a96bfea4a8a, quorum=zk1:2181, baseZNode=/hbase Received 
 unexpected KeeperException, re-throwing exception
 org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
 = Session expired for /hbase/master
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
   at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
   at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
   at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:337)
   at 
 org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataNoWatch(ZKUtil.java:777)
   at 
 org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.deleteIfEquals(MasterAddressTracker.java:170)
   at

[jira] [Commented] (HBASE-10318) generate-hadoopX-poms.sh expects the version to have one extra '-'


[ 
https://issues.apache.org/jira/browse/HBASE-10318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868821#comment-13868821
 ] 

Hudson commented on HBASE-10318:


SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-1.1 #50 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/50/])
HBASE-10318. generate-hadoopX-poms.sh expects the version to have one extra '-' 
(Raja Aluri) (apurtell: rev 1557301)
* /hbase/trunk/dev-support/generate-hadoopX-poms.sh


 generate-hadoopX-poms.sh expects the version to have one extra '-'
 --

 Key: HBASE-10318
 URL: https://issues.apache.org/jira/browse/HBASE-10318
 Project: HBase
  Issue Type: Bug
  Components: build
Affects Versions: 0.98.0
Reporter: Raja Aluri
Assignee: Raja Aluri
 Fix For: 0.98.0, 0.99.0

 Attachments: 
 0001-HBASE-10318-generate-hadoopX-poms.sh-expects-the-ver.patch


 This change is in 0.96 branch, but missing in 0.98.
 Including the commit that made this 
 [change|https://github.com/apache/hbase/commit/09442ca]



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10265) Upgrade to commons-logging 1.1.3


[ 
https://issues.apache.org/jira/browse/HBASE-10265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868820#comment-13868820
 ] 

Hudson commented on HBASE-10265:


SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-1.1 #50 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/50/])
HBASE-10265 Upgrade to commons-logging 1.1.3 (liangxie: rev 1557299)
* /hbase/trunk/pom.xml


 Upgrade to commons-logging 1.1.3
 

 Key: HBASE-10265
 URL: https://issues.apache.org/jira/browse/HBASE-10265
 Project: HBase
  Issue Type: Improvement
  Components: build
Affects Versions: 0.99.0
Reporter: Liang Xie
Assignee: Liang Xie
 Fix For: 0.99.0

 Attachments: HBASE-10265.txt


 Per HADOOP-10147 and HDFS-5678, through we didn't observe any deadlock due to 
 common-logging in HBase, to me, it's still worth to bump the version.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-9203) Secondary index support through coprocessors

2014-01-11 Thread ramkrishna.s.vasudevan (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-9203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868831#comment-13868831
]

ramkrishna.s.vasudevan commented on HBASE-9203:
---

bout scan(e.g. there's a filter on the indexed column) performance. how to
decide or evaluate when we do the query into user table directly and when we
do the query into index table first then do the (multi-)get into user table ?
You mean the dynamic decision whether to use index or not? Should that be user
decision?

Secondary index support through coprocessors

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10292) TestRegionServerCoprocessorExceptionWithAbort fails occasionally

2014-01-11 Thread Andrew Purtell (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868852#comment-13868852
 ] 

Andrew Purtell commented on HBASE-10292:


I am going to commit the addendum momentarily because after 30 executions of 
the complete unit test suite on the test box where this test was failing, now 
there are zero test failures.

 TestRegionServerCoprocessorExceptionWithAbort fails occasionally
 

 Key: HBASE-10292
 URL: https://issues.apache.org/jira/browse/HBASE-10292
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.98.0, 0.99.0

 Attachments: 10292-addendum-1.patch, 10292.patch, 10292.patch


 TestRegionServerCoprocessorExceptionWithAbort has occasionally failed for a 
 very long time now. Fix or disable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10277) refactor AsyncProcess

2014-01-11 Thread Andrew Purtell (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-10277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868855#comment-13868855
]

Andrew Purtell commented on HBASE-10277:

bq. The (ugly) behavior for HTable where e.g. next put will give you errors
from previous put was lovingly preserved.

To my mind this is a serious problem because the application is getting
incorrect feedback on what operation actually failed.

refactor AsyncProcess
-

Key: HBASE-10277
URL: https://issues.apache.org/jira/browse/HBASE-10277
Project: HBase
Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Attachments: HBASE-10277.patch

AsyncProcess currently has two patterns of usage, one from HTable flush w/o
callback and with reuse, and one from HCM/HTable batch call, with callback
and w/o reuse. In the former case (but not the latter), it also does some
throttling of actions on initial submit call, limiting the number of
outstanding actions per server.
The latter case is relatively straightforward. The former appears to be error
prone due to reuse - if, as javadoc claims should be safe, multiple submit
calls are performed without waiting for the async part of the previous call
to finish, fields like hasError become ambiguous and can be used for the
wrong call; callback for success/failure is called based on original index
of an action in submitted list, but with only one callback supplied to AP in
ctor it's not clear to which submit call the index belongs, if several are
outstanding.
I was going to add support for HBASE-10070 to AP, and found that it might be
difficult to do cleanly.
It would be nice to normalize AP usage patterns; in particular, separate the
global part (load tracking) from per-submit-call part.
Per-submit part can more conveniently track stuff like initialActions,
mapping of indexes and retry information, that is currently passed around the
method calls.
I am not sure yet, but maybe sending of the original index to server in
ClientProtos.MultiAction can also be avoided.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Resolved] (HBASE-10292) TestRegionServerCoprocessorExceptionWithAbort fails occasionally

2014-01-11 Thread Andrew Purtell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell resolved HBASE-10292.


Resolution: Fixed

 TestRegionServerCoprocessorExceptionWithAbort fails occasionally
 

 Key: HBASE-10292
 URL: https://issues.apache.org/jira/browse/HBASE-10292
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.98.0, 0.99.0

 Attachments: 10292-addendum-1.patch, 10292.patch, 10292.patch


 TestRegionServerCoprocessorExceptionWithAbort has occasionally failed for a 
 very long time now. Fix or disable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10292) TestRegionServerCoprocessorExceptionWithAbort fails occasionally


[ 
https://issues.apache.org/jira/browse/HBASE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868885#comment-13868885
 ] 

Hudson commented on HBASE-10292:


FAILURE: Integrated in HBase-0.98 #72 (See 
[https://builds.apache.org/job/HBase-0.98/72/])
Amend HBASE-10292. TestRegionServerCoprocessorExceptionWithAbort fails 
occasionally (apurtell: rev 1557438)
* 
/hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithAbort.java
* 
/hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithRemove.java


 TestRegionServerCoprocessorExceptionWithAbort fails occasionally
 

 Key: HBASE-10292
 URL: https://issues.apache.org/jira/browse/HBASE-10292
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.98.0, 0.99.0

 Attachments: 10292-addendum-1.patch, 10292.patch, 10292.patch


 TestRegionServerCoprocessorExceptionWithAbort has occasionally failed for a 
 very long time now. Fix or disable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10296) Replace ZK with a paxos running within master processes to provide better master failover performance and state consistency


[ 
https://issues.apache.org/jira/browse/HBASE-10296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868890#comment-13868890
 ] 

Lars Hofhansl commented on HBASE-10296:
---

I always thought that having processes participate in the

 Replace ZK with a paxos running within master processes to provide better 
 master failover performance and state consistency
 ---

 Key: HBASE-10296
 URL: https://issues.apache.org/jira/browse/HBASE-10296
 Project: HBase
  Issue Type: Brainstorming
  Components: master, Region Assignment, regionserver
Reporter: Feng Honghua

 Currently master relies on ZK to elect active master, monitor liveness and 
 store almost all of its states, such as region states, table info, 
 replication info and so on. And zk also plays as a channel for 
 master-regionserver communication(such as in region assigning) and 
 client-regionserver communication(such as replication state/behavior change). 
 But zk as a communication channel is fragile due to its one-time watch and 
 asynchronous notification mechanism which together can leads to missed 
 events(hence missed messages), for example the master must rely on the state 
 transition logic's idempotence to maintain the region assigning state 
 machine's correctness, actually almost all of the most tricky inconsistency 
 issues can trace back their root cause to the fragility of zk as a 
 communication channel.
 Replace zk with paxos running within master processes have following benefits:
 1. better master failover performance: all master, either the active or the 
 standby ones, have the same latest states in memory(except lag ones but which 
 can eventually catch up later on). whenever the active master dies, the newly 
 elected active master can immediately play its role without such failover 
 work as building its in-memory states by consulting meta-table and zk.
 2. better state consistency: master's in-memory states are the only truth 
 about the system,which can eliminate inconsistency from the very beginning. 
 and though the states are contained by all masters, paxos guarantees they are 
 identical at any time.
 3. more direct and simple communication pattern: client changes state by 
 sending requests to master, master and regionserver talk directly to each 
 other by sending request and response...all don't bother to using a 
 third-party storage like zk which can introduce more uncertainty, worse 
 latency and more complexity.
 4. zk can only be used as liveness monitoring for determining if a 
 regionserver is dead, and later on we can eliminate zk totally when we build 
 heartbeat between master and regionserver.
 I know this might looks like a very crazy re-architect, but it deserves deep 
 thinking and serious discussion for it, right?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Comment Edited] (HBASE-10296) Replace ZK with a paxos running within master processes to provide better master failover performance and state consistency


[ 
https://issues.apache.org/jira/browse/HBASE-10296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868890#comment-13868890
 ] 

Lars Hofhansl edited comment on HBASE-10296 at 1/11/14 9:59 PM:


I always thought that having processes participate in the coordination process 
directly (as group members) rather than using an external group membership 
would be better, which I was very disappointed when I first looked at ZK that 
ZAB was buried to deeply in with the rest of ZK.

ZK on the other hand is simple (because somebody else solved the hard problems 
for us). So I can see this go both ways.

On some level that ties into the discussion as to why we have master and 
regionserver roles. Cannot all servers serve both roles as needed?



was (Author: lhofhansl):
I always thought that having processes participate in the

 Replace ZK with a paxos running within master processes to provide better 
 master failover performance and state consistency
 ---

 Key: HBASE-10296
 URL: https://issues.apache.org/jira/browse/HBASE-10296
 Project: HBase
  Issue Type: Brainstorming
  Components: master, Region Assignment, regionserver
Reporter: Feng Honghua

 Currently master relies on ZK to elect active master, monitor liveness and 
 store almost all of its states, such as region states, table info, 
 replication info and so on. And zk also plays as a channel for 
 master-regionserver communication(such as in region assigning) and 
 client-regionserver communication(such as replication state/behavior change). 
 But zk as a communication channel is fragile due to its one-time watch and 
 asynchronous notification mechanism which together can leads to missed 
 events(hence missed messages), for example the master must rely on the state 
 transition logic's idempotence to maintain the region assigning state 
 machine's correctness, actually almost all of the most tricky inconsistency 
 issues can trace back their root cause to the fragility of zk as a 
 communication channel.
 Replace zk with paxos running within master processes have following benefits:
 1. better master failover performance: all master, either the active or the 
 standby ones, have the same latest states in memory(except lag ones but which 
 can eventually catch up later on). whenever the active master dies, the newly 
 elected active master can immediately play its role without such failover 
 work as building its in-memory states by consulting meta-table and zk.
 2. better state consistency: master's in-memory states are the only truth 
 about the system,which can eliminate inconsistency from the very beginning. 
 and though the states are contained by all masters, paxos guarantees they are 
 identical at any time.
 3. more direct and simple communication pattern: client changes state by 
 sending requests to master, master and regionserver talk directly to each 
 other by sending request and response...all don't bother to using a 
 third-party storage like zk which can introduce more uncertainty, worse 
 latency and more complexity.
 4. zk can only be used as liveness monitoring for determining if a 
 regionserver is dead, and later on we can eliminate zk totally when we build 
 heartbeat between master and regionserver.
 I know this might looks like a very crazy re-architect, but it deserves deep 
 thinking and serious discussion for it, right?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10292) TestRegionServerCoprocessorExceptionWithAbort fails occasionally


[ 
https://issues.apache.org/jira/browse/HBASE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868893#comment-13868893
 ] 

Hudson commented on HBASE-10292:


FAILURE: Integrated in HBase-TRUNK #4808 (See 
[https://builds.apache.org/job/HBase-TRUNK/4808/])
Amend HBASE-10292. TestRegionServerCoprocessorExceptionWithAbort fails 
occasionally (apurtell: rev 1557436)
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithAbort.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithRemove.java


 TestRegionServerCoprocessorExceptionWithAbort fails occasionally
 

 Key: HBASE-10292
 URL: https://issues.apache.org/jira/browse/HBASE-10292
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.98.0, 0.99.0

 Attachments: 10292-addendum-1.patch, 10292.patch, 10292.patch


 TestRegionServerCoprocessorExceptionWithAbort has occasionally failed for a 
 very long time now. Fix or disable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10292) TestRegionServerCoprocessorExceptionWithAbort fails occasionally


[ 
https://issues.apache.org/jira/browse/HBASE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868899#comment-13868899
 ] 

Hudson commented on HBASE-10292:


SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #67 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/67/])
Amend HBASE-10292. TestRegionServerCoprocessorExceptionWithAbort fails 
occasionally (apurtell: rev 1557438)
* 
/hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithAbort.java
* 
/hbase/branches/0.98/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithRemove.java


 TestRegionServerCoprocessorExceptionWithAbort fails occasionally
 

 Key: HBASE-10292
 URL: https://issues.apache.org/jira/browse/HBASE-10292
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.98.0, 0.99.0

 Attachments: 10292-addendum-1.patch, 10292.patch, 10292.patch


 TestRegionServerCoprocessorExceptionWithAbort has occasionally failed for a 
 very long time now. Fix or disable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (HBASE-10320) Avoid ArrayList.iterator() in tight loops

Lars Hofhansl created HBASE-10320:
-

 Summary: Avoid ArrayList.iterator() in tight loops
 Key: HBASE-10320
 URL: https://issues.apache.org/jira/browse/HBASE-10320
 Project: HBase
  Issue Type: Bug
  Components: Performance
Reporter: Lars Hofhansl


I noticed that in a profiler (sampler) run ScanQueryMatcher.setRow(...) showed 
up at all.
In turns out that the expensive part is iterating over the columns in 
ExcplicitColumnTracker.reset(). I did some microbenchmarks and found that
{code}
private ArrayListX l;
...
for (int i=0; il.size(); i++) {
   X = l.get(i);
   ...
}
{code}
Is twice as fast than:
{code}
private ArrayListX l;
...
for (X : l) {
   ...
}
{code}

The indexed version asymptotically approaches the iterator version, but even at 
1m entries it is still faster.
In my tight loop scans this provides for a 5% performance improvement overall 
when the ExcplicitColumnTracker is used.




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HBASE-10320) Avoid ArrayList.iterator() in tight loops


 [ 
https://issues.apache.org/jira/browse/HBASE-10320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-10320:
--

Attachment: 10320-0.94.txt

Simple patch for 0.94.
The
{code}
-  private final ListColumnCount columns;
+  private final ArrayListColumnCount columns;
{code}

is not needed, but it makes it explicit that we're addressing this list via 
random access.

 Avoid ArrayList.iterator() in tight loops
 -

 Key: HBASE-10320
 URL: https://issues.apache.org/jira/browse/HBASE-10320
 Project: HBase
  Issue Type: Bug
  Components: Performance
Reporter: Lars Hofhansl
 Attachments: 10320-0.94.txt


 I noticed that in a profiler (sampler) run ScanQueryMatcher.setRow(...) 
 showed up at all.
 In turns out that the expensive part is iterating over the columns in 
 ExcplicitColumnTracker.reset(). I did some microbenchmarks and found that
 {code}
 private ArrayListX l;
 ...
 for (int i=0; il.size(); i++) {
X = l.get(i);
...
 }
 {code}
 Is twice as fast than:
 {code}
 private ArrayListX l;
 ...
 for (X : l) {
...
 }
 {code}
 The indexed version asymptotically approaches the iterator version, but even 
 at 1m entries it is still faster.
 In my tight loop scans this provides for a 5% performance improvement overall 
 when the ExcplicitColumnTracker is used.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10292) TestRegionServerCoprocessorExceptionWithAbort fails occasionally


[ 
https://issues.apache.org/jira/browse/HBASE-10292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868918#comment-13868918
 ] 

Hudson commented on HBASE-10292:


SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-1.1 #51 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-1.1/51/])
Amend HBASE-10292. TestRegionServerCoprocessorExceptionWithAbort fails 
occasionally (apurtell: rev 1557436)
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithAbort.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestRegionServerCoprocessorExceptionWithRemove.java


 TestRegionServerCoprocessorExceptionWithAbort fails occasionally
 

 Key: HBASE-10292
 URL: https://issues.apache.org/jira/browse/HBASE-10292
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.98.0, 0.99.0

 Attachments: 10292-addendum-1.patch, 10292.patch, 10292.patch


 TestRegionServerCoprocessorExceptionWithAbort has occasionally failed for a 
 very long time now. Fix or disable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-6581) Build with hadoop.profile=3.0

2014-01-11 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868941#comment-13868941
 ] 

Ted Yu commented on HBASE-6581:
---

Looks like patch v6 was generated incorrecly: hadoop-three-compat.xml should be 
a new file.
{code}
--- a/hbase-assembly/src/main/assembly/hadoop-three-compat.xml
+++ /dev/null
@@ -1,46 +0,0 @@
-?xml version=1.0?
{code}

 Build with hadoop.profile=3.0
 -

 Key: HBASE-6581
 URL: https://issues.apache.org/jira/browse/HBASE-6581
 Project: HBase
  Issue Type: Bug
Reporter: Eric Charles
Assignee: Eric Charles
 Fix For: 0.98.1

 Attachments: HBASE-6581-1.patch, HBASE-6581-2.patch, 
 HBASE-6581-20130821.patch, HBASE-6581-3.patch, HBASE-6581-4.patch, 
 HBASE-6581-5.patch, HBASE-6581-6.patch, HBASE-6581.diff, HBASE-6581.diff


 Building trunk with hadoop.profile=3.0 gives exceptions (see [1]) due to 
 change in the hadoop maven modules naming (and also usage of 3.0-SNAPSHOT 
 instead of 3.0.0-SNAPSHOT in hbase-common).
 I can provide a patch that would move most of hadoop dependencies in their 
 respective profiles and will define the correct hadoop deps in the 3.0 
 profile.
 Please tell me if that's ok to go this way.
 Thx, Eric
 [1]
 $ mvn clean install -Dhadoop.profile=3.0
 [INFO] Scanning for projects...
 [ERROR] The build could not read 3 projects - [Help 1]
 [ERROR]   
 [ERROR]   The project org.apache.hbase:hbase-server:0.95-SNAPSHOT 
 (/d/hbase.svn/hbase-server/pom.xml) has 3 errors
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-common:jar is missing. @ line 655, column 21
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-annotations:jar is missing. @ line 659, column 21
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-minicluster:jar is missing. @ line 663, column 21
 [ERROR]   
 [ERROR]   The project org.apache.hbase:hbase-common:0.95-SNAPSHOT 
 (/d/hbase.svn/hbase-common/pom.xml) has 3 errors
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-common:jar is missing. @ line 170, column 21
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-annotations:jar is missing. @ line 174, column 21
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-minicluster:jar is missing. @ line 178, column 21
 [ERROR]   
 [ERROR]   The project org.apache.hbase:hbase-it:0.95-SNAPSHOT 
 (/d/hbase.svn/hbase-it/pom.xml) has 3 errors
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-common:jar is missing. @ line 220, column 18
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-annotations:jar is missing. @ line 224, column 21
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-minicluster:jar is missing. @ line 228, column 21
 [ERROR] 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10296) Replace ZK with a paxos running within master processes to provide better master failover performance and state consistency

2014-01-11 Thread Eric Charles (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-10296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868947#comment-13868947
]

Eric Charles commented on HBASE-10296:
--

I think the issue is that zk is just a half solution. It is a coordination
util, but the job is still to be done. For now, it the coordination logic
mainly done in the hbase code (a bit everywhere I think, there is no
'coordination' package, no separation of concern)
To evolve, there are 2 directions:
1. Embed the coordination with a protocol where the coordination is built-in
(zab, p axos or whatever).
2. Move the coordination out of the hbase code to an external layer. Zk is not
enough, would Helix (which relies on Zk) be a good fit ?

Replace ZK with a paxos running within master processes to provide better
master failover performance and state consistency
---

Key: HBASE-10296
URL: https://issues.apache.org/jira/browse/HBASE-10296
Project: HBase
Issue Type: Brainstorming
Components: master, Region Assignment, regionserver
Reporter: Feng Honghua

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HBASE-6581) Build with hadoop.profile=3.0

2014-01-11 Thread Eric Charles (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Charles updated HBASE-6581:


Attachment: HBASE-6581-7.patch

[~ted_yu] Thx for the review and sorry, I messed my git command.

Attached another v7.
I did successfully
svn co https://svn.apache.org/repos/asf/hbase/trunk
patch -p0  HBASE-6581-7.patch
mvn clean install -DskipTests -Dhadoop.profile=3.0$

The generated hbase dist works on a hadoop (trunk) cluster.

 Build with hadoop.profile=3.0
 -

 Key: HBASE-6581
 URL: https://issues.apache.org/jira/browse/HBASE-6581
 Project: HBase
  Issue Type: Bug
Reporter: Eric Charles
Assignee: Eric Charles
 Fix For: 0.98.1

 Attachments: HBASE-6581-1.patch, HBASE-6581-2.patch, 
 HBASE-6581-20130821.patch, HBASE-6581-3.patch, HBASE-6581-4.patch, 
 HBASE-6581-5.patch, HBASE-6581-6.patch, HBASE-6581-7.patch, HBASE-6581.diff, 
 HBASE-6581.diff


 Building trunk with hadoop.profile=3.0 gives exceptions (see [1]) due to 
 change in the hadoop maven modules naming (and also usage of 3.0-SNAPSHOT 
 instead of 3.0.0-SNAPSHOT in hbase-common).
 I can provide a patch that would move most of hadoop dependencies in their 
 respective profiles and will define the correct hadoop deps in the 3.0 
 profile.
 Please tell me if that's ok to go this way.
 Thx, Eric
 [1]
 $ mvn clean install -Dhadoop.profile=3.0
 [INFO] Scanning for projects...
 [ERROR] The build could not read 3 projects - [Help 1]
 [ERROR]   
 [ERROR]   The project org.apache.hbase:hbase-server:0.95-SNAPSHOT 
 (/d/hbase.svn/hbase-server/pom.xml) has 3 errors
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-common:jar is missing. @ line 655, column 21
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-annotations:jar is missing. @ line 659, column 21
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-minicluster:jar is missing. @ line 663, column 21
 [ERROR]   
 [ERROR]   The project org.apache.hbase:hbase-common:0.95-SNAPSHOT 
 (/d/hbase.svn/hbase-common/pom.xml) has 3 errors
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-common:jar is missing. @ line 170, column 21
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-annotations:jar is missing. @ line 174, column 21
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-minicluster:jar is missing. @ line 178, column 21
 [ERROR]   
 [ERROR]   The project org.apache.hbase:hbase-it:0.95-SNAPSHOT 
 (/d/hbase.svn/hbase-it/pom.xml) has 3 errors
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-common:jar is missing. @ line 220, column 18
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-annotations:jar is missing. @ line 224, column 21
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-minicluster:jar is missing. @ line 228, column 21
 [ERROR] 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-6581) Build with hadoop.profile=3.0

2014-01-11 Thread Eric Charles (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868950#comment-13868950
 ] 

Eric Charles commented on HBASE-6581:
-

[~ted_yu] oops, master does not start today. I will check and fix. you can 
already review the pach and see if it compiles with -Dhadoop.profile=3.0

 Build with hadoop.profile=3.0
 -

 Key: HBASE-6581
 URL: https://issues.apache.org/jira/browse/HBASE-6581
 Project: HBase
  Issue Type: Bug
Reporter: Eric Charles
Assignee: Eric Charles
 Fix For: 0.98.1

 Attachments: HBASE-6581-1.patch, HBASE-6581-2.patch, 
 HBASE-6581-20130821.patch, HBASE-6581-3.patch, HBASE-6581-4.patch, 
 HBASE-6581-5.patch, HBASE-6581-6.patch, HBASE-6581-7.patch, HBASE-6581.diff, 
 HBASE-6581.diff


 Building trunk with hadoop.profile=3.0 gives exceptions (see [1]) due to 
 change in the hadoop maven modules naming (and also usage of 3.0-SNAPSHOT 
 instead of 3.0.0-SNAPSHOT in hbase-common).
 I can provide a patch that would move most of hadoop dependencies in their 
 respective profiles and will define the correct hadoop deps in the 3.0 
 profile.
 Please tell me if that's ok to go this way.
 Thx, Eric
 [1]
 $ mvn clean install -Dhadoop.profile=3.0
 [INFO] Scanning for projects...
 [ERROR] The build could not read 3 projects - [Help 1]
 [ERROR]   
 [ERROR]   The project org.apache.hbase:hbase-server:0.95-SNAPSHOT 
 (/d/hbase.svn/hbase-server/pom.xml) has 3 errors
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-common:jar is missing. @ line 655, column 21
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-annotations:jar is missing. @ line 659, column 21
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-minicluster:jar is missing. @ line 663, column 21
 [ERROR]   
 [ERROR]   The project org.apache.hbase:hbase-common:0.95-SNAPSHOT 
 (/d/hbase.svn/hbase-common/pom.xml) has 3 errors
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-common:jar is missing. @ line 170, column 21
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-annotations:jar is missing. @ line 174, column 21
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-minicluster:jar is missing. @ line 178, column 21
 [ERROR]   
 [ERROR]   The project org.apache.hbase:hbase-it:0.95-SNAPSHOT 
 (/d/hbase.svn/hbase-it/pom.xml) has 3 errors
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-common:jar is missing. @ line 220, column 18
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-annotations:jar is missing. @ line 224, column 21
 [ERROR] 'dependencies.dependency.version' for 
 org.apache.hadoop:hadoop-minicluster:jar is missing. @ line 228, column 21
 [ERROR] 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10320) Avoid ArrayList.iterator() in tight loops


[ 
https://issues.apache.org/jira/browse/HBASE-10320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13868954#comment-13868954
 ] 

Anoop Sam John commented on HBASE-10320:


Interesting finding Lars.  +1 on the patch if it gives improvement..  

 Avoid ArrayList.iterator() in tight loops
 -

 Key: HBASE-10320
 URL: https://issues.apache.org/jira/browse/HBASE-10320
 Project: HBase
  Issue Type: Bug
  Components: Performance
Reporter: Lars Hofhansl
 Attachments: 10320-0.94.txt


 I noticed that in a profiler (sampler) run ScanQueryMatcher.setRow(...) 
 showed up at all.
 In turns out that the expensive part is iterating over the columns in 
 ExcplicitColumnTracker.reset(). I did some microbenchmarks and found that
 {code}
 private ArrayListX l;
 ...
 for (int i=0; il.size(); i++) {
X = l.get(i);
...
 }
 {code}
 Is twice as fast than:
 {code}
 private ArrayListX l;
 ...
 for (X : l) {
...
 }
 {code}
 The indexed version asymptotically approaches the iterator version, but even 
 at 1m entries it is still faster.
 In my tight loop scans this provides for a 5% performance improvement overall 
 when the ExcplicitColumnTracker is used.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (HBASE-10321) CellCodec has broken the 96 client to 98 server compatibility

Anoop Sam John created HBASE-10321:
--

 Summary: CellCodec has broken the 96 client to 98 server 
compatibility
 Key: HBASE-10321
 URL: https://issues.apache.org/jira/browse/HBASE-10321
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: Anoop Sam John
Assignee: Anoop Sam John
Priority: Critical
 Fix For: 0.98.0, 0.99.0


The write/read tags added in CellCodec has broken the 96 client to 98 server 
compatibility (and 98 client to 96 server)
When 96 client CellCodec writes cell, it wont write tags part at all. But the 
server expects a tag part, at least a 0 tag length. This tag length read will 
make a read of some bytes from next cell!

I suggest we can remove the tag part from CellCodec. This codec is not used by 
default and I don't think some one will change to CellCodec from the default 
KVCodec now. .. Still I feel we can solve it.

This makes tags not supported via CellCodec..Tag support can be added to 
CellCodec once we have Connection negotiation in place (?)




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HBASE-10321) CellCodec has broken the 96 client to 98 server compatibility


 [ 
https://issues.apache.org/jira/browse/HBASE-10321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-10321:
---

Attachment: HBASE-10321.patch

 CellCodec has broken the 96 client to 98 server compatibility
 -

 Key: HBASE-10321
 URL: https://issues.apache.org/jira/browse/HBASE-10321
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: Anoop Sam John
Assignee: Anoop Sam John
Priority: Critical
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10321.patch


 The write/read tags added in CellCodec has broken the 96 client to 98 server 
 compatibility (and 98 client to 96 server)
 When 96 client CellCodec writes cell, it wont write tags part at all. But the 
 server expects a tag part, at least a 0 tag length. This tag length read will 
 make a read of some bytes from next cell!
 I suggest we can remove the tag part from CellCodec. This codec is not used 
 by default and I don't think some one will change to CellCodec from the 
 default KVCodec now. .. Still I feel we can solve it.
 This makes tags not supported via CellCodec..Tag support can be added to 
 CellCodec once we have Connection negotiation in place (?)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HBASE-10321) CellCodec has broken the 96 client to 98 server compatibility


 [ 
https://issues.apache.org/jira/browse/HBASE-10321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-10321:
---

Status: Patch Available  (was: Open)

 CellCodec has broken the 96 client to 98 server compatibility
 -

 Key: HBASE-10321
 URL: https://issues.apache.org/jira/browse/HBASE-10321
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: Anoop Sam John
Assignee: Anoop Sam John
Priority: Critical
 Fix For: 0.98.0, 0.99.0

 Attachments: HBASE-10321.patch


 The write/read tags added in CellCodec has broken the 96 client to 98 server 
 compatibility (and 98 client to 96 server)
 When 96 client CellCodec writes cell, it won't write tags part at all. But 
 the server expects a tag part, at least a 0 tag length. This tag length read 
 will make a read of some bytes from next cell!
 I suggest we can remove the tag part from CellCodec. This codec is not used 
 by default and I don't think some one will change to CellCodec from the 
 default KVCodec now. ..
 This makes tags not supported via CellCodec..Tag support can be added to 
 CellCodec once we have Connection negotiation in place (?)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (HBASE-10321) CellCodec has broken the 96 client to 98 server compatibility

[
https://issues.apache.org/jira/browse/HBASE-10321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Anoop Sam John updated HBASE-10321:
---

Description:
The write/read tags added in CellCodec has broken the 96 client to 98 server
compatibility (and 98 client to 96 server)
When 96 client CellCodec writes cell, it won't write tags part at all. But the
server expects a tag part, at least a 0 tag length. This tag length read will
make a read of some bytes from next cell!

I suggest we can remove the tag part from CellCodec. This codec is not used by
default and I don't think some one will change to CellCodec from the default
KVCodec now. ..

This makes tags not supported via CellCodec..Tag support can be added to
CellCodec once we have Connection negotiation in place (?)

was:
The write/read tags added in CellCodec has broken the 96 client to 98 server
compatibility (and 98 client to 96 server)
When 96 client CellCodec writes cell, it wont write tags part at all. But the
server expects a tag part, at least a 0 tag length. This tag length read will
make a read of some bytes from next cell!

I suggest we can remove the tag part from CellCodec. This codec is not used by
default and I don't think some one will change to CellCodec from the default
KVCodec now. .. Still I feel we can solve it.

This makes tags not supported via CellCodec..Tag support can be added to
CellCodec once we have Connection negotiation in place (?)

CellCodec has broken the 96 client to 98 server compatibility
-

Key: HBASE-10321
URL: https://issues.apache.org/jira/browse/HBASE-10321
Project: HBase
Issue Type: Bug
Affects Versions: 0.98.0
Reporter: Anoop Sam John
Assignee: Anoop Sam John
Priority: Critical
Fix For: 0.98.0, 0.99.0

Attachments: HBASE-10321.patch

The write/read tags added in CellCodec has broken the 96 client to 98 server
compatibility (and 98 client to 96 server)
When 96 client CellCodec writes cell, it won't write tags part at all. But
the server expects a tag part, at least a 0 tag length. This tag length read
will make a read of some bytes from next cell!
I suggest we can remove the tag part from CellCodec. This codec is not used
by default and I don't think some one will change to CellCodec from the
default KVCodec now. ..
This makes tags not supported via CellCodec..Tag support can be added to
CellCodec once we have Connection negotiation in place (?)

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (HBASE-10322) Strip tags from KV while sending back to client on reads

Anoop Sam John created HBASE-10322:
--

 Summary: Strip tags from KV while sending back to client on reads
 Key: HBASE-10322
 URL: https://issues.apache.org/jira/browse/HBASE-10322
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
Reporter: Anoop Sam John
Assignee: Anoop Sam John
Priority: Blocker
 Fix For: 0.98.0, 0.99.0


Right now we have some inconsistency wrt sending back tags on read. We do this 
in scan when using Java client(Codec based cell block encoding). But during a 
Get operation or when a pure PB based Scan comes we are not sending back the 
tags.  So any of the below fix we have to do
1. Send back tags in missing cases also. But sending back visibility 
expression/ cell ACL is not correct.
2. Don't send back tags in any case. This will a problem when a tool like 
ExportTool use the scan to export the table data. We will miss exporting the 
cell visibility/ACL.
3. Send back tags based on some condition. It has to be per scan basis. 
Simplest way is pass some kind of attribute in Scan which says whether to send 
back tags or not. But believing some thing what scan specifies might not be 
correct IMO. Then comes the way of checking the user who is doing the scan. 
When a HBase super user doing the scan then only send back tags. So when a case 
comes like Export Tool's the execution should happen from a super user.

So IMO we should go with #3.
Patch coming soon.




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-10227) When a region is opened, its mvcc isn't correctly recovered when there are split hlogs to replay