[jira] [Commented] (HBASE-3892) Table can't disable

2011-06-08 Thread gaojinchao (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045889#comment-13045889
 ] 

gaojinchao commented on HBASE-3892:
---

No, It need review and merge. 

 Table can't disable
 ---

 Key: HBASE-3892
 URL: https://issues.apache.org/jira/browse/HBASE-3892
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.3
Reporter: gaojinchao
 Fix For: 0.90.4

 Attachments: AssignmentManager_90v2.patch, 
 AssignmentManager_90v3.patch, logs.rar


 In TimeoutMonitor : 
 if node exists and node state is RS_ZK_REGION_CLOSED
 We should send a zk message again when close region is timeout.
 in this case, It may be loss some message.
 I See. It seems like a bug. This is my analysis.
 // disable table and master sent Close message to region server, Region state 
 was set PENDING_CLOSE
 2011-05-08 17:44:25,745 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 serverName=C4C4.site,60020,1304820199467, load=(requests=0, regions=123, 
 usedHeap=4097, maxHeap=8175) for region 
 ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
 2011-05-08 17:44:45,530 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Received REGION_SPLIT: 
 ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
  Daughters; 
 ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
  
 ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
  from C4C4.site,60020,1304820199467
 2011-05-08 17:45:45,542 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Received REGION_SPLIT: 
 ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
  Daughters; 
 ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
  
 ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
  from C4C4.site,60020,1304820199467
 // received splitting message and cleared Region state (PENDING_CLOSE)
 2011-05-08 17:46:45,303 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Overwriting 
 4418fb197685a21f77e151e401cf8b66 on serverName=C4C4.site,60020,1304820199467, 
 load=(requests=0, regions=123, usedHeap=4097, maxHeap=8175)
 2011-05-08 17:46:45,538 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Received REGION_SPLIT: 
 ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
  Daughters; 
 ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
  
 ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
  from C4C4.site,60020,1304820199467
 2011-05-08 17:47:45,548 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Received REGION_SPLIT: 
 ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
  Daughters; 
 ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
  
 ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
  from C4C4.site,60020,1304820199467
 2011-05-08 17:48:45,545 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Received REGION_SPLIT: 
 ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
  Daughters; 
 ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
  
 ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
  from C4C4.site,60020,1304820199467
 2011-05-08 17:49:46,108 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Received REGION_SPLIT: 
 ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
  Daughters; 
 ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
  
 ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
  from C4C4.site,60020,1304820199467
 2011-05-08 17:50:46,105 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Received REGION_SPLIT: 
 ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
  Daughters; 
 ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
  
 ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
  from C4C4.site,60020,1304820199467
 2011-05-08 17:51:46,117 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Received REGION_SPLIT: 
 ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
  Daughters; 
 ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
  
 ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
  from C4C4.site,60020,1304820199467
 2011-05-08 17:52:46,112 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Received REGION_SPLIT: 
 

[jira] [Created] (HBASE-3961) Add Delete.setWriteToWAL functionality

2011-06-08 Thread Bruno Dumon (JIRA)
Add Delete.setWriteToWAL functionality
--

 Key: HBASE-3961
 URL: https://issues.apache.org/jira/browse/HBASE-3961
 Project: HBase
  Issue Type: Improvement
  Components: client, regionserver
Reporter: Bruno Dumon


For puts, write to WAL can be disabled, but for deletes this functionality is 
missing. The regionserver internally already passes around a writeToWAL flag, 
but it is not possible to set this from the client.

The attached patch introduces this.

This changes the serialization format of Delete, so bumped up the version.

I verified manually that the WAL is indeed not growing when writeToWAL is set 
to false.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3961) Add Delete.setWriteToWAL functionality

2011-06-08 Thread Bruno Dumon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Dumon updated HBASE-3961:
---

Attachment: delete-writetowal-patch.txt

patch against trunk r1133369

 Add Delete.setWriteToWAL functionality
 --

 Key: HBASE-3961
 URL: https://issues.apache.org/jira/browse/HBASE-3961
 Project: HBase
  Issue Type: Improvement
  Components: client, regionserver
Reporter: Bruno Dumon
 Attachments: delete-writetowal-patch.txt


 For puts, write to WAL can be disabled, but for deletes this functionality is 
 missing. The regionserver internally already passes around a writeToWAL flag, 
 but it is not possible to set this from the client.
 The attached patch introduces this.
 This changes the serialization format of Delete, so bumped up the version.
 I verified manually that the WAL is indeed not growing when writeToWAL is set 
 to false.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-3962) HConnectionManager.getConnection(HBaseConfiguration) returns new connection in default HTable constructor

2011-06-08 Thread Philippe (JIRA)
HConnectionManager.getConnection(HBaseConfiguration) returns new connection in 
default HTable constructor
-

 Key: HBASE-3962
 URL: https://issues.apache.org/jira/browse/HBASE-3962
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.1
Reporter: Philippe


The HBase instance are currently indexed by Configuration, which since 
HBASE-1976 does not have any other equivalence that the object equivalence.
So, everytime a new configuration is passed to the method a new connection is 
created.
If we create many HTable connections with the same configuration, there is no 
problem:

HBaseConfiguration config = HBaseConfiguration.create();
HTable table 1 = new HTable(config, table1); // init connection
HTable table 2 = new HTable(config, table2); // re-use connection
HTable table 3 = new HTable(config, table3); // re-use connection


However, if we call the default constructor, or re-call 
HBaseConfiguration.create();, we will pass a new instance of the configuration 
to the constructor. This will cause many connections to be created:
HTable table 1 = new HTable(table1); // init connection
HTable table 2 = new HTable(table2); // init new connection
HTable table 3 = new HTable(table3); // init new connection

I know connection should be pooled, but sometimes we have to create a new 
connection, and without having access to a previously instanced configuration 
object.
Since zookeeper has a max client connection (default was 30, now is 10), after 
creating 30 instances of HTable, we can no longer access to the database.

In addition to this, the HBASE_INSTANCES map does not close the connection when 
removing the eldest entry. So if we have a larger maxConnection value than the 
hard-coded MAX_CACHED_HBASE_INSTANCES variable, connections will remain but 
won't be closed. MAX_CACHED_HBASE_INSTANCES should actually be set from the 
hbase.zookeeper.property.maxClientCnxns parameter (value + 1).



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3961) Add Delete.setWriteToWAL functionality

2011-06-08 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045990#comment-13045990
 ] 

Andrew Purtell commented on HBASE-3961:
---

+1
Going to commit when local tests pass.

Thanks for the patch Bruno!

 Add Delete.setWriteToWAL functionality
 --

 Key: HBASE-3961
 URL: https://issues.apache.org/jira/browse/HBASE-3961
 Project: HBase
  Issue Type: Improvement
  Components: client, regionserver
Reporter: Bruno Dumon
 Attachments: delete-writetowal-patch.txt


 For puts, write to WAL can be disabled, but for deletes this functionality is 
 missing. The regionserver internally already passes around a writeToWAL flag, 
 but it is not possible to set this from the client.
 The attached patch introduces this.
 This changes the serialization format of Delete, so bumped up the version.
 I verified manually that the WAL is indeed not growing when writeToWAL is set 
 to false.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HBASE-3961) Add Delete.setWriteToWAL functionality

2011-06-08 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell reassigned HBASE-3961:
-

Assignee: Bruno Dumon

 Add Delete.setWriteToWAL functionality
 --

 Key: HBASE-3961
 URL: https://issues.apache.org/jira/browse/HBASE-3961
 Project: HBase
  Issue Type: Improvement
  Components: client, regionserver
Reporter: Bruno Dumon
Assignee: Bruno Dumon
 Attachments: delete-writetowal-patch.txt


 For puts, write to WAL can be disabled, but for deletes this functionality is 
 missing. The regionserver internally already passes around a writeToWAL flag, 
 but it is not possible to set this from the client.
 The attached patch introduces this.
 This changes the serialization format of Delete, so bumped up the version.
 I verified manually that the WAL is indeed not growing when writeToWAL is set 
 to false.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HBASE-3961) Add Delete.setWriteToWAL functionality

2011-06-08 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell resolved HBASE-3961.
---

   Resolution: Fixed
Fix Version/s: 0.92.0

Committed. Relevant local tests pass ok.

 Add Delete.setWriteToWAL functionality
 --

 Key: HBASE-3961
 URL: https://issues.apache.org/jira/browse/HBASE-3961
 Project: HBase
  Issue Type: Improvement
  Components: client, regionserver
Reporter: Bruno Dumon
Assignee: Bruno Dumon
 Fix For: 0.92.0

 Attachments: delete-writetowal-patch.txt


 For puts, write to WAL can be disabled, but for deletes this functionality is 
 missing. The regionserver internally already passes around a writeToWAL flag, 
 but it is not possible to set this from the client.
 The attached patch introduces this.
 This changes the serialization format of Delete, so bumped up the version.
 I verified manually that the WAL is indeed not growing when writeToWAL is set 
 to false.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3529) Add search to HBase

2011-06-08 Thread Alex Baranau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046016#comment-13046016
 ] 

Alex Baranau commented on HBASE-3529:
-

Another problem we faced: looks like there's an issue in TestLuceneCoprocessor 
tests life-cycle or smth else:
* the testSearchRPC test fails if we run mvn clean 
-Dtest=TestLuceneCoprocessor test, other 2 pass (it fails on first assert: 
expected 20, but found 10)
* if I add @Ignore to other two tests, i.e. the maven command runs only 
testSearchRPC, it works well

 Add search to HBase
 ---

 Key: HBASE-3529
 URL: https://issues.apache.org/jira/browse/HBASE-3529
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.0
Reporter: Jason Rutherglen
 Attachments: HBASE-3529.patch


 Using the Apache Lucene library we can add freetext search to HBase.  The 
 advantages of this are:
 * HBase is highly scalable and distributed
 * HBase is realtime
 * Lucene is a fast inverted index and will soon be realtime (see LUCENE-2312)
 * Lucene offers many types of queries not currently available in HBase (eg, 
 AND, OR, NOT, phrase, etc)
 * It's easier to build scalable realtime systems on top of already 
 architecturally sound, scalable realtime data system, eg, HBase.
 * Scaling realtime search will be as simple as scaling HBase.
 Phase 1 - Indexing:
 * Integrate Lucene into HBase such that an index mirrors a given region.  
 This means cascading add, update, and deletes between a Lucene index and an 
 HBase region (and vice versa).
 * Define meta-data to mark a region as indexed, and use a Solr schema to 
 allow the user to define the fields and analyzers.
 * Integrate with the HLog to ensure that index recovery can occur properly 
 (eg, on region server failure)
 * Mirror region splits with indexes (use Lucene's IndexSplitter?)
 * When a region is written to HDFS, also write the corresponding Lucene index 
 to HDFS.
 * A row key will be the ID of a given Lucene document.  The Lucene docstore 
 will explicitly not be used because the document/row data is stored in HBase. 
  We will need to solve what the best data structure for efficiently mapping a 
 docid - row key is.  It could be a docstore, field cache, column stride 
 fields, or some other mechanism.
 * Write unit tests for the above
 Phase 2 - Queries:
 * Enable distributed Lucene queries
 * Regions that have Lucene indexes are inherently available and may be 
 searched on, meaning there's no need for a separate search related system in 
 Zookeeper.
 * Integrate search with HBase's RPC mechanism

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3529) Add search to HBase

2011-06-08 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046023#comment-13046023
 ] 

Jason Rutherglen commented on HBASE-3529:
-

Hi Alex, I have new code I will commit to Github.  

 Add search to HBase
 ---

 Key: HBASE-3529
 URL: https://issues.apache.org/jira/browse/HBASE-3529
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.0
Reporter: Jason Rutherglen
 Attachments: HBASE-3529.patch


 Using the Apache Lucene library we can add freetext search to HBase.  The 
 advantages of this are:
 * HBase is highly scalable and distributed
 * HBase is realtime
 * Lucene is a fast inverted index and will soon be realtime (see LUCENE-2312)
 * Lucene offers many types of queries not currently available in HBase (eg, 
 AND, OR, NOT, phrase, etc)
 * It's easier to build scalable realtime systems on top of already 
 architecturally sound, scalable realtime data system, eg, HBase.
 * Scaling realtime search will be as simple as scaling HBase.
 Phase 1 - Indexing:
 * Integrate Lucene into HBase such that an index mirrors a given region.  
 This means cascading add, update, and deletes between a Lucene index and an 
 HBase region (and vice versa).
 * Define meta-data to mark a region as indexed, and use a Solr schema to 
 allow the user to define the fields and analyzers.
 * Integrate with the HLog to ensure that index recovery can occur properly 
 (eg, on region server failure)
 * Mirror region splits with indexes (use Lucene's IndexSplitter?)
 * When a region is written to HDFS, also write the corresponding Lucene index 
 to HDFS.
 * A row key will be the ID of a given Lucene document.  The Lucene docstore 
 will explicitly not be used because the document/row data is stored in HBase. 
  We will need to solve what the best data structure for efficiently mapping a 
 docid - row key is.  It could be a docstore, field cache, column stride 
 fields, or some other mechanism.
 * Write unit tests for the above
 Phase 2 - Queries:
 * Enable distributed Lucene queries
 * Regions that have Lucene indexes are inherently available and may be 
 searched on, meaning there's no need for a separate search related system in 
 Zookeeper.
 * Integrate search with HBase's RPC mechanism

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3529) Add search to HBase

2011-06-08 Thread Alex Baranau (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046026#comment-13046026
 ] 

Alex Baranau commented on HBASE-3529:
-

Thank you! Berlin is waiting! (kidding, we are going to leave very soon)

 Add search to HBase
 ---

 Key: HBASE-3529
 URL: https://issues.apache.org/jira/browse/HBASE-3529
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.0
Reporter: Jason Rutherglen
 Attachments: HBASE-3529.patch


 Using the Apache Lucene library we can add freetext search to HBase.  The 
 advantages of this are:
 * HBase is highly scalable and distributed
 * HBase is realtime
 * Lucene is a fast inverted index and will soon be realtime (see LUCENE-2312)
 * Lucene offers many types of queries not currently available in HBase (eg, 
 AND, OR, NOT, phrase, etc)
 * It's easier to build scalable realtime systems on top of already 
 architecturally sound, scalable realtime data system, eg, HBase.
 * Scaling realtime search will be as simple as scaling HBase.
 Phase 1 - Indexing:
 * Integrate Lucene into HBase such that an index mirrors a given region.  
 This means cascading add, update, and deletes between a Lucene index and an 
 HBase region (and vice versa).
 * Define meta-data to mark a region as indexed, and use a Solr schema to 
 allow the user to define the fields and analyzers.
 * Integrate with the HLog to ensure that index recovery can occur properly 
 (eg, on region server failure)
 * Mirror region splits with indexes (use Lucene's IndexSplitter?)
 * When a region is written to HDFS, also write the corresponding Lucene index 
 to HDFS.
 * A row key will be the ID of a given Lucene document.  The Lucene docstore 
 will explicitly not be used because the document/row data is stored in HBase. 
  We will need to solve what the best data structure for efficiently mapping a 
 docid - row key is.  It could be a docstore, field cache, column stride 
 fields, or some other mechanism.
 * Write unit tests for the above
 Phase 2 - Queries:
 * Enable distributed Lucene queries
 * Regions that have Lucene indexes are inherently available and may be 
 searched on, meaning there's no need for a separate search related system in 
 Zookeeper.
 * Integrate search with HBase's RPC mechanism

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-3963) Schedule all log-spliiting at startup all at once

2011-06-08 Thread Prakash Khemani (JIRA)
Schedule all log-spliiting at startup all at once
-

 Key: HBASE-3963
 URL: https://issues.apache.org/jira/browse/HBASE-3963
 Project: HBase
  Issue Type: Improvement
Reporter: Prakash Khemani
Assignee: Prakash Khemani


When distributed log splitting is enabled then it is better to call splitLog() 
for all region servers simultaneously. A large number of splitlog tasks will 
get scheduled - one for each log file. But a splitlog-worker (region server) 
executes only one task at a time and there shouldn't be a danger of DFS 
overload. Scheduling all the tasks at once ensures maximum parallelism.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1364) [performance] Distributed splitting of regionserver commit logs

2011-06-08 Thread Prakash Khemani (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046044#comment-13046044
 ] 

Prakash Khemani commented on HBASE-1364:


Filed https://issues.apache.org/jira/browse/HBASE-3963. Will try to get this 
done.



 [performance] Distributed splitting of regionserver commit logs
 ---

 Key: HBASE-1364
 URL: https://issues.apache.org/jira/browse/HBASE-1364
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Reporter: stack
Assignee: Prakash Khemani
Priority: Critical
 Fix For: 0.92.0

 Attachments: 1364-v5.txt, HBASE-1364.patch, 
 org.apache.hadoop.hbase.master.TestDistributedLogSplitting-output.txt

  Time Spent: 8h
  Remaining Estimate: 0h

 HBASE-1008 has some improvements to our log splitting on regionserver crash; 
 but it needs to run even faster.
 (Below is from HBASE-1008)
 In bigtable paper, the split is distributed. If we're going to have 1000 
 logs, we need to distribute or at least multithread the splitting.
 1. As is, regions starting up expect to find one reconstruction log only. 
 Need to make it so pick up a bunch of edit logs and it should be fine that 
 logs are elsewhere in hdfs in an output directory written by all split 
 participants whether multithreaded or a mapreduce-like distributed process 
 (Lets write our distributed sort first as a MR so we learn whats involved; 
 distributed sort, as much as possible should use MR framework pieces). On 
 startup, regions go to this directory and pick up the files written by split 
 participants deleting and clearing the dir when all have been read in. Making 
 it so can take multiple logs for input, can also make the split process more 
 robust rather than current tenuous process which loses all edits if it 
 doesn't make it to the end without error.
 2. Each column family rereads the reconstruction log to find its edits. Need 
 to fix that. Split can sort the edits by column family so store only reads 
 its edits.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1364) [performance] Distributed splitting of regionserver commit logs

2011-06-08 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046048#comment-13046048
 ] 

stack commented on HBASE-1364:
--

@mingjian Since you are looking the distributed code now, maybe you'd be up for 
having a go at HBASE-3963?  Or at least posting a patch that you've tried for 
Prakash and/or I to review?  Thanks.

 [performance] Distributed splitting of regionserver commit logs
 ---

 Key: HBASE-1364
 URL: https://issues.apache.org/jira/browse/HBASE-1364
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Reporter: stack
Assignee: Prakash Khemani
Priority: Critical
 Fix For: 0.92.0

 Attachments: 1364-v5.txt, HBASE-1364.patch, 
 org.apache.hadoop.hbase.master.TestDistributedLogSplitting-output.txt

  Time Spent: 8h
  Remaining Estimate: 0h

 HBASE-1008 has some improvements to our log splitting on regionserver crash; 
 but it needs to run even faster.
 (Below is from HBASE-1008)
 In bigtable paper, the split is distributed. If we're going to have 1000 
 logs, we need to distribute or at least multithread the splitting.
 1. As is, regions starting up expect to find one reconstruction log only. 
 Need to make it so pick up a bunch of edit logs and it should be fine that 
 logs are elsewhere in hdfs in an output directory written by all split 
 participants whether multithreaded or a mapreduce-like distributed process 
 (Lets write our distributed sort first as a MR so we learn whats involved; 
 distributed sort, as much as possible should use MR framework pieces). On 
 startup, regions go to this directory and pick up the files written by split 
 participants deleting and clearing the dir when all have been read in. Making 
 it so can take multiple logs for input, can also make the split process more 
 robust rather than current tenuous process which loses all edits if it 
 doesn't make it to the end without error.
 2. Each column family rereads the reconstruction log to find its edits. Need 
 to fix that. Split can sort the edits by column family so store only reads 
 its edits.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3946) The splitted region can be online again while the standby hmaster becomes the active one

2011-06-08 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-3946:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed to branch and trunk.  Thanks for the patch Jieshan.

 The splitted region can be online again while the standby hmaster becomes the 
 active one
 

 Key: HBASE-3946
 URL: https://issues.apache.org/jira/browse/HBASE-3946
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.3
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.90.4

 Attachments: HBASE-3946-V2.patch, HBASE-3946.patch


 (The cluster has two HMatser, one active and one standby)
 1.While the active HMaster shutdown, the standby one would become the active 
 one, and went into the processFailover() method:
 if (regionCount == 0) {
   LOG.info(Master startup proceeding: cluster startup);
   this.assignmentManager.cleanoutUnassigned();
   this.assignmentManager.assignAllUserRegions();
 } else {
   
   LOG.info(Master startup proceeding: master failover);
   this.assignmentManager.processFailover();
 }
 2.After that, the user regions would be rebuild.
   MapHServerInfo,ListPairHRegionInfo,Result deadServers = 
 rebuildUserRegions(); 
 3.Here's how the rebuildUserRegions worked. All the regions(contain the 
 splitted regions) would be added to the offlineRegions of offlineServers.
for (Result result : results) {
   PairHRegionInfo,HServerInfo region =
 MetaReader.metaRowToRegionPairWithInfo(result);
   if (region == null) continue;
   HServerInfo regionLocation = region.getSecond();
   HRegionInfo regionInfo = region.getFirst();
   if (regionLocation == null) {
 // Region not being served, add to region map with no assignment
 // If this needs to be assigned out, it will also be in ZK as RIT
 this.regions.put(regionInfo, null);
   } else if (!serverManager.isServerOnline(
   regionLocation.getServerName())) {
 // Region is located on a server that isn't online
 ListPairHRegionInfo,Result offlineRegions =
   offlineServers.get(regionLocation);
 if (offlineRegions == null) {
   offlineRegions = new ArrayListPairHRegionInfo,Result(1);
   offlineServers.put(regionLocation, offlineRegions);
 }
 offlineRegions.add(new PairHRegionInfo,Result(regionInfo, result));
   } else {
 // Region is being served and on an active server
 regions.put(regionInfo, regionLocation);
 addToServers(regionLocation, regionInfo);
   }
 }
 4.It seems that all the offline regions will be added to RIT and online again:
 ZKAssign will creat node for each offline never consider the splitted ones. 
 AssignmentManager# processDeadServers
   private void processDeadServers(
   MapHServerInfo, ListPairHRegionInfo, Result deadServers)
   throws IOException, KeeperException {
 for (Map.EntryHServerInfo, ListPairHRegionInfo,Result deadServer :
   deadServers.entrySet()) {
   ListPairHRegionInfo,Result regions = deadServer.getValue();
   for (PairHRegionInfo,Result region : regions) {
 HRegionInfo regionInfo = region.getFirst();
 Result result = region.getSecond();
 // If region was in transition (was in zk) force it offline for 
 reassign
 try {
   ZKAssign.createOrForceNodeOffline(watcher, regionInfo,
   master.getServerName());
 } catch (KeeperException.NoNodeException nne) {
   // This is fine
 }
 // Process with existing RS shutdown code
 ServerShutdownHandler.processDeadRegion(regionInfo, result, this,
 this.catalogTracker);
   }
 }
   }
 AssignmentManager# processFailover
 // Process list of dead servers
 processDeadServers(deadServers);
 // Check existing regions in transition
 ListString nodes = ZKUtil.listChildrenAndWatchForNewChildren(watcher,
 watcher.assignmentZNode);
 if (nodes.isEmpty()) {
   LOG.info(No regions in transition in ZK to process on failover);
   return;
 }
 LOG.info(Failed-over master needs to process  + nodes.size() +
  regions in transition);
 for (String encodedRegionName: nodes) {
   processRegionInTransition(encodedRegionName, null);
 }
 So I think before add the region into RIT, check it at first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-3964) Add maxResults per row per CF to Get and Scan.

2011-06-08 Thread Madhuwanti Vaidya (JIRA)
Add maxResults per row per CF to Get and Scan.
--

 Key: HBASE-3964
 URL: https://issues.apache.org/jira/browse/HBASE-3964
 Project: HBase
  Issue Type: New Feature
  Components: client, regionserver
Reporter: Madhuwanti Vaidya
Assignee: Madhuwanti Vaidya
Priority: Minor




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-3965) Expose major and minor compaction queue status.

2011-06-08 Thread Lohit Vijayarenu (JIRA)
Expose major and minor compaction queue status.
---

 Key: HBASE-3965
 URL: https://issues.apache.org/jira/browse/HBASE-3965
 Project: HBase
  Issue Type: Improvement
  Components: master, metrics
Affects Versions: 0.90.2
Reporter: Lohit Vijayarenu
Priority: Minor
 Fix For: 0.92.0


It would be good to have metrics (or information) about major and minor 
compaction queue exposed via WebUI. (plus if we can get it to metrics, to say 
number of pending major/minor compactions)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3962) HConnectionManager.getConnection(HBaseConfiguration) returns new connection in default HTable constructor

2011-06-08 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046138#comment-13046138
 ] 

Ted Yu commented on HBASE-3962:
---

In trunk, HConnectionManager.getConnection() constructs HConnectionKey from 
Configuration. Meaning, the identity of Configuration has been redefined.
Also, connection is closed in finalizer.

 HConnectionManager.getConnection(HBaseConfiguration) returns new connection 
 in default HTable constructor
 -

 Key: HBASE-3962
 URL: https://issues.apache.org/jira/browse/HBASE-3962
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.1
Reporter: Philippe

 The HBase instance are currently indexed by Configuration, which since 
 HBASE-1976 does not have any other equivalence that the object equivalence.
 So, everytime a new configuration is passed to the method a new connection is 
 created.
 If we create many HTable connections with the same configuration, there is no 
 problem:
 HBaseConfiguration config = HBaseConfiguration.create();
 HTable table 1 = new HTable(config, table1); // init connection
 HTable table 2 = new HTable(config, table2); // re-use connection
 HTable table 3 = new HTable(config, table3); // re-use connection
 However, if we call the default constructor, or re-call 
 HBaseConfiguration.create();, we will pass a new instance of the 
 configuration to the constructor. This will cause many connections to be 
 created:
 HTable table 1 = new HTable(table1); // init connection
 HTable table 2 = new HTable(table2); // init new connection
 HTable table 3 = new HTable(table3); // init new connection
 I know connection should be pooled, but sometimes we have to create a new 
 connection, and without having access to a previously instanced configuration 
 object.
 Since zookeeper has a max client connection (default was 30, now is 10), 
 after creating 30 instances of HTable, we can no longer access to the 
 database.
 In addition to this, the HBASE_INSTANCES map does not close the connection 
 when removing the eldest entry. So if we have a larger maxConnection value 
 than the hard-coded MAX_CACHED_HBASE_INSTANCES variable, connections will 
 remain but won't be closed. MAX_CACHED_HBASE_INSTANCES should actually be set 
 from the hbase.zookeeper.property.maxClientCnxns parameter (value + 1).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-3966) troubleshooting.xml - added section for web UI for master regionserver

2011-06-08 Thread Doug Meil (JIRA)
troubleshooting.xml - added section for web UI for master  regionserver


 Key: HBASE-3966
 URL: https://issues.apache.org/jira/browse/HBASE-3966
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor


Several folks on the dist-list didn't know about the hbase web-interfaces.

Added a sub-section in Troubleshooting\tools for this (builtin tools).  Moved 
existing tools into external tools sub-section.

 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3966) troubleshooting.xml - added section for web UI for master regionserver

2011-06-08 Thread Doug Meil (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Meil updated HBASE-3966:
-

Attachment: troubleshooting_HBASE_3966.xml.patch

 troubleshooting.xml - added section for web UI for master  regionserver
 

 Key: HBASE-3966
 URL: https://issues.apache.org/jira/browse/HBASE-3966
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: troubleshooting_HBASE_3966.xml.patch


 Several folks on the dist-list didn't know about the hbase web-interfaces.
 Added a sub-section in Troubleshooting\tools for this (builtin tools).  Moved 
 existing tools into external tools sub-section.
  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3966) troubleshooting.xml - added section for web UI for master regionserver

2011-06-08 Thread Doug Meil (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Meil updated HBASE-3966:
-

Status: Patch Available  (was: Open)

 troubleshooting.xml - added section for web UI for master  regionserver
 

 Key: HBASE-3966
 URL: https://issues.apache.org/jira/browse/HBASE-3966
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: troubleshooting_HBASE_3966.xml.patch


 Several folks on the dist-list didn't know about the hbase web-interfaces.
 Added a sub-section in Troubleshooting\tools for this (builtin tools).  Moved 
 existing tools into external tools sub-section.
  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-2842) Support BloomFilter error rate on a per-family basis

2011-06-08 Thread Ming Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated HBASE-2842:
---

Priority: Minor  (was: Trivial)

 Support BloomFilter error rate on a per-family basis
 

 Key: HBASE-2842
 URL: https://issues.apache.org/jira/browse/HBASE-2842
 Project: HBase
  Issue Type: Improvement
  Components: filters, ipc, regionserver, rest, thrift
Reporter: Nicolas Spiegelberg
Assignee: Ming Ma
Priority: Minor

 The error rate for bloom filters is currently set by the 
 io.hfile.bloom.error.rate global variable.  Todd suggested at the last HUG 
 that it would be nice to have per-family config options instead.  Trace the 
 Bloom Type code to implement this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3529) Add search to HBase

2011-06-08 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046258#comment-13046258
 ] 

Otis Gospodnetic commented on HBASE-3529:
-

A few more comments/questions for Jason:

* I see PKIndexSplitter usage for splitting the index when a region splits.  I 
see you split the index, open 2 IndexWriters for 2 new Lucene indices, but then 
either you are not adding documents to them, or I'm not seeing it?

* Are there issues around distributed search?  It looks like it wasn't in your 
github branch.

* What will happen when a region changes its location/regionserver for whatever 
reason?  I see HDFS-2004 got -1ed and you said without that search will be 
slow.  Do you have an alternative plan?

* What is the reason for storing those 2 extra row fields? (the UID one at the 
other one... I think it's called rowStr or something like that)

* What about storing the index in HBase itself? (a la Solandra, I suppose)  
Would this be doable?  Would it make things simpler in the sense that any 
splitting or moving around, etc. may be handled by HBase and we wouldn't have 
to make sure the Lucene index always mirrors what's in a region and make sure 
it follows the region wherever it goes?  Lars' idea/question, and I hope I 
didn't misunderstand or misrepresent his ideas.


 Add search to HBase
 ---

 Key: HBASE-3529
 URL: https://issues.apache.org/jira/browse/HBASE-3529
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.0
Reporter: Jason Rutherglen
 Attachments: HBASE-3529.patch


 Using the Apache Lucene library we can add freetext search to HBase.  The 
 advantages of this are:
 * HBase is highly scalable and distributed
 * HBase is realtime
 * Lucene is a fast inverted index and will soon be realtime (see LUCENE-2312)
 * Lucene offers many types of queries not currently available in HBase (eg, 
 AND, OR, NOT, phrase, etc)
 * It's easier to build scalable realtime systems on top of already 
 architecturally sound, scalable realtime data system, eg, HBase.
 * Scaling realtime search will be as simple as scaling HBase.
 Phase 1 - Indexing:
 * Integrate Lucene into HBase such that an index mirrors a given region.  
 This means cascading add, update, and deletes between a Lucene index and an 
 HBase region (and vice versa).
 * Define meta-data to mark a region as indexed, and use a Solr schema to 
 allow the user to define the fields and analyzers.
 * Integrate with the HLog to ensure that index recovery can occur properly 
 (eg, on region server failure)
 * Mirror region splits with indexes (use Lucene's IndexSplitter?)
 * When a region is written to HDFS, also write the corresponding Lucene index 
 to HDFS.
 * A row key will be the ID of a given Lucene document.  The Lucene docstore 
 will explicitly not be used because the document/row data is stored in HBase. 
  We will need to solve what the best data structure for efficiently mapping a 
 docid - row key is.  It could be a docstore, field cache, column stride 
 fields, or some other mechanism.
 * Write unit tests for the above
 Phase 2 - Queries:
 * Enable distributed Lucene queries
 * Regions that have Lucene indexes are inherently available and may be 
 searched on, meaning there's no need for a separate search related system in 
 Zookeeper.
 * Integrate search with HBase's RPC mechanism

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3529) Add search to HBase

2011-06-08 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046267#comment-13046267
 ] 

Jason Rutherglen commented on HBASE-3529:
-

Otis, I think many of your questions have been addressed in this issue, though 
indeed the comment trail is long at this point.  

bq. Do you have an alternative plan?

https://issues.apache.org/jira/browse/HBASE-3529?focusedCommentId=13040465page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13040465

bq. Are there issues around distributed search? It looks like it wasn't in your 
github branch

https://issues.apache.org/jira/browse/HBASE-3529?focusedCommentId=13042913page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13042913

bq. What about storing the index in HBase itself?

I think that's a great idea to test, though in a different Jira issue.

bq. PKIndexSplitter

That's LUCENE-2919.  Given it's not been committed I may need to bring it over 
into the HBase search source tree.

 Add search to HBase
 ---

 Key: HBASE-3529
 URL: https://issues.apache.org/jira/browse/HBASE-3529
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.0
Reporter: Jason Rutherglen
 Attachments: HBASE-3529.patch


 Using the Apache Lucene library we can add freetext search to HBase.  The 
 advantages of this are:
 * HBase is highly scalable and distributed
 * HBase is realtime
 * Lucene is a fast inverted index and will soon be realtime (see LUCENE-2312)
 * Lucene offers many types of queries not currently available in HBase (eg, 
 AND, OR, NOT, phrase, etc)
 * It's easier to build scalable realtime systems on top of already 
 architecturally sound, scalable realtime data system, eg, HBase.
 * Scaling realtime search will be as simple as scaling HBase.
 Phase 1 - Indexing:
 * Integrate Lucene into HBase such that an index mirrors a given region.  
 This means cascading add, update, and deletes between a Lucene index and an 
 HBase region (and vice versa).
 * Define meta-data to mark a region as indexed, and use a Solr schema to 
 allow the user to define the fields and analyzers.
 * Integrate with the HLog to ensure that index recovery can occur properly 
 (eg, on region server failure)
 * Mirror region splits with indexes (use Lucene's IndexSplitter?)
 * When a region is written to HDFS, also write the corresponding Lucene index 
 to HDFS.
 * A row key will be the ID of a given Lucene document.  The Lucene docstore 
 will explicitly not be used because the document/row data is stored in HBase. 
  We will need to solve what the best data structure for efficiently mapping a 
 docid - row key is.  It could be a docstore, field cache, column stride 
 fields, or some other mechanism.
 * Write unit tests for the above
 Phase 2 - Queries:
 * Enable distributed Lucene queries
 * Regions that have Lucene indexes are inherently available and may be 
 searched on, meaning there's no need for a separate search related system in 
 Zookeeper.
 * Integrate search with HBase's RPC mechanism

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3529) Add search to HBase

2011-06-08 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046274#comment-13046274
 ] 

Otis Gospodnetic commented on HBASE-3529:
-

Re 
https://issues.apache.org/jira/browse/HBASE-3529?focusedCommentId=13042913page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13042913

Does that mean that in order to implement distributed search you'll immediately 
convert this to HBase+Solr instead of HBase+Lucene, so that you don't have to 
do Lucene-level distributed search?  If so, what about NRTness that will be 
lost until Solr gets NRT search?


 Add search to HBase
 ---

 Key: HBASE-3529
 URL: https://issues.apache.org/jira/browse/HBASE-3529
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.0
Reporter: Jason Rutherglen
 Attachments: HBASE-3529.patch


 Using the Apache Lucene library we can add freetext search to HBase.  The 
 advantages of this are:
 * HBase is highly scalable and distributed
 * HBase is realtime
 * Lucene is a fast inverted index and will soon be realtime (see LUCENE-2312)
 * Lucene offers many types of queries not currently available in HBase (eg, 
 AND, OR, NOT, phrase, etc)
 * It's easier to build scalable realtime systems on top of already 
 architecturally sound, scalable realtime data system, eg, HBase.
 * Scaling realtime search will be as simple as scaling HBase.
 Phase 1 - Indexing:
 * Integrate Lucene into HBase such that an index mirrors a given region.  
 This means cascading add, update, and deletes between a Lucene index and an 
 HBase region (and vice versa).
 * Define meta-data to mark a region as indexed, and use a Solr schema to 
 allow the user to define the fields and analyzers.
 * Integrate with the HLog to ensure that index recovery can occur properly 
 (eg, on region server failure)
 * Mirror region splits with indexes (use Lucene's IndexSplitter?)
 * When a region is written to HDFS, also write the corresponding Lucene index 
 to HDFS.
 * A row key will be the ID of a given Lucene document.  The Lucene docstore 
 will explicitly not be used because the document/row data is stored in HBase. 
  We will need to solve what the best data structure for efficiently mapping a 
 docid - row key is.  It could be a docstore, field cache, column stride 
 fields, or some other mechanism.
 * Write unit tests for the above
 Phase 2 - Queries:
 * Enable distributed Lucene queries
 * Regions that have Lucene indexes are inherently available and may be 
 searched on, meaning there's no need for a separate search related system in 
 Zookeeper.
 * Integrate search with HBase's RPC mechanism

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-3967) Add support to HFileOutputFormat based bulk imports to add Delete mutations

2011-06-08 Thread Kannan Muthukkaruppan (JIRA)
Add support to HFileOutputFormat based bulk imports to add Delete mutations
---

 Key: HBASE-3967
 URL: https://issues.apache.org/jira/browse/HBASE-3967
 Project: HBase
  Issue Type: Improvement
Reporter: Kannan Muthukkaruppan


During bulk imports, it'll be useful to be able to do delete mutations (either 
to delete data that already exists in HBase or was inserted earlier during this 
run of the import). 

For example, we have a use case, where we are processing a log of data which 
may have both inserts and deletes in the mix and we want to upload that into 
HBase using the bulk import mechanism.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3967) Support deletes in HFileOutputFormat based bulk import mechanism

2011-06-08 Thread Kannan Muthukkaruppan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kannan Muthukkaruppan updated HBASE-3967:
-

Summary: Support deletes in HFileOutputFormat based bulk import mechanism  
(was: Add support to HFileOutputFormat based bulk imports to add Delete 
mutations)

 Support deletes in HFileOutputFormat based bulk import mechanism
 

 Key: HBASE-3967
 URL: https://issues.apache.org/jira/browse/HBASE-3967
 Project: HBase
  Issue Type: Improvement
Reporter: Kannan Muthukkaruppan

 During bulk imports, it'll be useful to be able to do delete mutations 
 (either to delete data that already exists in HBase or was inserted earlier 
 during this run of the import). 
 For example, we have a use case, where we are processing a log of data which 
 may have both inserts and deletes in the mix and we want to upload that into 
 HBase using the bulk import mechanism.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-3968) HLog Pretty Printer

2011-06-08 Thread Nicolas Spiegelberg (JIRA)
HLog Pretty Printer
---

 Key: HBASE-3968
 URL: https://issues.apache.org/jira/browse/HBASE-3968
 Project: HBase
  Issue Type: New Feature
  Components: io, regionserver, util
Reporter: Nicolas Spiegelberg
Assignee: Nicolas Spiegelberg
Priority: Minor


We currently have a rudimentary way to print HLog data, but it is limited and 
currently prints key-only information. We need extend this functionality, 
similar to how we developed HFile's pretty printer. Ideas for functionality:

- filter by sequence_id
- filter by row / region
- option to print values in addition to key info
- option to print output in JSON format (so scripts can easily parse for 
analysis)


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1364) [performance] Distributed splitting of regionserver commit logs

2011-06-08 Thread mingjian (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046299#comment-13046299
 ] 

mingjian commented on HBASE-1364:
-

@stack  Prakash I will attach a patch in HBASE-3963.

 [performance] Distributed splitting of regionserver commit logs
 ---

 Key: HBASE-1364
 URL: https://issues.apache.org/jira/browse/HBASE-1364
 Project: HBase
  Issue Type: Improvement
  Components: coprocessors
Reporter: stack
Assignee: Prakash Khemani
Priority: Critical
 Fix For: 0.92.0

 Attachments: 1364-v5.txt, HBASE-1364.patch, 
 org.apache.hadoop.hbase.master.TestDistributedLogSplitting-output.txt

  Time Spent: 8h
  Remaining Estimate: 0h

 HBASE-1008 has some improvements to our log splitting on regionserver crash; 
 but it needs to run even faster.
 (Below is from HBASE-1008)
 In bigtable paper, the split is distributed. If we're going to have 1000 
 logs, we need to distribute or at least multithread the splitting.
 1. As is, regions starting up expect to find one reconstruction log only. 
 Need to make it so pick up a bunch of edit logs and it should be fine that 
 logs are elsewhere in hdfs in an output directory written by all split 
 participants whether multithreaded or a mapreduce-like distributed process 
 (Lets write our distributed sort first as a MR so we learn whats involved; 
 distributed sort, as much as possible should use MR framework pieces). On 
 startup, regions go to this directory and pick up the files written by split 
 participants deleting and clearing the dir when all have been read in. Making 
 it so can take multiple logs for input, can also make the split process more 
 robust rather than current tenuous process which loses all edits if it 
 doesn't make it to the end without error.
 2. Each column family rereads the reconstruction log to find its edits. Need 
 to fix that. Split can sort the edits by column family so store only reads 
 its edits.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3723) Major compact should be done when there is only one storefile and some keyvalue is outdated.

2011-06-08 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046320#comment-13046320
 ] 

stack commented on HBASE-3723:
--

Committed to branch.

 Major compact should be done when there is only one storefile and some 
 keyvalue is outdated.
 

 Key: HBASE-3723
 URL: https://issues.apache.org/jira/browse/HBASE-3723
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.0, 0.90.1
Reporter: zhoushuaifeng
 Fix For: 0.90.2

 Attachments: hbase-3723.txt


 In the function store.isMajorCompaction:
   if (filesToCompact.size() == 1) {
 // Single file
 StoreFile sf = filesToCompact.get(0);
 long oldest =
 (sf.getReader().timeRangeTracker == null) ?
 Long.MIN_VALUE :
 now - sf.getReader().timeRangeTracker.minimumTimestamp;
 if (sf.isMajorCompaction() 
 (this.ttl == HConstants.FOREVER || oldest  this.ttl)) {
   if (LOG.isDebugEnabled()) {
 LOG.debug(Skipping major compaction of  + this.storeNameStr +
  because one (major) compacted file only and oldestTime  +
 oldest + ms is  ttl= + this.ttl);
   }
 }
   } else {
 When there is only one storefile in the store, and some keyvalues' TTL are 
 overtime, the majorcompactchecker should send this region to the compactquene 
 and run a majorcompact to clean these outdated data. But according to the 
 code in 0.90.1, it will do nothing. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3892) Table can't disable

2011-06-08 Thread gaojinchao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

gaojinchao updated HBASE-3892:
--

Attachment: AssignmentManager_90v4.patch

 Table can't disable
 ---

 Key: HBASE-3892
 URL: https://issues.apache.org/jira/browse/HBASE-3892
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.3
Reporter: gaojinchao
 Fix For: 0.90.4

 Attachments: AssignmentManager_90v3.patch, 
 AssignmentManager_90v4.patch, logs.rar


 In TimeoutMonitor : 
 if node exists and node state is RS_ZK_REGION_CLOSED
 We should send a zk message again when close region is timeout.
 in this case, It may be loss some message.
 I See. It seems like a bug. This is my analysis.
 // disable table and master sent Close message to region server, Region state 
 was set PENDING_CLOSE
 2011-05-08 17:44:25,745 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 serverName=C4C4.site,60020,1304820199467, load=(requests=0, regions=123, 
 usedHeap=4097, maxHeap=8175) for region 
 ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
 2011-05-08 17:44:45,530 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Received REGION_SPLIT: 
 ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
  Daughters; 
 ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
  
 ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
  from C4C4.site,60020,1304820199467
 2011-05-08 17:45:45,542 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Received REGION_SPLIT: 
 ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
  Daughters; 
 ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
  
 ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
  from C4C4.site,60020,1304820199467
 // received splitting message and cleared Region state (PENDING_CLOSE)
 2011-05-08 17:46:45,303 WARN 
 org.apache.hadoop.hbase.master.AssignmentManager: Overwriting 
 4418fb197685a21f77e151e401cf8b66 on serverName=C4C4.site,60020,1304820199467, 
 load=(requests=0, regions=123, usedHeap=4097, maxHeap=8175)
 2011-05-08 17:46:45,538 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Received REGION_SPLIT: 
 ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
  Daughters; 
 ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
  
 ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
  from C4C4.site,60020,1304820199467
 2011-05-08 17:47:45,548 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Received REGION_SPLIT: 
 ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
  Daughters; 
 ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
  
 ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
  from C4C4.site,60020,1304820199467
 2011-05-08 17:48:45,545 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Received REGION_SPLIT: 
 ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
  Daughters; 
 ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
  
 ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
  from C4C4.site,60020,1304820199467
 2011-05-08 17:49:46,108 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Received REGION_SPLIT: 
 ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
  Daughters; 
 ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
  
 ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
  from C4C4.site,60020,1304820199467
 2011-05-08 17:50:46,105 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Received REGION_SPLIT: 
 ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
  Daughters; 
 ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
  
 ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
  from C4C4.site,60020,1304820199467
 2011-05-08 17:51:46,117 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Received REGION_SPLIT: 
 ufdr,2011050812#8613817306227#0516,1304845660567.8e9a3b05abe1c3a692999cf5e8dfd9dd.:
  Daughters; 
 ufdr,2011050812#8613817306227#0516,1304847764729.5e4bca85c33fa6605ffc9a5c2eb94e62.,
  
 ufdr,2011050812#8613817398167#4032,1304847764729.4418fb197685a21f77e151e401cf8b66.
  from C4C4.site,60020,1304820199467
 2011-05-08 17:52:46,112 INFO org.apache.hadoop.hbase.master.ServerManager: 
 Received REGION_SPLIT: