[jira] [Commented] (HBASE-4393) Implement a canary monitoring program

2011-09-14 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104626#comment-13104626
 ] 

Otis Gospodnetic commented on HBASE-4393:
-

Todd, where/how would the metrics be published?  JMX perhaps?
Please see my comment on HBASE-4147: 
https://issues.apache.org/jira/browse/HBASE-4147?focusedCommentId=13104623&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13104623

> Implement a canary monitoring program
> -
>
> Key: HBASE-4393
> URL: https://issues.apache.org/jira/browse/HBASE-4393
> Project: HBase
>  Issue Type: New Feature
>  Components: monitoring
>Affects Versions: 0.92.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>
> This JIRA is to implement a standalone program that can be used to do "canary 
> monitoring" of a running HBase cluster. This program would gather a list of 
> the regions in the cluster, then iterate over them doing lightweight 
> operations (eg short scans) to provide metrics about latency as well as alert 
> on availability issues.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4147) StoreFile query usage report

2011-09-14 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104623#comment-13104623
 ] 

Otis Gospodnetic commented on HBASE-4147:
-

+1 for Todd's tracing idea (is this already in a separate JIRA issue that I 
can't find?)
+1 for what Gary said about using existing mechanisms for publishing metrics, 
so that those of us who already have tools to gather and aggregate data from 
there can just keep using those tools instead of developing new things that 
scrape metrics from additional places.


> StoreFile query usage report
> 
>
> Key: HBASE-4147
> URL: https://issues.apache.org/jira/browse/HBASE-4147
> Project: HBase
>  Issue Type: Improvement
>Reporter: Doug Meil
>Priority: Minor
> Attachments: hbase_4147_storefilereport.pdf, 
> hbase_4147_storefilereport_2011_08_10.pdf
>
>
> Detailed information on what HBase is doing in terms of reads is hard to come 
> by.
> What would be useful is to have a periodic StoreFile query report.  
> Specifically, this could run on a configured interval (e.g., every 30 
> seconds, 60 seconds) and dump the output to the log files.
> This would have all StoreFiles accessed during the reporting period (and with 
> the Path we would also know region, CF, and table), # of times the StoreFile 
> was accessed, the size of the StoreFile, and the total time (ms) spent 
> processing that StoreFile.
> Even this level of summary would be useful to detect a which tables & CFs are 
> being accessed the most, and including the StoreFile would provide insight 
> into relative "uncompaction" (i.e., lots of StoreFiles).
> I think the log-output, as opposed to UI, is an important facet with this.  
> I'm assuming that users will slice and dice this data on their own so I think 
> we should skip any kind of admin view for now (i.e., new JSPs, new APIs to 
> expose this data).  Just getting this to log-file would be a big improvement.
> Will this have a non-zero performance impact?  Yes.  Hopefully small, but yes 
> it will.  However, flying a plane without any instrumentation isn't fun.  :-) 
>  
>  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4046) expose more statistics from HBase Master node

2011-09-14 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104627#comment-13104627
 ] 

Otis Gospodnetic commented on HBASE-4046:
-

Nileema - +1 for getting this in.  Long live metrics!


> expose more statistics from HBase Master node
> -
>
> Key: HBASE-4046
> URL: https://issues.apache.org/jira/browse/HBASE-4046
> Project: HBase
>  Issue Type: Improvement
>Reporter: nileema shingte
>Assignee: nileema shingte
>Priority: Minor
> Attachments: HBASE-4046.patch
>
>
> We can add the following statistics to the master. Some stats that can be 
> added are: 
> 1. number of logs split 
> 2. size of logs split
> 3. number of region servers online 
> 4. number of region servers opened
> 5. number of region servers expired 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-3507) requests count per HRegion and rebalance command

2011-09-14 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic updated HBASE-3507:


Component/s: metrics

> requests count per HRegion and rebalance command
> 
>
> Key: HBASE-3507
> URL: https://issues.apache.org/jira/browse/HBASE-3507
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics, performance, regionserver
>Reporter: Sebastian Bauer
>Assignee: Ted Yu
> Fix For: 0.92.0
>
> Attachments: hbase-3507-high-scale.txt, hbase-3507-split-merge.txt, 
> hbase-3507-splitTxn.txt, hbase-3507-v4.txt, hbase-requestsCount-2.patch, 
> hbase-requestsCount-v2.patch, hbase-requestsCount.patch
>
>
> Path-1 add another mertic for HRegion to count request made to region.
> Path-2 add another command to hbase shell to grab all regions, sort by 
> requests from Path-1 and move in round-robin algorithm to servers

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HBASE-3455) Heap fragmentation in region server

2011-02-08 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992302#comment-12992302
 ] 

Otis Gospodnetic commented on HBASE-3455:
-

In plain English, what's the end effect of this change?  Elimination or 
minimalization of stop zee world GC pauses?  Or maybe (also) lower memory 
consumption? Or...?  Thanks.

> Heap fragmentation in region server
> ---
>
> Key: HBASE-3455
> URL: https://issues.apache.org/jira/browse/HBASE-3455
> Project: HBase
>  Issue Type: Brainstorming
>  Components: performance, regionserver
>Affects Versions: 0.90.1
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Critical
> Fix For: 0.90.1
>
> Attachments: HBasefragmentation.pdf, collapse-arrays.patch, 
> icv-frag.png, mslab-1.txt, mslab-2.txt, mslab-3.txt, mslab-4.txt, 
> mslab-5.txt, mslab-6.txt, parse-fls-statistics.py, with-kvallocs.png
>
>
> Stop-the-world GC pauses have long been a problem in HBase. "Concurrent mode 
> failures" can usually be tuned around by setting the initiating occupancy 
> fraction low, but eventually the heap becomes fragmented and a promotion 
> failure occurs.
> This JIRA is to do research/experiments about the heap fragmentation issue 
> and possible solutions.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HBASE-2886) Add search box to site

2011-02-21 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12997468#comment-12997468
 ] 

Otis Gospodnetic commented on HBASE-2886:
-

Oops, it looks like the search box got lost, probably during the Feb 10 release.
Stack, any way you can revive it?


> Add search box to site
> --
>
> Key: HBASE-2886
> URL: https://issues.apache.org/jira/browse/HBASE-2886
> Project: HBase
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Alex Baranau
>Priority: Minor
> Fix For: 0.90.0
>
> Attachments: add-search-box.patch
>
>
> Add search box to HBase site which directs users to http://search-hadoop.com
> This was discussed on mailing list: http://search-hadoop.com/m/pv9ndRH2tc

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HBASE-3424) Coprocessors: Add metrics for custom RPC methods called through HTable.coprocessorExec()

2011-02-24 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12998982#comment-12998982
 ] 

Otis Gospodnetic commented on HBASE-3424:
-

I don't follow if this issue represents something that needs to be implemented? 
 It sounds like it's just pointing to HBASE-3405.  Or maybe this is a 
placeholder for future patches for metrics around Coprocessors?


> Coprocessors: Add metrics for custom RPC methods called through 
> HTable.coprocessorExec()
> 
>
> Key: HBASE-3424
> URL: https://issues.apache.org/jira/browse/HBASE-3424
> Project: HBase
>  Issue Type: Sub-task
>  Components: ipc
>Reporter: Gary Helmling
>Priority: Minor
>
> Currently HBaseRpcMetrics only reports on known RPC methods in core HBase.  
> With HBASE-3405 we added hooks to register metrics for new methods.  We can 
> make use of these for tabulating metrics on custom CoprocessorProtocol 
> extensions invoked through HTable.coprocessorProxy() and 
> HTable.coprocessorExec().

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HBASE-1935) Scan in parallel

2011-02-25 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12999497#comment-12999497
 ] 

Otis Gospodnetic commented on HBASE-1935:
-

I'm wondering what sort of a speed improvement one can expect from parallel 
scans?  I know there is no universal answer, but if anyone has used this, I'd 
love to get the feeling for this.  Thanks.

> Scan in parallel
> 
>
> Key: HBASE-1935
> URL: https://issues.apache.org/jira/browse/HBASE-1935
> Project: HBase
>  Issue Type: New Feature
>  Components: coprocessors
>Reporter: stack
> Fix For: 0.92.0
>
> Attachments: pscanner-v2.patch, pscanner-v3.patch, pscanner-v4.patch, 
> pscanner.patch
>
>
> A scanner that rather than scan in series, instead scanned multiple regions 
> in parallell would be more involved but could complete much faster 
> partiularly if results are sparse.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Created: (HBASE-3597) ageOfLastAppliedOp should update after cluster replication failures

2011-03-03 Thread Otis Gospodnetic (JIRA)
ageOfLastAppliedOp should update after cluster replication failures
---

 Key: HBASE-3597
 URL: https://issues.apache.org/jira/browse/HBASE-3597
 Project: HBase
  Issue Type: Bug
  Components: replication
Affects Versions: 0.90.1
Reporter: Otis Gospodnetic
 Fix For: 0.90.2


The value of ageOfLastAppliedOp in JMX doesn't update after replication starts 
failing, and it should. See: http://search-hadoop.com/m/jFPgF1HfnLc

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HBASE-3597) ageOfLastAppliedOp should update after cluster replication failures

2011-03-03 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13002273#comment-13002273
 ] 

Otis Gospodnetic commented on HBASE-3597:
-

btw. this is filed under "replication" component because there is no "cluster 
replication" component.  Maybe it should be added?

> ageOfLastAppliedOp should update after cluster replication failures
> ---
>
> Key: HBASE-3597
> URL: https://issues.apache.org/jira/browse/HBASE-3597
> Project: HBase
>  Issue Type: Bug
>  Components: replication
>Affects Versions: 0.90.1
>Reporter: Otis Gospodnetic
> Fix For: 0.90.2
>
>
> The value of ageOfLastAppliedOp in JMX doesn't update after replication 
> starts failing, and it should. See: http://search-hadoop.com/m/jFPgF1HfnLc

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (HBASE-1476) scaling compaction with multiple threads

2011-03-17 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic updated HBASE-1476:


Summary: scaling compaction with multiple threads  (was: scaling compaction 
with milti threads)

> scaling compaction with multiple threads
> 
>
> Key: HBASE-1476
> URL: https://issues.apache.org/jira/browse/HBASE-1476
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: Billy Pearson
>Assignee: Jonathan Gray
>  Labels: moved_from_0_20_5
> Fix For: 0.92.0
>
>
> Was thinking we should build in support to be able to handle more then one 
> thread for compactions this will allow us to keep up with compactions when we 
> get to the point where we store Tb's of data per node and may regions
> Maybe a configurable setting to set how many threads a region server can use 
> for compactions.
> With compression turned on my compactions are limited by cpu speed with multi 
> cores then it would be nice to be able to scale compactions to 2 or more 
> cores.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HBASE-2646) Compaction requests should be prioritized to prevent blocking

2011-03-17 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008021#comment-13008021
 ] 

Otis Gospodnetic commented on HBASE-2646:
-

Should Fix Version/s be set for this one, so it doesn't get missed?  Looks 
important and ready with a patch. :)

> Compaction requests should be prioritized to prevent blocking
> -
>
> Key: HBASE-2646
> URL: https://issues.apache.org/jira/browse/HBASE-2646
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 0.20.4
> Environment: ubuntu server 10; hbase 0.20.4; 4 machine cluster (each 
> machine is an 8 core xeon with 16 GB of ram and 6TB of storage); ~250 Million 
> rows;
>Reporter: Jeff Whiting
>Priority: Critical
>  Labels: compaction, split
> Attachments: 2646-fix-race-condition-r1004349.txt, 2646-v2.txt, 
> 2646-v3.txt, PriorityQueue-r996664.patch, prioritycompactionqueue-0.20.4.patch
>
>
> While testing the write capacity of a 4 machine hbase cluster we were getting 
> long and frequent client pauses as we attempted to load the data.  Looking 
> into the problem we'd get a relatively large compaction queue and when a 
> region hit the "hbase.hstore.blockingStoreFiles" limit it would get block the 
> client and the compaction request would get put on the back of the queue 
> waiting for many other less important compactions.  The client is basically 
> stuck at that point until a compaction is done.  Prioritizing the compaction 
> requests and allowing the request that is blocking other actions go first 
> would help solve the problem.
> You can see the problem by looking at our log files:
> You'll first see an event such as a too many HLog which will put a lot of 
> requests on the compaction queue.
> {noformat}
> 2010-05-25 10:53:26,570 INFO org.apache.hadoop.hbase.regionserver.HLog: Too 
> many hlogs: logs=33, maxlogs=32; forcing flush of 22 regions(s): 
> responseCounts,RS_6eZzLtdwhGiTwHy,1274232223324, 
> responses,RS_0qhkL5rUmPCbx3K-1274213057242,1274513189592, 
> responses,RS_1ANYnTegjzVIsHW-12742177419
> 21,1274511001873, responses,RS_1HQ4UG5BdOlAyuE-1274216757425,1274726323747, 
> responses,RS_1Y7SbqSTsZrYe7a-1274328697838,1274478031930, 
> responses,RS_1ZH5TB5OdW4BVLm-1274216239894,1274538267659, 
> responses,RS_3BHc4KyoM3q72Yc-1274290546987,1274502062319, 
> responses,RS_3ra9BaBMAXFAvbK-127421457
> 9958,1274381552543, responses,RS_6SDrGNuyyLd3oR6-1274219941155,1274385453586, 
> responses,RS_8AGCEMWbI6mZuoQ-1274306857429,1274319602718, 
> responses,RS_8C8T9DN47uwTG1S-1274215381765,1274289112817, 
> responses,RS_8J5wmdmKmJXzK6g-1274299593861,1274494738952, 
> responses,RS_8e5Sz0HeFPAdb6c-1274288
> 641459,1274495868557, 
> responses,RS_8rjcnmBXPKzI896-1274306981684,1274403047940, 
> responses,RS_9FS3VedcyrF0KX2-1274245971331,1274754745013, 
> responses,RS_9oZgPtxO31npv3C-1274214027769,1274396489756, 
> responses,RS_a3FdO2jhqWuy37C-1274209228660,1274399508186, 
> responses,RS_a3LJVxwTj29MHVa-12742
> {noformat}
> Then you see the too many log files:
> {noformat}
> 2010-05-25 10:53:31,364 DEBUG 
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction requested 
> for region 
> responses-index,--1274799047787--R_cBKrGxx0FdWjPso,1274804575862/783020138 
> because: regionserver/192.168.0.81:60020.cacheFlusher
> 2010-05-25 10:53:32,364 WARN 
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Region 
> responses-index,--1274799047787--R_cBKrGxx0FdWjPso,1274804575862 has too many 
> store files, putting it back at the end of the flush queue.
> {noformat}
> Which leads to this: 
> {noformat}
> 2010-05-25 10:53:27,061 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
> Blocking updates for 'IPC Server handler 60 on 60020' on region 
> responses-index,--1274799047787--R_cBKrGxx0FdWjPso,1274804575862: memstore 
> size 128.0m is >= than blocking 128.0m size
> 2010-05-25 10:53:27,061 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
> Blocking updates for 'IPC Server handler 84 on 60020' on region 
> responses-index,--1274799047787--R_cBKrGxx0FdWjPso,1274804575862: memstore 
> size 128.0m is >= than blocking 128.0m size
> 2010-05-25 10:53:27,065 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
> Blocking updates for 'IPC Server handler 1 on 60020' on region 
> responses-index,--1274799047787--R_cBKrGxx0FdWjPso,1274804575862: memstore 
> size 128.0m is >= than blocking 128.0m size
> {noformat}
> Once the compaction / split is done a flush is able to happen which unblocks 
> the IPC allowing writes to continue.  Unfortunately this process can take 
> upwards of 15+ minutes (the specific case shown here from our logs took about 
> 4 minutes).

--
This message is automatically generated by JIRA.
For more information on J

[jira] [Commented] (HBASE-3529) Add search to HBase

2011-04-05 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016055#comment-13016055
 ] 

Otis Gospodnetic commented on HBASE-3529:
-

Jason, what is the current state of this work?  Does it work with the trunk?  
Is there a list of issues/problems that need to be fixed before this can be 
called "working"? Thanks!

> Add search to HBase
> ---
>
> Key: HBASE-3529
> URL: https://issues.apache.org/jira/browse/HBASE-3529
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.0
>Reporter: Jason Rutherglen
> Attachments: HBASE-3529.patch, 
> lucene-analyzers-common-4.0-SNAPSHOT.jar, lucene-core-4.0-SNAPSHOT.jar, 
> lucene-misc-4.0-SNAPSHOT.jar
>
>
> Using the Apache Lucene library we can add freetext search to HBase.  The 
> advantages of this are:
> * HBase is highly scalable and distributed
> * HBase is realtime
> * Lucene is a fast inverted index and will soon be realtime (see LUCENE-2312)
> * Lucene offers many types of queries not currently available in HBase (eg, 
> AND, OR, NOT, phrase, etc)
> * It's easier to build scalable realtime systems on top of already 
> architecturally sound, scalable realtime data system, eg, HBase.
> * Scaling realtime search will be as simple as scaling HBase.
> Phase 1 - Indexing:
> * Integrate Lucene into HBase such that an index mirrors a given region.  
> This means cascading add, update, and deletes between a Lucene index and an 
> HBase region (and vice versa).
> * Define meta-data to mark a region as indexed, and use a Solr schema to 
> allow the user to define the fields and analyzers.
> * Integrate with the HLog to ensure that index recovery can occur properly 
> (eg, on region server failure)
> * Mirror region splits with indexes (use Lucene's IndexSplitter?)
> * When a region is written to HDFS, also write the corresponding Lucene index 
> to HDFS.
> * A row key will be the ID of a given Lucene document.  The Lucene docstore 
> will explicitly not be used because the document/row data is stored in HBase. 
>  We will need to solve what the best data structure for efficiently mapping a 
> docid -> row key is.  It could be a docstore, field cache, column stride 
> fields, or some other mechanism.
> * Write unit tests for the above
> Phase 2 - Queries:
> * Enable distributed Lucene queries
> * Regions that have Lucene indexes are inherently available and may be 
> searched on, meaning there's no need for a separate search related system in 
> Zookeeper.
> * Integrate search with HBase's RPC mechanism

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3529) Add search to HBase

2011-04-05 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016085#comment-13016085
 ] 

Otis Gospodnetic commented on HBASE-3529:
-

Thanks Jason.  What's the Solr dependency about?  I thought your idea is to go 
with pure Lucene-level HBase + indexing integration, not Solr.  I do see you 
mention Solr's schema in the initial comments in this issue, but can't find any 
mentions of Solr in your patch.  Could you please clarify the approach?  Oh, 
and if the ML is a better medium, I can move my questions there.  Thanks.

> Add search to HBase
> ---
>
> Key: HBASE-3529
> URL: https://issues.apache.org/jira/browse/HBASE-3529
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.0
>Reporter: Jason Rutherglen
> Attachments: HBASE-3529.patch, 
> lucene-analyzers-common-4.0-SNAPSHOT.jar, lucene-core-4.0-SNAPSHOT.jar, 
> lucene-misc-4.0-SNAPSHOT.jar
>
>
> Using the Apache Lucene library we can add freetext search to HBase.  The 
> advantages of this are:
> * HBase is highly scalable and distributed
> * HBase is realtime
> * Lucene is a fast inverted index and will soon be realtime (see LUCENE-2312)
> * Lucene offers many types of queries not currently available in HBase (eg, 
> AND, OR, NOT, phrase, etc)
> * It's easier to build scalable realtime systems on top of already 
> architecturally sound, scalable realtime data system, eg, HBase.
> * Scaling realtime search will be as simple as scaling HBase.
> Phase 1 - Indexing:
> * Integrate Lucene into HBase such that an index mirrors a given region.  
> This means cascading add, update, and deletes between a Lucene index and an 
> HBase region (and vice versa).
> * Define meta-data to mark a region as indexed, and use a Solr schema to 
> allow the user to define the fields and analyzers.
> * Integrate with the HLog to ensure that index recovery can occur properly 
> (eg, on region server failure)
> * Mirror region splits with indexes (use Lucene's IndexSplitter?)
> * When a region is written to HDFS, also write the corresponding Lucene index 
> to HDFS.
> * A row key will be the ID of a given Lucene document.  The Lucene docstore 
> will explicitly not be used because the document/row data is stored in HBase. 
>  We will need to solve what the best data structure for efficiently mapping a 
> docid -> row key is.  It could be a docstore, field cache, column stride 
> fields, or some other mechanism.
> * Write unit tests for the above
> Phase 2 - Queries:
> * Enable distributed Lucene queries
> * Regions that have Lucene indexes are inherently available and may be 
> searched on, meaning there's no need for a separate search related system in 
> Zookeeper.
> * Integrate search with HBase's RPC mechanism

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3529) Add search to HBase

2011-06-08 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046258#comment-13046258
 ] 

Otis Gospodnetic commented on HBASE-3529:
-

A few more comments/questions for Jason:

* I see PKIndexSplitter usage for splitting the index when a region splits.  I 
see you split the index, open 2 IndexWriters for 2 new Lucene indices, but then 
either you are not adding documents to them, or I'm not seeing it?

* Are there issues around distributed search?  It looks like it wasn't in your 
github branch.

* What will happen when a region changes its location/regionserver for whatever 
reason?  I see HDFS-2004 got -1ed and you said without that search will be 
slow.  Do you have an alternative plan?

* What is the reason for storing those 2 extra row fields? (the UID one at the 
other one... I think it's called rowStr or something like that)

* What about storing the index in HBase itself? (a la Solandra, I suppose)  
Would this be doable?  Would it make things simpler in the sense that any 
splitting or moving around, etc. may be handled by HBase and we wouldn't have 
to make sure the Lucene index always mirrors what's in a region and make sure 
it follows the region wherever it goes?  Lars' idea/question, and I hope I 
didn't misunderstand or misrepresent his ideas.


> Add search to HBase
> ---
>
> Key: HBASE-3529
> URL: https://issues.apache.org/jira/browse/HBASE-3529
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.0
>Reporter: Jason Rutherglen
> Attachments: HBASE-3529.patch
>
>
> Using the Apache Lucene library we can add freetext search to HBase.  The 
> advantages of this are:
> * HBase is highly scalable and distributed
> * HBase is realtime
> * Lucene is a fast inverted index and will soon be realtime (see LUCENE-2312)
> * Lucene offers many types of queries not currently available in HBase (eg, 
> AND, OR, NOT, phrase, etc)
> * It's easier to build scalable realtime systems on top of already 
> architecturally sound, scalable realtime data system, eg, HBase.
> * Scaling realtime search will be as simple as scaling HBase.
> Phase 1 - Indexing:
> * Integrate Lucene into HBase such that an index mirrors a given region.  
> This means cascading add, update, and deletes between a Lucene index and an 
> HBase region (and vice versa).
> * Define meta-data to mark a region as indexed, and use a Solr schema to 
> allow the user to define the fields and analyzers.
> * Integrate with the HLog to ensure that index recovery can occur properly 
> (eg, on region server failure)
> * Mirror region splits with indexes (use Lucene's IndexSplitter?)
> * When a region is written to HDFS, also write the corresponding Lucene index 
> to HDFS.
> * A row key will be the ID of a given Lucene document.  The Lucene docstore 
> will explicitly not be used because the document/row data is stored in HBase. 
>  We will need to solve what the best data structure for efficiently mapping a 
> docid -> row key is.  It could be a docstore, field cache, column stride 
> fields, or some other mechanism.
> * Write unit tests for the above
> Phase 2 - Queries:
> * Enable distributed Lucene queries
> * Regions that have Lucene indexes are inherently available and may be 
> searched on, meaning there's no need for a separate search related system in 
> Zookeeper.
> * Integrate search with HBase's RPC mechanism

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3529) Add search to HBase

2011-06-08 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046274#comment-13046274
 ] 

Otis Gospodnetic commented on HBASE-3529:
-

Re 
https://issues.apache.org/jira/browse/HBASE-3529?focusedCommentId=13042913&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13042913

Does that mean that in order to implement distributed search you'll immediately 
convert this to HBase+Solr instead of HBase+Lucene, so that you don't have to 
do Lucene-level distributed search?  If so, what about NRTness that will be 
lost until Solr gets NRT search?


> Add search to HBase
> ---
>
> Key: HBASE-3529
> URL: https://issues.apache.org/jira/browse/HBASE-3529
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.90.0
>Reporter: Jason Rutherglen
> Attachments: HBASE-3529.patch
>
>
> Using the Apache Lucene library we can add freetext search to HBase.  The 
> advantages of this are:
> * HBase is highly scalable and distributed
> * HBase is realtime
> * Lucene is a fast inverted index and will soon be realtime (see LUCENE-2312)
> * Lucene offers many types of queries not currently available in HBase (eg, 
> AND, OR, NOT, phrase, etc)
> * It's easier to build scalable realtime systems on top of already 
> architecturally sound, scalable realtime data system, eg, HBase.
> * Scaling realtime search will be as simple as scaling HBase.
> Phase 1 - Indexing:
> * Integrate Lucene into HBase such that an index mirrors a given region.  
> This means cascading add, update, and deletes between a Lucene index and an 
> HBase region (and vice versa).
> * Define meta-data to mark a region as indexed, and use a Solr schema to 
> allow the user to define the fields and analyzers.
> * Integrate with the HLog to ensure that index recovery can occur properly 
> (eg, on region server failure)
> * Mirror region splits with indexes (use Lucene's IndexSplitter?)
> * When a region is written to HDFS, also write the corresponding Lucene index 
> to HDFS.
> * A row key will be the ID of a given Lucene document.  The Lucene docstore 
> will explicitly not be used because the document/row data is stored in HBase. 
>  We will need to solve what the best data structure for efficiently mapping a 
> docid -> row key is.  It could be a docstore, field cache, column stride 
> fields, or some other mechanism.
> * Write unit tests for the above
> Phase 2 - Queries:
> * Enable distributed Lucene queries
> * Regions that have Lucene indexes are inherently available and may be 
> searched on, meaning there's no need for a separate search related system in 
> Zookeeper.
> * Integrate search with HBase's RPC mechanism

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-3484) Replace memstore's ConcurrentSkipListMap with our own implementation

2012-07-09 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409901#comment-13409901
 ] 

Otis Gospodnetic commented on HBASE-3484:
-

@JD - what would/should the ideal graph look like, roughly?


> Replace memstore's ConcurrentSkipListMap with our own implementation
> 
>
> Key: HBASE-3484
> URL: https://issues.apache.org/jira/browse/HBASE-3484
> Project: HBase
>  Issue Type: Improvement
>  Components: performance
>Affects Versions: 0.92.0
>Reporter: Todd Lipcon
>Priority: Critical
> Attachments: hierarchical-map.txt, memstore_drag.png
>
>
> By copy-pasting ConcurrentSkipListMap into HBase we can make two improvements 
> to it for our use case in MemStore:
> - add an iterator.replace() method which should allow us to do upsert much 
> more cheaply
> - implement a Set directly without having to do Map to 
> save one reference per entry
> It turns out CSLM is in public domain from its development as part of JSR 
> 166, so we should be OK with licenses.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6351) IO impact reduction for compaction

2012-07-11 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic updated HBASE-6351:


Description: 
The following came from Otis via http://search-hadoop.com/m/MGVqgZJ4Mj2 :

Lucene 4.0.0-Alpha was recently released.  Mike McCandless, sne of the Lucene 
developers, wrote a really nice post about new things in this version of 
Lucene.  The part that I think is interesting for HBase, and that HBase devs 
may want to look at (and borrow to use with compactions) is this:

Reducing merge IO impact 

Merging (consolidating many small segments into a single big one) is a very IO 
and CPU intensive operation which can easily interfere with ongoing searches. 
In 4.0.0 we now have two ways to reduct this impact:
* Rate-limit the IO caused by ongoing merging, by calling 
FSDirectory.setMaxMergeWriteMBPerSec. 


* Use the new NativeUnixDirectory which bypasses the OS's IO cache for 
all merge IO, by using direct IO. This ensures that a merge won't evict hot 
pages used by searches. (Note that there is also a native WindowsDirectory, but 
it does not yet use direct IO during merging... patches welcome!). 

Remember to also set swappiness to 0 on Linux if you want to maximize search 
responsiveness. 

More generally, the APIs that open an input or output file (Directory.openInput 
and Directory.createOutput) now take an IOContext describing what's being done 
(e.g., flush vs merge), so you can create a custom Directory that changes its 
behavior depending on the context. 

  was:
The following came from Otis:

Lucene 4.0.0-Alpha was recently released.  Mike McCandless, sne of the Lucene 
developers, wrote a really nice post about new things in this version of 
Lucene.  The part that I think is interesting for HBase, and that HBase devs 
may want to look at (and borrow to use with compactions) is this:

Reducing merge IO impact 

Merging (consolidating many small segments into a single big one) is a very IO 
and CPU intensive operation which can easily interfere with ongoing searches. 
In 4.0.0 we now have two ways to reduct this impact:
* Rate-limit the IO caused by ongoing merging, by calling 
FSDirectory.setMaxMergeWriteMBPerSec. 


* Use the new NativeUnixDirectory which bypasses the OS's IO cache for 
all merge IO, by using direct IO. This ensures that a merge won't evict hot 
pages used by searches. (Note that there is also a native WindowsDirectory, but 
it does not yet use direct IO during merging... patches welcome!). 

Remember to also set swappiness to 0 on Linux if you want to maximize search 
responsiveness. 

More generally, the APIs that open an input or output file (Directory.openInput 
and Directory.createOutput) now take an IOContext describing what's being done 
(e.g., flush vs merge), so you can create a custom Directory that changes its 
behavior depending on the context. 


> IO impact reduction for compaction
> --
>
> Key: HBASE-6351
> URL: https://issues.apache.org/jira/browse/HBASE-6351
> Project: HBase
>  Issue Type: Bug
>Reporter: Zhihong Ted Yu
>
> The following came from Otis via http://search-hadoop.com/m/MGVqgZJ4Mj2 :
> Lucene 4.0.0-Alpha was recently released.  Mike McCandless, sne of the Lucene 
> developers, wrote a really nice post about new things in this version of 
> Lucene.  The part that I think is interesting for HBase, and that HBase devs 
> may want to look at (and borrow to use with compactions) is this:
> Reducing merge IO impact 
> Merging (consolidating many small segments into a single big one) is a very 
> IO and CPU intensive operation which can easily interfere with ongoing 
> searches. In 4.0.0 we now have two ways to reduct this impact:
> * Rate-limit the IO caused by ongoing merging, by calling 
> FSDirectory.setMaxMergeWriteMBPerSec. 
> * Use the new NativeUnixDirectory which bypasses the OS's IO cache 
> for all merge IO, by using direct IO. This ensures that a merge won't evict 
> hot pages used by searches. (Note that there is also a native 
> WindowsDirectory, but it does not yet use direct IO during merging... patches 
> welcome!). 
> Remember to also set swappiness to 0 on Linux if you want to maximize search 
> responsiveness. 
> More generally, the APIs that open an input or output file 
> (Directory.openInput and Directory.createOutput) now take an IOContext 
> describing what's being done (e.g., flush vs merge), so you can create a 
> custom Directory that changes its behavior depending on the context. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6351) Stop compactions from polluting OS FS cache

2012-07-11 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic updated HBASE-6351:


Summary: Stop compactions from polluting OS FS cache   (was: IO impact 
reduction for compaction)

> Stop compactions from polluting OS FS cache 
> 
>
> Key: HBASE-6351
> URL: https://issues.apache.org/jira/browse/HBASE-6351
> Project: HBase
>  Issue Type: Bug
>Reporter: Zhihong Ted Yu
>
> The following came from Otis via http://search-hadoop.com/m/MGVqgZJ4Mj2 :
> Lucene 4.0.0-Alpha was recently released.  Mike McCandless, sne of the Lucene 
> developers, wrote a really nice post about new things in this version of 
> Lucene.  The part that I think is interesting for HBase, and that HBase devs 
> may want to look at (and borrow to use with compactions) is this:
> Reducing merge IO impact 
> Merging (consolidating many small segments into a single big one) is a very 
> IO and CPU intensive operation which can easily interfere with ongoing 
> searches. In 4.0.0 we now have two ways to reduct this impact:
> * Rate-limit the IO caused by ongoing merging, by calling 
> FSDirectory.setMaxMergeWriteMBPerSec. 
> * Use the new NativeUnixDirectory which bypasses the OS's IO cache 
> for all merge IO, by using direct IO. This ensures that a merge won't evict 
> hot pages used by searches. (Note that there is also a native 
> WindowsDirectory, but it does not yet use direct IO during merging... patches 
> welcome!). 
> Remember to also set swappiness to 0 on Linux if you want to maximize search 
> responsiveness. 
> More generally, the APIs that open an input or output file 
> (Directory.openInput and Directory.createOutput) now take an IOContext 
> describing what's being done (e.g., flush vs merge), so you can create a 
> custom Directory that changes its behavior depending on the context. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6574) Add HBase Ref Card pointer to Ref Guide

2012-08-13 Thread Otis Gospodnetic (JIRA)
Otis Gospodnetic created HBASE-6574:
---

 Summary: Add HBase Ref Card pointer to Ref Guide
 Key: HBASE-6574
 URL: https://issues.apache.org/jira/browse/HBASE-6574
 Project: HBase
  Issue Type: Improvement
  Components: documentation
Reporter: Otis Gospodnetic
Priority: Minor


The HBase Refcard is at http://refcardz.dzone.com/refcardz/hbase
Maybe it belongs to Appendix F?


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6575) Add SPM for HBase to Ref Guide

2012-08-13 Thread Otis Gospodnetic (JIRA)
Otis Gospodnetic created HBASE-6575:
---

 Summary: Add SPM for HBase to Ref Guide
 Key: HBASE-6575
 URL: https://issues.apache.org/jira/browse/HBASE-6575
 Project: HBase
  Issue Type: Improvement
  Components: documentation
Reporter: Otis Gospodnetic
Priority: Minor


Ref Guide should point users to SPM for HBase in monitoring section(s).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6575) Add SPM for HBase to Ref Guide

2012-08-13 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic updated HBASE-6575:


Attachment: HBASE-6575.patch

Patch for troubleshooting.xml and ops_mgt.xml

> Add SPM for HBase to Ref Guide
> --
>
> Key: HBASE-6575
> URL: https://issues.apache.org/jira/browse/HBASE-6575
> Project: HBase
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Otis Gospodnetic
>Priority: Minor
> Attachments: HBASE-6575.patch
>
>
> Ref Guide should point users to SPM for HBase in monitoring section(s).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6574) Add HBase Ref Card pointer to Ref Guide

2012-08-13 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic updated HBASE-6574:


Description: 
The HBase Refcard is at http://refcardz.dzone.com/refcardz/hbase
Maybe it belongs to Appendix F? [~dmeil]?


  was:
The HBase Refcard is at http://refcardz.dzone.com/refcardz/hbase
Maybe it belongs to Appendix F?



> Add HBase Ref Card pointer to Ref Guide
> ---
>
> Key: HBASE-6574
> URL: https://issues.apache.org/jira/browse/HBASE-6574
> Project: HBase
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Otis Gospodnetic
>Priority: Minor
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> The HBase Refcard is at http://refcardz.dzone.com/refcardz/hbase
> Maybe it belongs to Appendix F? [~dmeil]?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6575) Add SPM for HBase to Ref Guide

2012-08-16 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435936#comment-13435936
 ] 

Otis Gospodnetic commented on HBASE-6575:
-

@todd SPM is free, at least for now
Another option is to link to a Wiki page and let people list HBase monitoring 
tools and services there, so HBase users have one place to look for their 
monitoring options.


> Add SPM for HBase to Ref Guide
> --
>
> Key: HBASE-6575
> URL: https://issues.apache.org/jira/browse/HBASE-6575
> Project: HBase
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Otis Gospodnetic
>Priority: Minor
> Attachments: HBASE-6575.patch
>
>
> Ref Guide should point users to SPM for HBase in monitoring section(s).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5776) HTableMultiplexer

2012-05-08 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13270533#comment-13270533
 ] 

Otis Gospodnetic commented on HBASE-5776:
-

What happens when some of the puts fail even after N attempts?  Does the caller 
get notified that a failure happened and which puts failed?  If not, how should 
one deal with such situations?

What happens with puts that are in memory, not yet written to RS, and the app 
dies/stops for whatever reason.  Are those puts lost?


> HTableMultiplexer 
> --
>
> Key: HBASE-5776
> URL: https://issues.apache.org/jira/browse/HBASE-5776
> Project: HBase
>  Issue Type: Improvement
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Attachments: D2775.1.patch, D2775.1.patch, D2775.2.patch, 
> D2775.2.patch
>
>
> There is a known issue in HBase client that single slow/dead region server 
> could slow down the multiput operations across all the region servers. So the 
> HBase client will be as slow as the slowest region server in the cluster. 
>  
> To solve this problem, HTableMultiplexer will separate the multiput 
> submitting threads with the flush threads, which means the multiput operation 
> will be a nonblocking operation. 
> The submitting thread will shard all the puts into different queues based on 
> its destination region server and return immediately. The flush threads will 
> flush these puts from each queue to its destination region server. 
> Currently the HTableMultiplexer only supports the put operation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-5960) Expose HBase HDFS disk space usage via JMX

2012-05-08 Thread Otis Gospodnetic (JIRA)
Otis Gospodnetic created HBASE-5960:
---

 Summary: Expose HBase HDFS disk space usage via JMX
 Key: HBASE-5960
 URL: https://issues.apache.org/jira/browse/HBASE-5960
 Project: HBase
  Issue Type: Improvement
  Components: metrics, monitoring
Reporter: Otis Gospodnetic
Priority: Minor
 Fix For: 0.96.0, 0.94.1


HBase should expose via JMX how much HDFS disk space it is using.  See 
http://search-hadoop.com/m/s9VIx4Hhjz .


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5539) asynchbase PerformanceEvaluation

2012-05-10 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13272964#comment-13272964
 ] 

Otis Gospodnetic commented on HBASE-5539:
-

@Benoit, have you succeeded in running the benchmarks?  We are considering 
using asyncbase and would love to see your numbers from the comparison.

> asynchbase PerformanceEvaluation
> 
>
> Key: HBASE-5539
> URL: https://issues.apache.org/jira/browse/HBASE-5539
> Project: HBase
>  Issue Type: New Feature
>  Components: performance
>Reporter: Benoit Sigoure
>Assignee: Benoit Sigoure
>Priority: Minor
>  Labels: benchmark
> Attachments: 0001-asynchbase-PerformanceEvaluation.patch
>
>
> I plugged [asynchbase|https://github.com/stumbleupon/asynchbase] into 
> {{PerformanceEvaluation}}.  This enables testing asynchbase from 
> {{PerformanceEvaluation}} and comparing its performance to {{HTable}}.  Also 
> asynchbase doesn't come with any benchmark, so it was good that I was able to 
> plug it into {{PerformanceEvaluation}} relatively easily.
> I am in the processing of collecting results on a dev cluster running 0.92.1 
> and will publish them once they're ready.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5776) HTableMultiplexer

2012-05-11 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13273631#comment-13273631
 ] 

Otis Gospodnetic commented on HBASE-5776:
-

I read up on asynchbase yesterday.  Doesn't asynchbase already solve the 
problem this issue is aimed at?
See:
http://search-hadoop.com/m/J6olJ11Idb
http://search-hadoop.com/m/4fogb27wKWC


> HTableMultiplexer 
> --
>
> Key: HBASE-5776
> URL: https://issues.apache.org/jira/browse/HBASE-5776
> Project: HBase
>  Issue Type: Improvement
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Attachments: D2775.1.patch, D2775.1.patch, D2775.2.patch, 
> D2775.2.patch
>
>
> There is a known issue in HBase client that single slow/dead region server 
> could slow down the multiput operations across all the region servers. So the 
> HBase client will be as slow as the slowest region server in the cluster. 
>  
> To solve this problem, HTableMultiplexer will separate the multiput 
> submitting threads with the flush threads, which means the multiput operation 
> will be a nonblocking operation. 
> The submitting thread will shard all the puts into different queues based on 
> its destination region server and return immediately. The flush threads will 
> flush these puts from each queue to its destination region server. 
> Currently the HTableMultiplexer only supports the put operation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5776) HTableMultiplexer

2012-05-13 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13274411#comment-13274411
 ] 

Otis Gospodnetic commented on HBASE-5776:
-

I think asynchbase does the same thing - has a queue for each RS.
Compatibility - don't know off hand, check its repo on github.

> HTableMultiplexer 
> --
>
> Key: HBASE-5776
> URL: https://issues.apache.org/jira/browse/HBASE-5776
> Project: HBase
>  Issue Type: Improvement
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Attachments: D2775.1.patch, D2775.1.patch, D2775.2.patch, 
> D2775.2.patch
>
>
> There is a known issue in HBase client that single slow/dead region server 
> could slow down the multiput operations across all the region servers. So the 
> HBase client will be as slow as the slowest region server in the cluster. 
>  
> To solve this problem, HTableMultiplexer will separate the multiput 
> submitting threads with the flush threads, which means the multiput operation 
> will be a nonblocking operation. 
> The submitting thread will shard all the puts into different queues based on 
> its destination region server and return immediately. The flush threads will 
> flush these puts from each queue to its destination region server. 
> Currently the HTableMultiplexer only supports the put operation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6125) Expose HBase config properties via JMX

2012-05-29 Thread Otis Gospodnetic (JIRA)
Otis Gospodnetic created HBASE-6125:
---

 Summary: Expose HBase config properties via JMX
 Key: HBASE-6125
 URL: https://issues.apache.org/jira/browse/HBASE-6125
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.94.0
Reporter: Otis Gospodnetic
Priority: Minor
 Fix For: 0.96.0


It would make sense to expose HBase config properties via JMX so one can 
understand how HBase was configured by looking at JMX.

See:
http://search-hadoop.com/m/siI2o1rGyAj2&subj=Exposing+config+properties+via+JMX

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4050) Update HBase metrics framework to metrics2 framework

2012-05-31 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286759#comment-13286759
 ] 

Otis Gospodnetic commented on HBASE-4050:
-

Stack mentioned this is critical for 0.94, but Fix Version/s says "None".
Is anyone working on this by any chance?


> Update HBase metrics framework to metrics2 framework
> 
>
> Key: HBASE-4050
> URL: https://issues.apache.org/jira/browse/HBASE-4050
> Project: HBase
>  Issue Type: New Feature
>  Components: metrics
>Affects Versions: 0.90.4
> Environment: Java 6
>Reporter: Eric Yang
>Assignee: Shaneal Manek
>Priority: Critical
>
> Metrics Framework has been marked deprecated in Hadoop 0.20.203+ and 0.22+, 
> and it might get removed in future Hadoop release.  Hence, HBase needs to 
> revise the dependency of MetricsContext to use Metrics2 framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6170) Timeouts for row lock and scan should be separate

2012-06-06 Thread Otis Gospodnetic (JIRA)
Otis Gospodnetic created HBASE-6170:
---

 Summary: Timeouts for row lock and scan should be separate
 Key: HBASE-6170
 URL: https://issues.apache.org/jira/browse/HBASE-6170
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Affects Versions: 0.94.0
Reporter: Otis Gospodnetic
Priority: Minor
 Fix For: 0.96.0


Apparently the timeout used for row locking and for scanning is global.  It 
would be better to have two separate timeouts.
(opening the issue to make Lars George happy)


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6211) Latencies not in jmx

2012-06-15 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic updated HBASE-6211:


Component/s: monitoring
 metrics

> Latencies not in jmx
> 
>
> Key: HBASE-6211
> URL: https://issues.apache.org/jira/browse/HBASE-6211
> Project: HBase
>  Issue Type: Bug
>  Components: metrics, monitoring
> Environment: RegionServerMetrics pushes latency histograms to hadoop 
> metrics, but they are not getting into jmx.
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HBASE-6211-0.patch, HBASE-6211-1.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5601) Add per-column-family data block cache hit ratios

2012-06-15 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic updated HBASE-5601:


Component/s: monitoring
 metrics

> Add per-column-family data block cache hit ratios
> -
>
> Key: HBASE-5601
> URL: https://issues.apache.org/jira/browse/HBASE-5601
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics, monitoring
>Reporter: Mikhail Bautin
>Assignee: Enis Soztutar
>
> In addition to the overall block cache hit ratio it would be extremely useful 
> to have per-column-family data block cache hit ratio metrics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6261) Better approximate high-percentile percentile latency metrics

2012-06-25 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401178#comment-13401178
 ] 

Otis Gospodnetic commented on HBASE-6261:
-

@Andrew - Ted Dunning may have thoughts on this and/or pointers to Mahout math 
or something else.

> Better approximate high-percentile percentile latency metrics
> -
>
> Key: HBASE-6261
> URL: https://issues.apache.org/jira/browse/HBASE-6261
> Project: HBase
>  Issue Type: New Feature
>Reporter: Andrew Wang
>  Labels: metrics
>
> The existing reservoir-sampling based latency metrics in HBase are not 
> well-suited for providing accurate estimates of high-percentile (e.g. 90th, 
> 95th, or 99th) latency. This is a well-studied problem in the literature (see 
> [1] and [2]), the question is determining which methods best suit our needs 
> and then implementing it.
> Ideally, we should be able to estimate these high percentiles with minimal 
> memory and CPU usage as well as minimal error (e.g. 1% error on 90th, or .1% 
> on 99th). It's also desirable to provide this over different time-based 
> sliding windows, e.g. last 1 min, 5 mins, 15 mins, and 1 hour.
> I'll note that this would also be useful in HDFS, or really anywhere latency 
> metrics are kept.
> [1] http://www.cs.rutgers.edu/~muthu/bquant.pdf
> [2] http://infolab.stanford.edu/~manku/papers/04pods-sliding.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6261) Better approximate high-percentile percentile latency metrics

2012-06-26 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13401463#comment-13401463
 ] 

Otis Gospodnetic commented on HBASE-6261:
-

@Andrew See https://twitter.com/otisg/status/217487624804376576

> Better approximate high-percentile percentile latency metrics
> -
>
> Key: HBASE-6261
> URL: https://issues.apache.org/jira/browse/HBASE-6261
> Project: HBase
>  Issue Type: New Feature
>Reporter: Andrew Wang
>  Labels: metrics
>
> The existing reservoir-sampling based latency metrics in HBase are not 
> well-suited for providing accurate estimates of high-percentile (e.g. 90th, 
> 95th, or 99th) latency. This is a well-studied problem in the literature (see 
> [1] and [2]), the question is determining which methods best suit our needs 
> and then implementing it.
> Ideally, we should be able to estimate these high percentiles with minimal 
> memory and CPU usage as well as minimal error (e.g. 1% error on 90th, or .1% 
> on 99th). It's also desirable to provide this over different time-based 
> sliding windows, e.g. last 1 min, 5 mins, 15 mins, and 1 hour.
> I'll note that this would also be useful in HDFS, or really anywhere latency 
> metrics are kept.
> [1] http://www.cs.rutgers.edu/~muthu/bquant.pdf
> [2] http://infolab.stanford.edu/~manku/papers/04pods-sliding.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3743) Throttle major compaction

2012-11-06 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13491694#comment-13491694
 ] 

Otis Gospodnetic commented on HBASE-3743:
-

Is this issue still needed or did HBASE-5867 take care of compaction throttling?


> Throttle major compaction
> -
>
> Key: HBASE-3743
> URL: https://issues.apache.org/jira/browse/HBASE-3743
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: Joep Rottinghuis
>
> Add the ability to throttle major compaction.
> For those use cases when a stop-the-world approach is not practical, it is 
> useful to be able to throttle the impact that major compaction has on the 
> cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5776) HTableMultiplexer

2012-11-08 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493477#comment-13493477
 ] 

Otis Gospodnetic commented on HBASE-5776:
-

[~liangly] Any plans/ETA for getting this in trunk?

> HTableMultiplexer 
> --
>
> Key: HBASE-5776
> URL: https://issues.apache.org/jira/browse/HBASE-5776
> Project: HBase
>  Issue Type: Improvement
>Reporter: Liyin Tang
>Assignee: Liyin Tang
> Attachments: ASF.LICENSE.NOT.GRANTED--D2775.1.patch, 
> ASF.LICENSE.NOT.GRANTED--D2775.1.patch, 
> ASF.LICENSE.NOT.GRANTED--D2775.2.patch, 
> ASF.LICENSE.NOT.GRANTED--D2775.2.patch, 
> ASF.LICENSE.NOT.GRANTED--D2775.3.patch, 
> ASF.LICENSE.NOT.GRANTED--D2775.4.patch, ASF.LICENSE.NOT.GRANTED--D2775.5.patch
>
>
> There is a known issue in HBase client that single slow/dead region server 
> could slow down the multiput operations across all the region servers. So the 
> HBase client will be as slow as the slowest region server in the cluster. 
>  
> To solve this problem, HTableMultiplexer will separate the multiput 
> submitting threads with the flush threads, which means the multiput operation 
> will be a nonblocking operation. 
> The submitting thread will shard all the puts into different queues based on 
> its destination region server and return immediately. The flush threads will 
> flush these puts from each queue to its destination region server. 
> Currently the HTableMultiplexer only supports the put operation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1307) Threading writes and reads into HBase.

2012-11-12 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495806#comment-13495806
 ] 

Otis Gospodnetic commented on HBASE-1307:
-

Does HBASE-5776 (HTableMultiplexer) make this 3.5 years old issue obsolete?

> Threading writes and reads into HBase.
> --
>
> Key: HBASE-1307
> URL: https://issues.apache.org/jira/browse/HBASE-1307
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 0.90.0
>Reporter: Erik Holstad
>
> I created this issue to be the overall issue for threading to increase read 
> and write performance in HBase and to keep it as a discussion place about 
> threading of these elements in general. Today we are doing batching of  
> writes and from 0.20 you will be able to do that for reads too. The thing is 
> that the batching procedure doesn't use the ability to run these different 
> queries at the same time, but more like a series of queries. I think that 
> after getting a good stable 0.20 system down we should try to add threading 
> to increase throughput for both reading and writing. At the top level of 
> these calls I don't think that is is goin gto be to hard to do this in 
> parallel, where it gets a little bit more complicated is when you get down to 
> running a get query on memcache and all the storefiles at the same time, but 
> above that I don't see it being to hard. I do think that this should not be a 
> part of 0.20 but rather an optimization in 0.21 or so.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1307) Threading writes and reads into HBase.

2012-11-12 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495807#comment-13495807
 ] 

Otis Gospodnetic commented on HBASE-1307:
-

And maybe HBASE-1306 is then obsolete, too?

> Threading writes and reads into HBase.
> --
>
> Key: HBASE-1307
> URL: https://issues.apache.org/jira/browse/HBASE-1307
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Affects Versions: 0.90.0
>Reporter: Erik Holstad
>
> I created this issue to be the overall issue for threading to increase read 
> and write performance in HBase and to keep it as a discussion place about 
> threading of these elements in general. Today we are doing batching of  
> writes and from 0.20 you will be able to do that for reads too. The thing is 
> that the batching procedure doesn't use the ability to run these different 
> queries at the same time, but more like a series of queries. I think that 
> after getting a good stable 0.20 system down we should try to add threading 
> to increase throughput for both reading and writing. At the top level of 
> these calls I don't think that is is goin gto be to hard to do this in 
> parallel, where it gets a little bit more complicated is when you get down to 
> running a get query on memcache and all the storefiles at the same time, but 
> above that I don't see it being to hard. I do think that this should not be a 
> part of 0.20 but rather an optimization in 0.21 or so.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-74) [performance] When a get or scan request spans multiple columns, execute the reads in parallel

2012-11-20 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-74?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13501677#comment-13501677
 ] 

Otis Gospodnetic commented on HBASE-74:
---

Let's dig out this 5+ years old issue with last comment from 4.5+ years ago :)
Maybe this was actually implemented by now?


> [performance] When a get or scan request spans multiple columns, execute the 
> reads in parallel
> --
>
> Key: HBASE-74
> URL: https://issues.apache.org/jira/browse/HBASE-74
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: Jim Kellerman
>
> When a get or scan request spans multiple columns, execute the reads in 
> parallel and use a CountDownLatch to wait for them to complete before 
> returning the results.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-4120) isolation and allocation

2012-11-20 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13501678#comment-13501678
 ] 

Otis Gospodnetic commented on HBASE-4120:
-

[~liujia_ict] it looks like this got stuck last year.  I quickly looked at the 
comments here in JIRA and see this went through some reviews and was generally 
welcomes.  Would it be possible for you to upload a fresh patch to revive this?
TIP: please use the same/single name for the patch.  JIRA will overwrite the 
old one.  This way devs reviewing this JIRA won't have to figure out which 
patch to use (yes, one can look at dates, but when there are >10 patches 
attached, even that becomes painful)


> isolation and allocation
> 
>
> Key: HBASE-4120
> URL: https://issues.apache.org/jira/browse/HBASE-4120
> Project: HBase
>  Issue Type: New Feature
>  Components: master, regionserver
>Affects Versions: 0.90.2, 0.90.3, 0.90.4, 0.92.0
>Reporter: Liu Jia
>Assignee: Liu Jia
> Attachments: Design_document_for_HBase_isolation_and_allocation.pdf, 
> Design_document_for_HBase_isolation_and_allocation_Revised.pdf, 
> HBase_isolation_and_allocation_user_guide.pdf, 
> Performance_of_Table_priority.pdf, 
> Simple_YCSB_Tests_For_TablePriority_Trunk_and_0.90.4.pdf, System 
> Structure.jpg, TablePriority.patch, TablePriority_v12.patch, 
> TablePriority_v12.patch, TablePriority_v15_with_coprocessor.patch, 
> TablePriority_v16_with_coprocessor.patch, TablePriority_v17.patch, 
> TablePriority_v17.patch, TablePriority_v8_for_trunk.patch, 
> TablePriority_v8.patch, TablePriority_v8.patch, TablePrioriy_v9.patch
>
>
> The HBase isolation and allocation tool is designed to help users manage 
> cluster resource among different application and tables.
> When we have a large scale of HBase cluster with many applications running on 
> it, there will be lots of problems. In Taobao there is a cluster for many 
> departments to test their applications performance, these applications are 
> based on HBase. With one cluster which has 12 servers, there will be only one 
> application running exclusively on this server, and many other applications 
> must wait until the previous test finished.
> After we add allocation manage function to the cluster, applications can 
> share the cluster and run concurrently. Also if the Test Engineer wants to 
> make sure there is no interference, he/she can move out other tables from 
> this group.
> In groups we use table priority to allocate resource, when system is busy; we 
> can make sure high-priority tables are not affected lower-priority tables
> Different groups can have different region server configurations, some groups 
> optimized for reading can have large block cache size, and others optimized 
> for writing can have large memstore size. 
> Tables and region servers can be moved easily between groups; after changing 
> the configuration, a group can be restarted alone instead of restarting the 
> whole cluster.
> git entry : https://github.com/ICT-Ope/HBase_allocation .
> We hope our work is helpful.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-74) [performance] When a get or scan request spans multiple columns, execute the reads in parallel

2012-11-24 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-74?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503460#comment-13503460
 ] 

Otis Gospodnetic commented on HBASE-74:
---

@stack & [~sershe] - thanks.  I linked HBASE-5416, but thought it would also be 
good to set Fix Version to 0.96 so this issue gets some visibility - seems 
popular in terms of votes and watchers and that HBASE-5416 is also set for 
0.96.  However, I don't seem to have enough HBase JIRA karma for this, so if 
you think setting Fix Version would make sense, could you please do it?
 

> [performance] When a get or scan request spans multiple columns, execute the 
> reads in parallel
> --
>
> Key: HBASE-74
> URL: https://issues.apache.org/jira/browse/HBASE-74
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: Jim Kellerman
>
> When a get or scan request spans multiple columns, execute the reads in 
> parallel and use a CountDownLatch to wait for them to complete before 
> returning the results.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (HBASE-2888) Review all our metrics

2010-12-09 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12969847#action_12969847
 ] 

Otis Gospodnetic commented on HBASE-2888:
-

+1 for emitting events and letting other systems store/render them.

> Review all our metrics
> --
>
> Key: HBASE-2888
> URL: https://issues.apache.org/jira/browse/HBASE-2888
> Project: HBase
>  Issue Type: Improvement
>  Components: master
>Reporter: Jean-Daniel Cryans
> Fix For: 0.92.0
>
>
> HBase publishes a bunch of metrics, some useful some wasteful, that should be 
> improved to deliver a better ops experience. Examples:
>  - Block cache hit ratio converges at some point and stops moving
>  - fsReadLatency goes down when compactions are running
>  - storefileIndexSizeMB is the exact same number once a system is serving 
> production load
> We could use new metrics too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-06-18 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13686800#comment-13686800
 ] 

Otis Gospodnetic commented on HBASE-8755:
-

[~frankfenghua] - this is about improving *writes*, but your table shows QPS 
(Queries Per Second).  Is that QPS really Writes Per Second?  Thanks.


> A new write thread model for HLog to improve the overall HBase write 
> throughput
> ---
>
> Key: HBASE-8755
> URL: https://issues.apache.org/jira/browse/HBASE-8755
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Feng Honghua
> Attachments: HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, 
> HBASE-8755-trunk-V0.patch
>
>
> In current write model, each write handler thread (executing put()) will 
> individually go through a full 'append (hlog local buffer) => HLog writer 
> append (write to hdfs) => HLog writer sync (sync hdfs)' cycle for each write, 
> which incurs heavy race condition on updateLock and flushLock.
> The only optimization where checking if current syncTillHere > txid in 
> expectation for other thread help write/sync its own txid to hdfs and 
> omitting the write/sync actually help much less than expectation.
> Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
> proposed a new write thread model for writing hdfs sequence file and the 
> prototype implementation shows a 4X improvement for throughput (from 17000 to 
> 7+). 
> I apply this new write thread model in HLog and the performance test in our 
> test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
> RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
> even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
> write throughput then is 31002). I can provide the detailed performance test 
> results if anyone is interested.
> The change for new write thread model is as below:
>  1> All put handler threads append the edits to HLog's local pending buffer; 
> (it notifies AsyncWriter thread that there is new edits in local buffer)
>  2> All put handler threads wait in HLog.syncer() function for underlying 
> threads to finish the sync that contains its txid;
>  3> An single AsyncWriter thread is responsible for retrieve all the buffered 
> edits in HLog's local pending buffer and write to the hdfs 
> (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
> writes to hdfs that needs a sync)
>  4> An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
> to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
> that sync watermark increases)
>  5> An single AsyncNotifier thread is responsible for notifying all pending 
> put handler threads which are waiting in the HLog.syncer() function
>  6> No LogSyncer thread any more (since there is always 
> AsyncWriter/AsyncFlusher threads do the same job it does)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput

2013-06-18 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13686854#comment-13686854
 ] 

Otis Gospodnetic commented on HBASE-8755:
-

Thanks and I'm sorry about the name messup.  Feel free to mess up mine - you've 
got 15 characters to play with.  And thanks for this patch.  Crazy improvement!

> A new write thread model for HLog to improve the overall HBase write 
> throughput
> ---
>
> Key: HBASE-8755
> URL: https://issues.apache.org/jira/browse/HBASE-8755
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Reporter: Feng Honghua
> Attachments: HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, 
> HBASE-8755-trunk-V0.patch
>
>
> In current write model, each write handler thread (executing put()) will 
> individually go through a full 'append (hlog local buffer) => HLog writer 
> append (write to hdfs) => HLog writer sync (sync hdfs)' cycle for each write, 
> which incurs heavy race condition on updateLock and flushLock.
> The only optimization where checking if current syncTillHere > txid in 
> expectation for other thread help write/sync its own txid to hdfs and 
> omitting the write/sync actually help much less than expectation.
> Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi 
> proposed a new write thread model for writing hdfs sequence file and the 
> prototype implementation shows a 4X improvement for throughput (from 17000 to 
> 7+). 
> I apply this new write thread model in HLog and the performance test in our 
> test cluster shows about 3X throughput improvement (from 12150 to 31520 for 1 
> RS, from 22000 to 7 for 5 RS), the 1 RS write throughput (1K row-size) 
> even beats the one of BigTable (Precolator published in 2011 says Bigtable's 
> write throughput then is 31002). I can provide the detailed performance test 
> results if anyone is interested.
> The change for new write thread model is as below:
>  1> All put handler threads append the edits to HLog's local pending buffer; 
> (it notifies AsyncWriter thread that there is new edits in local buffer)
>  2> All put handler threads wait in HLog.syncer() function for underlying 
> threads to finish the sync that contains its txid;
>  3> An single AsyncWriter thread is responsible for retrieve all the buffered 
> edits in HLog's local pending buffer and write to the hdfs 
> (hlog.writer.append); (it notifies AsyncFlusher thread that there is new 
> writes to hdfs that needs a sync)
>  4> An single AsyncFlusher thread is responsible for issuing a sync to hdfs 
> to persist the writes by AsyncWriter; (it notifies the AsyncNotifier thread 
> that sync watermark increases)
>  5> An single AsyncNotifier thread is responsible for notifying all pending 
> put handler threads which are waiting in the HLog.syncer() function
>  6> No LogSyncer thread any more (since there is always 
> AsyncWriter/AsyncFlusher threads do the same job it does)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8836) Separate reader and writer thread pool in RegionServer, so that write throughput will not be impacted when the read load is very high

2013-06-28 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13695924#comment-13695924
 ] 

Otis Gospodnetic commented on HBASE-8836:
-

Have you run any test and have any numbers to show how much impact this has?
I'm wondering how well this works if the underlying server/IO is actually maxed 
out.  In such situation, can separating writes from reads help?


> Separate reader and writer thread pool in RegionServer, so that write 
> throughput will not be impacted when the read load is very high
> -
>
> Key: HBASE-8836
> URL: https://issues.apache.org/jira/browse/HBASE-8836
> Project: HBase
>  Issue Type: New Feature
>  Components: Performance, regionserver
>Affects Versions: 0.94.8
>Reporter: Tianying Chang
>Assignee: Tianying Chang
> Fix For: 0.94.8
>
>
> We found that when the read load on a specific RS is high, the write 
> throughput also get impacted dramatically, and even cause write data loss 
> sometimes. We want to prioritize the write by putting them in a separate 
> queue from the read request, so that slower read will not make fast write 
> wait nu-necessarily long.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7667) Support stripe compaction

2013-12-08 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13842713#comment-13842713
 ] 

Otis Gospodnetic commented on HBASE-7667:
-

Btw. is this going to get into any 0.96.x releases by any chance?  Thanks.

> Support stripe compaction
> -
>
> Key: HBASE-7667
> URL: https://issues.apache.org/jira/browse/HBASE-7667
> Project: HBase
>  Issue Type: New Feature
>  Components: Compaction
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 0.98.0, 0.99.0
>
> Attachments: Stripe compaction perf evaluation.pdf, Stripe compaction 
> perf evaluation.pdf, Stripe compaction perf evaluation.pdf, Stripe 
> compactions.pdf, Stripe compactions.pdf, Stripe compactions.pdf, Stripe 
> compactions.pdf, Using stripe compactions.pdf, Using stripe compactions.pdf, 
> Using stripe compactions.pdf, stripe-cdf.pdf
>
>
> So I was thinking about having many regions as the way to make compactions 
> more manageable, and writing the level db doc about how level db range 
> overlap and data mixing breaks seqNum sorting, and discussing it with Jimmy, 
> Matteo and Ted, and thinking about how to avoid Level DB I/O multiplication 
> factor.
> And I suggest the following idea, let's call it stripe compactions. It's a 
> mix between level db ideas and having many small regions.
> It allows us to have a subset of benefits of many regions (wrt reads and 
> compactions) without many of the drawbacks (managing and current 
> memstore/etc. limitation).
> It also doesn't break seqNum-based file sorting for any one key.
> It works like this.
> The region key space is separated into configurable number of fixed-boundary 
> stripes (determined the first time we stripe the data, see below).
> All the data from memstores is written to normal files with all keys present 
> (not striped), similar to L0 in LevelDb, or current files.
> Compaction policy does 3 types of compactions.
> First is L0 compaction, which takes all L0 files and breaks them down by 
> stripe. It may be optimized by adding more small files from different 
> stripes, but the main logical outcome is that there are no more L0 files and 
> all data is striped.
> Second is exactly similar to current compaction, but compacting one single 
> stripe. In future, nothing prevents us from applying compaction rules and 
> compacting part of the stripe (e.g. similar to current policy with rations 
> and stuff, tiers, whatever), but for the first cut I'd argue let it "major 
> compact" the entire stripe. Or just have the ratio and no more complexity.
> Finally, the third addresses the concern of the fixed boundaries causing 
> stripes to be very unbalanced.
> It's exactly like the 2nd, except it takes 2+ adjacent stripes and writes the 
> results out with different boundaries.
> There's a tradeoff here - if we always take 2 adjacent stripes, compactions 
> will be smaller but rebalancing will take ridiculous amount of I/O.
> If we take many stripes we are essentially getting into the 
> epic-major-compaction problem again. Some heuristics will have to be in place.
> In general, if, before stripes are determined, we initially let L0 grow 
> before determining the stripes, we will get better boundaries.
> Also, unless unbalancing is really large we don't need to rebalance really.
> Obviously this scheme (as well as level) is not applicable for all scenarios, 
> e.g. if timestamp is your key it completely falls apart.
> The end result:
> - many small compactions that can be spread out in time.
> - reads still read from a small number of files (one stripe + L0).
> - region splits become marvelously simple (if we could move files between 
> regions, no references would be needed).
> Main advantage over Level (for HBase) is that default store can still open 
> the files and get correct results - there are no range overlap shenanigans.
> It also needs no metadata, although we may record some for convenience.
> It also would appear to not cause as much I/O.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-8480) Embed HDFS into HBase

2013-05-02 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648045#comment-13648045
 ] 

Otis Gospodnetic commented on HBASE-8480:
-

+1
This is getting close to "HBase in a box" - 
http://search-hadoop.com/m/p68C12nb7Hn (2010):
"HBase in a box" is like "dynamic equilibrium", or "virtual reality", or "jumbo 
shrimp"... :-)


> Embed HDFS into HBase
> -
>
> Key: HBASE-8480
> URL: https://issues.apache.org/jira/browse/HBASE-8480
> Project: HBase
>  Issue Type: New Feature
>Reporter: Lars George
>
> HBase is often a bit more involved to get going. We already have the option 
> to host ZooKeeper for very small clusters. We should have the same for HDFS. 
> The idea is that it adjusts replication based on the number of nodes, i.e. 
> from 1 to 3 (the default), so that you could start with a single node and 
> grow the cluster from there. Once the cluster reaches a certain size, and the 
> admin decides to split the components, we should have a why to export the 
> proper configs/settings so that you can easily start up an external HDFS 
> and/or ZooKeeper, while updating the HBase config as well to point to the new 
> "locations".
> The goal is to start a fully operational HBase that can grow from single 
> machine to multi machine clusters with just a single daemon on each machine.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7633) Add a metric that tracks the current number of used RPC threads on the regionservers

2013-01-22 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic updated HBASE-7633:


Component/s: metrics

> Add a metric that tracks the current number of used RPC threads on the 
> regionservers
> 
>
> Key: HBASE-7633
> URL: https://issues.apache.org/jira/browse/HBASE-7633
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Joey Echeverria
>Assignee: Elliott Clark
>
> One way to detect that you're hitting a "John Wayne" disk[1] would be if we 
> could see when region servers exhausted their RPC handlers. This would also 
> be useful when tuning the cluster for your workload to make sure that reads 
> or writes were not starving the other operations out.
> [1] http://hbase.apache.org/book.html#bad.disk

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7837) Add new metrics to better monitor recovery process

2013-02-12 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic updated HBASE-7837:


Component/s: metrics

> Add new metrics to better monitor recovery process 
> ---
>
> Key: HBASE-7837
> URL: https://issues.apache.org/jira/browse/HBASE-7837
> Project: HBase
>  Issue Type: Sub-task
>  Components: metrics
>Reporter: Jeffrey Zhong
> Fix For: 0.96.0
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7340) Allow user-specified actions following region movement

2012-12-12 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic updated HBASE-7340:


Description: 
Sometimes user performs compaction after a region is moved (by balancer). We 
should provide 'hook' which lets user specify what follow-on actions to take 
after region movement.

See discussion on user mailing list under the thread 'How to know it's time for 
a major compaction?' for background information: 
http://search-hadoop.com/m/BDx4S1jMjF92&subj=How+to+know+it+s+time+for+a+major+compaction+

  was:
Sometimes user performs compaction after a region is moved (by balancer). We 
should provide 'hook' which lets user specify what follow-on actions to take 
after region movement.

See discussion on user mailing list under the thread 'How to know it's time for 
a major compaction?' for background information


> Allow user-specified actions following region movement
> --
>
> Key: HBASE-7340
> URL: https://issues.apache.org/jira/browse/HBASE-7340
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>
> Sometimes user performs compaction after a region is moved (by balancer). We 
> should provide 'hook' which lets user specify what follow-on actions to take 
> after region movement.
> See discussion on user mailing list under the thread 'How to know it's time 
> for a major compaction?' for background information: 
> http://search-hadoop.com/m/BDx4S1jMjF92&subj=How+to+know+it+s+time+for+a+major+compaction+

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8329) Limit compaction speed

2013-04-12 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13630151#comment-13630151
 ] 

Otis Gospodnetic commented on HBASE-8329:
-

Dupe of HBASE-3743 or HBASE-5867 ?

> Limit compaction speed
> --
>
> Key: HBASE-8329
> URL: https://issues.apache.org/jira/browse/HBASE-8329
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: binlijin
>
> There is no speed or resource limit for compaction,I think we should add this 
> feature especially when request burst.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-2888) Review all our metrics

2015-02-18 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14326976#comment-14326976
 ] 

Otis Gospodnetic commented on HBASE-2888:
-

This issue is >4 years old and there are so many metrics I doubt anyone 
will ever review all of them systematically... Won't Fix?

> Review all our metrics
> --
>
> Key: HBASE-2888
> URL: https://issues.apache.org/jira/browse/HBASE-2888
> Project: HBase
>  Issue Type: Improvement
>  Components: master, metrics
>Reporter: Jean-Daniel Cryans
>
> HBase publishes a bunch of metrics, some useful some wasteful, that should be 
> improved to deliver a better ops experience. Examples:
>  - Block cache hit ratio converges at some point and stops moving
>  - fsReadLatency goes down when compactions are running
>  - storefileIndexSizeMB is the exact same number once a system is serving 
> production load
> We could use new metrics too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-9604) Add metric on short-circuit reads

2015-02-18 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14326984#comment-14326984
 ] 

Otis Gospodnetic commented on HBASE-9604:
-

Isn't this a dupe of HBASE-8868?

> Add metric on short-circuit reads
> -
>
> Key: HBASE-9604
> URL: https://issues.apache.org/jira/browse/HBASE-9604
> Project: HBase
>  Issue Type: Task
>  Components: metrics
>Reporter: stack
>Assignee: stack
> Fix For: 2.0.0, 1.1.0
>
>
> Got this from a Colin message this afternoon:
> "There are HDFS statistics that HBase could be checking by calling 
> DFSInputStream#getReadStatistics.  This tells you how many of your reads have 
> been remote, local, short-circuit, etc.  You could file an HBase JIRA for 
> them to roll those up into the HBase stats. Seems like a good idea to me."



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14459) Add request and response sizes metrics

2015-09-21 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic updated HBASE-14459:
-
Issue Type: New Feature  (was: Bug)

> Add request and response sizes metrics
> --
>
> Key: HBASE-14459
> URL: https://issues.apache.org/jira/browse/HBASE-14459
> Project: HBase
>  Issue Type: New Feature
>  Components: metrics
>Affects Versions: 1.2.0
>Reporter: Sanjeev Srivatsa
>Assignee: Sanjeev Srivatsa
> Attachments: HBASE-14459-v1.patch
>
>
> Adding metrics that should be useful:
> Request size
> Response size



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11144) Filter to support scan multiple row key ranges

2014-05-26 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14008789#comment-14008789
 ] 

Otis Gospodnetic commented on HBASE-11144:
--

# Which version is this for? 0.99?  I can't tell - Fix Version field is not set.
# Can you quantify how much more efficient this is?


> Filter to support scan multiple row key ranges
> --
>
> Key: HBASE-11144
> URL: https://issues.apache.org/jira/browse/HBASE-11144
> Project: HBase
>  Issue Type: Improvement
>  Components: Filters
>Reporter: Li Jiajia
> Attachments: MultiRowRangeFilter.patch, MultiRowRangeFilter2.patch, 
> MultiRowRangeFilter3.patch
>
>
> HBase is quite efficient when scanning only one small row key range. If user 
> needs to specify multiple row key ranges in one scan, the typical solutions 
> are: 1. through FilterList which is a list of row key Filters, 2. using the 
> SQL layer over HBase to join with two table, such as hive, phoenix etc. 
> However, both solutions are inefficient. Both of them can’t utilize the range 
> info to perform fast forwarding during scan which is quite time consuming. If 
> the number of ranges are quite big (e.g. millions), join is a proper solution 
> though it is slow. However, there are cases that user wants to specify a 
> small number of ranges to scan (e.g. <1000 ranges). Both solutions can’t 
> provide satisfactory performance in such case. 
> We provide this filter (MultiRowRangeFilter) to support such use case (scan 
> multiple row key ranges), which can construct the row key ranges from user 
> specified sorted list and perform fast-forwarding during scan. Thus, the scan 
> will be quite efficient. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11220) Add listeners to ServerManager and AssignmentManager

2014-05-26 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14009229#comment-14009229
 ] 

Otis Gospodnetic commented on HBASE-11220:
--

Is this for 0.98 or 0.99?


> Add listeners to ServerManager and AssignmentManager
> 
>
> Key: HBASE-11220
> URL: https://issues.apache.org/jira/browse/HBASE-11220
> Project: HBase
>  Issue Type: Improvement
>  Components: master
>Affects Versions: 0.99.0
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
>Priority: Minor
> Attachments: HBASE-11220-v0.patch
>
>
> Add support for listeners to ServerManager and AssignmentManager.
> This will allows to get notified about servers added/removed or regions 
> added/removed/moved.
> I'm planning to use this in the MasterProcedureManager. Since we are starting 
> using the Procedures for distributed operations, we must add support for RS 
> joining or Regions moving. At the moment the operation on the "moving" set of 
> RSs is "lost".



--
This message was sent by Atlassian JIRA
(v6.2#6252)