date:20140819


 [ 
https://issues.apache.org/jira/browse/HBASE-11645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-11645:
---

Description:  Add snapshot support for MOB.  In the initial implementation, 
taking a table snapshot does not preserve the mob data.  This issue will make 
sure that when a snapshot is taken, mob data is properly preserved and is 
restorable.  (was: Add snapshot support for MOB.  In the initial 
implementation, taking a table snapshot does not preserve the mob data.  This 
issue will make sure that when a snapshot is taken, mob data is properly 
preserved and is restorable.)

 Snapshot for MOB
 

 Key: HBASE-11645
 URL: https://issues.apache.org/jira/browse/HBASE-11645
 Project: HBase
  Issue Type: Sub-task
  Components: snapshots
Reporter: Jingcheng Du
Assignee: Jingcheng Du
 Attachments: HBASE-11645-V2.diff, HBASE-11645.diff


  Add snapshot support for MOB.  In the initial implementation, taking a table 
 snapshot does not preserve the mob data.  This issue will make sure that when 
 a snapshot is taken, mob data is properly preserved and is restorable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HBASE-11776) All RegionServers crash when compact when setting TTL

2014-08-19 Thread wuchengzhi (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-11776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wuchengzhi resolved HBASE-11776.


Resolution: Duplicate

duplicate issue

 All RegionServers crash when compact when setting TTL
 -

 Key: HBASE-11776
 URL: https://issues.apache.org/jira/browse/HBASE-11776
 Project: HBase
  Issue Type: Bug
  Components: Compaction
Affects Versions: 0.96.1.1
 Environment: ubuntu 12.04
 jdk1.7.0_06
Reporter: wuchengzhi
Priority: Critical
   Original Estimate: 72h
  Remaining Estimate: 72h

 As We create the table with TTL in columnFamily, When files was selected to 
 compact and the files's KVs all expired, after this, it generate a file just 
 contains some meta-info such as trail,but without kvs(size:564bytes). (and 
 the storeFile.getReader().getMaxTimestamp() = -1)
 And then We put the data to this table so fast, so memStore will flush to 
 storefile, and cause the compact task,unexpected thing happens: the 
 storefiles's count keeps on increasing all the time.
 seeing the debug log : 
 {code:title=hbase-regionServer.log|borderStyle=solid}
 2014-08-17 15:41:02,689 DEBUG 
 [regionserver60020-smallCompactions-1408258247722] 
 regionserver.CompactSplitThread: CompactSplitThread Status: 
 compaction_queue=(0:1), split_queue=0, merge_queue=0
 2014-08-17 15:41:02,689 DEBUG 
 [regionserver60020-smallCompactions-1408258247722] 
 compactions.RatioBasedCompactionPolicy: Selecting compaction from 9 store 
 files, 0 compacting, 9 eligible, 10 blocking
 2014-08-17 15:41:02,689 INFO  
 [regionserver60020-smallCompactions-1408258247722] 
 compactions.RatioBasedCompactionPolicy: Deleting the expired store file by 
 compaction: 
 hdfs://hbase:9000/hbase/data/default/top_subchannel_2/0b47596c0bff1a60cf749cf1101eb642/s/c6392d54411a46cbb19350d706a298be
  whose maxTimeStamp is -1 while the max expired timestamp is 1408257662689
 2014-08-17 15:41:02,689 DEBUG 
 [regionserver60020-smallCompactions-1408258247722] regionserver.HStore: 
 0b47596c0bff1a60cf749cf1101eb642 - s: Initiating minor compaction
 2014-08-17 15:41:02,689 INFO  
 [regionserver60020-smallCompactions-1408258247722] regionserver.HRegion: 
 Starting compaction on s in region 
 top_subchannel_2,,1407982287422.0b47596c0bff1a60cf749cf1101eb642.
 2014-08-17 15:41:02,689 INFO  
 [regionserver60020-smallCompactions-1408258247722] regionserver.HStore: 
 Starting compaction of 1 file(s) in s of 
 top_subchannel_2,,1407982287422.0b47596c0bff1a60cf749cf1101eb642. into 
 tmpdir=hdfs://hbase:9000/hbase/data/default/top_subchannel_2/0b47596c0bff1a60cf749cf1101eb642/.tmp,
  totalSize=564
 2014-08-17 15:41:02,689 DEBUG 
 [regionserver60020-smallCompactions-1408258247722] compactions.Compactor: 
 Compacting 
 hdfs://hbase:9000/hbase/data/default/top_subchannel_2/0b47596c0bff1a60cf749cf1101eb642/s/c6392d54411a46cbb19350d706a298be,
  keycount=0, bloomtype=NONE, size=564, encoding=FAST_DIFF, seqNum=45561
 2014-08-17 15:41:02,711 INFO  
 [regionserver60020-smallCompactions-1408258247722] regionserver.StoreFile: 
 HFile Bloom filter type for f2e60ae4574a4d6eb89745d43582e9b4: NONE, but ROW 
 specified in column family configuration
 2014-08-17 15:41:02,713 DEBUG 
 [regionserver60020-smallCompactions-1408258247722] 
 regionserver.HRegionFileSystem: Committing store file 
 hdfs://hbase:9000/hbase/data/default/top_subchannel_2/0b47596c0bff1a60cf749cf1101eb642/.tmp/f2e60ae4574a4d6eb89745d43582e9b4
  as 
 hdfs://hbase:9000/hbase/data/default/top_subchannel_2/0b47596c0bff1a60cf749cf1101eb642/s/f2e60ae4574a4d6eb89745d43582e9b4
 2014-08-17 15:41:02,726 INFO  
 [regionserver60020-smallCompactions-1408258247722] regionserver.StoreFile: 
 HFile Bloom filter type for f2e60ae4574a4d6eb89745d43582e9b4: NONE, but ROW 
 specified in column family configuration
 2014-08-17 15:41:02,727 DEBUG 
 [regionserver60020-smallCompactions-1408258247722] regionserver.HStore: 
 Removing store files after compaction...
 2014-08-17 15:41:02,731 DEBUG 
 [regionserver60020-smallCompactions-1408258247722] backup.HFileArchiver: 
 Finished archiving from class 
 org.apache.hadoop.hbase.backup.HFileArchiver$FileableStoreFile, 
 file:hdfs://hbase:9000/hbase/data/default/top_subchannel_2/0b47596c0bff1a60cf749cf1101eb642/s/c6392d54411a46cbb19350d706a298be,
  to 
 hdfs://hbase:9000/hbase/archive/data/default/top_subchannel_2/0b47596c0bff1a60cf749cf1101eb642/s/c6392d54411a46cbb19350d706a298be
 2014-08-17 15:41:02,731 INFO  
 [regionserver60020-smallCompactions-1408258247722] regionserver.HStore: 
 Completed compaction of 1 file(s) in s of 
 top_subchannel_2,,1407982287422.0b47596c0bff1a60cf749cf1101eb642. into 
 f2e60ae4574a4d6eb89745d43582e9b4(size=564), total size for store is 25.8 M. 
 This selection was in queue for 0sec, and took 0sec to execute.
 2014-08-17 15:41:02,731 INFO

[jira] [Updated] (HBASE-11339) HBase MOB

2014-08-19 Thread Li Jiajia (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jiajia updated HBASE-11339:
--

Attachment: MOB user guide .docx

update the mob user guide. 

 HBase MOB
 -

 Key: HBASE-11339
 URL: https://issues.apache.org/jira/browse/HBASE-11339
 Project: HBase
  Issue Type: Umbrella
  Components: regionserver, Scanners
Reporter: Jingcheng Du
Assignee: Jingcheng Du
 Attachments: HBase MOB Design-v2.pdf, HBase MOB Design-v3.pdf, HBase 
 MOB Design-v4.pdf, HBase MOB Design.pdf, MOB user guide .docx, 
 hbase-11339-in-dev.patch


   It's quite useful to save the medium binary data like images, documents 
 into Apache HBase. Unfortunately directly saving the binary MOB(medium 
 object) to HBase leads to a worse performance since the frequent split and 
 compaction.
   In this design, the MOB data are stored in an more efficient way, which 
 keeps a high write/read performance and guarantees the data consistency in 
 Apache HBase.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HBASE-11339) HBase MOB

2014-08-19 Thread Li Jiajia (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Jiajia updated HBASE-11339:
--

Attachment: (was: MOB user guide .docx)

 HBase MOB
 -

 Key: HBASE-11339
 URL: https://issues.apache.org/jira/browse/HBASE-11339
 Project: HBase
  Issue Type: Umbrella
  Components: regionserver, Scanners
Reporter: Jingcheng Du
Assignee: Jingcheng Du
 Attachments: HBase MOB Design-v2.pdf, HBase MOB Design-v3.pdf, HBase 
 MOB Design-v4.pdf, HBase MOB Design.pdf, MOB user guide .docx, 
 hbase-11339-in-dev.patch


   It's quite useful to save the medium binary data like images, documents 
 into Apache HBase. Unfortunately directly saving the binary MOB(medium 
 object) to HBase leads to a worse performance since the frequent split and 
 compaction.
   In this design, the MOB data are stored in an more efficient way, which 
 keeps a high write/read performance and guarantees the data consistency in 
 Apache HBase.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HBASE-11757) Provide a common base abstract class for both RegionObserver and MasterObserver


 [ 
https://issues.apache.org/jira/browse/HBASE-11757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-11757:


Fix Version/s: (was: 1.0.0)
   0.99.0

 Provide a common base abstract class for both RegionObserver and 
 MasterObserver
 ---

 Key: HBASE-11757
 URL: https://issues.apache.org/jira/browse/HBASE-11757
 Project: HBase
  Issue Type: Improvement
Reporter: Andrew Purtell
Assignee: Matteo Bertozzi
 Fix For: 0.99.0, 2.0.0, 0.98.6

 Attachments: HBASE-11757-0.98-v0.patch, HBASE-11757-v0.patch


 Some security coprocessors extend both RegionObserver and MasterObserver, 
 unfortunately only one of the two can use the available base abstract class 
 implementations. Provide a common base abstract class for both the 
 RegionObserver and MasterObserver interfaces. Update current coprocessors 
 that extend both interfaces to use the new common base abstract class.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11762) Record the class name of Codec in WAL header


[ 
https://issues.apache.org/jira/browse/HBASE-11762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101985#comment-14101985
 ] 

Hudson commented on HBASE-11762:


FAILURE: Integrated in HBase-1.0 #110 (See 
[https://builds.apache.org/job/HBase-1.0/110/])
HBASE-11762 Record the class name of Codec in WAL header (tedyu: rev 
12478cded70bfe375411e110deeca26db3484b2b)
* 
hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/WALProtos.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCustomWALCellCodec.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SecureProtobufLogWriter.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogWriter.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ReaderBase.java
* hbase-protocol/src/main/protobuf/WAL.proto
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogReader.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogReaderOnSecureHLog.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SecureProtobufLogReader.java


 Record the class name of Codec in WAL header
 

 Key: HBASE-11762
 URL: https://issues.apache.org/jira/browse/HBASE-11762
 Project: HBase
  Issue Type: Task
  Components: wal
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 1.0.0, 2.0.0, 0.98.6

 Attachments: 11762-0.98.txt, 11762-v1.txt, 11762-v2.txt, 
 11762-v4.txt, 11762-v5.txt, 11762-v6.txt


 In follow-up discussion to HBASE-11620, Enis brought up this point:
 Related to this, should not we also write the CellCodec that we use in the 
 WAL header. Right now, the codec comes from the configuration which means 
 that you cannot read back the WAL files if you change the codec.
 This JIRA is to implement the above suggestion.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-4955) Use the official versions of surefire junit

[
https://issues.apache.org/jira/browse/HBASE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101987#comment-14101987
]

Nicolas Liochon commented on HBASE-4955:

I suppose by deal breaker for you mean fork vs. threads? It's not at all a
deal breaker. And may be it's actually the same thing. Surefire renamed its
configuration parameters, the new names are better. Previously, threads could
mean fork.
Here what we have today:
- small tests are executed in a single jvm, single thread. This could be
multitreaded. The idea here is that these tests are very small, so a fork is
expensive. if something else work it's fine. The issue I had initially with
fork here was OOM because I had too many JVM (the fork happens while the
previous process is still alive), but it was a lng time ago.
- all other tests are executed with a fork per test class (even if the
parameter says thread, it's actually a fork).

[~posix4e], if you have something working it's just great :-).

Use the official versions of surefire junit
-

Key: HBASE-4955
URL: https://issues.apache.org/jira/browse/HBASE-4955
Project: HBase
Issue Type: Improvement
Components: test
Affects Versions: 0.94.0, 0.98.0, 0.96.0, 0.99.0
Environment: all
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
Priority: Critical
Attachments: 4955.v1.patch, 4955.v2.patch, 4955.v2.patch,
4955.v2.patch, 4955.v2.patch, 4955.v3.patch, 4955.v3.patch, 4955.v3.patch,
4955.v4.patch, 4955.v4.patch, 4955.v4.patch, 4955.v4.patch, 4955.v4.patch,
4955.v4.patch, 4955.v5.patch, 4955.v6.patch, 4955.v7.patch, 4955.v7.patch,
4955.v8.patch, 4955.v9.patch, 8204.v4.patch

We currently use private versions for Surefire JUnit since HBASE-4763.
This JIRA traks what we need to move to official versions.
Surefire 2.11 is just out, but, after some tests, it does not contain all
what we need.
JUnit. Could be for JUnit 4.11. Issue to monitor:
https://github.com/KentBeck/junit/issues/359: fixed in our version, no
feedback for an integration on trunk
Surefire: Could be for Surefire 2.12. Issues to monitor are:
329 (category support): fixed, we use the official implementation from the
trunk
786 (@Category with forkMode=always): fixed, we use the official
implementation from the trunk
791 (incorrect elapsed time on test failure): fixed, we use the official
implementation from the trunk
793 (incorrect time in the XML report): Not fixed (reopen) on trunk, fixed on
our version.
760 (does not take into account the test method): fixed in trunk, not fixed
in our version
798 (print immediately the test class name): not fixed in trunk, not fixed in
our version
799 (Allow test parallelization when forkMode=always): not fixed in trunk,
not fixed in our version
800 (redirectTestOutputToFile not taken into account): not yet fix on trunk,
fixed on our version
800 793 are the more important to monitor, it's the only ones that are
fixed in our version but not on trunk.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-4955) Use the official versions of surefire junit


[ 
https://issues.apache.org/jira/browse/HBASE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101993#comment-14101993
 ] 

Alex Newman commented on HBASE-4955:


I am testing it on my local buildserver and I seem to be having some issues.
 org.apache.hadoop.hbase.http.TestServletFilter.testServletFilter fails with
https://gist.github.com/posix4e/4512c3e6ca49ed1a04ac

 Use the official versions of surefire  junit
 -

 Key: HBASE-4955
 URL: https://issues.apache.org/jira/browse/HBASE-4955
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0, 0.98.0, 0.96.0, 0.99.0
 Environment: all
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
Priority: Critical
 Attachments: 4955.v1.patch, 4955.v2.patch, 4955.v2.patch, 
 4955.v2.patch, 4955.v2.patch, 4955.v3.patch, 4955.v3.patch, 4955.v3.patch, 
 4955.v4.patch, 4955.v4.patch, 4955.v4.patch, 4955.v4.patch, 4955.v4.patch, 
 4955.v4.patch, 4955.v5.patch, 4955.v6.patch, 4955.v7.patch, 4955.v7.patch, 
 4955.v8.patch, 4955.v9.patch, 8204.v4.patch


 We currently use private versions for Surefire  JUnit since HBASE-4763.
 This JIRA traks what we need to move to official versions.
 Surefire 2.11 is just out, but, after some tests, it does not contain all 
 what we need.
 JUnit. Could be for JUnit 4.11. Issue to monitor:
 https://github.com/KentBeck/junit/issues/359: fixed in our version, no 
 feedback for an integration on trunk
 Surefire: Could be for Surefire 2.12. Issues to monitor are:
 329 (category support): fixed, we use the official implementation from the 
 trunk
 786 (@Category with forkMode=always): fixed, we use the official 
 implementation from the trunk
 791 (incorrect elapsed time on test failure): fixed, we use the official 
 implementation from the trunk
 793 (incorrect time in the XML report): Not fixed (reopen) on trunk, fixed on 
 our version.
 760 (does not take into account the test method): fixed in trunk, not fixed 
 in our version
 798 (print immediately the test class name): not fixed in trunk, not fixed in 
 our version
 799 (Allow test parallelization when forkMode=always): not fixed in trunk, 
 not fixed in our version
 800 (redirectTestOutputToFile not taken into account): not yet fix on trunk, 
 fixed on our version
 800  793 are the more important to monitor, it's the only ones that are 
 fixed in our version but not on trunk.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-10092) Move up on to log4j2

[
https://issues.apache.org/jira/browse/HBASE-10092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101994#comment-14101994
]

Nicolas Liochon commented on HBASE-10092:
-

The issue I had with common-logging in the past was the lack of MDC (I wanted
to log the htrace id). All other wrappers support MDC.

But naive question: would it make sense to use directly log4j2 instead of using
a wrapper?

Move up on to log4j2

Key: HBASE-10092
URL: https://issues.apache.org/jira/browse/HBASE-10092
Project: HBase
Issue Type: Sub-task
Reporter: stack
Assignee: Alex Newman
Fix For: 2.0.0

Attachments: 10092.txt, 10092v2.txt, HBASE-10092.patch

Allows logging with less friction. See http://logging.apache.org/log4j/2.x/
This rather radical transition can be done w/ minor change given they have an
adapter for apache's logging, the one we use. They also have and adapter for
slf4j so we likely can remove at least some of the 4 versions of this module
our dependencies make use of.
I made a start in attached patch but am currently stuck in maven dependency
resolve hell courtesy of our slf4j. Fixing will take some concentration and
a good net connection, an item I currently lack. Other TODOs are that will
need to fix our little log level setting jsp page -- will likely have to undo
our use of hadoop's tool here -- and the config system changes a little.
I will return to this project soon. Will bring numbers.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-4955) Use the official versions of surefire junit


[ 
https://issues.apache.org/jira/browse/HBASE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101991#comment-14101991
 ] 

Alex Newman commented on HBASE-4955:


You can watch my progress at https://github.com/Ohmdata/hbase-public/tree/4955

 Use the official versions of surefire  junit
 -

 Key: HBASE-4955
 URL: https://issues.apache.org/jira/browse/HBASE-4955
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0, 0.98.0, 0.96.0, 0.99.0
 Environment: all
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
Priority: Critical
 Attachments: 4955.v1.patch, 4955.v2.patch, 4955.v2.patch, 
 4955.v2.patch, 4955.v2.patch, 4955.v3.patch, 4955.v3.patch, 4955.v3.patch, 
 4955.v4.patch, 4955.v4.patch, 4955.v4.patch, 4955.v4.patch, 4955.v4.patch, 
 4955.v4.patch, 4955.v5.patch, 4955.v6.patch, 4955.v7.patch, 4955.v7.patch, 
 4955.v8.patch, 4955.v9.patch, 8204.v4.patch


 We currently use private versions for Surefire  JUnit since HBASE-4763.
 This JIRA traks what we need to move to official versions.
 Surefire 2.11 is just out, but, after some tests, it does not contain all 
 what we need.
 JUnit. Could be for JUnit 4.11. Issue to monitor:
 https://github.com/KentBeck/junit/issues/359: fixed in our version, no 
 feedback for an integration on trunk
 Surefire: Could be for Surefire 2.12. Issues to monitor are:
 329 (category support): fixed, we use the official implementation from the 
 trunk
 786 (@Category with forkMode=always): fixed, we use the official 
 implementation from the trunk
 791 (incorrect elapsed time on test failure): fixed, we use the official 
 implementation from the trunk
 793 (incorrect time in the XML report): Not fixed (reopen) on trunk, fixed on 
 our version.
 760 (does not take into account the test method): fixed in trunk, not fixed 
 in our version
 798 (print immediately the test class name): not fixed in trunk, not fixed in 
 our version
 799 (Allow test parallelization when forkMode=always): not fixed in trunk, 
 not fixed in our version
 800 (redirectTestOutputToFile not taken into account): not yet fix on trunk, 
 fixed on our version
 800  793 are the more important to monitor, it's the only ones that are 
 fixed in our version but not on trunk.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-4955) Use the official versions of surefire junit


[ 
https://issues.apache.org/jira/browse/HBASE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101992#comment-14101992
 ] 

Alex Newman commented on HBASE-4955:


I am testing it on my local buildserver and I seem to be having some issues.
 org.apache.hadoop.hbase.http.TestServletFilter.testServletFilter fails with
https://gist.github.com/posix4e/4512c3e6ca49ed1a04ac

On Tue, Aug 19, 2014 at 12:53 AM, Nicolas Liochon (JIRA)


 Use the official versions of surefire  junit
 -

 Key: HBASE-4955
 URL: https://issues.apache.org/jira/browse/HBASE-4955
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0, 0.98.0, 0.96.0, 0.99.0
 Environment: all
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
Priority: Critical
 Attachments: 4955.v1.patch, 4955.v2.patch, 4955.v2.patch, 
 4955.v2.patch, 4955.v2.patch, 4955.v3.patch, 4955.v3.patch, 4955.v3.patch, 
 4955.v4.patch, 4955.v4.patch, 4955.v4.patch, 4955.v4.patch, 4955.v4.patch, 
 4955.v4.patch, 4955.v5.patch, 4955.v6.patch, 4955.v7.patch, 4955.v7.patch, 
 4955.v8.patch, 4955.v9.patch, 8204.v4.patch


 We currently use private versions for Surefire  JUnit since HBASE-4763.
 This JIRA traks what we need to move to official versions.
 Surefire 2.11 is just out, but, after some tests, it does not contain all 
 what we need.
 JUnit. Could be for JUnit 4.11. Issue to monitor:
 https://github.com/KentBeck/junit/issues/359: fixed in our version, no 
 feedback for an integration on trunk
 Surefire: Could be for Surefire 2.12. Issues to monitor are:
 329 (category support): fixed, we use the official implementation from the 
 trunk
 786 (@Category with forkMode=always): fixed, we use the official 
 implementation from the trunk
 791 (incorrect elapsed time on test failure): fixed, we use the official 
 implementation from the trunk
 793 (incorrect time in the XML report): Not fixed (reopen) on trunk, fixed on 
 our version.
 760 (does not take into account the test method): fixed in trunk, not fixed 
 in our version
 798 (print immediately the test class name): not fixed in trunk, not fixed in 
 our version
 799 (Allow test parallelization when forkMode=always): not fixed in trunk, 
 not fixed in our version
 800 (redirectTestOutputToFile not taken into account): not yet fix on trunk, 
 fixed on our version
 800  793 are the more important to monitor, it's the only ones that are 
 fixed in our version but not on trunk.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-10092) Move up on to log4j2

[
https://issues.apache.org/jira/browse/HBASE-10092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101997#comment-14101997
]

Alex Newman commented on HBASE-10092:
-

The good news is that I am fairly sure unit tests will not be an issue with
log4j2. As far as using it directly, I am game. But it's a much larger change.
I think I am very close on this one.

Move up on to log4j2

Key: HBASE-10092
URL: https://issues.apache.org/jira/browse/HBASE-10092
Project: HBase
Issue Type: Sub-task
Reporter: stack
Assignee: Alex Newman
Fix For: 2.0.0

Attachments: 10092.txt, 10092v2.txt, HBASE-10092.patch

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11613) get_counter shell command is not displaying the result for counter columns.

[
https://issues.apache.org/jira/browse/HBASE-11613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102003#comment-14102003
]

Y. SREENIVASULU REDDY commented on HBASE-11613:
---

[~jmspaggi]
get_counter 't', 'r1', 'f:c1', 'dummy'
the above one is working correctly.

need to update in get_counter help command same for better understanding.

get_counter shell command is not displaying the result for counter columns.
-

Key: HBASE-11613
URL: https://issues.apache.org/jira/browse/HBASE-11613
Project: HBase
Issue Type: Bug
Components: shell
Affects Versions: 0.98.3
Reporter: Y. SREENIVASULU REDDY
Priority: Minor

perform the following opertions in HBase shell prompt.
1. create a table with one column family.
2. insert some amount of data into the table.
3. then perform increment operation on any column qualifier.
eg: incr 't', 'r1', 'f:c1'
4. then queried the get counter query,
it is throwing nocounter found message to the user.
{code}
eg:
hbase(main):010:0 get_counter 't', 'r1', 'f', 'c1'
No counter found at specified coordinates
{code}
=
and wrong message is throwing to user, while executing the get_counter query.
{code}
hbase(main):009:0 get_counter 't', 'r1', 'f'
ERROR: wrong number of arguments (3 for 4)
Here is some help for this command:
Return a counter cell value at specified table/row/column coordinates.
A cell cell should be managed with atomic increment function oh HBase
and the data should be binary encoded. Example:
hbase get_counter 'ns1:t1', 'r1', 'c1'
hbase get_counter 't1', 'r1', 'c1'
The same commands also can be run on a table reference. Suppose you had a
reference
t to table 't1', the corresponding command would be:
hbase t.get_counter 'r1', 'c1'
{code}
{code}
problem:
In example they given 3 arguments but asking 4 arguments
If run with 3 arguments it will throw error.
if run with 4 arguments No counter found at specified coordinates
message is throwing even though counter is specified.
{code}

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11610) Enhance remote meta updates


[ 
https://issues.apache.org/jira/browse/HBASE-11610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102008#comment-14102008
 ] 

Nicolas Liochon commented on HBASE-11610:
-

The perf improvement is great.
I'm not a big fan of the ThreadLocalHTableInterface threadLocalHTable. It's 
often difficult to maintain and test.

If I understand well the issue if that the put is synchronous, and all the 
threads are queueing?

Should we use something like

  HConnection#processBatchCallback(List? extends Row list,
  final TableName tableName,
  ExecutorService pool,
  Object[] results,
  Batch.CallbackR callback) throws IOException, InterruptedException;

instead of using HTable? HConnection is thread safe, so there is no sync needed.
(ok it's deprecated, but if it saves this kind of hack may be we need to review 
our point of view).


 Enhance remote meta updates
 ---

 Key: HBASE-11610
 URL: https://issues.apache.org/jira/browse/HBASE-11610
 Project: HBase
  Issue Type: Sub-task
Reporter: Jimmy Xiang
Assignee: Virag Kothari
 Attachments: HBASE-11610.patch


 Currently, if the meta region is on a regionserver instead of the master, 
 meta update is synchronized on one HTable instance. We should be able to do 
 better.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11742) Backport HBASE-7987 and HBASE-11185 to 0.98


[ 
https://issues.apache.org/jira/browse/HBASE-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102006#comment-14102006
 ] 

Matteo Bertozzi commented on HBASE-11742:
-

+1 looks good to me

 Backport HBASE-7987 and HBASE-11185 to 0.98
 ---

 Key: HBASE-11742
 URL: https://issues.apache.org/jira/browse/HBASE-11742
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce, snapshots
Affects Versions: 0.98.5
Reporter: Esteban Gutierrez
Assignee: Esteban Gutierrez
 Fix For: 0.98.6

 Attachments: HBASE-11742.v0.patch, HBASE-11742.v1.patch


 HBASE-7987 improves how snapshots are handled via a manifest file. This 
 requires reverting HBASE-11360 since introduces an alternate functionality 
 that is not compatible with HBASE-7987.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-4955) Use the official versions of surefire junit


[ 
https://issues.apache.org/jira/browse/HBASE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102017#comment-14102017
 ] 

Nicolas Liochon commented on HBASE-4955:


It's different from what I was having a few months ago on the Apache build: my 
tests were just hanging. If it's the only test that fails it's a good progress 
already.

 Use the official versions of surefire  junit
 -

 Key: HBASE-4955
 URL: https://issues.apache.org/jira/browse/HBASE-4955
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0, 0.98.0, 0.96.0, 0.99.0
 Environment: all
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
Priority: Critical
 Attachments: 4955.v1.patch, 4955.v2.patch, 4955.v2.patch, 
 4955.v2.patch, 4955.v2.patch, 4955.v3.patch, 4955.v3.patch, 4955.v3.patch, 
 4955.v4.patch, 4955.v4.patch, 4955.v4.patch, 4955.v4.patch, 4955.v4.patch, 
 4955.v4.patch, 4955.v5.patch, 4955.v6.patch, 4955.v7.patch, 4955.v7.patch, 
 4955.v8.patch, 4955.v9.patch, 8204.v4.patch


 We currently use private versions for Surefire  JUnit since HBASE-4763.
 This JIRA traks what we need to move to official versions.
 Surefire 2.11 is just out, but, after some tests, it does not contain all 
 what we need.
 JUnit. Could be for JUnit 4.11. Issue to monitor:
 https://github.com/KentBeck/junit/issues/359: fixed in our version, no 
 feedback for an integration on trunk
 Surefire: Could be for Surefire 2.12. Issues to monitor are:
 329 (category support): fixed, we use the official implementation from the 
 trunk
 786 (@Category with forkMode=always): fixed, we use the official 
 implementation from the trunk
 791 (incorrect elapsed time on test failure): fixed, we use the official 
 implementation from the trunk
 793 (incorrect time in the XML report): Not fixed (reopen) on trunk, fixed on 
 our version.
 760 (does not take into account the test method): fixed in trunk, not fixed 
 in our version
 798 (print immediately the test class name): not fixed in trunk, not fixed in 
 our version
 799 (Allow test parallelization when forkMode=always): not fixed in trunk, 
 not fixed in our version
 800 (redirectTestOutputToFile not taken into account): not yet fix on trunk, 
 fixed on our version
 800  793 are the more important to monitor, it's the only ones that are 
 fixed in our version but not on trunk.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11762) Record the class name of Codec in WAL header


[ 
https://issues.apache.org/jira/browse/HBASE-11762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102037#comment-14102037
 ] 

Hudson commented on HBASE-11762:


FAILURE: Integrated in HBase-TRUNK #5409 (See 
[https://builds.apache.org/job/HBase-TRUNK/5409/])
HBASE-11762 Record the class name of Codec in WAL header (tedyu: rev 
fd4dfb489aa4100b9bd204ad70e4ae590db93b32)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SecureProtobufLogReader.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCustomWALCellCodec.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogReaderOnSecureHLog.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SecureProtobufLogWriter.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ReaderBase.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogReader.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
* 
hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/WALProtos.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java
* hbase-protocol/src/main/protobuf/WAL.proto
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogWriter.java


 Record the class name of Codec in WAL header
 

 Key: HBASE-11762
 URL: https://issues.apache.org/jira/browse/HBASE-11762
 Project: HBase
  Issue Type: Task
  Components: wal
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 1.0.0, 2.0.0, 0.98.6

 Attachments: 11762-0.98.txt, 11762-v1.txt, 11762-v2.txt, 
 11762-v4.txt, 11762-v5.txt, 11762-v6.txt


 In follow-up discussion to HBASE-11620, Enis brought up this point:
 Related to this, should not we also write the CellCodec that we use in the 
 WAL header. Right now, the codec comes from the configuration which means 
 that you cannot read back the WAL files if you change the codec.
 This JIRA is to implement the above suggestion.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-10728) get_counter value is never used.


[ 
https://issues.apache.org/jira/browse/HBASE-10728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102051#comment-14102051
 ] 

Y. SREENIVASULU REDDY commented on HBASE-10728:
---

+1 for 0.98 patch

while commit please handle this comment
https://issues.apache.org/jira/browse/HBASE-11613?focusedCommentId=14092624page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14092624

 get_counter value is never used.
 

 Key: HBASE-10728
 URL: https://issues.apache.org/jira/browse/HBASE-10728
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.2, 0.98.1, 0.99.0
Reporter: Jean-Marc Spaggiari
Assignee: Jean-Marc Spaggiari
 Attachments: HBASE-10728-v0-0.96.patch, HBASE-10728-v0-0.98.patch, 
 HBASE-10728-v0-trunk.patch, HBASE-10728-v1-0.96.patch, 
 HBASE-10728-v1-0.98.patch, HBASE-10728-v1-trunk.patch, 
 HBASE-10728-v2-trunk.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11613) get_counter shell command is not displaying the result for counter columns.

[
https://issues.apache.org/jira/browse/HBASE-11613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102052#comment-14102052
]

Y. SREENIVASULU REDDY commented on HBASE-11613:
---

I verified the patch for issue HBASE-10728 in 0.98.x version it is working fine
Resolve issue setting to duplicate.

get_counter shell command is not displaying the result for counter columns.
-

Key: HBASE-11613
URL: https://issues.apache.org/jira/browse/HBASE-11613
Project: HBase
Issue Type: Bug
Components: shell
Affects Versions: 0.98.3
Reporter: Y. SREENIVASULU REDDY
Priority: Minor

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (HBASE-11613) get_counter shell command is not displaying the result for counter columns.

[
https://issues.apache.org/jira/browse/HBASE-11613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Y. SREENIVASULU REDDY reassigned HBASE-11613:
-

Assignee: Y. SREENIVASULU REDDY

get_counter shell command is not displaying the result for counter columns.
-

Key: HBASE-11613
URL: https://issues.apache.org/jira/browse/HBASE-11613
Project: HBase
Issue Type: Bug
Components: shell
Affects Versions: 0.98.3
Reporter: Y. SREENIVASULU REDDY
Assignee: Y. SREENIVASULU REDDY
Priority: Minor

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HBASE-11613) get_counter shell command is not displaying the result for counter columns.

[
https://issues.apache.org/jira/browse/HBASE-11613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Y. SREENIVASULU REDDY resolved HBASE-11613.
---

Resolution: Duplicate

get_counter shell command is not displaying the result for counter columns.
-

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11728) Data loss while scanning using PREFIX_TREE DATA-BLOCK-ENCODING

2014-08-19 Thread wuchengzhi (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-11728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102063#comment-14102063
 ] 

wuchengzhi commented on HBASE-11728:


[~ram_krish]
ok,i got it, we will try to upgrade the version, thanks for your remind.
if I just replace the lastest prefix-tree-xxx.jar , can it work?

 Data loss while scanning using PREFIX_TREE DATA-BLOCK-ENCODING
 --

 Key: HBASE-11728
 URL: https://issues.apache.org/jira/browse/HBASE-11728
 Project: HBase
  Issue Type: Bug
  Components: Scanners
Affects Versions: 0.96.1.1, 0.98.4
 Environment: ubuntu12 
 hadoop-2.2.0
 Hbase-0.96.1.1
 SUN-JDK(1.7.0_06-b24)
Reporter: wuchengzhi
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0, 2.0.0, 0.98.6

 Attachments: 29cb562fad564b468ea9d61a2d60e8b0, HBASE-11728.patch, 
 HBASE-11728_1.patch, HBASE-11728_2.patch, HBASE-11728_3.patch, 
 HBASE-11728_4.patch, HFileAnalys.java, TestPrefixTree.java

   Original Estimate: 72h
  Remaining Estimate: 72h

 In Scan case, i prepare some data as beflow:
 Table Desc (Using the prefix-tree encoding) :
 'prefix_tree_test', {NAME = 'cf_1', DATA_BLOCK_ENCODING = 'PREFIX_TREE', 
 TTL = '15552000'}
 and i put 5 rows as:
 (RowKey , Qualifier, Value)
 'a-b-0-0', 'qf_1', 'c1-value'
 'a-b-A-1', 'qf_1', 'c1-value'
 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value'
 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2'
 'a-b-B-2-1402397300-1402416535', 'qf_2', 'c2-value-3'
 so i try to scan the rowKey between 'a-b-A-1' and 'a-b-A-1:' , i and got the 
 corret result:
 Test 1: 
 Scan scan = new Scan();
 scan.setStartRow(a-b-A-1.getBytes());
 scan.setStopRow(a-b-A-1:.getBytes());
 --
 'a-b-A-1', 'qf_1', 'c1-value'
 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value'
 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2'
 and then i try next , scan to addColumn
 Test2:
 Scan scan = new Scan();
 scan.addColumn(Bytes.toBytes(cf_1) ,  Bytes.toBytes(qf_2));
 scan.setStartRow(a-b-A-1.getBytes());
 scan.setStopRow(a-b-A-1:.getBytes());
 --
 except:
 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value'
 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2'
 but actually i got nonthing. Then i update the addColumn for 
 scan.addColumn(Bytes.toBytes(cf_1) ,  Bytes.toBytes(qf_1)); and i got the 
 expected result 'a-b-A-1', 'qf_1', 'c1-value' as well.
 then i do more testing...  i update the case to modify the startRow greater 
 than the 'a-b-A-1' 
 Test3:
 Scan scan = new Scan();
 scan.setStartRow(a-b-A-1-.getBytes());
 scan.setStopRow(a-b-A-1:.getBytes());
 --
 except:
 'a-b-A-1-1402329600-1402396277', 'qf_2', 'c2-value'
 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2'
 but actually  i got nothing again. i modify the start row greater than 
 'a-b-A-1-1402329600-1402396277'
 Scan scan = new Scan();
 scan.setStartRow(a-b-A-1-140239.getBytes());
 scan.setStopRow(a-b-A-1:.getBytes());
 and i got the expect row as well:
 'a-b-A-1-1402397227-1402415999', 'qf_2', 'c2-value-2'
 So, i think it may be a bug in the prefix-tree encoding.It happens after the 
 data flush to the storefile, and it's ok when the data in mem-store.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11762) Record the class name of Codec in WAL header


[ 
https://issues.apache.org/jira/browse/HBASE-11762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102087#comment-14102087
 ] 

Hudson commented on HBASE-11762:


SUCCESS: Integrated in HBase-0.98 #457 (See 
[https://builds.apache.org/job/HBase-0.98/457/])
HBASE-11762 Record the class name of Codec in WAL header (tedyu: rev 
3f38af605f3bcd3f61babc5e75b05ac7490e839e)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestCustomWALCellCodec.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogReader.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SecureProtobufLogWriter.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SequenceFileLogReader.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ProtobufLogWriter.java
* 
hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/WALProtos.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/SecureProtobufLogReader.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/ReaderBase.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogReaderOnSecureHLog.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/WALCellCodec.java
* hbase-protocol/src/main/protobuf/WAL.proto


 Record the class name of Codec in WAL header
 

 Key: HBASE-11762
 URL: https://issues.apache.org/jira/browse/HBASE-11762
 Project: HBase
  Issue Type: Task
  Components: wal
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 1.0.0, 2.0.0, 0.98.6

 Attachments: 11762-0.98.txt, 11762-v1.txt, 11762-v2.txt, 
 11762-v4.txt, 11762-v5.txt, 11762-v6.txt


 In follow-up discussion to HBASE-11620, Enis brought up this point:
 Related to this, should not we also write the CellCodec that we use in the 
 WAL header. Right now, the codec comes from the configuration which means 
 that you cannot read back the WAL files if you change the codec.
 This JIRA is to implement the above suggestion.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HBASE-11696) Make CombinedBlockCache resizable.


 [ 
https://issues.apache.org/jira/browse/HBASE-11696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-11696:
---

   Resolution: Fixed
Fix Version/s: 0.99.0
 Release Note: CombinedBlockCache is made resizable. See HBASE-5349 for 
auto resizing feature. On resize of this block cache, the L1 cache (ie. LRU 
cache) will get resized
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

 Make CombinedBlockCache resizable.
 --

 Key: HBASE-11696
 URL: https://issues.apache.org/jira/browse/HBASE-11696
 Project: HBase
  Issue Type: Improvement
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 0.99.0, 2.0.0

 Attachments: HBASE-11696.patch


 HBASE-5349 adds auto tuning of memstore heap and block cache heap. Block 
 cache needs to be resizable in order for this. CombinedBlockCache is not 
 marked resizable now. We can make this. On resize the L1 cache (ie. LRU 
 cache) can get resized.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11339) HBase MOB

[
https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102218#comment-14102218
]

Jonathan Hsieh commented on HBASE-11339:

[~jiajia], thanks for the update to the user guide. I think it has the key
details points (the whats) needed for a user who already understands what a MOB
is and is for. We should add some context for users (the why's and the bigger
picture) that aren't familiar with it thought but adding some background into
this user doc. We'll eventually fold into the ref guide here[1].

Let me provide a quick draft that we could build off of.

Before Bullet we should have some info (this is a paraphrased version of the
design doc's intro.

{quote}
Data comes in many sizes, and it is convenient to save the binary data like
images, documents into the HBase. While HBase can handle binary objects with
cells that are 1 byte to 10MB long, HBase's normal read and write paths are
optimized for values smaller than 100KB in size. When HBase deals with large
numbers of values 100kb and up to ~10MB of data, it encounters performance
degradations due to write amplification caused by splits and compactions.
HBase 2.0+ has added support for better managing large numbers of *Medium
Objects* (MOBs) that maintains the same high performance, strongly
consistently characteristics with low operational overhead.

To enable the feature, one must enable and config the mob components in each
region server and enable the mob feature on particular column families during
table creation or table alter. Also in the preview version of this feature,
the admin must setup periodic processes that re-optimizes the layout of mob
data.

Section: Enabling and Configuring the mob feature on region servers.

Need to enable feature in flushes and compactions. Tuning settings on caches.

user doc bullet 1. edit hbase-site...
user doc bullet 7. mob cache

Would be nice to have an examples of doing this from the shell -- an example of
creating a table with mob on a cf, and an example of a table alter that changes
a cf to use the mob path.

Section: Mob management

The mob feature introduces a new read and write path to hbase and in its
current incarnation requires external tools for housekeeping and
reoptimization. There are two tools introduced -- the expiredMobFileCleaner
for handling ttls and time based expiry of data, and the sweep tool for
coalescing small mob files or mob files with many deletions or updates.

user doc bullet 8.

Section: Enabling the mob feature on user tables

This can be done when creating a table or when altering a table

user doc bullet 2 (set cf with mob)
user doc bullet 6 (threshold size)

To a client, mob cells act just like normal cells.

user doc bullet 3 put
user doc bullet 4 scan

There is a special scanner mode users can use to read the raw values

user doc bullet 5.

{quote}

[1] http://hbase.apache.org/book.html

HBase MOB
-

Key: HBASE-11339
URL: https://issues.apache.org/jira/browse/HBASE-11339
Project: HBase
Issue Type: Umbrella
Components: regionserver, Scanners
Reporter: Jingcheng Du
Assignee: Jingcheng Du
Attachments: HBase MOB Design-v2.pdf, HBase MOB Design-v3.pdf, HBase
MOB Design-v4.pdf, HBase MOB Design.pdf, MOB user guide .docx,
hbase-11339-in-dev.patch

It's quite useful to save the medium binary data like images, documents
into Apache HBase. Unfortunately directly saving the binary MOB(medium
object) to HBase leads to a worse performance since the frequent split and
compaction.
In this design, the MOB data are stored in an more efficient way, which
keeps a high write/read performance and guarantees the data consistency in
Apache HBase.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (HBASE-11772) Bulk load mvcc and seqId issues with native hfiles

2014-08-19 Thread Jean-Marc Spaggiari (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-11772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102226#comment-14102226
 ] 

Jean-Marc Spaggiari edited comment on HBASE-11772 at 8/19/14 2:09 PM:
--

{code}
   /**
-   * @return true if this storefile was created by HFileOutputFormat
+   * @return true if this storefile was created by bulk load.
* for a bulk load.
*/
{code}

You might want to remove the for a bulk load line too.


was (Author: jmspaggi):
{quote}
   /**
-   * @return true if this storefile was created by HFileOutputFormat
+   * @return true if this storefile was created by bulk load.
* for a bulk load.
*/
{quote}

You might want to remove the for a bulk load line too.

 Bulk load mvcc and seqId issues with native hfiles
 --

 Key: HBASE-11772
 URL: https://issues.apache.org/jira/browse/HBASE-11772
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.5
Reporter: Jerry He
Assignee: Jerry He
Priority: Critical
 Fix For: 0.98.6

 Attachments: HBASE-11772-0.98.patch


 There are mvcc and seqId issues when bulk load native hfiles -- meaning 
 hfiles that are direct file copy-out from hbase, not from HFileOutputFormat 
 job.
 There are differences between these two types of hfiles.
 Native hfiles have possible non-zero MAX_MEMSTORE_TS_KEY value and non-zero 
 mvcc values in cells. 
 Native hfiles also have MAX_SEQ_ID_KEY.
 Native hfiles do not have BULKLOAD_TIME_KEY.
 Here are a couple of problems I observed when bulk load native hfiles.
 1.  Cells in newly bulk loaded hfiles can be invisible to scan.
 It is easy to re-create.
 Bulk load a native hfile that has a larger mvcc value in cells, e.g 10
 If the current readpoint when initiating a scan is less than 10, the cells in 
 the new hfile are skipped, thus become invisible.
 We don't reset the readpoint of a region after bulk load.
 2. The current StoreFile.isBulkLoadResult() is implemented as:
 {code}
 return metadataMap.containsKey(BULKLOAD_TIME_KEY)
 {code}
 which does not detect bulkloaded native hfiles.
 3. Another observed problem is possible data loss during log recovery. 
 It is similar to HBASE-10958 reported by [~jdcryans]. Borrow the re-create 
 steps from HBASE-10958.
 1) Create an empty table
 2) Put one row in it (let's say it gets seqid 1)
 3) Bulk load one native hfile with large seqId ( e.g. 100). The native hfile 
 can be obtained by copying out from existing table.
 4) Kill the region server that holds the table's region.
 Scan the table once the region is made available again. The first row, at 
 seqid 1, will be missing since the HFile with seqid 100 makes us believe that 
 everything that came before it was flushed. 
 The problem 3 is probably related to 2. We will be ok if we get the appended 
 seqId during bulk load instead of 100 from inside the file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11772) Bulk load mvcc and seqId issues with native hfiles

2014-08-19 Thread Jean-Marc Spaggiari (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-11772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102226#comment-14102226
 ] 

Jean-Marc Spaggiari commented on HBASE-11772:
-

{quote}
   /**
-   * @return true if this storefile was created by HFileOutputFormat
+   * @return true if this storefile was created by bulk load.
* for a bulk load.
*/
{quote}

You might want to remove the for a bulk load line too.

 Bulk load mvcc and seqId issues with native hfiles
 --

 Key: HBASE-11772
 URL: https://issues.apache.org/jira/browse/HBASE-11772
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.5
Reporter: Jerry He
Assignee: Jerry He
Priority: Critical
 Fix For: 0.98.6

 Attachments: HBASE-11772-0.98.patch


 There are mvcc and seqId issues when bulk load native hfiles -- meaning 
 hfiles that are direct file copy-out from hbase, not from HFileOutputFormat 
 job.
 There are differences between these two types of hfiles.
 Native hfiles have possible non-zero MAX_MEMSTORE_TS_KEY value and non-zero 
 mvcc values in cells. 
 Native hfiles also have MAX_SEQ_ID_KEY.
 Native hfiles do not have BULKLOAD_TIME_KEY.
 Here are a couple of problems I observed when bulk load native hfiles.
 1.  Cells in newly bulk loaded hfiles can be invisible to scan.
 It is easy to re-create.
 Bulk load a native hfile that has a larger mvcc value in cells, e.g 10
 If the current readpoint when initiating a scan is less than 10, the cells in 
 the new hfile are skipped, thus become invisible.
 We don't reset the readpoint of a region after bulk load.
 2. The current StoreFile.isBulkLoadResult() is implemented as:
 {code}
 return metadataMap.containsKey(BULKLOAD_TIME_KEY)
 {code}
 which does not detect bulkloaded native hfiles.
 3. Another observed problem is possible data loss during log recovery. 
 It is similar to HBASE-10958 reported by [~jdcryans]. Borrow the re-create 
 steps from HBASE-10958.
 1) Create an empty table
 2) Put one row in it (let's say it gets seqid 1)
 3) Bulk load one native hfile with large seqId ( e.g. 100). The native hfile 
 can be obtained by copying out from existing table.
 4) Kill the region server that holds the table's region.
 Scan the table once the region is made available again. The first row, at 
 seqid 1, will be missing since the HFile with seqid 100 makes us believe that 
 everything that came before it was flushed. 
 The problem 3 is probably related to 2. We will be ok if we get the appended 
 seqId during bulk load instead of 100 from inside the file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11774) Avoid allocating unnecessary tag iterators


[ 
https://issues.apache.org/jira/browse/HBASE-11774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102282#comment-14102282
 ] 

Anoop Sam John commented on HBASE-11774:


FYI..  I will commit HBASE-11553 in some time. I am incorporating the changes 
from this patch to visibility classes. 
bq.CellUtil.tagsIterator() used without the tags length check in 
VisibilityUtils also. Might be good to fix there also?
This part also I am fixing.
You can just avoid the visibility classes from the patch.

 Avoid allocating unnecessary tag iterators
 --

 Key: HBASE-11774
 URL: https://issues.apache.org/jira/browse/HBASE-11774
 Project: HBase
  Issue Type: Improvement
Reporter: Andrew Purtell
Assignee: Andrew Purtell
Priority: Minor
 Fix For: 0.99.0, 2.0.0, 0.98.6

 Attachments: HBASE-11774.patch


 We can avoid an unnecessary object allocation, sometimes in hot code paths, 
 by not creating a tag iterator if the cell's tag area is of length zero, 
 signifying no tags present.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11774) Avoid allocating unnecessary tag iterators


[ 
https://issues.apache.org/jira/browse/HBASE-11774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102283#comment-14102283
 ] 

Anoop Sam John commented on HBASE-11774:


FYI..  I will commit HBASE-11553 in some time. I am incorporating the changes 
from this patch to visibility classes. 
bq.CellUtil.tagsIterator() used without the tags length check in 
VisibilityUtils also. Might be good to fix there also?
This part also I am fixing.
You can just avoid the visibility classes from the patch.

 Avoid allocating unnecessary tag iterators
 --

 Key: HBASE-11774
 URL: https://issues.apache.org/jira/browse/HBASE-11774
 Project: HBase
  Issue Type: Improvement
Reporter: Andrew Purtell
Assignee: Andrew Purtell
Priority: Minor
 Fix For: 0.99.0, 2.0.0, 0.98.6

 Attachments: HBASE-11774.patch


 We can avoid an unnecessary object allocation, sometimes in hot code paths, 
 by not creating a tag iterator if the cell's tag area is of length zero, 
 signifying no tags present.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11683) Metrics for MOB

[
https://issues.apache.org/jira/browse/HBASE-11683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102310#comment-14102310
]

Jonathan Hsieh commented on HBASE-11683:

{quote}
I'm thinking how to implement the #2 mob reads, is it okay to record how many
times the scanner read from the mob files? I don't see HBase has metrics in the
normal scanner, is it necessary for the mob read? Please advise. Thanks.
{quote}

I'm thinking about this from the point of view of someone trying to decide if
they should use the mob or an operator verifying that the mobs are working.

Flushes should cover the write side metrics. Ideally i'd want to know how much
IO i'm saving or would save by using the mob feature and this helps me
understand that. We'd probably want some compaction related mob counts as well.
(# cells converted to mob, # converted from mob).

However, I really do care about the reads side as well. It would be great
actually if we got general sizes statistics for the cells when reading and
stats on the mob caches as well. There are two places I'm thinking the data
could be collected:

* Adding a counter every time the mob dereferences cell (specific to mob)
* Adding cell size count buckets that the server tracks when a Result is sent
from a get/scan.

Metrics for MOB
---

Key: HBASE-11683
URL: https://issues.apache.org/jira/browse/HBASE-11683
Project: HBase
Issue Type: Sub-task
Components: regionserver, Scanners
Affects Versions: 2.0.0
Reporter: Jonathan Hsieh
Assignee: Jingcheng Du
Attachments: HBASE-11683.diff

We need to make sure to capture metrics about mobs.
Some basic ones include:
# of mob writes
# of mob reads
# avg size of mob (?)
# mob files
# of mob compactions / sweeps

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-10092) Move up on to log4j2


[ 
https://issues.apache.org/jira/browse/HBASE-10092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102324#comment-14102324
 ] 

Nicolas Liochon commented on HBASE-10092:
-

Yeah, using it directly is more in the scope of HBASE-11334. Putting a comment 
there.


 Move up on to log4j2
 

 Key: HBASE-10092
 URL: https://issues.apache.org/jira/browse/HBASE-10092
 Project: HBase
  Issue Type: Sub-task
Reporter: stack
Assignee: Alex Newman
 Fix For: 2.0.0

 Attachments: 10092.txt, 10092v2.txt, HBASE-10092.patch


 Allows logging with less friction.  See http://logging.apache.org/log4j/2.x/  
 This rather radical transition can be done w/ minor change given they have an 
 adapter for apache's logging, the one we use.  They also have and adapter for 
 slf4j so we likely can remove at least some of the 4 versions of this module 
 our dependencies make use of.
 I made a start in attached patch but am currently stuck in maven dependency 
 resolve hell courtesy of our slf4j.  Fixing will take some concentration and 
 a good net connection, an item I currently lack.  Other TODOs are that will 
 need to fix our little log level setting jsp page -- will likely have to undo 
 our use of hadoop's tool here -- and the config system changes a little.
 I will return to this project soon.  Will bring numbers.
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11334) Migrate to SLF4J as logging interface


[ 
https://issues.apache.org/jira/browse/HBASE-11334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102325#comment-14102325
 ] 

Nicolas Liochon commented on HBASE-11334:
-

Would it make sense to use directly log4j2?

 Migrate to SLF4J as logging interface
 -

 Key: HBASE-11334
 URL: https://issues.apache.org/jira/browse/HBASE-11334
 Project: HBase
  Issue Type: Improvement
Reporter: jay vyas

 Migrating to new log implementations is underway as in HBASE-10092. 
 Next step would be to abstract them so that the hadoop community can 
 standardize on a logging layer that is easy for end users to tune.
 Simplest way to do this is use SLF4j APIs as the main interface and binding/ 
 implementation details in the docs as necessary.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11696) Make CombinedBlockCache resizable.


[ 
https://issues.apache.org/jira/browse/HBASE-11696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102379#comment-14102379
 ] 

Hudson commented on HBASE-11696:


SUCCESS: Integrated in HBase-TRUNK #5410 (See 
[https://builds.apache.org/job/HBase-TRUNK/5410/])
HBASE-11696 Make CombinedBlockCache resizable. (anoopsamjohn: rev 
3c13e8f3ced049431cff1f9f2c0baa92a1ca5c24)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CombinedBlockCache.java


 Make CombinedBlockCache resizable.
 --

 Key: HBASE-11696
 URL: https://issues.apache.org/jira/browse/HBASE-11696
 Project: HBase
  Issue Type: Improvement
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 0.99.0, 2.0.0

 Attachments: HBASE-11696.patch


 HBASE-5349 adds auto tuning of memstore heap and block cache heap. Block 
 cache needs to be resizable in order for this. CombinedBlockCache is not 
 marked resizable now. We can make this. On resize the L1 cache (ie. LRU 
 cache) can get resized.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Issue Comment Deleted] (HBASE-11774) Avoid allocating unnecessary tag iterators


 [ 
https://issues.apache.org/jira/browse/HBASE-11774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anoop Sam John updated HBASE-11774:
---

Comment: was deleted

(was: FYI..  I will commit HBASE-11553 in some time. I am incorporating the 
changes from this patch to visibility classes. 
bq.CellUtil.tagsIterator() used without the tags length check in 
VisibilityUtils also. Might be good to fix there also?
This part also I am fixing.
You can just avoid the visibility classes from the patch.)

 Avoid allocating unnecessary tag iterators
 --

 Key: HBASE-11774
 URL: https://issues.apache.org/jira/browse/HBASE-11774
 Project: HBase
  Issue Type: Improvement
Reporter: Andrew Purtell
Assignee: Andrew Purtell
Priority: Minor
 Fix For: 0.99.0, 2.0.0, 0.98.6

 Attachments: HBASE-11774.patch


 We can avoid an unnecessary object allocation, sometimes in hot code paths, 
 by not creating a tag iterator if the cell's tag area is of length zero, 
 signifying no tags present.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HBASE-11773) Wrong field used for protobuf construction in RegionStates.


 [ 
https://issues.apache.org/jira/browse/HBASE-11773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-11773:
---

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Pushed to 0.98+. Thanks for the patch [~octo47]!

 Wrong field used for protobuf construction in RegionStates.
 ---

 Key: HBASE-11773
 URL: https://issues.apache.org/jira/browse/HBASE-11773
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Reporter: Andrey Stepachev
Assignee: Andrey Stepachev
 Fix For: 0.99.0, 2.0.0, 0.98.6

 Attachments: HBASE-11773-0.98.patch, HBASE-11773.patch


 Protobuf  Java Pojo converter uses wrong field for converted enum 
 construction (actually default value of protobuf message used).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HBASE-11657) Put HTable region methods in an interface


 [ 
https://issues.apache.org/jira/browse/HBASE-11657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carter updated HBASE-11657:
---

Status: Open  (was: Patch Available)

 Put HTable region methods in an interface
 -

 Key: HBASE-11657
 URL: https://issues.apache.org/jira/browse/HBASE-11657
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.99.0
Reporter: Carter
Assignee: Carter
 Fix For: 0.99.0

 Attachments: HBASE_11657.patch, HBASE_11657_v2.patch, 
 HBASE_11657_v3.patch, HBASE_11657_v3.patch, HBASE_11657_v4.patch


 Most of the HTable methods are now abstracted by HTableInterface, with the 
 notable exception of the following methods that pertain to region metadata:
 {code}
 HRegionLocation getRegionLocation(final String row)
 HRegionLocation getRegionLocation(final byte [] row)
 HRegionLocation getRegionLocation(final byte [] row, boolean reload)
 byte [][] getStartKeys()
 byte[][] getEndKeys()
 Pairbyte[][],byte[][] getStartEndKeys()
 void clearRegionCache()
 {code}
 and a default scope method which maybe should be bundled with the others:
 {code}
 ListRegionLocations listRegionLocations()
 {code}
 Since the consensus seems to be that these would muddy HTableInterface with 
 non-core functionality, where should it go?  MapReduce looks up the region 
 boundaries, so it needs to be exposed somewhere.
 Let me throw out a straw man to start the conversation.  I propose:
 {code}
 org.apache.hadoop.hbase.client.HRegionInterface
 {code}
 Have HTable implement this interface.  Also add these methods to HConnection:
 {code}
 HRegionInterface getTableRegion(TableName tableName)
 HRegionInterface getTableRegion(TableName tableName, ExecutorService pool)
 {code}
 [~stack], [~ndimiduk], [~enis], thoughts?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HBASE-11657) Put HTable region methods in an interface


 [ 
https://issues.apache.org/jira/browse/HBASE-11657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carter updated HBASE-11657:
---

Attachment: HBASE_11657_v5.patch

 Put HTable region methods in an interface
 -

 Key: HBASE-11657
 URL: https://issues.apache.org/jira/browse/HBASE-11657
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.99.0
Reporter: Carter
Assignee: Carter
 Fix For: 0.99.0

 Attachments: HBASE_11657.patch, HBASE_11657_v2.patch, 
 HBASE_11657_v3.patch, HBASE_11657_v3.patch, HBASE_11657_v4.patch, 
 HBASE_11657_v5.patch


 Most of the HTable methods are now abstracted by HTableInterface, with the 
 notable exception of the following methods that pertain to region metadata:
 {code}
 HRegionLocation getRegionLocation(final String row)
 HRegionLocation getRegionLocation(final byte [] row)
 HRegionLocation getRegionLocation(final byte [] row, boolean reload)
 byte [][] getStartKeys()
 byte[][] getEndKeys()
 Pairbyte[][],byte[][] getStartEndKeys()
 void clearRegionCache()
 {code}
 and a default scope method which maybe should be bundled with the others:
 {code}
 ListRegionLocations listRegionLocations()
 {code}
 Since the consensus seems to be that these would muddy HTableInterface with 
 non-core functionality, where should it go?  MapReduce looks up the region 
 boundaries, so it needs to be exposed somewhere.
 Let me throw out a straw man to start the conversation.  I propose:
 {code}
 org.apache.hadoop.hbase.client.HRegionInterface
 {code}
 Have HTable implement this interface.  Also add these methods to HConnection:
 {code}
 HRegionInterface getTableRegion(TableName tableName)
 HRegionInterface getTableRegion(TableName tableName, ExecutorService pool)
 {code}
 [~stack], [~ndimiduk], [~enis], thoughts?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HBASE-11657) Put HTable region methods in an interface


 [ 
https://issues.apache.org/jira/browse/HBASE-11657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carter updated HBASE-11657:
---

Status: Patch Available  (was: Open)

Submitted new patch in v5.  Made HRL interface.public, removed 
{{clearRegionCache()}}, added {{TableName getName()}}.

 Put HTable region methods in an interface
 -

 Key: HBASE-11657
 URL: https://issues.apache.org/jira/browse/HBASE-11657
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.99.0
Reporter: Carter
Assignee: Carter
 Fix For: 0.99.0

 Attachments: HBASE_11657.patch, HBASE_11657_v2.patch, 
 HBASE_11657_v3.patch, HBASE_11657_v3.patch, HBASE_11657_v4.patch, 
 HBASE_11657_v5.patch


 Most of the HTable methods are now abstracted by HTableInterface, with the 
 notable exception of the following methods that pertain to region metadata:
 {code}
 HRegionLocation getRegionLocation(final String row)
 HRegionLocation getRegionLocation(final byte [] row)
 HRegionLocation getRegionLocation(final byte [] row, boolean reload)
 byte [][] getStartKeys()
 byte[][] getEndKeys()
 Pairbyte[][],byte[][] getStartEndKeys()
 void clearRegionCache()
 {code}
 and a default scope method which maybe should be bundled with the others:
 {code}
 ListRegionLocations listRegionLocations()
 {code}
 Since the consensus seems to be that these would muddy HTableInterface with 
 non-core functionality, where should it go?  MapReduce looks up the region 
 boundaries, so it needs to be exposed somewhere.
 Let me throw out a straw man to start the conversation.  I propose:
 {code}
 org.apache.hadoop.hbase.client.HRegionInterface
 {code}
 Have HTable implement this interface.  Also add these methods to HConnection:
 {code}
 HRegionInterface getTableRegion(TableName tableName)
 HRegionInterface getTableRegion(TableName tableName, ExecutorService pool)
 {code}
 [~stack], [~ndimiduk], [~enis], thoughts?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11696) Make CombinedBlockCache resizable.


[ 
https://issues.apache.org/jira/browse/HBASE-11696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102439#comment-14102439
 ] 

Hudson commented on HBASE-11696:


SUCCESS: Integrated in HBase-1.0 #111 (See 
[https://builds.apache.org/job/HBase-1.0/111/])
HBASE-11696 Make CombinedBlockCache resizable. (anoopsamjohn: rev 
d502bafad2592e83672f3bbe3bae2e2fb48a19cc)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/CombinedBlockCache.java


 Make CombinedBlockCache resizable.
 --

 Key: HBASE-11696
 URL: https://issues.apache.org/jira/browse/HBASE-11696
 Project: HBase
  Issue Type: Improvement
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 0.99.0, 2.0.0

 Attachments: HBASE-11696.patch


 HBASE-5349 adds auto tuning of memstore heap and block cache heap. Block 
 cache needs to be resizable in order for this. CombinedBlockCache is not 
 marked resizable now. We can make this. On resize the L1 cache (ie. LRU 
 cache) can get resized.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HBASE-11774) Avoid allocating unnecessary tag iterators


 [ 
https://issues.apache.org/jira/browse/HBASE-11774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-11774:
---

Attachment: HBASE-11774_v2.patch

bq. You can just avoid the visibility classes from the patch.

Would save a bit of work maybe but I think each change should stand on its own 
and be complete. But yeah we will need these changes in HBASE-11553 also or 
that patch would regress on this point. 

bq. CellUtil.tagsIterator() used without the tags length check in 
VisibilityUtils also

Attached v2 patch that includes this. Will commit shortly unless objection. 
Thanks for the reviews!

All o.a.h.h.security.*.* tests pass locally.

 Avoid allocating unnecessary tag iterators
 --

 Key: HBASE-11774
 URL: https://issues.apache.org/jira/browse/HBASE-11774
 Project: HBase
  Issue Type: Improvement
Reporter: Andrew Purtell
Assignee: Andrew Purtell
Priority: Minor
 Fix For: 0.99.0, 2.0.0, 0.98.6

 Attachments: HBASE-11774.patch, HBASE-11774_v2.patch


 We can avoid an unnecessary object allocation, sometimes in hot code paths, 
 by not creating a tag iterator if the cell's tag area is of length zero, 
 signifying no tags present.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11334) Migrate to SLF4J as logging interface


[ 
https://issues.apache.org/jira/browse/HBASE-11334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102457#comment-14102457
 ] 

stack commented on HBASE-11334:
---

bq. Next step would be to abstract them so that the hadoop community can 
standardize on a logging layer that is easy for end users to tune.

That sounds grand given all the different logging machines afoot in hadoop.  
The aim is to have all use the same?  What about the 3rd parties that are 
outside of the hadoop umbrella?  e.g. jetty?

HBase is up on an abstraction already, apache commons logging. Why do we need 
to move to another [~jayunit100]?

 Migrate to SLF4J as logging interface
 -

 Key: HBASE-11334
 URL: https://issues.apache.org/jira/browse/HBASE-11334
 Project: HBase
  Issue Type: Improvement
Reporter: jay vyas

 Migrating to new log implementations is underway as in HBASE-10092. 
 Next step would be to abstract them so that the hadoop community can 
 standardize on a logging layer that is easy for end users to tune.
 Simplest way to do this is use SLF4j APIs as the main interface and binding/ 
 implementation details in the docs as necessary.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11334) Migrate to SLF4J as logging interface


[ 
https://issues.apache.org/jira/browse/HBASE-11334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102468#comment-14102468
 ] 

stack commented on HBASE-11334:
---

bq. Why do we need to move to another [~jayunit100]?

Smile = 
http://jayunit100.blogspot.com/2013/10/simplifying-distinction-between-sl4j.html

 Migrate to SLF4J as logging interface
 -

 Key: HBASE-11334
 URL: https://issues.apache.org/jira/browse/HBASE-11334
 Project: HBase
  Issue Type: Improvement
Reporter: jay vyas

 Migrating to new log implementations is underway as in HBASE-10092. 
 Next step would be to abstract them so that the hadoop community can 
 standardize on a logging layer that is easy for end users to tune.
 Simplest way to do this is use SLF4j APIs as the main interface and binding/ 
 implementation details in the docs as necessary.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11334) Migrate to SLF4J as logging interface


[ 
https://issues.apache.org/jira/browse/HBASE-11334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102473#comment-14102473
 ] 

stack commented on HBASE-11334:
---

[~jayunit100] What you suggest for reconciling logging engines in hbase?  We 
bundle a bunch of third-parties -- hadoop and non-hadoop -- with conflicting 
loggings and then we ourselves are on the classpath of other apps/containers.

 Migrate to SLF4J as logging interface
 -

 Key: HBASE-11334
 URL: https://issues.apache.org/jira/browse/HBASE-11334
 Project: HBase
  Issue Type: Improvement
Reporter: jay vyas

 Migrating to new log implementations is underway as in HBASE-10092. 
 Next step would be to abstract them so that the hadoop community can 
 standardize on a logging layer that is easy for end users to tune.
 Simplest way to do this is use SLF4j APIs as the main interface and binding/ 
 implementation details in the docs as necessary.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HBASE-11774) Avoid allocating unnecessary tag iterators


 [ 
https://issues.apache.org/jira/browse/HBASE-11774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-11774:
---

Attachment: HBASE-11774_v2-0.98.patch

 Avoid allocating unnecessary tag iterators
 --

 Key: HBASE-11774
 URL: https://issues.apache.org/jira/browse/HBASE-11774
 Project: HBase
  Issue Type: Improvement
Reporter: Andrew Purtell
Assignee: Andrew Purtell
Priority: Minor
 Fix For: 0.99.0, 2.0.0, 0.98.6

 Attachments: HBASE-11774.patch, HBASE-11774_v2-0.98.patch, 
 HBASE-11774_v2.patch


 We can avoid an unnecessary object allocation, sometimes in hot code paths, 
 by not creating a tag iterator if the cell's tag area is of length zero, 
 signifying no tags present.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HBASE-11763) Move TTL handling into ScanQueryMatcher


 [ 
https://issues.apache.org/jira/browse/HBASE-11763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-11763:
---

Attachment: HBASE-11763.patch

Updated patch drops a junk change in ScanDeleteTracker that would cause a 
javadoc warning.

 Move TTL handling into ScanQueryMatcher
 ---

 Key: HBASE-11763
 URL: https://issues.apache.org/jira/browse/HBASE-11763
 Project: HBase
  Issue Type: Sub-task
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.99.0, 2.0.0, 0.98.6

 Attachments: HBASE-11763.patch, HBASE-11763.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11764) Support per cell TTLs


[ 
https://issues.apache.org/jira/browse/HBASE-11764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102496#comment-14102496
 ] 

Andrew Purtell commented on HBASE-11764:


bq. So you will add setter in Mutation (Non Delete) to pass the per cell TTL 
right?

Yes. 

Also, I realized this morning that this part of HBASE-11763 will need to be 
changed in the patch on this issue:
{code}
@@ -362,9 +360,16 @@ public class ScanQueryMatcher {
   }
   // note the following next else if...
   // delete marker are not subject to other delete markers
-} else if (!this.deletes.isEmpty()) {
-  DeleteResult deleteResult = deletes.isDeleted(cell);
-  switch (deleteResult) {
+} else {
+  // If the cell is expired and we have enough versions, skip
+  if (columns.hasMinVersions()  HStore.isExpired(cell, 
oldestUnexpiredTS)) {
+return columns.getNextRowOrNextColumn(cell.getQualifierArray(), 
qualifierOffset,
+qualifierLength);
+  }
+  // Check deletes
+  if (!this.deletes.isEmpty()) {
+DeleteResult deleteResult = deletes.isDeleted(cell);
+switch (deleteResult) {
 case FAMILY_DELETED:
 case COLUMN_DELETED:
   return columns.getNextRowOrNextColumn(cell.getQualifierArray(),
{code}

We can't assume based on a cell TTL that we can skip to the next column. We can 
only skip the current cell. This may affect scanning performance 
unconditionally. Up to now additional costs like the tag iterator would be 
avoided wherever cells do not have tags.

 Support per cell TTLs
 -

 Key: HBASE-11764
 URL: https://issues.apache.org/jira/browse/HBASE-11764
 Project: HBase
  Issue Type: Sub-task
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.99.0, 0.98.6

 Attachments: HBASE-11764.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11646) Handle the MOB in compaction


[ 
https://issues.apache.org/jira/browse/HBASE-11646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102497#comment-14102497
 ] 

Jonathan Hsieh commented on HBASE-11646:


note -- the patch is being reviewed here https://reviews.apache.org/r/24736/

 Handle the MOB in compaction
 

 Key: HBASE-11646
 URL: https://issues.apache.org/jira/browse/HBASE-11646
 Project: HBase
  Issue Type: Sub-task
  Components: Compaction
Reporter: Jingcheng Du
Assignee: Jingcheng Du
 Attachments: HBASE-11646.diff


 In the updated MOB design however, admins can set CF level thresholds that 
 would force cell values  the threshold to use the MOB write path instead of 
 the traditional path.  There are two cases where mobs need to interact with 
 this threshold
 1) How do we handle the case when the threshold size is changed?
 2) Today, you can bulkload hfiles that contain MOBs.  These cells will work 
 as normal inside hbase.  Unfortunately the cells with MOBs in them will never 
 benefit form the MOB write path.
 The proposal here is to modify compaction in mob enabled cf's such that the 
 threshold value is honored with compactions.  This handles case #1 -- 
 elements that should be moved out of the normal hfiles get 'compacted' into 
 refs and mob hfiles, and values that should be pulled into the cf get derefed 
 and written out wholy in the compaction.  For case #2, we can maintain the 
 same behavior and compaction would move data into the mob writepath/lifecycle.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11742) Backport HBASE-7987 and HBASE-11185 to 0.98

2014-08-19 Thread Esteban Gutierrez (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102502#comment-14102502
 ] 

Esteban Gutierrez commented on HBASE-11742:
---

Thanks [~mbertozzi]. [~apurtell] does it looks good for you? 

 Backport HBASE-7987 and HBASE-11185 to 0.98
 ---

 Key: HBASE-11742
 URL: https://issues.apache.org/jira/browse/HBASE-11742
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce, snapshots
Affects Versions: 0.98.5
Reporter: Esteban Gutierrez
Assignee: Esteban Gutierrez
 Fix For: 0.98.6

 Attachments: HBASE-11742.v0.patch, HBASE-11742.v1.patch


 HBASE-7987 improves how snapshots are handled via a manifest file. This 
 requires reverting HBASE-11360 since introduces an alternate functionality 
 that is not compatible with HBASE-7987.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (HBASE-11764) Support per cell TTLs


[ 
https://issues.apache.org/jira/browse/HBASE-11764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102496#comment-14102496
 ] 

Andrew Purtell edited comment on HBASE-11764 at 8/19/14 5:30 PM:
-

bq. So you will add setter in Mutation (Non Delete) to pass the per cell TTL 
right?

Yes. 

Also, I realized this morning that this part of HBASE-11763 will need to be 
changed in the patch on this issue:
{code}
@@ -362,9 +360,16 @@ public class ScanQueryMatcher {
   }
   // note the following next else if...
   // delete marker are not subject to other delete markers
-} else if (!this.deletes.isEmpty()) {
-  DeleteResult deleteResult = deletes.isDeleted(cell);
-  switch (deleteResult) {
+} else {
+  // If the cell is expired and we have enough versions, skip
+  if (columns.hasMinVersions()  HStore.isExpired(cell, 
oldestUnexpiredTS)) {
+return columns.getNextRowOrNextColumn(cell.getQualifierArray(), 
qualifierOffset,
+qualifierLength);
+  }
+  // Check deletes
+  if (!this.deletes.isEmpty()) {
+DeleteResult deleteResult = deletes.isDeleted(cell);
+switch (deleteResult) {
 case FAMILY_DELETED:
 case COLUMN_DELETED:
   return columns.getNextRowOrNextColumn(cell.getQualifierArray(),
{code}

HStore#isExpired has been changed to do, if a cell TTL is available, a 
comparison of the cell's timestamp with its TTL tag, ignoring the family 
setting (oldestUnexpiredTS). We can't assume based on a cell TTL that we can 
skip to the next column. We can only skip the current cell, because a cell TTL 
overrides any family setting in the narrowest scope of a single cell. The 
earlier assumption that once we hit an expired cell no earlier cell is alive 
is no longer true. This may affect scanning performance unconditionally. Up to 
now additional costs like the tag iterator would be avoided wherever cells do 
not have tags.


was (Author: apurtell):
bq. So you will add setter in Mutation (Non Delete) to pass the per cell TTL 
right?

Yes. 

Also, I realized this morning that this part of HBASE-11763 will need to be 
changed in the patch on this issue:
{code}
@@ -362,9 +360,16 @@ public class ScanQueryMatcher {
   }
   // note the following next else if...
   // delete marker are not subject to other delete markers
-} else if (!this.deletes.isEmpty()) {
-  DeleteResult deleteResult = deletes.isDeleted(cell);
-  switch (deleteResult) {
+} else {
+  // If the cell is expired and we have enough versions, skip
+  if (columns.hasMinVersions()  HStore.isExpired(cell, 
oldestUnexpiredTS)) {
+return columns.getNextRowOrNextColumn(cell.getQualifierArray(), 
qualifierOffset,
+qualifierLength);
+  }
+  // Check deletes
+  if (!this.deletes.isEmpty()) {
+DeleteResult deleteResult = deletes.isDeleted(cell);
+switch (deleteResult) {
 case FAMILY_DELETED:
 case COLUMN_DELETED:
   return columns.getNextRowOrNextColumn(cell.getQualifierArray(),
{code}

We can't assume based on a cell TTL that we can skip to the next column. We can 
only skip the current cell. This may affect scanning performance 
unconditionally. Up to now additional costs like the tag iterator would be 
avoided wherever cells do not have tags.

 Support per cell TTLs
 -

 Key: HBASE-11764
 URL: https://issues.apache.org/jira/browse/HBASE-11764
 Project: HBase
  Issue Type: Sub-task
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.99.0, 0.98.6

 Attachments: HBASE-11764.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11764) Support per cell TTLs


[ 
https://issues.apache.org/jira/browse/HBASE-11764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102508#comment-14102508
 ] 

Andrew Purtell commented on HBASE-11764:


Anyway, the above is easy to deal with, I just have to break out the expiration 
tests into one check for cell TTL and another for family and skip or move to 
next column based on each test individually. Just calling your attention to the 
issue with the current patch

 Support per cell TTLs
 -

 Key: HBASE-11764
 URL: https://issues.apache.org/jira/browse/HBASE-11764
 Project: HBase
  Issue Type: Sub-task
Reporter: Andrew Purtell
Assignee: Andrew Purtell
 Fix For: 0.99.0, 0.98.6

 Attachments: HBASE-11764.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-4955) Use the official versions of surefire junit


[ 
https://issues.apache.org/jira/browse/HBASE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102516#comment-14102516
 ] 

stack commented on HBASE-4955:
--

Its a 'foriegn' test, one that came in from hadoop when we copy/pasted http.  
Its second class. Could comment it out if only failing test (as per @nkeywal -- 
sort of)

 Use the official versions of surefire  junit
 -

 Key: HBASE-4955
 URL: https://issues.apache.org/jira/browse/HBASE-4955
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0, 0.98.0, 0.96.0, 0.99.0
 Environment: all
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
Priority: Critical
 Attachments: 4955.v1.patch, 4955.v2.patch, 4955.v2.patch, 
 4955.v2.patch, 4955.v2.patch, 4955.v3.patch, 4955.v3.patch, 4955.v3.patch, 
 4955.v4.patch, 4955.v4.patch, 4955.v4.patch, 4955.v4.patch, 4955.v4.patch, 
 4955.v4.patch, 4955.v5.patch, 4955.v6.patch, 4955.v7.patch, 4955.v7.patch, 
 4955.v8.patch, 4955.v9.patch, 8204.v4.patch


 We currently use private versions for Surefire  JUnit since HBASE-4763.
 This JIRA traks what we need to move to official versions.
 Surefire 2.11 is just out, but, after some tests, it does not contain all 
 what we need.
 JUnit. Could be for JUnit 4.11. Issue to monitor:
 https://github.com/KentBeck/junit/issues/359: fixed in our version, no 
 feedback for an integration on trunk
 Surefire: Could be for Surefire 2.12. Issues to monitor are:
 329 (category support): fixed, we use the official implementation from the 
 trunk
 786 (@Category with forkMode=always): fixed, we use the official 
 implementation from the trunk
 791 (incorrect elapsed time on test failure): fixed, we use the official 
 implementation from the trunk
 793 (incorrect time in the XML report): Not fixed (reopen) on trunk, fixed on 
 our version.
 760 (does not take into account the test method): fixed in trunk, not fixed 
 in our version
 798 (print immediately the test class name): not fixed in trunk, not fixed in 
 our version
 799 (Allow test parallelization when forkMode=always): not fixed in trunk, 
 not fixed in our version
 800 (redirectTestOutputToFile not taken into account): not yet fix on trunk, 
 fixed on our version
 800  793 are the more important to monitor, it's the only ones that are 
 fixed in our version but not on trunk.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11772) Bulk load mvcc and seqId issues with native hfiles

2014-08-19 Thread Jerry He (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-11772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102519#comment-14102519
]

Jerry He commented on HBASE-11772:
--

Hi, [~jmspaggi]

Will do.

Bulk load mvcc and seqId issues with native hfiles
--

Attachments: HBASE-11772-0.98.patch

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11591) Scanner fails to retrieve KV from bulk loaded file with highest sequence id than the cell's mvcc in a non-bulk loaded file


[ 
https://issues.apache.org/jira/browse/HBASE-11591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102535#comment-14102535
 ] 

stack commented on HBASE-11591:
---

There is the SequenceNumber Interface but that is only about getting a 
SequenceNumber.

As per you fellows, don't think we need add method to Cell.  There are no 
setters in Cell currently.  Why start now.

A marker Interface that allows you set sequence id on the hosting object seems 
fine.  MutableCell is a little ugly since it tarnishes our nice 'Cell' notion.

What about adding setter on SequenceNumber? One of the implementors is HLogKey. 
 It has a:

  void setLogSeqNum(final long sequence) {
this.logSeqNum = sequence;
this.seqNumAssignedLatch.countDown();
  }



 Scanner fails to retrieve KV  from bulk loaded file with highest sequence id 
 than the cell's mvcc in a non-bulk loaded file
 ---

 Key: HBASE-11591
 URL: https://issues.apache.org/jira/browse/HBASE-11591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.99.0

 Attachments: HBASE-11591.patch, HBASE-11591_1.patch, 
 HBASE-11591_2.patch, HBASE-11591_3.patch, TestBulkload.java


 See discussion in HBASE-11339.
 When we have a case where there are same KVs in two files one produced by 
 flush/compaction and the other thro the bulk load.
 Both the files have some same kvs which matches even in timestamp.
 Steps:
 Add some rows with a specific timestamp and flush the same.  
 Bulk load a file with the same data.. Enusre that assign seqnum property is 
 set.
 The bulk load should use HFileOutputFormat2 (or ensure that we write the 
 bulk_time_output key).
 This would ensure that the bulk loaded file has the highest seq num.
 Assume the cell in the flushed/compacted store file is 
 row1,cf,cq,ts1, value1  and the cell in the bulk loaded file is
 row1,cf,cq,ts1,value2 
 (There are no parallel scans).
 Issue a scan on the table in 0.96. The retrieved value is 
 row1,cf1,cq,ts1,value2
 But the same in 0.98 will retrieve row1,cf1,cq,ts2,value1. 
 This is a behaviour change.  This is because of this code 
 {code}
 public int compare(KeyValueScanner left, KeyValueScanner right) {
   int comparison = compare(left.peek(), right.peek());
   if (comparison != 0) {
 return comparison;
   } else {
 // Since both the keys are exactly the same, we break the tie in favor
 // of the key which came latest.
 long leftSequenceID = left.getSequenceID();
 long rightSequenceID = right.getSequenceID();
 if (leftSequenceID  rightSequenceID) {
   return -1;
 } else if (leftSequenceID  rightSequenceID) {
   return 1;
 } else {
   return 0;
 }
   }
 }
 {code}
 Here  in 0.96 case the mvcc of the cell in both the files will have 0 and so 
 the comparison will happen from the else condition .  Where the seq id of the 
 bulk loaded file is greater and would sort out first ensuring that the scan 
 happens from that bulk loaded file.
 In case of 0.98+ as we are retaining the mvcc+seqid we are not making the 
 mvcc as 0 (remains a non zero positive value).  Hence the compare() sorts out 
 the cell in the flushed/compacted file.  Which means though we know the 
 lateset file is the bulk loaded file we don't scan the data.
 Seems to be a behaviour change.  Will check on other corner cases also but we 
 are trying to know the behaviour of bulk load because we are evaluating if it 
 can be used for MOB design.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (HBASE-11574) hbase:meta's regions can be replicated

2014-08-19 Thread Devaraj Das (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-11574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das reassigned HBASE-11574:
---

Assignee: Devaraj Das

 hbase:meta's regions can be replicated
 --

 Key: HBASE-11574
 URL: https://issues.apache.org/jira/browse/HBASE-11574
 Project: HBase
  Issue Type: Sub-task
Reporter: Enis Soztutar
Assignee: Devaraj Das

 As mentioned elsewhere, we can leverage hbase-10070 features to create 
 replicas for the meta tables regions so that: 
  1. meta hotspotting can be circumvented 
  2. meta becomes highly available for reading 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11165) Scaling so cluster can host 1M regions and beyond (50M regions?)

2014-08-19 Thread Francis Liu (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-11165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102555#comment-14102555
]

Francis Liu commented on HBASE-11165:
-

{quote}
Can I have some pointers on how to read the above. Zk-less AM is better because
you scan a table – you don't have to ls znodes? What is the 1M znodes vs 1M
rows about in above?
{quote}
Essentially the apis are better. ie 1M rows we can iterate over the rows
instead of ls and get back a huge chunk of data. ie deleting 1M znodes takes
too long, this could be parallelizable against an hbase table.

For 2.a, response is below. For 2.b, it's mainly a concern wether we'll hit
other ZK issues when having that many child znodes (1M and beyond). HDFS guys
are already looking into scaling number of child directories for NN.

Will update doc.

{quote}
Francis Liu Is the above the basis for your ...As our experiments shows
splitting is a must for scaling.? If split meta, then more read/write
throughput?
{quote}
If split meta, then: 1) Less write amplification (ie no large compactions),
Better W throughput. 2) More disks, more R/W throughput. 3. More heap to fit
meta, better R throughput.

{quote}
Because the meta table could be served by many machines so field more
reads/writes? The reads/writes are needed at starttime or during cluster
lifetime in your judgement? Thanks.
{quote}
Yep needed for startup. We need to do experiments for 1 rack and 2 rack failure
for cluster lifetime case. Though large compactions would creep up on you. So
splitting would still be motivating for cluster lifetime IMHO.

Scaling so cluster can host 1M regions and beyond (50M regions?)

Key: HBASE-11165
URL: https://issues.apache.org/jira/browse/HBASE-11165
Project: HBase
Issue Type: Brainstorming
Reporter: stack
Attachments: HBASE-11165.zip, Region Scalability test.pdf,
zk_less_assignment_comparison_2.pdf

This discussion issue comes out of Co-locate Meta And Master HBASE-10569
and comments on the doc posted there.
A user -- our Francis Liu -- needs to be able to scale a cluster to do 1M
regions maybe even 50M later. This issue is about discussing how we will do
that (or if not 50M on a cluster, how otherwise we can attain same end).
More detail to follow.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11657) Put HTable region methods in an interface


[ 
https://issues.apache.org/jira/browse/HBASE-11657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102569#comment-14102569
 ] 

stack commented on HBASE-11657:
---

bq. For RegionLocations, we may need an immutable version.

Sounds good.

Looking at patch, would I ever have to clean up a RegionLocator... call close 
on it when done?  Otherwise +1



 Put HTable region methods in an interface
 -

 Key: HBASE-11657
 URL: https://issues.apache.org/jira/browse/HBASE-11657
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.99.0
Reporter: Carter
Assignee: Carter
 Fix For: 0.99.0

 Attachments: HBASE_11657.patch, HBASE_11657_v2.patch, 
 HBASE_11657_v3.patch, HBASE_11657_v3.patch, HBASE_11657_v4.patch, 
 HBASE_11657_v5.patch


 Most of the HTable methods are now abstracted by HTableInterface, with the 
 notable exception of the following methods that pertain to region metadata:
 {code}
 HRegionLocation getRegionLocation(final String row)
 HRegionLocation getRegionLocation(final byte [] row)
 HRegionLocation getRegionLocation(final byte [] row, boolean reload)
 byte [][] getStartKeys()
 byte[][] getEndKeys()
 Pairbyte[][],byte[][] getStartEndKeys()
 void clearRegionCache()
 {code}
 and a default scope method which maybe should be bundled with the others:
 {code}
 ListRegionLocations listRegionLocations()
 {code}
 Since the consensus seems to be that these would muddy HTableInterface with 
 non-core functionality, where should it go?  MapReduce looks up the region 
 boundaries, so it needs to be exposed somewhere.
 Let me throw out a straw man to start the conversation.  I propose:
 {code}
 org.apache.hadoop.hbase.client.HRegionInterface
 {code}
 Have HTable implement this interface.  Also add these methods to HConnection:
 {code}
 HRegionInterface getTableRegion(TableName tableName)
 HRegionInterface getTableRegion(TableName tableName, ExecutorService pool)
 {code}
 [~stack], [~ndimiduk], [~enis], thoughts?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11773) Wrong field used for protobuf construction in RegionStates.


[ 
https://issues.apache.org/jira/browse/HBASE-11773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102581#comment-14102581
 ] 

Hudson commented on HBASE-11773:


FAILURE: Integrated in HBase-1.0 #112 (See 
[https://builds.apache.org/job/HBase-1.0/112/])
HBASE-11773 Wrong field used for protobuf construction in RegionStates (Andrey 
Stepachev) (apurtell: rev 4901e649b64700e6796c2ba2da24ac2b906273ec)
* hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestRegionState.java
* hbase-client/src/main/java/org/apache/hadoop/hbase/master/RegionState.java


 Wrong field used for protobuf construction in RegionStates.
 ---

 Key: HBASE-11773
 URL: https://issues.apache.org/jira/browse/HBASE-11773
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Reporter: Andrey Stepachev
Assignee: Andrey Stepachev
 Fix For: 0.99.0, 2.0.0, 0.98.6

 Attachments: HBASE-11773-0.98.patch, HBASE-11773.patch


 Protobuf  Java Pojo converter uses wrong field for converted enum 
 construction (actually default value of protobuf message used).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HBASE-11657) Put HTable region methods in an interface


 [ 
https://issues.apache.org/jira/browse/HBASE-11657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carter updated HBASE-11657:
---

Status: Open  (was: Patch Available)

 Put HTable region methods in an interface
 -

 Key: HBASE-11657
 URL: https://issues.apache.org/jira/browse/HBASE-11657
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.99.0
Reporter: Carter
Assignee: Carter
 Fix For: 0.99.0

 Attachments: HBASE_11657.patch, HBASE_11657_v2.patch, 
 HBASE_11657_v3.patch, HBASE_11657_v3.patch, HBASE_11657_v4.patch, 
 HBASE_11657_v5.patch, HBASE_11657_v6.patch


 Most of the HTable methods are now abstracted by HTableInterface, with the 
 notable exception of the following methods that pertain to region metadata:
 {code}
 HRegionLocation getRegionLocation(final String row)
 HRegionLocation getRegionLocation(final byte [] row)
 HRegionLocation getRegionLocation(final byte [] row, boolean reload)
 byte [][] getStartKeys()
 byte[][] getEndKeys()
 Pairbyte[][],byte[][] getStartEndKeys()
 void clearRegionCache()
 {code}
 and a default scope method which maybe should be bundled with the others:
 {code}
 ListRegionLocations listRegionLocations()
 {code}
 Since the consensus seems to be that these would muddy HTableInterface with 
 non-core functionality, where should it go?  MapReduce looks up the region 
 boundaries, so it needs to be exposed somewhere.
 Let me throw out a straw man to start the conversation.  I propose:
 {code}
 org.apache.hadoop.hbase.client.HRegionInterface
 {code}
 Have HTable implement this interface.  Also add these methods to HConnection:
 {code}
 HRegionInterface getTableRegion(TableName tableName)
 HRegionInterface getTableRegion(TableName tableName, ExecutorService pool)
 {code}
 [~stack], [~ndimiduk], [~enis], thoughts?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HBASE-11657) Put HTable region methods in an interface


 [ 
https://issues.apache.org/jira/browse/HBASE-11657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carter updated HBASE-11657:
---

Attachment: HBASE_11657_v6.patch

 Put HTable region methods in an interface
 -

 Key: HBASE-11657
 URL: https://issues.apache.org/jira/browse/HBASE-11657
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.99.0
Reporter: Carter
Assignee: Carter
 Fix For: 0.99.0

 Attachments: HBASE_11657.patch, HBASE_11657_v2.patch, 
 HBASE_11657_v3.patch, HBASE_11657_v3.patch, HBASE_11657_v4.patch, 
 HBASE_11657_v5.patch, HBASE_11657_v6.patch


 Most of the HTable methods are now abstracted by HTableInterface, with the 
 notable exception of the following methods that pertain to region metadata:
 {code}
 HRegionLocation getRegionLocation(final String row)
 HRegionLocation getRegionLocation(final byte [] row)
 HRegionLocation getRegionLocation(final byte [] row, boolean reload)
 byte [][] getStartKeys()
 byte[][] getEndKeys()
 Pairbyte[][],byte[][] getStartEndKeys()
 void clearRegionCache()
 {code}
 and a default scope method which maybe should be bundled with the others:
 {code}
 ListRegionLocations listRegionLocations()
 {code}
 Since the consensus seems to be that these would muddy HTableInterface with 
 non-core functionality, where should it go?  MapReduce looks up the region 
 boundaries, so it needs to be exposed somewhere.
 Let me throw out a straw man to start the conversation.  I propose:
 {code}
 org.apache.hadoop.hbase.client.HRegionInterface
 {code}
 Have HTable implement this interface.  Also add these methods to HConnection:
 {code}
 HRegionInterface getTableRegion(TableName tableName)
 HRegionInterface getTableRegion(TableName tableName, ExecutorService pool)
 {code}
 [~stack], [~ndimiduk], [~enis], thoughts?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HBASE-11657) Put HTable region methods in an interface


 [ 
https://issues.apache.org/jira/browse/HBASE-11657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carter updated HBASE-11657:
---

Status: Patch Available  (was: Open)

Of course.  Added extends Closeable to interface.

 Put HTable region methods in an interface
 -

 Key: HBASE-11657
 URL: https://issues.apache.org/jira/browse/HBASE-11657
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.99.0
Reporter: Carter
Assignee: Carter
 Fix For: 0.99.0

 Attachments: HBASE_11657.patch, HBASE_11657_v2.patch, 
 HBASE_11657_v3.patch, HBASE_11657_v3.patch, HBASE_11657_v4.patch, 
 HBASE_11657_v5.patch, HBASE_11657_v6.patch


 Most of the HTable methods are now abstracted by HTableInterface, with the 
 notable exception of the following methods that pertain to region metadata:
 {code}
 HRegionLocation getRegionLocation(final String row)
 HRegionLocation getRegionLocation(final byte [] row)
 HRegionLocation getRegionLocation(final byte [] row, boolean reload)
 byte [][] getStartKeys()
 byte[][] getEndKeys()
 Pairbyte[][],byte[][] getStartEndKeys()
 void clearRegionCache()
 {code}
 and a default scope method which maybe should be bundled with the others:
 {code}
 ListRegionLocations listRegionLocations()
 {code}
 Since the consensus seems to be that these would muddy HTableInterface with 
 non-core functionality, where should it go?  MapReduce looks up the region 
 boundaries, so it needs to be exposed somewhere.
 Let me throw out a straw man to start the conversation.  I propose:
 {code}
 org.apache.hadoop.hbase.client.HRegionInterface
 {code}
 Have HTable implement this interface.  Also add these methods to HConnection:
 {code}
 HRegionInterface getTableRegion(TableName tableName)
 HRegionInterface getTableRegion(TableName tableName, ExecutorService pool)
 {code}
 [~stack], [~ndimiduk], [~enis], thoughts?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11625) Reading datablock throws Invalid HFile block magic and can not switch to hdfs checksum

2014-08-19 Thread Paul Fleetwood (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-11625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102583#comment-14102583
 ] 

Paul Fleetwood commented on HBASE-11625:


I may have a reproduction of this issue.  I've generated a bulk loadable HFile 
(which I've attached) using the HFileOutputFormat, and am unable to perform 
scans on it.

Here is what I do:

Running 0.98.5 on Mac in single instance mode...
- Use the hbase shell to create a table
- Use the bulkload tool to load the attached file into the table:  ./bin/hbase 
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles
- Use the hbase shell to count the rows in the new table - see an exception
- Scan the table completely in the hbase shell
- Attempt to count the table again, see it succeed

Something about the scan fixes things.

The exception that I see when running the count is this:

RROR: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: 
Expected nextCallSeq: 2 But the nextCallSeq got from client: 1; 
request=scanner_id: 547 number_of_rows: 10 close_scanner: false next_call_seq: 1
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3110)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29587)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2026)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:98)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94)
at java.lang.Thread.run(Thread.java:695)

But, I've learned that this is a red herring.  This issue is the result of a 
client retry (causing the sequence numbers to mismatch).  The retry itself is 
caused by another failure, which looks like this (please note the following 
callstacks were captured while running my own application, but I believe they 
are the same in the count):

java.io.IOException: Could not iterate StoreFileScanner[HFileScanner for reader 
reader=file:/var/folders/bn/qpypwv8s3r7g3ksgxdj0hlw8gn/T/hbase-paulfleetwood/hbase/data/paul2_lxlv1_prod/events/d2697b9d34be632e481ab33433a28699/common/69823512bede4392be352761adc669e6_SeqId_26_,
 compression=none, cacheConf=CacheConfig:enabled [cacheDataOnRead=true] 
[cacheDataOnWrite=false] [cacheIndexesOnWrite=false] [cacheBloomsOnWrite=false] 
[cacheEvictOnClose=false] [cacheCompressed=false][prefetchOnOpen=false], 
firstKey=\x00Sa\xC8Dw\xE3\xE8i\x9C\xD2\xDB\x1C\xC3Mk\xF4\x99!1\xD1\xBF\x99w/common:entity\x00id\x00\x03/1398917188791/Put,
 
lastKey=\x0FS\x88\xAD\xE1\xFA\x11:\x85\x88\xDE\x12\x12\xF0\xD8s\x9A\x06X\x1B\x84\x1A\x8B\xA7\xC1/common:txn\x00timeoutPolicy\x00\x03/1401466337205/Put,
 avgKeyLen=58, avgValueLen=11, entries=345444, length=27516812, 
cur=\x00Sc'M\x5C\xC8\x0A\xD5\xC0P\xA53U\x01,\xDF=\x8D\x0F\xA6\x00\xCC\xCB/common:entity\x00type\x00\x03/1399007053829/Put/vlen=4/mvcc=0]

This is caused by something like the following:

java.io.IOException: Failed to read compressed block at 65621, 
onDiskSizeWithoutHeader=65620, preReadHeaderSize=33, header.length=33, header 
bytes: 
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00

Which, is caused by:

java.io.IOException: Invalid HFile block magic: FJ\xA8Yt\x04@$

That last exception is thrown with the following call stack:

BlockType.parse(byte[], int, int) line: 154 
BlockType.read(ByteBuffer) line: 165
HFileBlock.init(ByteBuffer, boolean) line: 239
HFileBlock$FSReaderV2.readBlockDataInternal(FSDataInputStream, long, long, int, 
boolean, boolean) line: 1446
HFileBlock$FSReaderV2.readBlockData(long, long, int, boolean) line: 1312
HFileReaderV2.readBlock(long, long, boolean, boolean, boolean, boolean, 
BlockType) line: 387
HFileReaderV2$ScannerV2(HFileReaderV2$AbstractScannerV2).readNextDataBlock() 
line: 635  
HFileReaderV2$ScannerV2.next() line: 749
StoreFileScanner.next() line: 136   
KeyValueHeap.next() line: 108   
StoreScanner.next(ListCell, int) line: 537
KeyValueHeap.next(ListCell, int) line: 140
HRegion$RegionScannerImpl.populateResult(ListCell, KeyValueHeap, int, byte[], 
int, short) line: 3937  
HRegion$RegionScannerImpl.nextInternal(ListCell, int) line: 4017  
HRegion$RegionScannerImpl.nextRaw(ListCell, int) line: 3885   
HRegion$RegionScannerImpl.nextRaw(ListCell) line: 3876
HRegionServer.scan(RpcController, ClientProtos$ScanRequest) line: 3158  
ClientProtos$ClientService$2.callBlockingMethod(Descriptors$MethodDescriptor, 
RpcController, Message) line: 29587   
RpcServer.call(BlockingService, MethodDescriptor, Message, CellScanner, long, 
MonitoredRPCHandler) line: 2026   
CallRunner.run() line: 98

[jira] [Updated] (HBASE-11625) Reading datablock throws Invalid HFile block magic and can not switch to hdfs checksum

2014-08-19 Thread Paul Fleetwood (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-11625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Fleetwood updated HBASE-11625:
---

Affects Version/s: 0.98.5

 Reading datablock throws Invalid HFile block magic and can not switch to 
 hdfs checksum 
 -

 Key: HBASE-11625
 URL: https://issues.apache.org/jira/browse/HBASE-11625
 Project: HBase
  Issue Type: Bug
  Components: HFile
Affects Versions: 0.94.21, 0.98.4, 0.98.5
Reporter: qian wang
 Attachments: 2711de1fdf73419d9f8afc6a8b86ce64.gz


 when using hbase checksum,call readBlockDataInternal() in hfileblock.java, it 
 could happen file corruption but it only can switch to hdfs checksum 
 inputstream till validateBlockChecksum(). If the datablock's header corrupted 
 when b = new HFileBlock(),it throws the exception Invalid HFile block magic 
 and the rpc call fail



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HBASE-11625) Reading datablock throws Invalid HFile block magic and can not switch to hdfs checksum

2014-08-19 Thread Paul Fleetwood (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-11625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Fleetwood updated HBASE-11625:
---

Attachment: 2711de1fdf73419d9f8afc6a8b86ce64.gz

I gzip'ed the file so that it would be under the upload size constraint.

 Reading datablock throws Invalid HFile block magic and can not switch to 
 hdfs checksum 
 -

 Key: HBASE-11625
 URL: https://issues.apache.org/jira/browse/HBASE-11625
 Project: HBase
  Issue Type: Bug
  Components: HFile
Affects Versions: 0.94.21, 0.98.4, 0.98.5
Reporter: qian wang
 Attachments: 2711de1fdf73419d9f8afc6a8b86ce64.gz


 when using hbase checksum,call readBlockDataInternal() in hfileblock.java, it 
 could happen file corruption but it only can switch to hdfs checksum 
 inputstream till validateBlockChecksum(). If the datablock's header corrupted 
 when b = new HFileBlock(),it throws the exception Invalid HFile block magic 
 and the rpc call fail



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-10378) Divide HLog interface into User and Implementor specific interfaces

2014-08-19 Thread Sean Busbey (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102596#comment-14102596
 ] 

Sean Busbey commented on HBASE-10378:
-

This has turned into a bit of a time sink. Now that I have small tests passing, 
I'm posting some WIP to get suggestions early about parts that are worth 
breaking off into different tickets. Note that the RB patch is definitely not 
something that's ready to go in.

* [github branch|https://github.com/busbey/hbase/tree/HBASE-10378] - has both 
the original work from this ticket rebased onto master and then follow on 
changes.
* [reviewboard for the net changes on top of 
master|https://reviews.apache.org/r/24857/] (warning: it's 10 pages)

RB description has the major compatibility goals and current failings. The 
github WIP commit message has questions that I'm trying to work out in my head. 
I'd love to get feedback on the compatibility stuff and suggestions on the open 
questions.

 Divide HLog interface into User and Implementor specific interfaces
 ---

 Key: HBASE-10378
 URL: https://issues.apache.org/jira/browse/HBASE-10378
 Project: HBase
  Issue Type: Sub-task
  Components: wal
Reporter: Himanshu Vashishtha
Assignee: Sean Busbey
 Attachments: 10378-1.patch, 10378-2.patch


 HBASE-5937 introduces the HLog interface as a first step to support multiple 
 WAL implementations. This interface is a good start, but has some 
 limitations/drawbacks in its current state, such as:
 1) There is no clear distinction b/w User and Implementor APIs, and it 
 provides APIs both for WAL users (append, sync, etc) and also WAL 
 implementors (Reader/Writer interfaces, etc). There are APIs which are very 
 much implementation specific (getFileNum, etc) and a user such as a 
 RegionServer shouldn't know about it.
 2) There are about 14 methods in FSHLog which are not present in HLog 
 interface but are used at several places in the unit test code. These tests 
 typecast HLog to FSHLog, which makes it very difficult to test multiple WAL 
 implementations without doing some ugly checks.
 I'd like to propose some changes in HLog interface that would ease the multi 
 WAL story:
 1) Have two interfaces WAL and WALService. WAL provides APIs for 
 implementors. WALService provides APIs for users (such as RegionServer).
 2) A skeleton implementation of the above two interface as the base class for 
 other WAL implementations (AbstractWAL). It provides required fields for all 
 subclasses (fs, conf, log dir, etc). Make a minimal set of test only methods 
 and add this set in AbstractWAL.
 3) HLogFactory returns a WALService reference when creating a WAL instance; 
 if a user need to access impl specific APIs (there are unit tests which get 
 WAL from a HRegionServer and then call impl specific APIs), use AbstractWAL 
 type casting,
 4) Make TestHLog abstract and let all implementors provide their respective 
 test class which extends TestHLog (TestFSHLog, for example).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11657) Put HTable region methods in an interface


[ 
https://issues.apache.org/jira/browse/HBASE-11657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102613#comment-14102613
 ] 

stack commented on HBASE-11657:
---

+1 on v6 (Waiting on Pope @enis to give his blessing before commit)

 Put HTable region methods in an interface
 -

 Key: HBASE-11657
 URL: https://issues.apache.org/jira/browse/HBASE-11657
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.99.0
Reporter: Carter
Assignee: Carter
 Fix For: 0.99.0

 Attachments: HBASE_11657.patch, HBASE_11657_v2.patch, 
 HBASE_11657_v3.patch, HBASE_11657_v3.patch, HBASE_11657_v4.patch, 
 HBASE_11657_v5.patch, HBASE_11657_v6.patch


 Most of the HTable methods are now abstracted by HTableInterface, with the 
 notable exception of the following methods that pertain to region metadata:
 {code}
 HRegionLocation getRegionLocation(final String row)
 HRegionLocation getRegionLocation(final byte [] row)
 HRegionLocation getRegionLocation(final byte [] row, boolean reload)
 byte [][] getStartKeys()
 byte[][] getEndKeys()
 Pairbyte[][],byte[][] getStartEndKeys()
 void clearRegionCache()
 {code}
 and a default scope method which maybe should be bundled with the others:
 {code}
 ListRegionLocations listRegionLocations()
 {code}
 Since the consensus seems to be that these would muddy HTableInterface with 
 non-core functionality, where should it go?  MapReduce looks up the region 
 boundaries, so it needs to be exposed somewhere.
 Let me throw out a straw man to start the conversation.  I propose:
 {code}
 org.apache.hadoop.hbase.client.HRegionInterface
 {code}
 Have HTable implement this interface.  Also add these methods to HConnection:
 {code}
 HRegionInterface getTableRegion(TableName tableName)
 HRegionInterface getTableRegion(TableName tableName, ExecutorService pool)
 {code}
 [~stack], [~ndimiduk], [~enis], thoughts?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11773) Wrong field used for protobuf construction in RegionStates.


[ 
https://issues.apache.org/jira/browse/HBASE-11773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102680#comment-14102680
 ] 

Hudson commented on HBASE-11773:


SUCCESS: Integrated in HBase-0.98 #458 (See 
[https://builds.apache.org/job/HBase-0.98/458/])
HBASE-11773 Wrong field used for protobuf construction in RegionStates (Andrey 
Stepachev) (apurtell: rev dbda5c38feb28aef2ee3829264cbe39af54c958d)
* hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestRegionState.java
* hbase-client/src/main/java/org/apache/hadoop/hbase/master/RegionState.java


 Wrong field used for protobuf construction in RegionStates.
 ---

 Key: HBASE-11773
 URL: https://issues.apache.org/jira/browse/HBASE-11773
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Reporter: Andrey Stepachev
Assignee: Andrey Stepachev
 Fix For: 0.99.0, 2.0.0, 0.98.6

 Attachments: HBASE-11773-0.98.patch, HBASE-11773.patch


 Protobuf  Java Pojo converter uses wrong field for converted enum 
 construction (actually default value of protobuf message used).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-10092) Move up on to log4j2

[
https://issues.apache.org/jira/browse/HBASE-10092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102712#comment-14102712
]

Andrew Purtell commented on HBASE-10092:

bq. I don't think properties is a blocker one way or another.

See Stack's comment above. The change will not be palatable for current or
anticipated releases (0.98, 1.0) without properties file compatibility. A trunk
only change would still be a great contribution but with less impact. I would
really like to see async logging a possibility in 0.98 so will need to do this
work at least in that branch.

bq. Any reason why we should not consider logback? It looks like supporting
unit testing with the new version of surefire is going to be very hard with
log4j2

Hadoop is looking at moving up to log4j2 also. What kind of hell will we be in
if Hadoop is on log4j2 and we are on logback? Isn't log4j2 the continuation of
logback?

Move up on to log4j2

Key: HBASE-10092
URL: https://issues.apache.org/jira/browse/HBASE-10092
Project: HBase
Issue Type: Sub-task
Reporter: stack
Assignee: Alex Newman
Fix For: 2.0.0

Attachments: 10092.txt, 10092v2.txt, HBASE-10092.patch

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-10092) Move up on to log4j2


[ 
https://issues.apache.org/jira/browse/HBASE-10092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102713#comment-14102713
 ] 

Andrew Purtell commented on HBASE-10092:


bq. If support for properties files is that important, vote for LOG4J2-635.

As requested, [~jvz]! 

 Move up on to log4j2
 

 Key: HBASE-10092
 URL: https://issues.apache.org/jira/browse/HBASE-10092
 Project: HBase
  Issue Type: Sub-task
Reporter: stack
Assignee: Alex Newman
 Fix For: 2.0.0

 Attachments: 10092.txt, 10092v2.txt, HBASE-10092.patch


 Allows logging with less friction.  See http://logging.apache.org/log4j/2.x/  
 This rather radical transition can be done w/ minor change given they have an 
 adapter for apache's logging, the one we use.  They also have and adapter for 
 slf4j so we likely can remove at least some of the 4 versions of this module 
 our dependencies make use of.
 I made a start in attached patch but am currently stuck in maven dependency 
 resolve hell courtesy of our slf4j.  Fixing will take some concentration and 
 a good net connection, an item I currently lack.  Other TODOs are that will 
 need to fix our little log level setting jsp page -- will likely have to undo 
 our use of hadoop's tool here -- and the config system changes a little.
 I will return to this project soon.  Will bring numbers.
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HBASE-11774) Avoid allocating unnecessary tag iterators


 [ 
https://issues.apache.org/jira/browse/HBASE-11774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-11774:
---

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Pushed v2 patch to 0.98+. Thanks for the reviews!

 Avoid allocating unnecessary tag iterators
 --

 Key: HBASE-11774
 URL: https://issues.apache.org/jira/browse/HBASE-11774
 Project: HBase
  Issue Type: Improvement
Reporter: Andrew Purtell
Assignee: Andrew Purtell
Priority: Minor
 Fix For: 0.99.0, 2.0.0, 0.98.6

 Attachments: HBASE-11774.patch, HBASE-11774_v2-0.98.patch, 
 HBASE-11774_v2.patch


 We can avoid an unnecessary object allocation, sometimes in hot code paths, 
 by not creating a tag iterator if the cell's tag area is of length zero, 
 signifying no tags present.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11773) Wrong field used for protobuf construction in RegionStates.


[ 
https://issues.apache.org/jira/browse/HBASE-11773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102753#comment-14102753
 ] 

Hudson commented on HBASE-11773:


SUCCESS: Integrated in HBase-TRUNK #5411 (See 
[https://builds.apache.org/job/HBase-TRUNK/5411/])
HBASE-11773 Wrong field used for protobuf construction in RegionStates (Andrey 
Stepachev) (apurtell: rev 393a2a3814a85e4b985aba89243101b23220eed1)
* hbase-client/src/main/java/org/apache/hadoop/hbase/master/RegionState.java
* hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestRegionState.java


 Wrong field used for protobuf construction in RegionStates.
 ---

 Key: HBASE-11773
 URL: https://issues.apache.org/jira/browse/HBASE-11773
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Reporter: Andrey Stepachev
Assignee: Andrey Stepachev
 Fix For: 0.99.0, 2.0.0, 0.98.6

 Attachments: HBASE-11773-0.98.patch, HBASE-11773.patch


 Protobuf  Java Pojo converter uses wrong field for converted enum 
 construction (actually default value of protobuf message used).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11778) Scale timestamps by 1000

2014-08-19 Thread Lars Hofhansl (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-11778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102756#comment-14102756
]

Lars Hofhansl commented on HBASE-11778:
---

[~giacomotay...@gmail.com], FYI.

Scale timestamps by 1000

Key: HBASE-11778
URL: https://issues.apache.org/jira/browse/HBASE-11778
Project: HBase
Issue Type: Brainstorming
Reporter: Lars Hofhansl

The KV timestamps are used for various reasons:
# ordering of KVs
# resolving conflicts
# enforce TTL
Currently we assume that the timestamps have a resolution of 1ms, and because
of that we made the resolution at which we can determine time identical to
the resolution at which we can store time.
I think it is time to disentangle the two... At least allow a higher
resolution of time to be stored. That way we could have a centralized
transaction oracle that produces ids that relate to wall clock time, and at
the same time allow producing more than 1000/s.
The simplest way is to just store time in us (microseconds). I.e. we'd still
collect time in ms by default and just multiply that with 1000 before we
store it. With 8 bytes that still gives us a range of 292471 years.
We'd have grandfather in old data. Could write a metadata entry into each
HFile declaring what the TS resolution is if it is different from ms.
Not sure, yet, how this would relate to using the TS for things like seqIds.
Let's do some brainstorming.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HBASE-11778) Scale timestamps by 1000

2014-08-19 Thread Lars Hofhansl (JIRA)

Lars Hofhansl created HBASE-11778:
-

 Summary: Scale timestamps by 1000
 Key: HBASE-11778
 URL: https://issues.apache.org/jira/browse/HBASE-11778
 Project: HBase
  Issue Type: Brainstorming
Reporter: Lars Hofhansl


The KV timestamps are used for various reasons:
# ordering of KVs
# resolving conflicts
# enforce TTL

Currently we assume that the timestamps have a resolution of 1ms, and because 
of that we made the resolution at which we can determine time identical to the 
resolution at which we can store time.
I think it is time to disentangle the two... At least allow a higher resolution 
of time to be stored. That way we could have a centralized transaction oracle 
that produces ids that relate to wall clock time, and at the same time allow 
producing more than 1000/s.

The simplest way is to just store time in us (microseconds). I.e. we'd still 
collect time in ms by default and just multiply that with 1000 before we store 
it. With 8 bytes that still gives us a range of 292471 years.

We'd have grandfather in old data. Could write a metadata entry into each HFile 
declaring what the TS resolution is if it is different from ms.

Not sure, yet, how this would relate to using the TS for things like seqIds.

Let's do some brainstorming. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HBASE-11778) Scale timestamps by 1000

2014-08-19 Thread James Taylor (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-11778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Taylor updated HBASE-11778:
-

Tags: Phoenix

 Scale timestamps by 1000
 

 Key: HBASE-11778
 URL: https://issues.apache.org/jira/browse/HBASE-11778
 Project: HBase
  Issue Type: Brainstorming
Reporter: Lars Hofhansl

 The KV timestamps are used for various reasons:
 # ordering of KVs
 # resolving conflicts
 # enforce TTL
 Currently we assume that the timestamps have a resolution of 1ms, and because 
 of that we made the resolution at which we can determine time identical to 
 the resolution at which we can store time.
 I think it is time to disentangle the two... At least allow a higher 
 resolution of time to be stored. That way we could have a centralized 
 transaction oracle that produces ids that relate to wall clock time, and at 
 the same time allow producing more than 1000/s.
 The simplest way is to just store time in us (microseconds). I.e. we'd still 
 collect time in ms by default and just multiply that with 1000 before we 
 store it. With 8 bytes that still gives us a range of 292471 years.
 We'd have grandfather in old data. Could write a metadata entry into each 
 HFile declaring what the TS resolution is if it is different from ms.
 Not sure, yet, how this would relate to using the TS for things like seqIds.
 Let's do some brainstorming. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11165) Scaling so cluster can host 1M regions and beyond (50M regions?)

[
https://issues.apache.org/jira/browse/HBASE-11165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102764#comment-14102764
]

stack commented on HBASE-11165:
---

bq. If split meta, then 1) Less write amplification (ie no large compactions)
...

Good point. i.e. if we want to move to lots of small regions, it would be odd
if there was an except for meta clause.

bq. Better W throughput.

If Master is only writer, we'd need to ensure we are writing in // (i.e.
Virag's recent patches).

bq. 2) More disks, more R/W throughput.

Yes.

bq. More heap to fit meta...

More heap to cache meta, yes.

bq. ...We need to do experiments for 1 rack and 2 rack failure...

Agreed that in time of catastrophic part-failure, we'd need the better R/W
throughput a split meta can give you.

Other pluses are we would treat meta like any other table. Negatives are we
need our root back and startup is more complicated (but at least all inside
single master in this case).

In
https://docs.google.com/document/d/1xC-bCzAAKO59Xo3XN-Cl6p-5CM_4DMoR-WpnkmYZgpw/edit#
I (and others) argue for colocated meta and master going forward looking at
options. Let me freshen it with arguments made here.

Colocating meta and master has nice properties. The in-memory image of the
cluster layout -- probably a severe sub-set of what is actually in meta --
would need to fit a single-server's RAM in either model. When colocated,
operations are faster, less prone-to-error when less RPC involved (We'd still
be subject to
http://writings.quilt.org/2014/05/12/distributed-systems-and-the-end-of-the-api/
if persisting meta in hdfs as francis notes above). A single machine hosting
single meta would not be able to service a 50M region startup with hundreds or
regionservers as well as a deploy with split meta. It could. It'd just be
slower. Colocated meta and master implies single meta forever and that single
meta is served by one server only -- a 50M meta region would be an anomaly in
the cluster being bigger than all the rest -- and until we have HBASE-10295
Refactor the replication implementation to eliminate permanent zk node and/or
HBASE-11467 New impl of Registry interface not using ZK + new RPCs on master
protocol (Maybe a later phase of HBASE-10070 when followers can run closer in
to the leader state would work here) or a new master layout where we partition
meta across multiple master server.

A plus split meta has over colocated master and meta is that master currently
can be down for some period of time and the cluster keeps working; no splits
and no merges and if a machine crashes while master is down, data is offline
till master comes back (needs more exercise). This is less the case when
colocated master and meta.

Please pile on all with thoughts. We need to put stake in grounds soon for
hbase 2.0 cluster topology. Francis needs something in 0.98 timeframe. If the
0.98 is different to what folks want for 2.0, as per Andy lets split this issue.

Thoughts-for-the-day:

+ HBase is supposed to be able to scale
+ Single meta came about because way back, we were too lazy to fix issues that
arose when meta was split (at the time, we didn't need to scale as much).

Scaling so cluster can host 1M regions and beyond (50M regions?)

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11778) Scale timestamps by 1000

[
https://issues.apache.org/jira/browse/HBASE-11778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102771#comment-14102771
]

stack commented on HBASE-11778:
---

[~lhofhansl] see https://issues.apache.org/jira/browse/HBASE-8927 Could we do
it for 1.0?

Scale timestamps by 1000

Key: HBASE-11778
URL: https://issues.apache.org/jira/browse/HBASE-11778
Project: HBase
Issue Type: Brainstorming
Reporter: Lars Hofhansl

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Reopened] (HBASE-11625) Reading datablock throws Invalid HFile block magic and can not switch to hdfs checksum


 [ 
https://issues.apache.org/jira/browse/HBASE-11625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack reopened HBASE-11625:
---


Reopening because Paul loaded up what was asked for.

 Reading datablock throws Invalid HFile block magic and can not switch to 
 hdfs checksum 
 -

 Key: HBASE-11625
 URL: https://issues.apache.org/jira/browse/HBASE-11625
 Project: HBase
  Issue Type: Bug
  Components: HFile
Affects Versions: 0.94.21, 0.98.4, 0.98.5
Reporter: qian wang
 Attachments: 2711de1fdf73419d9f8afc6a8b86ce64.gz


 when using hbase checksum,call readBlockDataInternal() in hfileblock.java, it 
 could happen file corruption but it only can switch to hdfs checksum 
 inputstream till validateBlockChecksum(). If the datablock's header corrupted 
 when b = new HFileBlock(),it throws the exception Invalid HFile block magic 
 and the rpc call fail



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11742) Backport HBASE-7987 and HBASE-11185 to 0.98


[ 
https://issues.apache.org/jira/browse/HBASE-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102809#comment-14102809
 ] 

Andrew Purtell commented on HBASE-11742:


What happens if there is an older client library that wants to run a MR-over 
snapshots job against snapshots dropped by newer servers? The older library 
will be looking for table descriptors and snapshot region names in the FS 
instead of SnapshotFileInfo/SnapshotRegionManifest.

Do we continue to handle TableSnapshotRegionSplit messages that have 
RegionSpecifiers (field #1) instead of the new TableSchema and RegionInfo 
fields (3 and 4)? 

What happens if someone uses the newer client in a MR-over-snapshots job where 
there is no SnapshotFileInfo/SnapshotRegionManifest data available because the 
servers are older?

What happens when we have a mix of snapshots dropped by an older server side 
versus a newer server side?

 Backport HBASE-7987 and HBASE-11185 to 0.98
 ---

 Key: HBASE-11742
 URL: https://issues.apache.org/jira/browse/HBASE-11742
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce, snapshots
Affects Versions: 0.98.5
Reporter: Esteban Gutierrez
Assignee: Esteban Gutierrez
 Fix For: 0.98.6

 Attachments: HBASE-11742.v0.patch, HBASE-11742.v1.patch


 HBASE-7987 improves how snapshots are handled via a manifest file. This 
 requires reverting HBASE-11360 since introduces an alternate functionality 
 that is not compatible with HBASE-7987.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HBASE-11607) Document HBase metrics


 [ 
https://issues.apache.org/jira/browse/HBASE-11607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-11607:
--

   Resolution: Fixed
Fix Version/s: 2.0.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Excellent. Committed w/ minor fixes [~misty] Thanks. Can figure how to dump out 
metrics at later date.

 Document HBase metrics
 --

 Key: HBASE-11607
 URL: https://issues.apache.org/jira/browse/HBASE-11607
 Project: HBase
  Issue Type: Sub-task
  Components: documentation, metrics
Reporter: Misty Stanley-Jones
Assignee: Misty Stanley-Jones
  Labels: beginner
 Fix For: 2.0.0

 Attachments: HBASE-11607.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-4920) We need a mascot, a totem


[ 
https://issues.apache.org/jira/browse/HBASE-4920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102822#comment-14102822
 ] 

stack commented on HBASE-4920:
--

Let me stick something up on the list [~jmspaggi]  I'm responsible for the mess 
here.

 We need a mascot, a totem
 -

 Key: HBASE-4920
 URL: https://issues.apache.org/jira/browse/HBASE-4920
 Project: HBase
  Issue Type: Task
Reporter: stack
 Attachments: Apache_HBase_Orca_Logo_1.jpg, 
 Apache_HBase_Orca_Logo_Mean_version-3.pdf, 
 Apache_HBase_Orca_Logo_Mean_version-4.pdf, Apache_HBase_Orca_Logo_round5.pdf, 
 HBase Orca Logo.jpg, Orca_479990801.jpg, Screen shot 2011-11-30 at 4.06.17 
 PM.png, apache hbase orca logo_Proof 3.pdf, apache logo_Proof 8.pdf, 
 jumping-orca_rotated.xcf, jumping-orca_rotated_right.png, krake.zip, 
 more_orcas.png, more_orcas2.png, orca_clipart_freevector_lhs.jpeg, 
 orca_free_vector_on_top_66percent_levelled.png, 
 orca_free_vector_sheared_rotated_rhs.png, 
 orca_free_vector_some_selections.png, photo (2).JPG, plus_orca.png, 
 proposal_1_logo.png, proposal_1_logo.xcf, proposal_2_logo.png, 
 proposal_2_logo.xcf, proposal_3_logo.png, proposal_3_logo.xcf


 We need a totem for our t-shirt that is yet to be printed.  O'Reilly owns the 
 Clyesdale.  We need something else.
 We could have a fluffy little duck that quacks 'hbase!' when you squeeze it 
 and we could order boxes of them from some off-shore sweatshop that 
 subcontracts to a contractor who employs child labor only.
 Or we could have an Orca (Big!, Fast!, Killer!, and in a poem that Marcy from 
 Salesforce showed me, that was a bit too spiritual for me to be seen quoting 
 here, it had the Orca as the 'Guardian of the Cosmic Memory': i.e. in 
 translation, bigdata).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11742) Backport HBASE-7987 and HBASE-11185 to 0.98


[ 
https://issues.apache.org/jira/browse/HBASE-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102826#comment-14102826
 ] 

Matteo Bertozzi commented on HBASE-11742:
-

{quote}What happens if there is an older client library that wants to run a 
MR-over snapshots job against snapshots dropped by newer servers? The older 
library will be looking for table descriptors and snapshot region names in the 
FS instead of SnapshotFileInfo/SnapshotRegionManifest.{quote}
you can't use old jars to read the new format.

{quote}Do we continue to handle TableSnapshotRegionSplit messages that have 
RegionSpecifiers (field #1) instead of the new TableSchema and RegionInfo 
fields (3 and 4)?{quote}
That proto is internal to the map reduce and used just for message passing, so 
there is no need for compatibility there. Unless you have different jars 
executing the mr job.

{quote}What happens if someone uses the newer client in a MR-over-snapshots job 
where there is no SnapshotFileInfo/SnapshotRegionManifest data available 
because the servers are older?{quote}
The new code is able to read the old format

{quote}What happens when we have a mix of snapshots dropped by an older server 
side versus a newer server side?{quote}
The new code support taking snapshot during rolling upgrades, which means that 
the older RS writes in the old format and the new one in the new format. Which 
is fine since both format are readable and aggregated on read if necessary 
(requires the new jars). If the master is already updates will do the merge of 
the RS result converting to the single manifest if necessary.

 Backport HBASE-7987 and HBASE-11185 to 0.98
 ---

 Key: HBASE-11742
 URL: https://issues.apache.org/jira/browse/HBASE-11742
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce, snapshots
Affects Versions: 0.98.5
Reporter: Esteban Gutierrez
Assignee: Esteban Gutierrez
 Fix For: 0.98.6

 Attachments: HBASE-11742.v0.patch, HBASE-11742.v1.patch


 HBASE-7987 improves how snapshots are handled via a manifest file. This 
 requires reverting HBASE-11360 since introduces an alternate functionality 
 that is not compatible with HBASE-7987.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (HBASE-11742) Backport HBASE-7987 and HBASE-11185 to 0.98

[
https://issues.apache.org/jira/browse/HBASE-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102850#comment-14102850
]

Andrew Purtell edited comment on HBASE-11742 at 8/19/14 9:12 PM:
-

{quote}
bq. What happens if there is an older client library that wants to run a
MR-over snapshots job against snapshots dropped by newer servers? The older
library will be looking for table descriptors and snapshot region names in the
FS instead of SnapshotFileInfo/SnapshotRegionManifest.

you can't use old jars to read the new format.
{quote}

We can't force an upgrade of an older client with a minor server bump. The
server side needs to support older clients until the client fleet can be
upgraded independent of server side. Thanks for clarifying elsewhere, so this
is the only issue with the current patch. Can we keep server side support for
writing both formats with the backwards compatible option the default? Some new
configuration setting will have a default of false (or 1).

was (Author: apurtell):
{quote}
bq, What happens if there is an older client library that wants to run a
MR-over snapshots job against snapshots dropped by newer servers? The older
library will be looking for table descriptors and snapshot region names in the
FS instead of SnapshotFileInfo/SnapshotRegionManifest.

you can't use old jars to read the new format.
{quote}

Backport HBASE-7987 and HBASE-11185 to 0.98
---

Key: HBASE-11742
URL: https://issues.apache.org/jira/browse/HBASE-11742
Project: HBase
Issue Type: Improvement
Components: mapreduce, snapshots
Affects Versions: 0.98.5
Reporter: Esteban Gutierrez
Assignee: Esteban Gutierrez
Fix For: 0.98.6

Attachments: HBASE-11742.v0.patch, HBASE-11742.v1.patch

HBASE-7987 improves how snapshots are handled via a manifest file. This
requires reverting HBASE-11360 since introduces an alternate functionality
that is not compatible with HBASE-7987.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11742) Backport HBASE-7987 and HBASE-11185 to 0.98

[
https://issues.apache.org/jira/browse/HBASE-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102850#comment-14102850
]

Andrew Purtell commented on HBASE-11742:

{quote}
bq, What happens if there is an older client library that wants to run a
MR-over snapshots job against snapshots dropped by newer servers? The older
library will be looking for table descriptors and snapshot region names in the
FS instead of SnapshotFileInfo/SnapshotRegionManifest.

you can't use old jars to read the new format.
{quote}

Backport HBASE-7987 and HBASE-11185 to 0.98
---

Attachments: HBASE-11742.v0.patch, HBASE-11742.v1.patch

HBASE-7987 improves how snapshots are handled via a manifest file. This
requires reverting HBASE-11360 since introduces an alternate functionality
that is not compatible with HBASE-7987.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (HBASE-11742) Backport HBASE-7987 and HBASE-11185 to 0.98

[
https://issues.apache.org/jira/browse/HBASE-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102850#comment-14102850
]

Andrew Purtell edited comment on HBASE-11742 at 8/19/14 9:14 PM:
-

you can't use old jars to read the new format.
{quote}

was (Author: apurtell):
{quote}
bq. What happens if there is an older client library that wants to run a
MR-over snapshots job against snapshots dropped by newer servers? The older
library will be looking for table descriptors and snapshot region names in the
FS instead of SnapshotFileInfo/SnapshotRegionManifest.

you can't use old jars to read the new format.
{quote}

Backport HBASE-7987 and HBASE-11185 to 0.98
---

Attachments: HBASE-11742.v0.patch, HBASE-11742.v1.patch

HBASE-7987 improves how snapshots are handled via a manifest file. This
requires reverting HBASE-11360 since introduces an alternate functionality
that is not compatible with HBASE-7987.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11742) Backport HBASE-7987 and HBASE-11185 to 0.98


[ 
https://issues.apache.org/jira/browse/HBASE-11742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102860#comment-14102860
 ] 

Matteo Bertozzi commented on HBASE-11742:
-

yeah, that should be easy. it just need a conf that set this property in 
SnapshotDescriptionUtil.java
{code}
public static final int SNAPSHOT_LAYOUT_VERSION = 
SnapshotManifestV2.DESCRIPTOR_VERSION;
{code}

let me do that

 Backport HBASE-7987 and HBASE-11185 to 0.98
 ---

 Key: HBASE-11742
 URL: https://issues.apache.org/jira/browse/HBASE-11742
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce, snapshots
Affects Versions: 0.98.5
Reporter: Esteban Gutierrez
Assignee: Esteban Gutierrez
 Fix For: 0.98.6

 Attachments: HBASE-11742.v0.patch, HBASE-11742.v1.patch


 HBASE-7987 improves how snapshots are handled via a manifest file. This 
 requires reverting HBASE-11360 since introduces an alternate functionality 
 that is not compatible with HBASE-7987.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11682) Explain hotspotting

2014-08-19 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102867#comment-14102867
 ] 

Nick Dimiduk commented on HBASE-11682:
--

Very well articulated example, I like it! [~jmhsieh] you're right in that I 
don't think of using random data for a prefix because the nondeterminism makes 
gets ineffective. It is, however, a valid approach.

{noformat}
+paraSuppose you have the following list of row keys:/para
{noformat}

This example assumes the table is split in a way such that f* would be in a 
single region but a-, b-, c-, d- are in different regions. Be explicit about 
the region splits, include a sentence like assume your table is split by 
letter, so the rowkey prefix {{a}} is on one region, {{b}} is on a second, 
{{c}} on a 3rd, c. In that topology, then all the foo rows would be in the 
same region, and the prefixed rows are in different regions.

{noformat}
+titleHashing/title
{noformat}

For this bit, you can add something like using a deterministic hash allows the 
client to reconstruct the complete rowkey and use a get operation to retrieve 
that row as normal. The current text alludes to this, but maybe we can some 
out and say it explicitly.

For references, you could also link off to Phoenix's Salted Tables 
description http://phoenix.apache.org/salted.html

 Explain hotspotting
 ---

 Key: HBASE-11682
 URL: https://issues.apache.org/jira/browse/HBASE-11682
 Project: HBase
  Issue Type: Task
  Components: documentation
Reporter: Misty Stanley-Jones
Assignee: Misty Stanley-Jones
 Attachments: HBASE-11682-1.patch, HBASE-11682.patch, 
 HBASE-11682.patch, HBASE-11682.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11682) Explain hotspotting


[ 
https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102875#comment-14102875
 ] 

Jonathan Hsieh commented on HBASE-11682:


+1 to NIck's clarifications

 Explain hotspotting
 ---

 Key: HBASE-11682
 URL: https://issues.apache.org/jira/browse/HBASE-11682
 Project: HBase
  Issue Type: Task
  Components: documentation
Reporter: Misty Stanley-Jones
Assignee: Misty Stanley-Jones
 Attachments: HBASE-11682-1.patch, HBASE-11682.patch, 
 HBASE-11682.patch, HBASE-11682.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11735) Document Configurable Bucket Sizes in bucketCache


[ 
https://issues.apache.org/jira/browse/HBASE-11735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102883#comment-14102883
 ] 

Misty Stanley-Jones commented on HBASE-11735:
-

What do you think, [~stack]?

 Document Configurable Bucket Sizes in bucketCache
 -

 Key: HBASE-11735
 URL: https://issues.apache.org/jira/browse/HBASE-11735
 Project: HBase
  Issue Type: Task
  Components: documentation
Affects Versions: 0.99.0, 0.98.4, 0.98.5
Reporter: Misty Stanley-Jones
Assignee: Misty Stanley-Jones
 Fix For: 0.99.0, 0.98.6

 Attachments: HBASE-11735.patch, HBASE-11735.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HBASE-11682) Explain hotspotting


 [ 
https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Misty Stanley-Jones updated HBASE-11682:


Attachment: HBASE-11682.patch

Thanks [~ndimiduk], how's this?

 Explain hotspotting
 ---

 Key: HBASE-11682
 URL: https://issues.apache.org/jira/browse/HBASE-11682
 Project: HBase
  Issue Type: Task
  Components: documentation
Reporter: Misty Stanley-Jones
Assignee: Misty Stanley-Jones
 Attachments: HBASE-11682-1.patch, HBASE-11682.patch, 
 HBASE-11682.patch, HBASE-11682.patch, HBASE-11682.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11737) Document callQueue improvements from HBASE-11355 and HBASE-11724


[ 
https://issues.apache.org/jira/browse/HBASE-11737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102885#comment-14102885
 ] 

Misty Stanley-Jones commented on HBASE-11737:
-

What do you think, [~mbertozzi]?

 Document callQueue improvements from HBASE-11355 and HBASE-11724
 

 Key: HBASE-11737
 URL: https://issues.apache.org/jira/browse/HBASE-11737
 Project: HBase
  Issue Type: Sub-task
  Components: documentation
Reporter: Misty Stanley-Jones
Assignee: Misty Stanley-Jones
 Fix For: 0.99.0, 0.98.4

 Attachments: HBASE-11737.patch, HBASE-11737.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11752) Document blockcache prefetch option


[ 
https://issues.apache.org/jira/browse/HBASE-11752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102887#comment-14102887
 ] 

Misty Stanley-Jones commented on HBASE-11752:
-

Maybe [~stack] can have a look as well?

 Document blockcache prefetch option
 ---

 Key: HBASE-11752
 URL: https://issues.apache.org/jira/browse/HBASE-11752
 Project: HBase
  Issue Type: Sub-task
  Components: BlockCache, documentation
Reporter: Misty Stanley-Jones
Assignee: Misty Stanley-Jones
 Fix For: 0.99.0, 0.98.3

 Attachments: HBASE-11752.patch, HBASE-11752.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11773) Wrong field used for protobuf construction in RegionStates.


[ 
https://issues.apache.org/jira/browse/HBASE-11773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102906#comment-14102906
 ] 

Hudson commented on HBASE-11773:


SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #431 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/431/])
HBASE-11773 Wrong field used for protobuf construction in RegionStates (Andrey 
Stepachev) (apurtell: rev dbda5c38feb28aef2ee3829264cbe39af54c958d)
* hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestRegionState.java
* hbase-client/src/main/java/org/apache/hadoop/hbase/master/RegionState.java


 Wrong field used for protobuf construction in RegionStates.
 ---

 Key: HBASE-11773
 URL: https://issues.apache.org/jira/browse/HBASE-11773
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment
Reporter: Andrey Stepachev
Assignee: Andrey Stepachev
 Fix For: 0.99.0, 2.0.0, 0.98.6

 Attachments: HBASE-11773-0.98.patch, HBASE-11773.patch


 Protobuf  Java Pojo converter uses wrong field for converted enum 
 construction (actually default value of protobuf message used).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HBASE-11735) Document Configurable Bucket Sizes in bucketCache


 [ 
https://issues.apache.org/jira/browse/HBASE-11735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-11735:
--

   Resolution: Fixed
Fix Version/s: 2.0.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I removed the bit about hbase.offheapcache.percentage.  It is removed and even 
where it is present, it is a PITA.  Nice job [~misty] Thanks.

 Document Configurable Bucket Sizes in bucketCache
 -

 Key: HBASE-11735
 URL: https://issues.apache.org/jira/browse/HBASE-11735
 Project: HBase
  Issue Type: Task
  Components: documentation
Affects Versions: 0.99.0, 0.98.4, 0.98.5
Reporter: Misty Stanley-Jones
Assignee: Misty Stanley-Jones
 Fix For: 0.99.0, 2.0.0, 0.98.6

 Attachments: HBASE-11735.patch, HBASE-11735.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HBASE-4955) Use the official versions of surefire junit


 [ 
https://issues.apache.org/jira/browse/HBASE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Newman updated HBASE-4955:
---

Attachment: HBASE-4955-v10.patch

 Use the official versions of surefire  junit
 -

 Key: HBASE-4955
 URL: https://issues.apache.org/jira/browse/HBASE-4955
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0, 0.98.0, 0.96.0, 0.99.0
 Environment: all
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
Priority: Critical
 Attachments: 4955.v1.patch, 4955.v2.patch, 4955.v2.patch, 
 4955.v2.patch, 4955.v2.patch, 4955.v3.patch, 4955.v3.patch, 4955.v3.patch, 
 4955.v4.patch, 4955.v4.patch, 4955.v4.patch, 4955.v4.patch, 4955.v4.patch, 
 4955.v4.patch, 4955.v5.patch, 4955.v6.patch, 4955.v7.patch, 4955.v7.patch, 
 4955.v8.patch, 4955.v9.patch, 8204.v4.patch, HBASE-4955-v10.patch


 We currently use private versions for Surefire  JUnit since HBASE-4763.
 This JIRA traks what we need to move to official versions.
 Surefire 2.11 is just out, but, after some tests, it does not contain all 
 what we need.
 JUnit. Could be for JUnit 4.11. Issue to monitor:
 https://github.com/KentBeck/junit/issues/359: fixed in our version, no 
 feedback for an integration on trunk
 Surefire: Could be for Surefire 2.12. Issues to monitor are:
 329 (category support): fixed, we use the official implementation from the 
 trunk
 786 (@Category with forkMode=always): fixed, we use the official 
 implementation from the trunk
 791 (incorrect elapsed time on test failure): fixed, we use the official 
 implementation from the trunk
 793 (incorrect time in the XML report): Not fixed (reopen) on trunk, fixed on 
 our version.
 760 (does not take into account the test method): fixed in trunk, not fixed 
 in our version
 798 (print immediately the test class name): not fixed in trunk, not fixed in 
 our version
 799 (Allow test parallelization when forkMode=always): not fixed in trunk, 
 not fixed in our version
 800 (redirectTestOutputToFile not taken into account): not yet fix on trunk, 
 fixed on our version
 800  793 are the more important to monitor, it's the only ones that are 
 fixed in our version but not on trunk.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11682) Explain hotspotting

2014-08-19 Thread Nick Dimiduk (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102921#comment-14102921
 ] 

Nick Dimiduk commented on HBASE-11682:
--

A little nit-picky, but... (now you know what Aman went through ;) )

{noformat}
+paraSuppose you have the following list of row keys, and your table 
is split in such a way
+  that all the rows starting with foo are in the same region./para
{noformat}

I would say ... and your table is split such that there is one region for each 
letter of the alphabet -- prefix 'a' is one region, prefix 'b' is another. In 
this table, all rows starting with 'f' are in the same region.

That is, be explicitly clear about the region split for the example.

{noformat}
+an link xlink:href=http://phoenix.apache.org/salted.html;article on 
Salted Tables/link
{noformat}

an should be and ?

 Explain hotspotting
 ---

 Key: HBASE-11682
 URL: https://issues.apache.org/jira/browse/HBASE-11682
 Project: HBase
  Issue Type: Task
  Components: documentation
Reporter: Misty Stanley-Jones
Assignee: Misty Stanley-Jones
 Attachments: HBASE-11682-1.patch, HBASE-11682.patch, 
 HBASE-11682.patch, HBASE-11682.patch, HBASE-11682.patch






--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-4955) Use the official versions of surefire junit


[ 
https://issues.apache.org/jira/browse/HBASE-4955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102924#comment-14102924
 ] 

Alex Newman commented on HBASE-4955:


OK I disabled that. It seems to work ok on our build server. I'd be interested 
to see how apache build is working.

 Use the official versions of surefire  junit
 -

 Key: HBASE-4955
 URL: https://issues.apache.org/jira/browse/HBASE-4955
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.0, 0.98.0, 0.96.0, 0.99.0
 Environment: all
Reporter: Nicolas Liochon
Assignee: Nicolas Liochon
Priority: Critical
 Attachments: 4955.v1.patch, 4955.v2.patch, 4955.v2.patch, 
 4955.v2.patch, 4955.v2.patch, 4955.v3.patch, 4955.v3.patch, 4955.v3.patch, 
 4955.v4.patch, 4955.v4.patch, 4955.v4.patch, 4955.v4.patch, 4955.v4.patch, 
 4955.v4.patch, 4955.v5.patch, 4955.v6.patch, 4955.v7.patch, 4955.v7.patch, 
 4955.v8.patch, 4955.v9.patch, 8204.v4.patch, HBASE-4955-v10.patch


 We currently use private versions for Surefire  JUnit since HBASE-4763.
 This JIRA traks what we need to move to official versions.
 Surefire 2.11 is just out, but, after some tests, it does not contain all 
 what we need.
 JUnit. Could be for JUnit 4.11. Issue to monitor:
 https://github.com/KentBeck/junit/issues/359: fixed in our version, no 
 feedback for an integration on trunk
 Surefire: Could be for Surefire 2.12. Issues to monitor are:
 329 (category support): fixed, we use the official implementation from the 
 trunk
 786 (@Category with forkMode=always): fixed, we use the official 
 implementation from the trunk
 791 (incorrect elapsed time on test failure): fixed, we use the official 
 implementation from the trunk
 793 (incorrect time in the XML report): Not fixed (reopen) on trunk, fixed on 
 our version.
 760 (does not take into account the test method): fixed in trunk, not fixed 
 in our version
 798 (print immediately the test class name): not fixed in trunk, not fixed in 
 our version
 799 (Allow test parallelization when forkMode=always): not fixed in trunk, 
 not fixed in our version
 800 (redirectTestOutputToFile not taken into account): not yet fix on trunk, 
 fixed on our version
 800  793 are the more important to monitor, it's the only ones that are 
 fixed in our version but not on trunk.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (HBASE-4955) Use the official versions of surefire junit