[jira] [Created] (HBASE-11326) Use an InputFormat for ExportSnapshot

2014-06-11 Thread Matteo Bertozzi (JIRA)
Matteo Bertozzi created HBASE-11326:
---

 Summary: Use an InputFormat for ExportSnapshot
 Key: HBASE-11326
 URL: https://issues.apache.org/jira/browse/HBASE-11326
 Project: HBase
  Issue Type: Improvement
  Components: snapshots
Affects Versions: 0.99.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Minor
 Fix For: 0.99.0


Use an InputFormat instead of uploading a set of input files to have a progress 
based on the file size



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Anyone to have a quick look at 11313?

2014-06-11 Thread Jonathan Hsieh
I took a look.  While the patch is trivial enough,  I think we should do
something do something more drastic.  We should deprecate/remove Reusable
and Threadlocal PoolMaps for the rpc clients.   Reusable PoolMap in
conjunction with the code becomes resource unbounded and ThreadLocal is
tricky to use (difficult to close or interrupt since we have to hunt down
all the threads).

For now we should deprecate the other two and once 1.0 gets branched we
should remove Reusable and Threadlocal and possibly just get rid of the
extra config infrastructure.

Jon.


On Tue, Jun 10, 2014 at 5:27 PM, Jean-Marc Spaggiari 
jean-m...@spaggiari.org wrote:

 Hi Konstantin, I'm talking about this one:
 https://issues.apache.org/jira/browse/HBASE-11313

 Thanks,

 JM


 2014-06-10 20:14 GMT-04:00 Konstantin Boudnik c...@apache.org:

  Seems to be a private one? Unless I am not in the correct group or
  something...
 
  On Tue, Jun 10, 2014 at 07:58PM, Jean-Marc Spaggiari wrote:
   Pretty trivial patch.
  
   Thanks,
  
   JM
 




-- 
// Jonathan Hsieh (shay)
// HBase Tech Lead, Software Engineer, Cloudera
// j...@cloudera.com // @jmhsieh


[jira] [Created] (HBASE-11327) ExportSnapshot hit stackoverflow error when target snapshotDir doesn't contain uri

2014-06-11 Thread Demai Ni (JIRA)
Demai Ni created HBASE-11327:


 Summary: ExportSnapshot hit stackoverflow error when target 
snapshotDir doesn't contain uri
 Key: HBASE-11327
 URL: https://issues.apache.org/jira/browse/HBASE-11327
 Project: HBase
  Issue Type: Bug
  Components: snapshots
Affects Versions: 0.98.2
Reporter: Demai Ni
Assignee: Demai Ni
Priority: Minor
 Fix For: 0.99.0, 0.98.4


{code}
$hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot snapshotT1_dn 
-copy-to /user/demai/backup1

Exception in thread main java.lang.StackOverflowError
at java.util.regex.Pattern$Slice.match(Pattern.java:3490)
at java.util.regex.Pattern$Start.match(Pattern.java:3066)
at java.util.regex.Matcher.search(Matcher.java:1116)
at java.util.regex.Matcher.find(Matcher.java:546)
at 
org.apache.hadoop.conf.Configuration.substituteVars(Configuration.java:681)
at org.apache.hadoop.conf.Configuration.get(Configuration.java:893)
at org.apache.hadoop.fs.FileSystem.getDefaultUri(FileSystem.java:175)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:360)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:360)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:360)

{code}

the following command will work with uri
{code}
hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot snapshotT1_dn 
-copy-to hdfs://hdtest014.svl.ibm.com:9000/user/demai/backup2
{code}

The bug is the same as 
[Hadoop-9069|https://issues.apache.org/jira/browse/HADOOP-9069]. Since the 
hadoop jira has been sitting there for more than a year, use this jira for a 
local hbase fix for now. 

Many thanks for [~mbertozzi] help on this one. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HBASE-11313) RpcClient should allow Reusable pool option.

2014-06-11 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari resolved HBASE-11313.
-

Resolution: Won't Fix

As Jon said. We will need a bigger modification. Will be addressed in a 
subsequent patch.

 RpcClient should allow Reusable pool option.
 

 Key: HBASE-11313
 URL: https://issues.apache.org/jira/browse/HBASE-11313
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0, 0.98.3
Reporter: Jean-Marc Spaggiari
Assignee: Jean-Marc Spaggiari
Priority: Minor
 Attachments: HBASE-11313-v0-trunk.patch


 RpcClient.getPoolType check the pool choice against ThreadLocal and 
 RoundRobin. However, we have a 3rd pooltype already in, Reusable. We should 
 allow in this getPoolType check.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Anyone to have a quick look at 11313?

2014-06-11 Thread Ted Yu
bq. we should deprecate the other two

+1


On Wed, Jun 11, 2014 at 8:13 AM, Jonathan Hsieh j...@cloudera.com wrote:

 I took a look.  While the patch is trivial enough,  I think we should do
 something do something more drastic.  We should deprecate/remove Reusable
 and Threadlocal PoolMaps for the rpc clients.   Reusable PoolMap in
 conjunction with the code becomes resource unbounded and ThreadLocal is
 tricky to use (difficult to close or interrupt since we have to hunt down
 all the threads).

 For now we should deprecate the other two and once 1.0 gets branched we
 should remove Reusable and Threadlocal and possibly just get rid of the
 extra config infrastructure.

 Jon.


 On Tue, Jun 10, 2014 at 5:27 PM, Jean-Marc Spaggiari 
 jean-m...@spaggiari.org wrote:

  Hi Konstantin, I'm talking about this one:
  https://issues.apache.org/jira/browse/HBASE-11313
 
  Thanks,
 
  JM
 
 
  2014-06-10 20:14 GMT-04:00 Konstantin Boudnik c...@apache.org:
 
   Seems to be a private one? Unless I am not in the correct group or
   something...
  
   On Tue, Jun 10, 2014 at 07:58PM, Jean-Marc Spaggiari wrote:
Pretty trivial patch.
   
Thanks,
   
JM
  
 



 --
 // Jonathan Hsieh (shay)
 // HBase Tech Lead, Software Engineer, Cloudera
 // j...@cloudera.com // @jmhsieh



Re: Anyone to have a quick look at 11313?

2014-06-11 Thread Jean-Marc Spaggiari
Thanks for looking at it Jon.

I will close and open a new one for Reusable and ThreadLocal
removal/deprecation.

JM


2014-06-11 11:13 GMT-04:00 Jonathan Hsieh j...@cloudera.com:

 I took a look.  While the patch is trivial enough,  I think we should do
 something do something more drastic.  We should deprecate/remove Reusable
 and Threadlocal PoolMaps for the rpc clients.   Reusable PoolMap in
 conjunction with the code becomes resource unbounded and ThreadLocal is
 tricky to use (difficult to close or interrupt since we have to hunt down
 all the threads).

 For now we should deprecate the other two and once 1.0 gets branched we
 should remove Reusable and Threadlocal and possibly just get rid of the
 extra config infrastructure.

 Jon.


 On Tue, Jun 10, 2014 at 5:27 PM, Jean-Marc Spaggiari 
 jean-m...@spaggiari.org wrote:

  Hi Konstantin, I'm talking about this one:
  https://issues.apache.org/jira/browse/HBASE-11313
 
  Thanks,
 
  JM
 
 
  2014-06-10 20:14 GMT-04:00 Konstantin Boudnik c...@apache.org:
 
   Seems to be a private one? Unless I am not in the correct group or
   something...
  
   On Tue, Jun 10, 2014 at 07:58PM, Jean-Marc Spaggiari wrote:
Pretty trivial patch.
   
Thanks,
   
JM
  
 



 --
 // Jonathan Hsieh (shay)
 // HBase Tech Lead, Software Engineer, Cloudera
 // j...@cloudera.com // @jmhsieh



Fork protobuf?

2014-06-11 Thread Nick Dimiduk
FYI.

There's a fairly serious thread about forking protobuf happening over
on HBASE-8. Should be socialized a little wider, I think.


[jira] [Created] (HBASE-11328) testMoveRegion could fail

2014-06-11 Thread Jimmy Xiang (JIRA)
Jimmy Xiang created HBASE-11328:
---

 Summary: testMoveRegion could fail
 Key: HBASE-11328
 URL: https://issues.apache.org/jira/browse/HBASE-11328
 Project: HBase
  Issue Type: Test
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor


TestAssignmentManagerOnCluster#testMoveRegion could try to move a region to a 
server not online, and fail.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HBASE-11329) Minor fixup of new blockcache tab number formatting

2014-06-11 Thread stack (JIRA)
stack created HBASE-11329:
-

 Summary: Minor fixup of new blockcache tab number formatting
 Key: HBASE-11329
 URL: https://issues.apache.org/jira/browse/HBASE-11329
 Project: HBase
  Issue Type: Bug
Reporter: stack
Priority: Trivial


Counts are showing as MB/KB.  Let me fix.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HBASE-11329) Minor fixup of new blockcache tab number formatting

2014-06-11 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-11329.
---

   Resolution: Fixed
Fix Version/s: 0.99.0
 Assignee: stack

Committed to master.

 Minor fixup of new blockcache tab number formatting
 ---

 Key: HBASE-11329
 URL: https://issues.apache.org/jira/browse/HBASE-11329
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Trivial
 Fix For: 0.99.0

 Attachments: 11329.txt


 Counts are showing as MB/KB.  Let me fix.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HBASE-11330) Deprecate ThreadLocal and Reusable from PoolMap

2014-06-11 Thread Jean-Marc Spaggiari (JIRA)
Jean-Marc Spaggiari created HBASE-11330:
---

 Summary: Deprecate ThreadLocal and Reusable from PoolMap
 Key: HBASE-11330
 URL: https://issues.apache.org/jira/browse/HBASE-11330
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Marc Spaggiari
Assignee: Jean-Marc Spaggiari


Reusable is not used anywhere and ThreadLocal is not recommend. See HBASE-11313 
for other details.

So let's remove Reusable and deprecate ThreadLocal.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HBASE-11331) [blockcache] lazy block decompression

2014-06-11 Thread Nick Dimiduk (JIRA)
Nick Dimiduk created HBASE-11331:


 Summary: [blockcache] lazy block decompression
 Key: HBASE-11331
 URL: https://issues.apache.org/jira/browse/HBASE-11331
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk


Maintaining data in its compressed form in the block cache will greatly 
increase our effective blockcache size and should show a meaning improvement in 
cache hit rates in well designed applications. The idea here is to lazily 
decompress/decrypt blocks when they're consumed, rather than as soon as they're 
pulled off of disk.

This is related to but less invasive than HBASE-8894.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Timestamp resolution

2014-06-11 Thread Michael Segel
Weirdly enough I find that I have to agree with Andrew. 

First, how do you get time in units smaller than a ms? 
Second clock skew becomes an issue. 
Third, which clock are you using? The client machine? The RS? And then how do 
you synchronize each of the RS to be within a ms of each other? 
Correct me if I’m wrong but NTP doesn’t give that close of a sync.  

Sorry, but really, not a good idea. 

If you want this… you can store the temporal data as a column. 

Time really is relative. 

On May 25, 2014, at 12:53 AM, Stack st...@duboce.net wrote:

 On Fri, May 23, 2014 at 5:27 PM, lars hofhansl la...@apache.org wrote:
 
 We have discussed this in the past. It just came up again during an
 internal discussion.
 Currently we simply store a Java timestamp (millisec since epoch), i.e. we
 have ms resolution.
 
 We do have 8 bytes for the TS, though. Not enough to store nanosecs (that
 would only cover 2^63/10^9/3600/24/365.24 = 292.279 years), but enough for
 microseconds (292279 years).
 Should we just store he TS is microseconds? We could do that right now
 (and just keep the ms resolution for now - i.e. the us part would always be
 0 for now).
 Existing data must be in ms of course, so we'd grandfather that in, but
 new tables could store by default in us.
 
 We'd need to make this configurable both the column family level and
 client level, so clients could still opt to see data in ms.
 
 Comments? Too much to bite off?
 
 -- Lars
 
 
 I'm a fan.  As Enis cites, HBASE-8927 has good discussion.  No
 configuration I'd say.  Just move to the new regime (though I suppose we
 should let you turn it off).
 
 I think it was Liu Shaohui (IIRC) who made a suggestion that had us put
 together ms and nanos under a synchronized block stamping the ts on Cells
 (left-shift the currentTimeMillis and fill in the bottom bytes with as much
 of the nanos as fits; i.e. your micros).  Rather than nanos/micros, we
 could use a counter instead if a Cell arrives in the same ms.  Would be
 costly having all ops go via one code block to get 'time' across cores and
 handlers.
 
 St.Ack



[jira] [Created] (HBASE-11332) Fix for metas location cache from HBASE-10785

2014-06-11 Thread Enis Soztutar (JIRA)
Enis Soztutar created HBASE-11332:
-

 Summary: Fix for metas location cache from HBASE-10785 
 Key: HBASE-11332
 URL: https://issues.apache.org/jira/browse/HBASE-11332
 Project: HBase
  Issue Type: Sub-task
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: hbase-10070


In HBASE-10785, we removed the invalidation of the cached location from patch 
v2 to v3. This results in a case where if there is a cached location for meta, 
it is not invalidated. 

Since we do a second check from cache for the location after acquiring the 
lock, this results in metas location to be wrongly cached forever resulting in 
clients blocking indefinitely. 





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HBASE-11333) Remove deprecated class MetaMigrationConvertingToPB

2014-06-11 Thread Mikhail Antonov (JIRA)
Mikhail Antonov created HBASE-11333:
---

 Summary: Remove deprecated class MetaMigrationConvertingToPB
 Key: HBASE-11333
 URL: https://issues.apache.org/jira/browse/HBASE-11333
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.99.0
Reporter: Mikhail Antonov
Assignee: Mikhail Antonov
Priority: Trivial
 Fix For: 0.99.0


MetaMigrationConvertingToPB is marked deprecated and to be deleted next major 
release after 0.96. Is that the time?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HBASE-11334) Migrate to SLF4J as logging interface

2014-06-11 Thread jay vyas (JIRA)
jay vyas created HBASE-11334:


 Summary: Migrate to SLF4J as logging interface
 Key: HBASE-11334
 URL: https://issues.apache.org/jira/browse/HBASE-11334
 Project: HBase
  Issue Type: Improvement
Reporter: jay vyas


Migrating to new log implementations is underway as in HBASE-10092. 
Next step would be to abstract them so that the hadoop community can 
standardize on a logging layer that is easy for end users to tune.

Simplest way to do this is use SLF4j APIs as the main interface and binding/ 
implementation details in the docs as necessary.





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HBASE-11335) Fix the TABLE_DIR param in TableSnapshotInputFormat

2014-06-11 Thread deepankar (JIRA)
deepankar created HBASE-11335:
-

 Summary: Fix the TABLE_DIR param in TableSnapshotInputFormat
 Key: HBASE-11335
 URL: https://issues.apache.org/jira/browse/HBASE-11335
 Project: HBase
  Issue Type: Bug
  Components: mapreduce, snapshots
Affects Versions: 0.98.3, 0.96.2
Reporter: deepankar


In class *TableSnapshotInputFormat* or *TableSnapshotInputFormatImpl*
in the function 
{code}
public static void setInput(Job job, String snapshotName, Path restoreDir) 
throws IOException {
{code}
we are setting restoreDir (temporary root) to tableDir
{code}
conf.set(TABLE_DIR_KEY, restoreDir.toString());
{code}

The above parameter is used to get the InputSplits, especially for 
calculating favorable hosts in the function
{code}
Path tableDir = new Path(conf.get(TABLE_DIR_KEY));

ListString hosts = getBestLocations(conf,
  HRegion.computeHDFSBlocksDistribution(conf, htd, hri, tableDir));
{code}

This will lead to returning a empty *HDFSBlocksDistribution*, as there is 
will be no directory with name as the region name from hri in the restored
root directory, which will lead to scheduling of non local tasks.

The change is simple in the sense, is to call the 
{code}FSUtils.getTableDir(rootDir, tableDesc.getTableName()) {code}
in the getSplits function

more discussion in the comments below 

https://issues.apache.org/jira/browse/HBASE-8369?focusedCommentId=14012085page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14012085




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HBASE-11336) Show region sizes in table page of master info server

2014-06-11 Thread Yi Deng (JIRA)
Yi Deng created HBASE-11336:
---

 Summary: Show region sizes in table page of master info server
 Key: HBASE-11336
 URL: https://issues.apache.org/jira/browse/HBASE-11336
 Project: HBase
  Issue Type: Bug
  Components: Admin
Affects Versions: 0.89-fb
Reporter: Yi Deng
Priority: Minor


Show region sizes in table page of master info server



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HBASE-11337) Document how to create a table using Java

2014-06-11 Thread Misty Stanley-Jones (JIRA)
Misty Stanley-Jones created HBASE-11337:
---

 Summary: Document how to create a table using Java
 Key: HBASE-11337
 URL: https://issues.apache.org/jira/browse/HBASE-11337
 Project: HBase
  Issue Type: Bug
  Components: Admin, documentation
Reporter: Misty Stanley-Jones
Assignee: Misty Stanley-Jones


Example code from [~jmspaggi]

{quote}
package com.cloudera.sa.hp.hbase.admin;

import java.io.IOException;

import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.io.compress.Compression.Algorithm;
import org.apache.hadoop.conf.Configuration;

import static com.cloudera.sa.hp.hbase.Constants.*;

public class CreateSchema {

  public static void createIfNotExist(HBaseAdmin admin, HTableDescriptor table) 
throws IOException {
if (admin.tableExists(table.getName())) {
  admin.disableTable(table.getName());
  admin.deleteTable(table.getName());
}
admin.createTable(table);
  }

  public static void main(String[] args) {
/**/
/* Create application schema. */
/**/

Configuration config = HBaseConfiguration.create();
config.set(hbase.zookeeper.quorum, 192.168.56.102); // Here we are 
running zookeeper locally

try {
  final HBaseAdmin admin = new HBaseAdmin(config);
  HTableDescriptor table_assetmeta = new 
HTableDescriptor(TableName.valueOf(TABLE_ASSETMETA));
  table_assetmeta.addFamily(new 
HColumnDescriptor(CF_DEFAULT).setCompressionType(Algorithm.GZ));

  System.out.print(Creating table_assetmeta. );
  admin.createTable(table_assetmeta);
  System.out.println( Done.);

  admin.close();
} catch (Exception e) {
  e.printStackTrace();
}

  }

}
{quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Fork protobuf?

2014-06-11 Thread Andrew Purtell
Agreed, it's radical, but something arrived at by force


On Wed, Jun 11, 2014 at 8:53 AM, Nick Dimiduk ndimi...@gmail.com wrote:

 FYI.

 There's a fairly serious thread about forking protobuf happening over
 on HBASE-8. Should be socialized a little wider, I think.




-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)


Re: Fork protobuf?

2014-06-11 Thread Ted Yu
I think we should do this - opening door for further optimizations.

Cheers


On Wed, Jun 11, 2014 at 8:57 PM, Andrew Purtell apurt...@apache.org wrote:

 Agreed, it's radical, but something arrived at by force


 On Wed, Jun 11, 2014 at 8:53 AM, Nick Dimiduk ndimi...@gmail.com wrote:

  FYI.
 
  There's a fairly serious thread about forking protobuf happening over
  on HBASE-8. Should be socialized a little wider, I think.
 



 --
 Best regards,

- Andy

 Problems worthy of attack prove their worth by hitting back. - Piet Hein
 (via Tom White)



Re: Thinking of branching for 1.0 by June 23

2014-06-11 Thread Konstantin Boudnik
I would be happy to provide an overview of variation of this model that I've
been using on different projects and companies. Please let me know if any is
needed.

Thanks,
  Cos

On Tue, Jun 10, 2014 at 06:24PM, Enis Söztutar wrote:
 I think we can accept the patches for the zk abstraction. Let me link the
 parent issues together.
 
 For the git workflow, it would be good to hear from somebody with
 experience on different models (esp with the accumulo model).
 
 Enis
 
 
 On Tue, Jun 10, 2014 at 4:41 PM, Ted Yu yuzhih...@gmail.com wrote:
 
  I like Cos' idea.
 
  Cheers
 
 
  On Tue, Jun 10, 2014 at 4:34 PM, Konstantin Boudnik c...@apache.org
  wrote:
 
   I am +1 on b) because it will naturally allow for a continuation of 1.x
   development.
  
   In all honesty, I found that notorious git branching model works
   perfectly for such situations. One thing to mention: unlike
   http://nvie.com/posts/a-successful-git-branching-model/ it'll force a
   significant number of cherry-picking from the master (and SHA1 changes on
   such
   commits).
   Perhaps it might be a good time to reconsider what has been working ok
  for
   Hadoop on SVN and look into something that's more natural for Git
   branching?
  
   Cos
  
   On Tue, Jun 10, 2014 at 07:16PM, Jean-Marc Spaggiari wrote:
For people voting, can you please put small comments regarding why you
prefer a solution versus the other one? Just for knowledge sharing...
   
Thanks,
   
JM
   
   
2014-06-10 19:05 GMT-04:00 Mikhail Antonov olorinb...@gmail.com:
   
 I think jiras on ZK abstraction can still get committed (I'll make
   sure to
 have all non-trivial patches posted on RB for discussion to make sure
   we
 don't accidentally introduce any instability).

 On jiras.

 Under HBASE-10909:
  -  HBASE-11069 (region merge transaction) is close to completion,
  just
 needs rebasing/merging, so we should have the new patch soon
  -  HBASE-11072 (WAL splitting) - there's progress going on here, I
   think
 we're going to have patch up for reviews pretty soon.
  -  HBASE-11073 (abstract Zk Watcher and listeners) - should have
  first
 patch up for review in a week or two

 Besides that, we should have HBASE-4495 (get rid of CatalogTracker)
   too.

 Further steps on abstraction (involving changing/simplifying the way
  we
 keep state in ZK) require coordination engine (as described in
   consensus
 design doc), which has been proposed in hadoop-common (for the time
   being I
 guess we can add this engine directly to HBase to speedup
   development?).

 Mikhail




 2014-06-10 15:46 GMT-07:00 Stack st...@duboce.net:

  +1 on option b)
 
  On Tue, Jun 10, 2014 at 3:28 PM, Konstantin Boudnik 
  c...@apache.org
  wrote:
 
   +1 on the #2.
  
   One question though: do you envision that the work around
   coordinated
   replication won't be able to go into branch-1 anymore?
  
 
  Its not done and it is far along with Mikhail making good progress.
   I'd
 be
  up for keeping up reviews and commit (if thats OK w/ you Mr. RM).
 
  How much you think could make 1.0 Cos/Mikhail?  Which issues.
 
  St.Ack
 



 --
 Thanks,
 Michael Antonov

  
 


[jira] [Created] (HBASE-11338) Expand documentation on bloom filters

2014-06-11 Thread Misty Stanley-Jones (JIRA)
Misty Stanley-Jones created HBASE-11338:
---

 Summary: Expand documentation on bloom filters
 Key: HBASE-11338
 URL: https://issues.apache.org/jira/browse/HBASE-11338
 Project: HBase
  Issue Type: Bug
  Components: documentation, Filters
Reporter: Misty Stanley-Jones
Assignee: Misty Stanley-Jones


Ref Guide  could use more info on bloom filters.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Timestamp resolution

2014-06-11 Thread lars hofhansl
The issues you cite are all orthogonal. We have client/RS time now, we have 
clock skew now, that is completely independent from the time resolution.


I explained the need I saw for this before. Lemme include:

On Fri, May 23, 2014 at 06:16PM, lars hofhansl wrote:
 The specific discussion here was a transaction engine doing snapshot
 isolation using the HBase timestamps, but still be close to wall clock time
 as much as possible.
 In that scenario, with ms resolution you can only do 1000 transactions/sec,
 and so you need to turn the timestamp into something that is not wall clock
 time as HBase understands it (and hence TTL, etc, will no longer work, as
 well as any other tools you've written that use the HBase timestamp).
 1m transactions/sec are good enough (for now, I envision in a few years
 we'll be sitting here wondering how we could ever think that 1m
 transaction/sec are sufficient) :)
 


The point is: Even if you had timestamp oracle (that can resolve ms and fill 
inside ms resolution with a counter), there'd be no way to use this as the 
HBase timestamp while being close to wall clock (so that TTL, etc, still works).
So specifically I was not advocating an automatic higher time resolution (as 
far as I know that cannot be done reliably in Java across
multiple cores). I was advocating allowing clients with access to a (perhaps, 
but not necessarily single threaded) timestamp oracle to store those timestamps 
and still make use of all HBase optimization (filtering HFiles, TTL, etc).


-- Lars




 From: Michael Segel michael_se...@hotmail.com
To: dev@hbase.apache.org 
Cc: lars hofhansl la...@apache.org 
Sent: Wednesday, June 11, 2014 2:03 PM
Subject: Re: Timestamp resolution
 

Weirdly enough I find that I have to agree with Andrew. 

First, how do you get time in units smaller than a ms? 
Second clock skew becomes an issue. 
Third, which clock are you using? The client machine? The RS? And then how do 
you synchronize each of the RS to be within a ms of each other? 
Correct me if I’m wrong but NTP doesn’t give that close of a sync.  

Sorry, but really, not a good idea. 

If you want this… you can store the temporal data as a column. 

Time really is relative. 


On May 25, 2014, at 12:53 AM, Stack st...@duboce.net wrote:

 On Fri, May 23, 2014 at 5:27 PM, lars hofhansl la...@apache.org wrote:
 
 We have discussed this in the past. It just came up again during an
 internal discussion.
 Currently we simply store a Java timestamp (millisec since epoch), i.e. we
 have ms resolution.
 
 We do have 8 bytes for the TS, though. Not enough to store nanosecs (that
 would only cover 2^63/10^9/3600/24/365.24 = 292.279 years), but enough for
 microseconds (292279 years).
 Should we just store he TS is microseconds? We could do that right now
 (and just keep the ms resolution for now - i.e. the us part would always be
 0 for now).
 Existing data must be in ms of course, so we'd grandfather that in, but
 new tables could store by default in us.
 
 We'd need to make this configurable both the column family level and
 client level, so clients could still opt to see data in ms.
 
 Comments? Too much to bite off?
 
 -- Lars
 
 
 I'm a fan.  As Enis cites, HBASE-8927 has good discussion.  No
 configuration I'd say.  Just move to the new regime (though I suppose we
 should let you turn it off).
 
 I think it was Liu Shaohui (IIRC) who made a suggestion that had us put
 together ms and nanos under a synchronized block stamping the ts on Cells
 (left-shift the currentTimeMillis and fill in the bottom bytes with as much
 of the nanos as fits; i.e. your micros).  Rather than nanos/micros, we
 could use a counter instead if a Cell arrives in the same ms.  Would be
 costly having all ops go via one code block to get 'time' across cores and
 handlers.
 
 St.Ack