Re: Retiring empty regions

2016-04-01 Thread Jean-Marc Spaggiari
;) That was not the question ;)

So Nick, merge on 1.1 is not recommended??? Was working very well on
previous versions. Is ProcV2 really impact it that bad??

JMS

2016-04-01 13:49 GMT-04:00 Vladimir Rodionov :

> >> This is something
> >> which makes it far less useful for time-series databases with short TTL
> on
> >> the tables.
>
> With a right row-key design you will never have empty regions due to TTL.
>
> -Vlad
>
> On Thu, Mar 31, 2016 at 10:31 PM, Mikhail Antonov 
> wrote:
>
> > Crazy idea, but you might be able to take stripped down version of region
> > normalizer code and make a Tool to run? Requesting split or merge is done
> > through the client API, and the only weighing information you need is
> > whether region empty or not, that you could find out too?
> >
> >
> > "Short of upgrading to 1.2 for the region normalizer,"
> >
> > A bit off topic, but I think unfortunately region normalizer now ignores
> > empty regions to avoid undoing pre-split on the table. This is something
> > which makes it far less useful for time-series databases with short TTL
> on
> > the tables. We'll need to address that.
> >
> > -Mikhail
> >
> > On Thu, Mar 31, 2016 at 9:56 PM, Nick Dimiduk 
> wrote:
> >
> > > Hi folks,
> > >
> > > I have a table with TTL enabled. It's been receiving data for a while
> > > beyond the TTL and I now have a number of empty regions. I'd like to
> drop
> > > those empty regions to free up heap space on the region servers and
> > reduce
> > > master load. I'm running a 1.1 derivative.
> > >
> > > The only threads I found on this topic are from circa 0.92 timeframe.
> > >
> > > Short of upgrading to 1.2 for the region normalizer, what's the
> > recommended
> > > method of cleaning up this cruft? Should I be merging empty regions
> into
> > > their neighbor's? Looks like region merge hasn't been migrated to
> ProcV2
> > > yet so would be wise to reduce online table activity, or at least aim
> > for a
> > > "quiet period"? Is there a documented process for off-lining and
> > deleting a
> > > region by name? I don't see anything in the book about it.
> > >
> > > I experimented with online merge on pseudodist, looks like it's working
> > > fine for the most basic case. I'll probably pursue this unless someone
> > has
> > > some other ideas.
> > >
> > > Thanks,
> > > Nick
> > >
> >
> >
> >
> > --
> > Thanks,
> > Michael Antonov
> >
>


Re: Retiring empty regions

2016-04-01 Thread Vladimir Rodionov
>> This is something
>> which makes it far less useful for time-series databases with short TTL
on
>> the tables.

With a right row-key design you will never have empty regions due to TTL.

-Vlad

On Thu, Mar 31, 2016 at 10:31 PM, Mikhail Antonov 
wrote:

> Crazy idea, but you might be able to take stripped down version of region
> normalizer code and make a Tool to run? Requesting split or merge is done
> through the client API, and the only weighing information you need is
> whether region empty or not, that you could find out too?
>
>
> "Short of upgrading to 1.2 for the region normalizer,"
>
> A bit off topic, but I think unfortunately region normalizer now ignores
> empty regions to avoid undoing pre-split on the table. This is something
> which makes it far less useful for time-series databases with short TTL on
> the tables. We'll need to address that.
>
> -Mikhail
>
> On Thu, Mar 31, 2016 at 9:56 PM, Nick Dimiduk  wrote:
>
> > Hi folks,
> >
> > I have a table with TTL enabled. It's been receiving data for a while
> > beyond the TTL and I now have a number of empty regions. I'd like to drop
> > those empty regions to free up heap space on the region servers and
> reduce
> > master load. I'm running a 1.1 derivative.
> >
> > The only threads I found on this topic are from circa 0.92 timeframe.
> >
> > Short of upgrading to 1.2 for the region normalizer, what's the
> recommended
> > method of cleaning up this cruft? Should I be merging empty regions into
> > their neighbor's? Looks like region merge hasn't been migrated to ProcV2
> > yet so would be wise to reduce online table activity, or at least aim
> for a
> > "quiet period"? Is there a documented process for off-lining and
> deleting a
> > region by name? I don't see anything in the book about it.
> >
> > I experimented with online merge on pseudodist, looks like it's working
> > fine for the most basic case. I'll probably pursue this unless someone
> has
> > some other ideas.
> >
> > Thanks,
> > Nick
> >
>
>
>
> --
> Thanks,
> Michael Antonov
>


Re: Store Large files on HBase/HDFS

2016-04-01 Thread Stack
On Thu, Mar 31, 2016 at 6:42 PM, Arun Patel  wrote:

> [image: Mic Drop]
> Since there are millions of files (with sizes from 1mb to 15mb), I would
> like to store them in a sequence file.  How do I store the location of each
> of these files in HBase?
>
> I see lots blogs and books talking about storing large files on HDFS and
> storing file paths on HBase.  But, I don't see any real examples. I was
> wondering if anybody implemented this in production.
>
> I don't know of any open implementation that I could point you at.

There is some consideration of what would be involved spanning HDFS and
HBase in this blog [1].

St.Ack

1.
http://blog.cloudera.com/blog/2015/06/inside-apache-hbases-new-support-for-mobs/


> Looking forward for reply from the community experts.  Thanks.
>
> Regards,
> Arun
>
> On Sun, Feb 21, 2016 at 10:30 AM, Ted Yu  wrote:
>
> > For #1, please take a look
> > at
> >
> hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
> >
> > e.g. the following methods:
> >
> >   public DFSInputStream open(String src) throws IOException {
> >
> >   public HdfsDataOutputStream append(final String src, final int
> > buffersize,
> >
> >   EnumSet flag, final Progressable progress,
> >
> >   final FileSystem.Statistics statistics) throws IOException {
> >
> >
> > Cheers
> >
> > On Wed, Feb 17, 2016 at 3:40 PM, Arun Patel 
> > wrote:
> >
> > > I would like to store large documents (over 100 MB) on HDFS and insert
> > > metadata in HBase.
> > >
> > > 1) Users will use HBase REST API for PUT and GET requests for storing
> and
> > > retrieving documents. In this case, how to PUT and GET documents
> to/from
> > > HDFS?What are the recommended ways for storing and accessing document
> > > to/from HDFS that provides optimum performance?
> > >
> > > Can you please share any sample code?  or a Github project?
> > >
> > > 2)  What are the performance issues I need to know?
> > >
> > > Regards,
> > > Arun
> > >
> >
>


Re: Back up HBase tables before hadoop upgrade

2016-04-01 Thread Ted Yu
bq. copy whole hbase directory to my local disk

I doubt your local disk has enough space for all your data.
Plus, what if some part of the local disk goes bad ?
With hdfs, the chance of data loss is very low.

w.r.t. hbase snapshot, you can refer to
http://hbase.apache.org/book.html#ops.snapshots

For Export, see http://hbase.apache.org/book.html#export

Note, hbase snapshot doesn't generate sequence file.

On Fri, Apr 1, 2016 at 7:10 AM, Chathuri Wimalasena 
wrote:

> Thank you. What if I stop HBase and copy whole hbase directory to my local
> disk ? Will that work if something went wrong with the upgrade ?
>
> Also could you please tell me what's the difference between export and
> snapshot ?
>
> Thanks,
> Chathuri
>
> On Wed, Mar 30, 2016 at 10:01 AM, Ted Yu  wrote:
>
> > You can also snapshot each of the 647 tables.
> > In case something goes unexpected, you can restore any of them.
> >
> > FYI
> >
> > On Wed, Mar 30, 2016 at 6:46 AM, Chathuri Wimalasena <
> kamalas...@gmail.com
> > >
> > wrote:
> >
> > > Hi All,
> > >
> > > We have production system using hadoop 2.5.1 and HBase 0.94.23. We have
> > > nearly around 200 TB of data in HDFS and we are planning to upgrade to
> > > newer hadoop version 2.7.2. In HBase we have roughly 647 tables. Before
> > the
> > > upgrade, we want to back up HBase tables in case of data loss or
> > corruption
> > > while upgrade. We are thinking of using export and import functionality
> > to
> > > export each table. Is there any other recommended way to back up hbase
> > > tables ?
> > >
> > > Thanks,
> > > Chathuri
> > >
> >
>


Re: Back up HBase tables before hadoop upgrade

2016-04-01 Thread Chathuri Wimalasena
Thank you. What if I stop HBase and copy whole hbase directory to my local
disk ? Will that work if something went wrong with the upgrade ?

Also could you please tell me what's the difference between export and
snapshot ?

Thanks,
Chathuri

On Wed, Mar 30, 2016 at 10:01 AM, Ted Yu  wrote:

> You can also snapshot each of the 647 tables.
> In case something goes unexpected, you can restore any of them.
>
> FYI
>
> On Wed, Mar 30, 2016 at 6:46 AM, Chathuri Wimalasena  >
> wrote:
>
> > Hi All,
> >
> > We have production system using hadoop 2.5.1 and HBase 0.94.23. We have
> > nearly around 200 TB of data in HDFS and we are planning to upgrade to
> > newer hadoop version 2.7.2. In HBase we have roughly 647 tables. Before
> the
> > upgrade, we want to back up HBase tables in case of data loss or
> corruption
> > while upgrade. We are thinking of using export and import functionality
> to
> > export each table. Is there any other recommended way to back up hbase
> > tables ?
> >
> > Thanks,
> > Chathuri
> >
>


Re: build error

2016-04-01 Thread Micha


Am 01.04.2016 um 11:23 schrieb Ted Yu:
> In refguide, I don't see -Dsnappy mentioned.
> I didn't find snappy in pom.xml either.

indeed, it isn't mentioned. Maybe I used an older building guide.

So snappy works just out of the box?

By the way, the option "-Dhadoop-two.version=2.7.2" also isn't mentioned
in the ref guide, but instead "-Dhadoop.profile=.."
Is there a difference between those two?


> Have you tried building without this -D ?

no, I'll try it

cheers,
Michael



Re: build error

2016-04-01 Thread Ted Yu
In refguide, I don't see -Dsnappy mentioned.
I didn't find snappy in pom.xml either.

Have you tried building without this -D ?

On Fri, Apr 1, 2016 at 12:40 AM, Micha  wrote:

> Hi,
>
> this is my first maven build, thought this should just work :-)
>
> after calling:
>
> MAVEN_OPTS="-Xmx2g" mvn site install assembly:assembly -DskipTests
> -Dhadoop-two.version=2.7.2   -Dsnappy
>
>
> I get:
>
> Downloading:
>
> http://people.apache.org/~garyh/mvn/org/apache/hadoop/hadoop-snappy/0.0.1-SNAPSHOT/hadoop-snappy-0.0.1-SNAPSHOT.pom
> Downloading:
>
> http://repository.apache.org/snapshots/org/apache/hadoop/hadoop-snappy/0.0.1-SNAPSHOT/hadoop-snappy-0.0.1-SNAPSHOT.pom
> [WARNING] The POM for org.apache.hadoop:hadoop-snappy:jar:0.0.1-SNAPSHOT
> is missing, no dependency information available
>
>
> this leads to:
>
> ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-site-plugin:3.4:site (default-site) on
> project hbase: failed to get report for
> org.apache.maven.plugins:maven-javadoc-plugin: Failed to execute goal on
> project hbase-server: Could not resolve dependencies for project
> org.apache.hbase:hbase-server:jar:1.1.4: Could not find artifact
> org.apache.hadoop:hadoop-snappy:jar:0.0.1-SNAPSHOT in apache release
> (https://repository.apache.org/content/repositories/releases/) -> [Help 1]
> [ERROR]
>
>
>
> how to fix this?
>
> thanks,
>  Michael
>
>


build error

2016-04-01 Thread Micha
Hi,

this is my first maven build, thought this should just work :-)

after calling:

MAVEN_OPTS="-Xmx2g" mvn site install assembly:assembly -DskipTests
-Dhadoop-two.version=2.7.2   -Dsnappy


I get:

Downloading:
http://people.apache.org/~garyh/mvn/org/apache/hadoop/hadoop-snappy/0.0.1-SNAPSHOT/hadoop-snappy-0.0.1-SNAPSHOT.pom
Downloading:
http://repository.apache.org/snapshots/org/apache/hadoop/hadoop-snappy/0.0.1-SNAPSHOT/hadoop-snappy-0.0.1-SNAPSHOT.pom
[WARNING] The POM for org.apache.hadoop:hadoop-snappy:jar:0.0.1-SNAPSHOT
is missing, no dependency information available


this leads to:

ERROR] Failed to execute goal
org.apache.maven.plugins:maven-site-plugin:3.4:site (default-site) on
project hbase: failed to get report for
org.apache.maven.plugins:maven-javadoc-plugin: Failed to execute goal on
project hbase-server: Could not resolve dependencies for project
org.apache.hbase:hbase-server:jar:1.1.4: Could not find artifact
org.apache.hadoop:hadoop-snappy:jar:0.0.1-SNAPSHOT in apache release
(https://repository.apache.org/content/repositories/releases/) -> [Help 1]
[ERROR]



how to fix this?

thanks,
 Michael