Re: Retiring empty regions
;) That was not the question ;) So Nick, merge on 1.1 is not recommended??? Was working very well on previous versions. Is ProcV2 really impact it that bad?? JMS 2016-04-01 13:49 GMT-04:00 Vladimir Rodionov : > >> This is something > >> which makes it far less useful for time-series databases with short TTL > on > >> the tables. > > With a right row-key design you will never have empty regions due to TTL. > > -Vlad > > On Thu, Mar 31, 2016 at 10:31 PM, Mikhail Antonov > wrote: > > > Crazy idea, but you might be able to take stripped down version of region > > normalizer code and make a Tool to run? Requesting split or merge is done > > through the client API, and the only weighing information you need is > > whether region empty or not, that you could find out too? > > > > > > "Short of upgrading to 1.2 for the region normalizer," > > > > A bit off topic, but I think unfortunately region normalizer now ignores > > empty regions to avoid undoing pre-split on the table. This is something > > which makes it far less useful for time-series databases with short TTL > on > > the tables. We'll need to address that. > > > > -Mikhail > > > > On Thu, Mar 31, 2016 at 9:56 PM, Nick Dimiduk > wrote: > > > > > Hi folks, > > > > > > I have a table with TTL enabled. It's been receiving data for a while > > > beyond the TTL and I now have a number of empty regions. I'd like to > drop > > > those empty regions to free up heap space on the region servers and > > reduce > > > master load. I'm running a 1.1 derivative. > > > > > > The only threads I found on this topic are from circa 0.92 timeframe. > > > > > > Short of upgrading to 1.2 for the region normalizer, what's the > > recommended > > > method of cleaning up this cruft? Should I be merging empty regions > into > > > their neighbor's? Looks like region merge hasn't been migrated to > ProcV2 > > > yet so would be wise to reduce online table activity, or at least aim > > for a > > > "quiet period"? Is there a documented process for off-lining and > > deleting a > > > region by name? I don't see anything in the book about it. > > > > > > I experimented with online merge on pseudodist, looks like it's working > > > fine for the most basic case. I'll probably pursue this unless someone > > has > > > some other ideas. > > > > > > Thanks, > > > Nick > > > > > > > > > > > -- > > Thanks, > > Michael Antonov > > >
Re: Retiring empty regions
>> This is something >> which makes it far less useful for time-series databases with short TTL on >> the tables. With a right row-key design you will never have empty regions due to TTL. -Vlad On Thu, Mar 31, 2016 at 10:31 PM, Mikhail Antonov wrote: > Crazy idea, but you might be able to take stripped down version of region > normalizer code and make a Tool to run? Requesting split or merge is done > through the client API, and the only weighing information you need is > whether region empty or not, that you could find out too? > > > "Short of upgrading to 1.2 for the region normalizer," > > A bit off topic, but I think unfortunately region normalizer now ignores > empty regions to avoid undoing pre-split on the table. This is something > which makes it far less useful for time-series databases with short TTL on > the tables. We'll need to address that. > > -Mikhail > > On Thu, Mar 31, 2016 at 9:56 PM, Nick Dimiduk wrote: > > > Hi folks, > > > > I have a table with TTL enabled. It's been receiving data for a while > > beyond the TTL and I now have a number of empty regions. I'd like to drop > > those empty regions to free up heap space on the region servers and > reduce > > master load. I'm running a 1.1 derivative. > > > > The only threads I found on this topic are from circa 0.92 timeframe. > > > > Short of upgrading to 1.2 for the region normalizer, what's the > recommended > > method of cleaning up this cruft? Should I be merging empty regions into > > their neighbor's? Looks like region merge hasn't been migrated to ProcV2 > > yet so would be wise to reduce online table activity, or at least aim > for a > > "quiet period"? Is there a documented process for off-lining and > deleting a > > region by name? I don't see anything in the book about it. > > > > I experimented with online merge on pseudodist, looks like it's working > > fine for the most basic case. I'll probably pursue this unless someone > has > > some other ideas. > > > > Thanks, > > Nick > > > > > > -- > Thanks, > Michael Antonov >
Re: Store Large files on HBase/HDFS
On Thu, Mar 31, 2016 at 6:42 PM, Arun Patel wrote: > [image: Mic Drop] > Since there are millions of files (with sizes from 1mb to 15mb), I would > like to store them in a sequence file. How do I store the location of each > of these files in HBase? > > I see lots blogs and books talking about storing large files on HDFS and > storing file paths on HBase. But, I don't see any real examples. I was > wondering if anybody implemented this in production. > > I don't know of any open implementation that I could point you at. There is some consideration of what would be involved spanning HDFS and HBase in this blog [1]. St.Ack 1. http://blog.cloudera.com/blog/2015/06/inside-apache-hbases-new-support-for-mobs/ > Looking forward for reply from the community experts. Thanks. > > Regards, > Arun > > On Sun, Feb 21, 2016 at 10:30 AM, Ted Yu wrote: > > > For #1, please take a look > > at > > > hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java > > > > e.g. the following methods: > > > > public DFSInputStream open(String src) throws IOException { > > > > public HdfsDataOutputStream append(final String src, final int > > buffersize, > > > > EnumSet flag, final Progressable progress, > > > > final FileSystem.Statistics statistics) throws IOException { > > > > > > Cheers > > > > On Wed, Feb 17, 2016 at 3:40 PM, Arun Patel > > wrote: > > > > > I would like to store large documents (over 100 MB) on HDFS and insert > > > metadata in HBase. > > > > > > 1) Users will use HBase REST API for PUT and GET requests for storing > and > > > retrieving documents. In this case, how to PUT and GET documents > to/from > > > HDFS?What are the recommended ways for storing and accessing document > > > to/from HDFS that provides optimum performance? > > > > > > Can you please share any sample code? or a Github project? > > > > > > 2) What are the performance issues I need to know? > > > > > > Regards, > > > Arun > > > > > >
Re: Back up HBase tables before hadoop upgrade
bq. copy whole hbase directory to my local disk I doubt your local disk has enough space for all your data. Plus, what if some part of the local disk goes bad ? With hdfs, the chance of data loss is very low. w.r.t. hbase snapshot, you can refer to http://hbase.apache.org/book.html#ops.snapshots For Export, see http://hbase.apache.org/book.html#export Note, hbase snapshot doesn't generate sequence file. On Fri, Apr 1, 2016 at 7:10 AM, Chathuri Wimalasena wrote: > Thank you. What if I stop HBase and copy whole hbase directory to my local > disk ? Will that work if something went wrong with the upgrade ? > > Also could you please tell me what's the difference between export and > snapshot ? > > Thanks, > Chathuri > > On Wed, Mar 30, 2016 at 10:01 AM, Ted Yu wrote: > > > You can also snapshot each of the 647 tables. > > In case something goes unexpected, you can restore any of them. > > > > FYI > > > > On Wed, Mar 30, 2016 at 6:46 AM, Chathuri Wimalasena < > kamalas...@gmail.com > > > > > wrote: > > > > > Hi All, > > > > > > We have production system using hadoop 2.5.1 and HBase 0.94.23. We have > > > nearly around 200 TB of data in HDFS and we are planning to upgrade to > > > newer hadoop version 2.7.2. In HBase we have roughly 647 tables. Before > > the > > > upgrade, we want to back up HBase tables in case of data loss or > > corruption > > > while upgrade. We are thinking of using export and import functionality > > to > > > export each table. Is there any other recommended way to back up hbase > > > tables ? > > > > > > Thanks, > > > Chathuri > > > > > >
Re: Back up HBase tables before hadoop upgrade
Thank you. What if I stop HBase and copy whole hbase directory to my local disk ? Will that work if something went wrong with the upgrade ? Also could you please tell me what's the difference between export and snapshot ? Thanks, Chathuri On Wed, Mar 30, 2016 at 10:01 AM, Ted Yu wrote: > You can also snapshot each of the 647 tables. > In case something goes unexpected, you can restore any of them. > > FYI > > On Wed, Mar 30, 2016 at 6:46 AM, Chathuri Wimalasena > > wrote: > > > Hi All, > > > > We have production system using hadoop 2.5.1 and HBase 0.94.23. We have > > nearly around 200 TB of data in HDFS and we are planning to upgrade to > > newer hadoop version 2.7.2. In HBase we have roughly 647 tables. Before > the > > upgrade, we want to back up HBase tables in case of data loss or > corruption > > while upgrade. We are thinking of using export and import functionality > to > > export each table. Is there any other recommended way to back up hbase > > tables ? > > > > Thanks, > > Chathuri > > >
Re: build error
Am 01.04.2016 um 11:23 schrieb Ted Yu: > In refguide, I don't see -Dsnappy mentioned. > I didn't find snappy in pom.xml either. indeed, it isn't mentioned. Maybe I used an older building guide. So snappy works just out of the box? By the way, the option "-Dhadoop-two.version=2.7.2" also isn't mentioned in the ref guide, but instead "-Dhadoop.profile=.." Is there a difference between those two? > Have you tried building without this -D ? no, I'll try it cheers, Michael
Re: build error
In refguide, I don't see -Dsnappy mentioned. I didn't find snappy in pom.xml either. Have you tried building without this -D ? On Fri, Apr 1, 2016 at 12:40 AM, Micha wrote: > Hi, > > this is my first maven build, thought this should just work :-) > > after calling: > > MAVEN_OPTS="-Xmx2g" mvn site install assembly:assembly -DskipTests > -Dhadoop-two.version=2.7.2 -Dsnappy > > > I get: > > Downloading: > > http://people.apache.org/~garyh/mvn/org/apache/hadoop/hadoop-snappy/0.0.1-SNAPSHOT/hadoop-snappy-0.0.1-SNAPSHOT.pom > Downloading: > > http://repository.apache.org/snapshots/org/apache/hadoop/hadoop-snappy/0.0.1-SNAPSHOT/hadoop-snappy-0.0.1-SNAPSHOT.pom > [WARNING] The POM for org.apache.hadoop:hadoop-snappy:jar:0.0.1-SNAPSHOT > is missing, no dependency information available > > > this leads to: > > ERROR] Failed to execute goal > org.apache.maven.plugins:maven-site-plugin:3.4:site (default-site) on > project hbase: failed to get report for > org.apache.maven.plugins:maven-javadoc-plugin: Failed to execute goal on > project hbase-server: Could not resolve dependencies for project > org.apache.hbase:hbase-server:jar:1.1.4: Could not find artifact > org.apache.hadoop:hadoop-snappy:jar:0.0.1-SNAPSHOT in apache release > (https://repository.apache.org/content/repositories/releases/) -> [Help 1] > [ERROR] > > > > how to fix this? > > thanks, > Michael > >
build error
Hi, this is my first maven build, thought this should just work :-) after calling: MAVEN_OPTS="-Xmx2g" mvn site install assembly:assembly -DskipTests -Dhadoop-two.version=2.7.2 -Dsnappy I get: Downloading: http://people.apache.org/~garyh/mvn/org/apache/hadoop/hadoop-snappy/0.0.1-SNAPSHOT/hadoop-snappy-0.0.1-SNAPSHOT.pom Downloading: http://repository.apache.org/snapshots/org/apache/hadoop/hadoop-snappy/0.0.1-SNAPSHOT/hadoop-snappy-0.0.1-SNAPSHOT.pom [WARNING] The POM for org.apache.hadoop:hadoop-snappy:jar:0.0.1-SNAPSHOT is missing, no dependency information available this leads to: ERROR] Failed to execute goal org.apache.maven.plugins:maven-site-plugin:3.4:site (default-site) on project hbase: failed to get report for org.apache.maven.plugins:maven-javadoc-plugin: Failed to execute goal on project hbase-server: Could not resolve dependencies for project org.apache.hbase:hbase-server:jar:1.1.4: Could not find artifact org.apache.hadoop:hadoop-snappy:jar:0.0.1-SNAPSHOT in apache release (https://repository.apache.org/content/repositories/releases/) -> [Help 1] [ERROR] how to fix this? thanks, Michael