Re: Store Large files on HBase/HDFS
On Thu, Mar 31, 2016 at 6:42 PM, Arun Patelwrote: > [image: Mic Drop] > Since there are millions of files (with sizes from 1mb to 15mb), I would > like to store them in a sequence file. How do I store the location of each > of these files in HBase? > > I see lots blogs and books talking about storing large files on HDFS and > storing file paths on HBase. But, I don't see any real examples. I was > wondering if anybody implemented this in production. > > I don't know of any open implementation that I could point you at. There is some consideration of what would be involved spanning HDFS and HBase in this blog [1]. St.Ack 1. http://blog.cloudera.com/blog/2015/06/inside-apache-hbases-new-support-for-mobs/ > Looking forward for reply from the community experts. Thanks. > > Regards, > Arun > > On Sun, Feb 21, 2016 at 10:30 AM, Ted Yu wrote: > > > For #1, please take a look > > at > > > hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java > > > > e.g. the following methods: > > > > public DFSInputStream open(String src) throws IOException { > > > > public HdfsDataOutputStream append(final String src, final int > > buffersize, > > > > EnumSet flag, final Progressable progress, > > > > final FileSystem.Statistics statistics) throws IOException { > > > > > > Cheers > > > > On Wed, Feb 17, 2016 at 3:40 PM, Arun Patel > > wrote: > > > > > I would like to store large documents (over 100 MB) on HDFS and insert > > > metadata in HBase. > > > > > > 1) Users will use HBase REST API for PUT and GET requests for storing > and > > > retrieving documents. In this case, how to PUT and GET documents > to/from > > > HDFS?What are the recommended ways for storing and accessing document > > > to/from HDFS that provides optimum performance? > > > > > > Can you please share any sample code? or a Github project? > > > > > > 2) What are the performance issues I need to know? > > > > > > Regards, > > > Arun > > > > > >
Re: Store Large files on HBase/HDFS
[image: Mic Drop] Since there are millions of files (with sizes from 1mb to 15mb), I would like to store them in a sequence file. How do I store the location of each of these files in HBase? I see lots blogs and books talking about storing large files on HDFS and storing file paths on HBase. But, I don't see any real examples. I was wondering if anybody implemented this in production. Looking forward for reply from the community experts. Thanks. Regards, Arun On Sun, Feb 21, 2016 at 10:30 AM, Ted Yuwrote: > For #1, please take a look > at > hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java > > e.g. the following methods: > > public DFSInputStream open(String src) throws IOException { > > public HdfsDataOutputStream append(final String src, final int > buffersize, > > EnumSet flag, final Progressable progress, > > final FileSystem.Statistics statistics) throws IOException { > > > Cheers > > On Wed, Feb 17, 2016 at 3:40 PM, Arun Patel > wrote: > > > I would like to store large documents (over 100 MB) on HDFS and insert > > metadata in HBase. > > > > 1) Users will use HBase REST API for PUT and GET requests for storing and > > retrieving documents. In this case, how to PUT and GET documents to/from > > HDFS?What are the recommended ways for storing and accessing document > > to/from HDFS that provides optimum performance? > > > > Can you please share any sample code? or a Github project? > > > > 2) What are the performance issues I need to know? > > > > Regards, > > Arun > > >
Re: Store Large files on HBase/HDFS
For #1, please take a look at hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java e.g. the following methods: public DFSInputStream open(String src) throws IOException { public HdfsDataOutputStream append(final String src, final int buffersize, EnumSet flag, final Progressable progress, final FileSystem.Statistics statistics) throws IOException { Cheers On Wed, Feb 17, 2016 at 3:40 PM, Arun Patelwrote: > I would like to store large documents (over 100 MB) on HDFS and insert > metadata in HBase. > > 1) Users will use HBase REST API for PUT and GET requests for storing and > retrieving documents. In this case, how to PUT and GET documents to/from > HDFS?What are the recommended ways for storing and accessing document > to/from HDFS that provides optimum performance? > > Can you please share any sample code? or a Github project? > > 2) What are the performance issues I need to know? > > Regards, > Arun >
Re: Store Large files on HBase/HDFS
But the whole idea of storing large file on HDFS will be defeated, right? Why do you think we need to bring it back to HBase? On Thu, Feb 18, 2016 at 10:23 PM, Jameson Liwrote: > maybe U can parse the HDFS image file, then transform them as the Hfile, > and load into hbase Tables. > --remember to partition the hbase table > > 2016-02-18 7:40 GMT+08:00 Arun Patel : > > > I would like to store large documents (over 100 MB) on HDFS and insert > > metadata in HBase. > > > > 1) Users will use HBase REST API for PUT and GET requests for storing and > > retrieving documents. In this case, how to PUT and GET documents to/from > > HDFS?What are the recommended ways for storing and accessing document > > to/from HDFS that provides optimum performance? > > > > Can you please share any sample code? or a Github project? > > > > 2) What are the performance issues I need to know? > > > > Regards, > > Arun > > > > > > -- > > > Thanks & Regards, > 李剑 Jameson Li > Focus on Hadoop,Mysql >
Re: Store Large files on HBase/HDFS
maybe U can parse the HDFS image file, then transform them as the Hfile, and load into hbase Tables. --remember to partition the hbase table 2016-02-18 7:40 GMT+08:00 Arun Patel: > I would like to store large documents (over 100 MB) on HDFS and insert > metadata in HBase. > > 1) Users will use HBase REST API for PUT and GET requests for storing and > retrieving documents. In this case, how to PUT and GET documents to/from > HDFS?What are the recommended ways for storing and accessing document > to/from HDFS that provides optimum performance? > > Can you please share any sample code? or a Github project? > > 2) What are the performance issues I need to know? > > Regards, > Arun > -- Thanks & Regards, 李剑 Jameson Li Focus on Hadoop,Mysql