Re: Store Large files on HBase/HDFS

2016-04-01 Thread Stack
On Thu, Mar 31, 2016 at 6:42 PM, Arun Patel  wrote:

> [image: Mic Drop]
> Since there are millions of files (with sizes from 1mb to 15mb), I would
> like to store them in a sequence file.  How do I store the location of each
> of these files in HBase?
>
> I see lots blogs and books talking about storing large files on HDFS and
> storing file paths on HBase.  But, I don't see any real examples. I was
> wondering if anybody implemented this in production.
>
> I don't know of any open implementation that I could point you at.

There is some consideration of what would be involved spanning HDFS and
HBase in this blog [1].

St.Ack

1.
http://blog.cloudera.com/blog/2015/06/inside-apache-hbases-new-support-for-mobs/


> Looking forward for reply from the community experts.  Thanks.
>
> Regards,
> Arun
>
> On Sun, Feb 21, 2016 at 10:30 AM, Ted Yu  wrote:
>
> > For #1, please take a look
> > at
> >
> hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
> >
> > e.g. the following methods:
> >
> >   public DFSInputStream open(String src) throws IOException {
> >
> >   public HdfsDataOutputStream append(final String src, final int
> > buffersize,
> >
> >   EnumSet flag, final Progressable progress,
> >
> >   final FileSystem.Statistics statistics) throws IOException {
> >
> >
> > Cheers
> >
> > On Wed, Feb 17, 2016 at 3:40 PM, Arun Patel 
> > wrote:
> >
> > > I would like to store large documents (over 100 MB) on HDFS and insert
> > > metadata in HBase.
> > >
> > > 1) Users will use HBase REST API for PUT and GET requests for storing
> and
> > > retrieving documents. In this case, how to PUT and GET documents
> to/from
> > > HDFS?What are the recommended ways for storing and accessing document
> > > to/from HDFS that provides optimum performance?
> > >
> > > Can you please share any sample code?  or a Github project?
> > >
> > > 2)  What are the performance issues I need to know?
> > >
> > > Regards,
> > > Arun
> > >
> >
>


Re: Store Large files on HBase/HDFS

2016-03-31 Thread Arun Patel
[image: Mic Drop]
Since there are millions of files (with sizes from 1mb to 15mb), I would
like to store them in a sequence file.  How do I store the location of each
of these files in HBase?

I see lots blogs and books talking about storing large files on HDFS and
storing file paths on HBase.  But, I don't see any real examples. I was
wondering if anybody implemented this in production.

Looking forward for reply from the community experts.  Thanks.

Regards,
Arun

On Sun, Feb 21, 2016 at 10:30 AM, Ted Yu  wrote:

> For #1, please take a look
> at
> hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
>
> e.g. the following methods:
>
>   public DFSInputStream open(String src) throws IOException {
>
>   public HdfsDataOutputStream append(final String src, final int
> buffersize,
>
>   EnumSet flag, final Progressable progress,
>
>   final FileSystem.Statistics statistics) throws IOException {
>
>
> Cheers
>
> On Wed, Feb 17, 2016 at 3:40 PM, Arun Patel 
> wrote:
>
> > I would like to store large documents (over 100 MB) on HDFS and insert
> > metadata in HBase.
> >
> > 1) Users will use HBase REST API for PUT and GET requests for storing and
> > retrieving documents. In this case, how to PUT and GET documents to/from
> > HDFS?What are the recommended ways for storing and accessing document
> > to/from HDFS that provides optimum performance?
> >
> > Can you please share any sample code?  or a Github project?
> >
> > 2)  What are the performance issues I need to know?
> >
> > Regards,
> > Arun
> >
>


Re: Store Large files on HBase/HDFS

2016-02-21 Thread Ted Yu
For #1, please take a look
at 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java

e.g. the following methods:

  public DFSInputStream open(String src) throws IOException {

  public HdfsDataOutputStream append(final String src, final int buffersize,

  EnumSet flag, final Progressable progress,

  final FileSystem.Statistics statistics) throws IOException {


Cheers

On Wed, Feb 17, 2016 at 3:40 PM, Arun Patel  wrote:

> I would like to store large documents (over 100 MB) on HDFS and insert
> metadata in HBase.
>
> 1) Users will use HBase REST API for PUT and GET requests for storing and
> retrieving documents. In this case, how to PUT and GET documents to/from
> HDFS?What are the recommended ways for storing and accessing document
> to/from HDFS that provides optimum performance?
>
> Can you please share any sample code?  or a Github project?
>
> 2)  What are the performance issues I need to know?
>
> Regards,
> Arun
>


Re: Store Large files on HBase/HDFS

2016-02-19 Thread Arun Patel
But the whole idea of storing large file on HDFS will be defeated, right?
Why do you think we need to bring it back to HBase?

On Thu, Feb 18, 2016 at 10:23 PM, Jameson Li  wrote:

> maybe U can parse the HDFS image file, then transform them as the Hfile,
> and load into hbase Tables.
> --remember to partition the hbase table
>
> 2016-02-18 7:40 GMT+08:00 Arun Patel :
>
> > I would like to store large documents (over 100 MB) on HDFS and insert
> > metadata in HBase.
> >
> > 1) Users will use HBase REST API for PUT and GET requests for storing and
> > retrieving documents. In this case, how to PUT and GET documents to/from
> > HDFS?What are the recommended ways for storing and accessing document
> > to/from HDFS that provides optimum performance?
> >
> > Can you please share any sample code?  or a Github project?
> >
> > 2)  What are the performance issues I need to know?
> >
> > Regards,
> > Arun
> >
>
>
>
> --
>
>
> Thanks & Regards,
> 李剑 Jameson Li
> Focus on Hadoop,Mysql
>


Re: Store Large files on HBase/HDFS

2016-02-18 Thread Jameson Li
maybe U can parse the HDFS image file, then transform them as the Hfile,
and load into hbase Tables.
--remember to partition the hbase table

2016-02-18 7:40 GMT+08:00 Arun Patel :

> I would like to store large documents (over 100 MB) on HDFS and insert
> metadata in HBase.
>
> 1) Users will use HBase REST API for PUT and GET requests for storing and
> retrieving documents. In this case, how to PUT and GET documents to/from
> HDFS?What are the recommended ways for storing and accessing document
> to/from HDFS that provides optimum performance?
>
> Can you please share any sample code?  or a Github project?
>
> 2)  What are the performance issues I need to know?
>
> Regards,
> Arun
>



-- 


Thanks & Regards,
李剑 Jameson Li
Focus on Hadoop,Mysql