Re: Ignite as distributed file storage

Dmitriy Setrakyan Mon, 02 Jul 2018 15:08:30 -0700

To be honest, I am not sure if we need to kick off another file system
storage discussion in Ignite. It sounds like a huge effort and likely will
not be productive.


However, I think an ability to store large objects will make sense. For
example, how do I store a 10GB blob in Ignite cache? Most likely we have to
have a separate memory or disk space, allocated for blobs only. We also
need to be able to efficiently transfer a 10GB Blob object over the network
and store it off-heap right away, without bringing it into main heap memory
(otherwise we would run out of memory).

I suggest that we create an IEP about this use case alone and leave the
file system for the future discussions.

D.

On Mon, Jul 2, 2018 at 6:50 AM, Vladimir Ozerov <[email protected]>
wrote:

> Pavel,
>
> Thank you. I'll wait for feature comparison and concrete use cases, because
> for me this feature still sounds too abstract to judge whether product
> would benefit from it.
>
> On Mon, Jul 2, 2018 at 3:15 PM Pavel Kovalenko <[email protected]> wrote:
>
> > Dmitriy,
> >
> > I think we have a little miscommunication here. Of course, I meant
> > supporting large entries / chunks of binary data. Internally it will be
> > BLOB storage, which can be accessed through various interfaces.
> > "File" is just an abstraction for an end user for convenience, a wrapper
> > layer to have user-friendly API to directly store BLOBs. We shouldn't
> > support full file protocol support with file system capabilities. It can
> be
> > added later, but now it's absolutely unnecessary and introduces extra
> > complexity.
> >
> > We can implement our BLOB storage step by step. The first thing is
> > core functionality and support to save large parts of binary objects to
> it.
> > "File" layer, Web layer, etc. can be added later.
> >
> > The initial IGFS design doesn't have good capabilities to have a
> > persistence layer. I think we shouldn't do any changes to it, this
> project
> > as for me is almost outdated. We will drop IGFS after implementing File
> > System layer over our BLOB storage.
> >
> > Vladimir,
> >
> > I will prepare a comparison with other existing distributed file storages
> > and file systems in a few days.
> >
> > About usage data grid, I never said, that we need transactions, sync
> backup
> > and etc. We need just a few core things - Atomic cache with persistence,
> > Discovery, Baseline, Affinity, and Communication.
> > Other things we can implement by ourselves. So this feature can develop
> > independently of other non-core features.
> > For me Ignite way is providing to our users a fast and convenient way to
> > solve their problems with good performance and durability. We have the
> > problem with storing large data, we should solve it.
> > About other things see my message to Dmitriy above.
> >
> > вс, 1 июл. 2018 г. в 9:48, Dmitriy Setrakyan <[email protected]>:
> >
> > > Pavel,
> > >
> > > I have actually misunderstood the use case. To be honest, I thought
> that
> > > you were talking about the support of large values in Ignite caches,
> e.g.
> > > objects that are several megabytes in cache.
> > >
> > > If we are tackling the distributed file system, then in my view, we
> > should
> > > be talking about IGFS and adding persistence support to IGFS (which is
> > > based on HDFS API). It is not clear to me that you are talking about
> > IGFS.
> > > Can you confirm?
> > >
> > > D.
> > >
> > >
> > > On Sat, Jun 30, 2018 at 10:59 AM, Pavel Kovalenko <[email protected]>
> > > wrote:
> > >
> > > > Dmitriy,
> > > >
> > > > Yes, I have approximate design in my mind. The main idea is that we
> > > already
> > > > have distributed cache for files metadata (our Atomic cache), the
> data
> > > flow
> > > > and distribution will be controlled by our AffinityFunction and
> > Baseline.
> > > > We're already have discovery and communication to make such local
> files
> > > > storages to be synced. The files data will be separated to large
> blocks
> > > > (64-128Mb) (which looks very similar to our WAL). Each block can
> > contain
> > > > one or more file chunks. The tablespace (segment ids, offsets and
> etc.)
> > > > will be stored to our regular page memory. This is key ideas to
> > implement
> > > > first version of such storage. We already have similiar components in
> > our
> > > > persistence, so this experience can be reused to develop such
> storage.
> > > >
> > > > Denis,
> > > >
> > > > Nothing significant should be changed at our memory level. It will be
> > > > separate, pluggable component over cache. Most of the functions which
> > > give
> > > > performance boost can be delegated to OS level (Memory mapped files,
> > DMA,
> > > > Direct write from Socket to disk and vice versa). Ignite and File
> > Storage
> > > > can develop independetly of each other.
> > > >
> > > > Alexey Stelmak, which has a great experience with developing such
> > systems
> > > > can provide more low level information about how it should look.
> > > >
> > > > сб, 30 июн. 2018 г. в 19:40, Dmitriy Setrakyan <
> [email protected]
> > >:
> > > >
> > > > > Pavel, it definitely makes sense. Do you have a design in mind?
> > > > >
> > > > > D.
> > > > >
> > > > > On Sat, Jun 30, 2018, 07:24 Pavel Kovalenko <[email protected]>
> > > wrote:
> > > > >
> > > > > > Igniters,
> > > > > >
> > > > > > I would like to start a discussion about designing a new feature
> > > > because
> > > > > I
> > > > > > think it's time to start making steps towards it.
> > > > > > I noticed, that some of our users have tried to store large
> > > homogenous
> > > > > > entries (> 1, 10, 100 Mb/Gb/Tb) to our caches, but without big
> > > success.
> > > > > >
> > > > > > IGFS project has the possibility to do it, but as for me it has
> one
> > > big
> > > > > > disadvantage - it's in-memory only, so users have a strict size
> > limit
> > > > of
> > > > > > their data and have data loss problem.
> > > > > >
> > > > > > Our durable memory has a possibility to persist a data that
> doesn't
> > > fit
> > > > > to
> > > > > > RAM to disk, but page structure of it is not supposed to store
> > large
> > > > > pieces
> > > > > > of data.
> > > > > >
> > > > > > There are a lot of projects of distributed file systems like
> HDFS,
> > > > > > GlusterFS, etc. But all of them concentrate to implement
> high-grade
> > > > file
> > > > > > protocol, rather than user-friendly API which leads to high entry
> > > > > threshold
> > > > > > to start implementing something over it.
> > > > > > We shouldn't go in this way. Our main goal should be providing to
> > > user
> > > > > easy
> > > > > > and fast way to use file storage and processing here and now.
> > > > > >
> > > > > > If take HDFS as closest possible by functionality project, we
> have
> > > one
> > > > > big
> > > > > > advantage against it. We can use our caches as files metadata
> > storage
> > > > and
> > > > > > have the infinite possibility to scale it, while HDFS is bounded
> by
> > > > > > Namenode capacity and has big problems with keeping a large
> number
> > of
> > > > > files
> > > > > > in the system.
> > > > > >
> > > > > > We achieved very good experience with persistence when we
> developed
> > > our
> > > > > > durable memory, and we can couple together it and experience with
> > > > > services,
> > > > > > binary protocol, I/O and start to design a new IEP.
> > > > > >
> > > > > > Use cases and features of the project:
> > > > > > 1) Storing XML, JSON, BLOB, CLOB, images, videos, text, etc
> without
> > > > > > overhead and data loss possibility.
> > > > > > 2) Easy, pluggable, fast and distributed file processing,
> > > > transformation
> > > > > > and analysis. (E.g. ImageMagick processor for images
> > transformation,
> > > > > > LuceneIndex for texts, whatever, it's bounded only by your
> > > > imagination).
> > > > > > 3) Scalability out of the box.
> > > > > > 4) User-friendly API and minimal steps to start using this
> storage
> > in
> > > > > > production.
> > > > > >
> > > > > > I repeated again, this project is not supposed to be a high-grade
> > > > > > distributed file system with full file protocol support.
> > > > > > This project should primarily focus on target users, which would
> > like
> > > > to
> > > > > > use it without complex preparation.
> > > > > >
> > > > > > As for example, a user can deploy Ignite with such storage and
> > > > web-server
> > > > > > with REST API as Ignite service and get scalable, performant
> image
> > > > server
> > > > > > out of the box which can be accessed using any programming
> > language.
> > > > > >
> > > > > > As a far target goal, we should focus on storing and processing a
> > > very
> > > > > > large amount of the data like movies, streaming, which is the big
> > > trend
> > > > > > today.
> > > > > >
> > > > > > I would like to say special thanks to our community members
> Alexey
> > > > > Stelmak
> > > > > > and Dmitriy Govorukhin which significantly helped me to put
> > together
> > > > all
> > > > > > pieces of that puzzle.
> > > > > >
> > > > > > So, I want to hear your opinions about this proposal.
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Ignite as distributed file storage

Reply via email to