To be honest, I am not sure if we need to kick off another file system storage discussion in Ignite. It sounds like a huge effort and likely will not be productive.
However, I think an ability to store large objects will make sense. For example, how do I store a 10GB blob in Ignite cache? Most likely we have to have a separate memory or disk space, allocated for blobs only. We also need to be able to efficiently transfer a 10GB Blob object over the network and store it off-heap right away, without bringing it into main heap memory (otherwise we would run out of memory). I suggest that we create an IEP about this use case alone and leave the file system for the future discussions. D. On Mon, Jul 2, 2018 at 6:50 AM, Vladimir Ozerov <voze...@gridgain.com> wrote: > Pavel, > > Thank you. I'll wait for feature comparison and concrete use cases, because > for me this feature still sounds too abstract to judge whether product > would benefit from it. > > On Mon, Jul 2, 2018 at 3:15 PM Pavel Kovalenko <jokse...@gmail.com> wrote: > > > Dmitriy, > > > > I think we have a little miscommunication here. Of course, I meant > > supporting large entries / chunks of binary data. Internally it will be > > BLOB storage, which can be accessed through various interfaces. > > "File" is just an abstraction for an end user for convenience, a wrapper > > layer to have user-friendly API to directly store BLOBs. We shouldn't > > support full file protocol support with file system capabilities. It can > be > > added later, but now it's absolutely unnecessary and introduces extra > > complexity. > > > > We can implement our BLOB storage step by step. The first thing is > > core functionality and support to save large parts of binary objects to > it. > > "File" layer, Web layer, etc. can be added later. > > > > The initial IGFS design doesn't have good capabilities to have a > > persistence layer. I think we shouldn't do any changes to it, this > project > > as for me is almost outdated. We will drop IGFS after implementing File > > System layer over our BLOB storage. > > > > Vladimir, > > > > I will prepare a comparison with other existing distributed file storages > > and file systems in a few days. > > > > About usage data grid, I never said, that we need transactions, sync > backup > > and etc. We need just a few core things - Atomic cache with persistence, > > Discovery, Baseline, Affinity, and Communication. > > Other things we can implement by ourselves. So this feature can develop > > independently of other non-core features. > > For me Ignite way is providing to our users a fast and convenient way to > > solve their problems with good performance and durability. We have the > > problem with storing large data, we should solve it. > > About other things see my message to Dmitriy above. > > > > вс, 1 июл. 2018 г. в 9:48, Dmitriy Setrakyan <dsetrak...@apache.org>: > > > > > Pavel, > > > > > > I have actually misunderstood the use case. To be honest, I thought > that > > > you were talking about the support of large values in Ignite caches, > e.g. > > > objects that are several megabytes in cache. > > > > > > If we are tackling the distributed file system, then in my view, we > > should > > > be talking about IGFS and adding persistence support to IGFS (which is > > > based on HDFS API). It is not clear to me that you are talking about > > IGFS. > > > Can you confirm? > > > > > > D. > > > > > > > > > On Sat, Jun 30, 2018 at 10:59 AM, Pavel Kovalenko <jokse...@gmail.com> > > > wrote: > > > > > > > Dmitriy, > > > > > > > > Yes, I have approximate design in my mind. The main idea is that we > > > already > > > > have distributed cache for files metadata (our Atomic cache), the > data > > > flow > > > > and distribution will be controlled by our AffinityFunction and > > Baseline. > > > > We're already have discovery and communication to make such local > files > > > > storages to be synced. The files data will be separated to large > blocks > > > > (64-128Mb) (which looks very similar to our WAL). Each block can > > contain > > > > one or more file chunks. The tablespace (segment ids, offsets and > etc.) > > > > will be stored to our regular page memory. This is key ideas to > > implement > > > > first version of such storage. We already have similiar components in > > our > > > > persistence, so this experience can be reused to develop such > storage. > > > > > > > > Denis, > > > > > > > > Nothing significant should be changed at our memory level. It will be > > > > separate, pluggable component over cache. Most of the functions which > > > give > > > > performance boost can be delegated to OS level (Memory mapped files, > > DMA, > > > > Direct write from Socket to disk and vice versa). Ignite and File > > Storage > > > > can develop independetly of each other. > > > > > > > > Alexey Stelmak, which has a great experience with developing such > > systems > > > > can provide more low level information about how it should look. > > > > > > > > сб, 30 июн. 2018 г. в 19:40, Dmitriy Setrakyan < > dsetrak...@apache.org > > >: > > > > > > > > > Pavel, it definitely makes sense. Do you have a design in mind? > > > > > > > > > > D. > > > > > > > > > > On Sat, Jun 30, 2018, 07:24 Pavel Kovalenko <jokse...@gmail.com> > > > wrote: > > > > > > > > > > > Igniters, > > > > > > > > > > > > I would like to start a discussion about designing a new feature > > > > because > > > > > I > > > > > > think it's time to start making steps towards it. > > > > > > I noticed, that some of our users have tried to store large > > > homogenous > > > > > > entries (> 1, 10, 100 Mb/Gb/Tb) to our caches, but without big > > > success. > > > > > > > > > > > > IGFS project has the possibility to do it, but as for me it has > one > > > big > > > > > > disadvantage - it's in-memory only, so users have a strict size > > limit > > > > of > > > > > > their data and have data loss problem. > > > > > > > > > > > > Our durable memory has a possibility to persist a data that > doesn't > > > fit > > > > > to > > > > > > RAM to disk, but page structure of it is not supposed to store > > large > > > > > pieces > > > > > > of data. > > > > > > > > > > > > There are a lot of projects of distributed file systems like > HDFS, > > > > > > GlusterFS, etc. But all of them concentrate to implement > high-grade > > > > file > > > > > > protocol, rather than user-friendly API which leads to high entry > > > > > threshold > > > > > > to start implementing something over it. > > > > > > We shouldn't go in this way. Our main goal should be providing to > > > user > > > > > easy > > > > > > and fast way to use file storage and processing here and now. > > > > > > > > > > > > If take HDFS as closest possible by functionality project, we > have > > > one > > > > > big > > > > > > advantage against it. We can use our caches as files metadata > > storage > > > > and > > > > > > have the infinite possibility to scale it, while HDFS is bounded > by > > > > > > Namenode capacity and has big problems with keeping a large > number > > of > > > > > files > > > > > > in the system. > > > > > > > > > > > > We achieved very good experience with persistence when we > developed > > > our > > > > > > durable memory, and we can couple together it and experience with > > > > > services, > > > > > > binary protocol, I/O and start to design a new IEP. > > > > > > > > > > > > Use cases and features of the project: > > > > > > 1) Storing XML, JSON, BLOB, CLOB, images, videos, text, etc > without > > > > > > overhead and data loss possibility. > > > > > > 2) Easy, pluggable, fast and distributed file processing, > > > > transformation > > > > > > and analysis. (E.g. ImageMagick processor for images > > transformation, > > > > > > LuceneIndex for texts, whatever, it's bounded only by your > > > > imagination). > > > > > > 3) Scalability out of the box. > > > > > > 4) User-friendly API and minimal steps to start using this > storage > > in > > > > > > production. > > > > > > > > > > > > I repeated again, this project is not supposed to be a high-grade > > > > > > distributed file system with full file protocol support. > > > > > > This project should primarily focus on target users, which would > > like > > > > to > > > > > > use it without complex preparation. > > > > > > > > > > > > As for example, a user can deploy Ignite with such storage and > > > > web-server > > > > > > with REST API as Ignite service and get scalable, performant > image > > > > server > > > > > > out of the box which can be accessed using any programming > > language. > > > > > > > > > > > > As a far target goal, we should focus on storing and processing a > > > very > > > > > > large amount of the data like movies, streaming, which is the big > > > trend > > > > > > today. > > > > > > > > > > > > I would like to say special thanks to our community members > Alexey > > > > > Stelmak > > > > > > and Dmitriy Govorukhin which significantly helped me to put > > together > > > > all > > > > > > pieces of that puzzle. > > > > > > > > > > > > So, I want to hear your opinions about this proposal. > > > > > > > > > > > > > > > > > > > > >