On 5 March 2018 at 16:04, Michael Dürig <mic...@gmail.com> wrote:

> > How does it perform compared to TarMK
> > a) when the entire repo doesn't fit into RAM allocated to the container ?
> > b) when the working set doesn't fit into RAM allocated to the container ?
>
> I think this is some of the things we need to find out along the way.
> Currently my thinking is to move from off heap caching (mmap) to on
> heap caching (leveraging the segment cache). For that to work we
> likely need better understand locality of the working set (see
> https://issues.apache.org/jira/browse/OAK-5655) and rethink the
> granularity of the cached items. There will likely be many more issues
> coming through Jira re. this.
>

Agreed.
All that will help minimise the IO in this case, or are you saying that if
the IO is managed and not left to the OS via mmap that it may be possible
to use a network disk cached by the OS VFS Disk cache, if TarMK has been
optimised for that type of disk ?

@Tomek
I assume that the patch deals with the 50K limit[1] to the number of blocks
per Azure Blob store ?
With a compacted TarEntry size averaging 230K, the max repo size per Azure
Blob store will be about 10GB.
I checked the patch but didn't see anything to indicate that the size of
each tar entry was increased.
Azure Blob stores are also limited to 500 IOPS (API requests/s), which is
about the same as a magnetic disk.

Best Regards
Ian

1 https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits



>
> Michael
>
> On 2 March 2018 at 09:45, Ian Boston <i...@tfd.co.uk> wrote:
> > Hi Tomek,
> > Thank you for the pointers and the description in OAK-6922. It all makes
> > sense and seems like a reasonable approach. I assume the description is
> > upto date.
> >
> > How does it perform compared to TarMK
> > a) when the entire repo doesn't fit into RAM allocated to the container ?
> > b) when the working set doesn't fit into RAM allocated to the container ?
> >
> > Since you mentioned cost, have you done a cost based analysis of RAM vs
> > attached disk, assuming that TarMK has already been highly optimised to
> > cope with deployments where the working set may only just fit into RAM ?
> >
> > IIRC the Azure attached disks mount Azure Blobs behind a kernel block
> > device driver and use local SSD to optimise caching (in read and write
> > through mode). Since there are a kernel block device they also benefit
> from
> > the linux kernel VFS Disk Cache and support memory mapping via the page
> > cache. So An Azure attached disk often behaves like a local SSD (IIUC). I
> > realise that some containerisation frameworks in Azure dont yet support
> > easy native Azure disk mounting (eg Mesos), but others do (eg AKS[1])
> >
> > Best regards
> > Ian
> >
> >
> > 1 https://azure.microsoft.com/en-us/services/container-service/
> > https://docs.microsoft.com/en-us/azure/aks/azure-files-dynamic-pv
> >
> >
> >
> > On 1 March 2018 at 18:40, Matt Ryan <o...@mvryan.org> wrote:
> >
> >> Hi Tomek,
> >>
> >> Some time ago (November 2016 Oakathon IIRC) some people explored a
> similar
> >> concept using AWS (S3) instead of Azure.  If you haven’t discussed with
> >> them already it may be worth doing so.  IIRC Stefan Egli and I believe
> >> Michael Duerig were involved and probably some others as well.
> >>
> >> -MR
> >>
> >>
> >> On March 1, 2018 at 5:42:07 AM, Tomek Rekawek (reka...@adobe.com.invalid
> )
> >> wrote:
> >>
> >> Hi Tommaso,
> >>
> >> so, the goal is to run the Oak in a cloud, in this case Azure. In order
> to
> >> do this in a scalable way (eg. multiple instances on a single VM,
> >> containerized), we need to take care of provisioning the sufficient
> amount
> >> of space for the segmentstore. Mounting the physical SSD/HDD disks (in
> >> Azure they’re called “Managed Disks” aka EBS in Amazon) has two
> drawbacks:
> >>
> >> * it’s expensive,
> >> * it’s complex (each disk is a separate /dev/sdX that has to be
> formatted,
> >> mounted, etc.)
> >>
> >> The point of the Azure Segment Store is to deal with these two issues,
> by
> >> replacing the need for a local file system space with a remote service,
> >> that will be (a) cheaper and (b) easier to provision (as it’ll be
> >> configured on the application layer rather than VM layer).
> >>
> >> Another option would be using the Azure File Storage (which mounts the
> SMB
> >> file system, not the “physical” disk). However, in this case we’d have a
> >> remote storage that emulates a local one and SegmentMK doesn’t really
> >> expect this. Rather than that it’s better to create a full-fledged
> remote
> >> storage implementation, so we can work out the issues caused by the
> higher
> >> latency, etc.
> >>
> >> Regards,
> >> Tomek
> >>
> >> --
> >> Tomek Rękawek | Adobe Research | www.adobe.com
> >> reka...@adobe.com
> >>
> >> > On 1 Mar 2018, at 11:16, Tommaso Teofili <tommaso.teof...@gmail.com>
> >> wrote:
> >> >
> >> > Hi Tomek,
> >> >
> >> > While I think it's an interesting feature, I'd be also interested to
> hear
> >> > about the user story behind your prototype.
> >> >
> >> > Regards,
> >> > Tommaso
> >> >
> >> >
> >> > Il giorno gio 1 mar 2018 alle ore 10:31 Tomek Rękawek <
> tom...@apache.org
> >> >
> >> > ha scritto:
> >> >
> >> >> Hello,
> >> >>
> >> >> I prepared a prototype for the Azure-based Segment Store, which
> allows
> >> to
> >> >> persist all the SegmentMK-related resources (segments, journal,
> >> manifest,
> >> >> etc.) on a remote service, namely the Azure Blob Storage [1]. The
> whole
> >> >> description of the approach, data structure, etc. as well as the
> patch
> >> can
> >> >> be found in OAK-6922. It uses the extension points introduced in the
> >> >> OAK-6921.
> >> >>
> >> >> While it’s still an experimental code, I’d like to commit it to trunk
> >> >> rather sooner than later. The patch is already pretty big and I’d
> like
> >> to
> >> >> avoid developing it “privately” on my own branch. It’s a new,
> optional
> >> >> Maven module, which doesn’t change any existing behaviour of Oak or
> >> >> SegmentMK. The only change it makes externally is adding a few
> exports
> >> to
> >> >> the oak-segment-tar, so it can use the SPI introduced in the
> OAK-6921.
> >> We
> >> >> may narrow these exports to a single package if you think it’d be
> good
> >> for
> >> >> the encapsulation.
> >> >>
> >> >> There’s a related issue OAK-7297, which introduces the new fixture
> for
> >> >> benchmark and ITs. After merging it, all the Oak integration tests
> pass
> >> on
> >> >> the Azure Segment Store.
> >> >>
> >> >> Looking forward for the feedback.
> >> >>
> >> >> Regards,
> >> >> Tomek
> >> >>
> >> >> [1] https://azure.microsoft.com/en-us/services/storage/blobs/
> >> >>
> >> >> --
> >> >> Tomek Rękawek | Adobe Research | www.adobe.com
> >> >> reka...@adobe.com
> >> >>
> >> >>
> >>
>

Reply via email to