Implementation an S3 file system for python SDK - Updated

2019-04-08 Thread Pasan Kamburugamuwa
Hi,

I have updated the project proposal according to the given feedback. So can
you guys check my proposal again and give me your feedback about
corrections I have done.

Here is the link to the updated project proposal
https://docs.google.com/document/d/1i_PoIrbmhNgwKCS1TYWC28A9RsyZQFsQCJic3aCXO-8/edit?usp=sharing

Thank you
Pasan Kamburugamuwa


Re: Implementation an S3 file system for python SDK - Updated

2019-04-08 Thread Ahmet Altay
+dev  +Pablo Estrada  +Chamikara
Jayalath  +Udi Meiri 

Thank you Pasan. I quickly looked at the proposal and it looks good. Added
a few folks who could offer additional feedback.

On Mon, Apr 8, 2019 at 12:13 AM Pasan Kamburugamuwa <
pasankamburugamu...@gmail.com> wrote:

> Hi,
>
> I have updated the project proposal according to the given feedback. So
> can you guys check my proposal again and give me your feedback about
> corrections I have done.
>
> Here is the link to the updated project proposal
>
> https://docs.google.com/document/d/1i_PoIrbmhNgwKCS1TYWC28A9RsyZQFsQCJic3aCXO-8/edit?usp=sharing
>
> Thank you
> Pasan Kamburugamuwa
>


Re: Implementation an S3 file system for python SDK - Updated

2019-04-08 Thread Pablo Estrada
Currently, Pasan is working on a design for adding a couple implementations
to the Filesystem interface in Python, and it's not necessary to consider
SDF here. IMHO.

On the other hand, Python's fileio[1] could probably use SDF-based
improvements to split when many files are being matched.
Best
-P.

On Mon, Apr 8, 2019 at 10:00 AM Alex Amato  wrote:

> +Lukasz Cwik , +Boyuan Zhang , +Lara
> Schmidt 
>
> Should splittable DoFn be considered in this design? In order to split and
> scale the source step properly?
>
> On Mon, Apr 8, 2019 at 9:11 AM Ahmet Altay  wrote:
>
>> +dev  +Pablo Estrada  +Chamikara
>> Jayalath  +Udi Meiri 
>>
>> Thank you Pasan. I quickly looked at the proposal and it looks good.
>> Added a few folks who could offer additional feedback.
>>
>> On Mon, Apr 8, 2019 at 12:13 AM Pasan Kamburugamuwa <
>> pasankamburugamu...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I have updated the project proposal according to the given feedback. So
>>> can you guys check my proposal again and give me your feedback about
>>> corrections I have done.
>>>
>>> Here is the link to the updated project proposal
>>>
>>> https://docs.google.com/document/d/1i_PoIrbmhNgwKCS1TYWC28A9RsyZQFsQCJic3aCXO-8/edit?usp=sharing
>>>
>>> Thank you
>>> Pasan Kamburugamuwa
>>>
>>


Re: Implementation an S3 file system for python SDK - Updated

2019-04-08 Thread Lukasz Cwik
A filesystem is a lower level abstraction that a PTransform can use thus
there is no need to consider SDF when creating the S3 filesytem.
If we were redesigning the interface to all filesystems, then SDF should be
considered.

On Mon, Apr 8, 2019 at 10:54 AM Lara Schmidt  wrote:

> I'd push towards waiting until SDF is working end to end to begin
> converting things. Unless it's something like Text.ReadAll batch API that
> gets benefits without a SDF implementation. I don't have a lot of context
> on what file APIs python already supports.
>
> On Mon, Apr 8, 2019 at 10:06 AM Pablo Estrada  wrote:
>
>> Currently, Pasan is working on a design for adding a couple
>> implementations to the Filesystem interface in Python, and it's not
>> necessary to consider SDF here. IMHO.
>>
>> On the other hand, Python's fileio[1] could probably use SDF-based
>> improvements to split when many files are being matched.
>> Best
>> -P.
>>
>> On Mon, Apr 8, 2019 at 10:00 AM Alex Amato  wrote:
>>
>>> +Lukasz Cwik , +Boyuan Zhang , +Lara
>>> Schmidt 
>>>
>>> Should splittable DoFn be considered in this design? In order to split
>>> and scale the source step properly?
>>>
>>> On Mon, Apr 8, 2019 at 9:11 AM Ahmet Altay  wrote:
>>>
 +dev  +Pablo Estrada  +Chamikara
 Jayalath  +Udi Meiri 

 Thank you Pasan. I quickly looked at the proposal and it looks good.
 Added a few folks who could offer additional feedback.

 On Mon, Apr 8, 2019 at 12:13 AM Pasan Kamburugamuwa <
 pasankamburugamu...@gmail.com> wrote:

> Hi,
>
> I have updated the project proposal according to the given feedback.
> So can you guys check my proposal again and give me your feedback about
> corrections I have done.
>
> Here is the link to the updated project proposal
>
> https://docs.google.com/document/d/1i_PoIrbmhNgwKCS1TYWC28A9RsyZQFsQCJic3aCXO-8/edit?usp=sharing
>
> Thank you
> Pasan Kamburugamuwa
>



Re: Implementation an S3 file system for python SDK - Updated

2019-04-08 Thread Chamikara Jayalath
Thanks for the proposal Pasan. Added some comments.

As others mentioned, FileSystem interface is orthogonal to SDF  (storage
system instead of source format) so no need to wait for SDF.

- Cham

On Mon, Apr 8, 2019 at 10:57 AM Lukasz Cwik  wrote:

> A filesystem is a lower level abstraction that a PTransform can use thus
> there is no need to consider SDF when creating the S3 filesytem.
> If we were redesigning the interface to all filesystems, then SDF should
> be considered.
>
> On Mon, Apr 8, 2019 at 10:54 AM Lara Schmidt 
> wrote:
>
>> I'd push towards waiting until SDF is working end to end to begin
>> converting things. Unless it's something like Text.ReadAll batch API that
>> gets benefits without a SDF implementation. I don't have a lot of context
>> on what file APIs python already supports.
>>
>> On Mon, Apr 8, 2019 at 10:06 AM Pablo Estrada  wrote:
>>
>>> Currently, Pasan is working on a design for adding a couple
>>> implementations to the Filesystem interface in Python, and it's not
>>> necessary to consider SDF here. IMHO.
>>>
>>> On the other hand, Python's fileio[1] could probably use SDF-based
>>> improvements to split when many files are being matched.
>>> Best
>>> -P.
>>>
>>> On Mon, Apr 8, 2019 at 10:00 AM Alex Amato  wrote:
>>>
 +Lukasz Cwik , +Boyuan Zhang , +Lara
 Schmidt 

 Should splittable DoFn be considered in this design? In order to split
 and scale the source step properly?

 On Mon, Apr 8, 2019 at 9:11 AM Ahmet Altay  wrote:

> +dev  +Pablo Estrada  +Chamikara
> Jayalath  +Udi Meiri 
>
> Thank you Pasan. I quickly looked at the proposal and it looks good.
> Added a few folks who could offer additional feedback.
>
> On Mon, Apr 8, 2019 at 12:13 AM Pasan Kamburugamuwa <
> pasankamburugamu...@gmail.com> wrote:
>
>> Hi,
>>
>> I have updated the project proposal according to the given feedback.
>> So can you guys check my proposal again and give me your feedback about
>> corrections I have done.
>>
>> Here is the link to the updated project proposal
>>
>> https://docs.google.com/document/d/1i_PoIrbmhNgwKCS1TYWC28A9RsyZQFsQCJic3aCXO-8/edit?usp=sharing
>>
>> Thank you
>> Pasan Kamburugamuwa
>>
>