Re: Question about Iceberg release cadence

2020-08-27 Thread Saisai Shao
Would like to get structured streaming reader in in the next release :).
Will spend time on addressing new feedbacks.

Thanks
Saisai

Mass Dosage  于2020年8月27日周四 下午10:36写道:

> I'm all for a release. The only thing still required for basic Hive read
> support (other than documentation of course!) is producing a *single* jar
> that can be added to Hive's classpath, the PR for that is at
> https://github.com/apache/iceberg/pull/1267.
>
> Thanks,
>
> Adrian
>
> On Thu, 27 Aug 2020 at 01:26, Anton Okolnychyi
>  wrote:
>
>> +1 on releasing structured streaming source. I should be able to do one
>> more review round tomorrow.
>>
>> - Anton
>>
>> On 26 Aug 2020, at 17:12, Jungtaek Lim 
>> wrote:
>>
>> I hope we include Spark structured streaming read as well in the next
>> release; that was proposed in Feb this year and still around. Quoting my
>> comment on benefit of the streaming read on Spark;
>>
>> This would be the major feature to cover the gap on use case for
>>> structured streaming between Delta Lake and Iceberg. There's a technical
>>> limitation on Spark structured streaming itself (global watermark), which
>>> requires workaround via splitting query into multiple queries &
>>> intermediate storage supporting end-to-end exactly once. Delta Lake covers
>>> the case, and I really would like to see the case also covered by Iceberg.
>>> I see there're lots of works in progress on the milestone (and these are
>>> great features which should be done), but after this we cover both batch
>>> and streaming workloads being done with Spark, which is a huge step forward
>>> on Spark users.
>>
>>
>> Thanks,
>> Jungtaek Lim (HeartSaVioR)
>>
>> On Thu, Aug 27, 2020 at 1:13 AM Ryan Blue 
>> wrote:
>>
>>> Hi Marton,
>>>
>>> 0.9.0 was released about 6 weeks ago, so I don't think we've planned
>>> when the next release will be yet. I think it's a good idea to release
>>> soon, though. The Flink sink is close to being ready as well and I'd like
>>> to get both of those released so that the contributors can start using them.
>>>
>>> Seems like a good question for the broader community: how about a
>>> release in the next month or so for Hive reads and the Flink sink?
>>>
>>> rb
>>>
>>> On Wed, Aug 26, 2020 at 8:58 AM Marton Bod  wrote:
>>>
 Hi Team,

 I was wondering whether there is a release cadence already in place for
 Iceberg, e.g. how often releases will take place approximately? Which
 commits/features as release candidates in the near term?

 We're looking to integrate Iceberg into Hive, however, the current
 0.9.1 release does not yet contain the StorageHandler code in iceberg-mr.
 Knowing the approximate release timelines would help greatly with our
 integration planning.

 Of course, happy to get involved with ongoing dev/stability efforts to
 help achieve a new release of this module.

 Thanks a lot,
 Marton

>>>
>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Netflix
>>>
>>
>>


Re: Question about Iceberg release cadence

2020-08-27 Thread Mass Dosage
I'm all for a release. The only thing still required for basic Hive read
support (other than documentation of course!) is producing a *single* jar
that can be added to Hive's classpath, the PR for that is at
https://github.com/apache/iceberg/pull/1267.

Thanks,

Adrian

On Thu, 27 Aug 2020 at 01:26, Anton Okolnychyi
 wrote:

> +1 on releasing structured streaming source. I should be able to do one
> more review round tomorrow.
>
> - Anton
>
> On 26 Aug 2020, at 17:12, Jungtaek Lim 
> wrote:
>
> I hope we include Spark structured streaming read as well in the next
> release; that was proposed in Feb this year and still around. Quoting my
> comment on benefit of the streaming read on Spark;
>
> This would be the major feature to cover the gap on use case for
>> structured streaming between Delta Lake and Iceberg. There's a technical
>> limitation on Spark structured streaming itself (global watermark), which
>> requires workaround via splitting query into multiple queries &
>> intermediate storage supporting end-to-end exactly once. Delta Lake covers
>> the case, and I really would like to see the case also covered by Iceberg.
>> I see there're lots of works in progress on the milestone (and these are
>> great features which should be done), but after this we cover both batch
>> and streaming workloads being done with Spark, which is a huge step forward
>> on Spark users.
>
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> On Thu, Aug 27, 2020 at 1:13 AM Ryan Blue 
> wrote:
>
>> Hi Marton,
>>
>> 0.9.0 was released about 6 weeks ago, so I don't think we've planned when
>> the next release will be yet. I think it's a good idea to release soon,
>> though. The Flink sink is close to being ready as well and I'd like to get
>> both of those released so that the contributors can start using them.
>>
>> Seems like a good question for the broader community: how about a release
>> in the next month or so for Hive reads and the Flink sink?
>>
>> rb
>>
>> On Wed, Aug 26, 2020 at 8:58 AM Marton Bod  wrote:
>>
>>> Hi Team,
>>>
>>> I was wondering whether there is a release cadence already in place for
>>> Iceberg, e.g. how often releases will take place approximately? Which
>>> commits/features as release candidates in the near term?
>>>
>>> We're looking to integrate Iceberg into Hive, however, the current 0.9.1
>>> release does not yet contain the StorageHandler code in iceberg-mr. Knowing
>>> the approximate release timelines would help greatly with our integration
>>> planning.
>>>
>>> Of course, happy to get involved with ongoing dev/stability efforts to
>>> help achieve a new release of this module.
>>>
>>> Thanks a lot,
>>> Marton
>>>
>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>
>


Re: Question about Iceberg release cadence

2020-08-26 Thread Anton Okolnychyi
+1 on releasing structured streaming source. I should be able to do one more 
review round tomorrow.

- Anton

> On 26 Aug 2020, at 17:12, Jungtaek Lim  wrote:
> 
> I hope we include Spark structured streaming read as well in the next 
> release; that was proposed in Feb this year and still around. Quoting my 
> comment on benefit of the streaming read on Spark;
> 
> This would be the major feature to cover the gap on use case for structured 
> streaming between Delta Lake and Iceberg. There's a technical limitation on 
> Spark structured streaming itself (global watermark), which requires 
> workaround via splitting query into multiple queries & intermediate storage 
> supporting end-to-end exactly once. Delta Lake covers the case, and I really 
> would like to see the case also covered by Iceberg.
> I see there're lots of works in progress on the milestone (and these are 
> great features which should be done), but after this we cover both batch and 
> streaming workloads being done with Spark, which is a huge step forward on 
> Spark users.
> 
> Thanks,
> Jungtaek Lim (HeartSaVioR) 
> 
> On Thu, Aug 27, 2020 at 1:13 AM Ryan Blue  wrote:
> Hi Marton,
> 
> 0.9.0 was released about 6 weeks ago, so I don't think we've planned when the 
> next release will be yet. I think it's a good idea to release soon, though. 
> The Flink sink is close to being ready as well and I'd like to get both of 
> those released so that the contributors can start using them.
> 
> Seems like a good question for the broader community: how about a release in 
> the next month or so for Hive reads and the Flink sink?
> 
> rb
> 
> On Wed, Aug 26, 2020 at 8:58 AM Marton Bod  > wrote:
> Hi Team,
> 
> I was wondering whether there is a release cadence already in place for 
> Iceberg, e.g. how often releases will take place approximately? Which 
> commits/features as release candidates in the near term?
> 
> We're looking to integrate Iceberg into Hive, however, the current 0.9.1 
> release does not yet contain the StorageHandler code in iceberg-mr. Knowing 
> the approximate release timelines would help greatly with our integration 
> planning.
> 
> Of course, happy to get involved with ongoing dev/stability efforts to help 
> achieve a new release of this module.
> 
> Thanks a lot,
> Marton
> 
> 
> -- 
> Ryan Blue
> Software Engineer
> Netflix



Re: Question about Iceberg release cadence

2020-08-26 Thread Jungtaek Lim
I hope we include Spark structured streaming read as well in the next
release; that was proposed in Feb this year and still around. Quoting my
comment on benefit of the streaming read on Spark;

This would be the major feature to cover the gap on use case for structured
> streaming between Delta Lake and Iceberg. There's a technical limitation on
> Spark structured streaming itself (global watermark), which requires
> workaround via splitting query into multiple queries & intermediate storage
> supporting end-to-end exactly once. Delta Lake covers the case, and I
> really would like to see the case also covered by Iceberg.
> I see there're lots of works in progress on the milestone (and these are
> great features which should be done), but after this we cover both batch
> and streaming workloads being done with Spark, which is a huge step forward
> on Spark users.


Thanks,
Jungtaek Lim (HeartSaVioR)

On Thu, Aug 27, 2020 at 1:13 AM Ryan Blue  wrote:

> Hi Marton,
>
> 0.9.0 was released about 6 weeks ago, so I don't think we've planned when
> the next release will be yet. I think it's a good idea to release soon,
> though. The Flink sink is close to being ready as well and I'd like to get
> both of those released so that the contributors can start using them.
>
> Seems like a good question for the broader community: how about a release
> in the next month or so for Hive reads and the Flink sink?
>
> rb
>
> On Wed, Aug 26, 2020 at 8:58 AM Marton Bod  wrote:
>
>> Hi Team,
>>
>> I was wondering whether there is a release cadence already in place for
>> Iceberg, e.g. how often releases will take place approximately? Which
>> commits/features as release candidates in the near term?
>>
>> We're looking to integrate Iceberg into Hive, however, the current 0.9.1
>> release does not yet contain the StorageHandler code in iceberg-mr. Knowing
>> the approximate release timelines would help greatly with our integration
>> planning.
>>
>> Of course, happy to get involved with ongoing dev/stability efforts to
>> help achieve a new release of this module.
>>
>> Thanks a lot,
>> Marton
>>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>


Re: Question about Iceberg release cadence

2020-08-26 Thread Ryan Blue
Hi Marton,

0.9.0 was released about 6 weeks ago, so I don't think we've planned when
the next release will be yet. I think it's a good idea to release soon,
though. The Flink sink is close to being ready as well and I'd like to get
both of those released so that the contributors can start using them.

Seems like a good question for the broader community: how about a release
in the next month or so for Hive reads and the Flink sink?

rb

On Wed, Aug 26, 2020 at 8:58 AM Marton Bod  wrote:

> Hi Team,
>
> I was wondering whether there is a release cadence already in place for
> Iceberg, e.g. how often releases will take place approximately? Which
> commits/features as release candidates in the near term?
>
> We're looking to integrate Iceberg into Hive, however, the current 0.9.1
> release does not yet contain the StorageHandler code in iceberg-mr. Knowing
> the approximate release timelines would help greatly with our integration
> planning.
>
> Of course, happy to get involved with ongoing dev/stability efforts to
> help achieve a new release of this module.
>
> Thanks a lot,
> Marton
>


-- 
Ryan Blue
Software Engineer
Netflix