> Does anyone know if we can recover existing data affected by it?

In the PR #1271, there are two data types which have correctness bugs:
decimal18 and timestampZone.

For decimal18,  we actually write the correct decimal value, but read it in
an incorrect way. saying the decimal(10,3) and value = 10.100, the orc
writer will store it in file as  101*10^(-1),  while before this patch we
will read it as 101*10^(-3).  If we use the scale=-1 to construct the
BigDecimal and then adjust to scale=3, then in theory we could still get
the correct decimal 10100*10^(-3).

For timestampZone,  I'd say that we've stored the wrong value in the file,
the error range between the written timestamp and correct timestamp should
be less than a few seconds.  Because here [1]  for negative value,  -5 / 2
= -2,  floorDiv(-5, 2) = -3,  the error range should be less than 1,  the
nanoseconds of timestamp is the value that is less than one second.  While
I did not get the way to recover the existing data.

1.
https://github.com/apache/iceberg/pull/1271/files#diff-5aa4840155ec70fdf7f725e122cde7b7L218



On Tue, Aug 4, 2020 at 3:08 AM Ryan Blue <rb...@netflix.com.invalid> wrote:

> Yes, we should get #1269 into a patch release as well since it is a
> correctness bug.
>
> Does anyone know if we can recover existing data affected by it?
>
> On Mon, Aug 3, 2020 at 11:08 AM Anton Okolnychyi <aokolnyc...@apple.com>
> wrote:
>
>> I see a few open issues for ORC. Some of them seem critical (like issue
>> #1269). Do we want to fix those before the release? Or is ORC support still
>> experimental?
>>
>> - Anton
>>
>> On 1 Aug 2020, at 20:04, Jungtaek Lim <kabhwan.opensou...@gmail.com>
>> wrote:
>>
>> Sure! I just submitted #1285
>> <https://github.com/apache/iceberg/pull/1285> to exclude the refactor.
>> Once #1285 is merged I'll rebase the existing PR to do the refactor. Thanks
>> for the input!
>>
>> On Sun, Aug 2, 2020 at 4:41 AM Ryan Blue <rb...@netflix.com.invalid>
>> wrote:
>>
>>> Thanks, Jungtaek! I agree it would be great to fix that problem. I took
>>> a quick look at the PR and it is a little big to go into a patch release
>>> since it refactors quite a few places to consolidate the list copy. What do
>>> you think about making a PR that just fixes the problem with
>>> BaseCombinedScanTask and Kryo, then doing the remainder of the refactor in
>>> master?
>>>
>>> On Fri, Jul 31, 2020 at 5:29 PM Jungtaek Lim <
>>> kabhwan.opensou...@gmail.com> wrote:
>>>
>>>> If we still have some more days I think #1280
>>>> <https://github.com/apache/iceberg/pull/1280>: "fix serialization
>>>> issue in BaseCombinedScanTask with Kyro" is a good candidate to be
>>>> included. The bug affects both Spark and Flink (according to #1279
>>>> <https://github.com/apache/iceberg/pull/1279>).
>>>>
>>>> On Sat, Aug 1, 2020 at 8:04 AM Ryan Blue <b...@apache.org> wrote:
>>>>
>>>>> Hi everyone,
>>>>>
>>>>> We’ve accumulated a few bug fixes in the last couple of weeks and I
>>>>> think it might make sense to get some of them out in an 0.9.1 release 
>>>>> since
>>>>> they make it harder to work with Iceberg. Here are the ones I know about:
>>>>>
>>>>>    - #1282 <https://github.com/apache/iceberg/pull/1282>: rewriteNot
>>>>>    fails for binary and unary predicates
>>>>>    - #1278 <https://github.com/apache/iceberg/pull/1278>: Bad import
>>>>>    from commons-compress causes query failures
>>>>>    - #1251 <https://github.com/apache/iceberg/pull/1251>: Fixes more
>>>>>    imports from non-Iceberg Guava
>>>>>    - #1283 <https://github.com/apache/iceberg/pull/1283>: Query
>>>>>    descriptions fail when IN predicates are pushed
>>>>>    - #1228 <https://github.com/apache/iceberg/pull/1228>: Data
>>>>>    imports fail when paths include whitespace
>>>>>    - #1194 <https://github.com/apache/iceberg/pull/1194>: USING
>>>>>    should set format when used in a CTAS command
>>>>>    - #1203 <https://github.com/apache/iceberg/pull/1203>: Table cache
>>>>>    should not expire
>>>>>
>>>>> If there are no objections, I’ll get started and create a release
>>>>> branch. And please reply if there are other issues you’ve seen that should
>>>>> also be included in a patch release.
>>>>>
>>>>> rb
>>>>> --
>>>>> Ryan Blue
>>>>>
>>>>
>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Netflix
>>>
>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Reply via email to