Vitalli, Thank you for looking into this, sorry I missed it in the review. When you open up a request to fix this issue could you update the check for correctness in the metadata to check for the is.date.correct flag, or a version greater than or equal to 1.9.0 (no snapshot)? This will allow us to stop writing the flag into the metadata at the release or shortly thereafter.
It might be worth looking at how we can catch issues like this related to plan serialization. There were pretty thorough tests with the patch, but we still have code paths that only come up with remote fragment usage that we could test in other ways to avoid bugs like this. - Jason Jason Altekruse Software Engineer at Dremio Apache Drill Committer On Thu, Oct 27, 2016 at 4:03 PM, Zelaine Fong <[email protected]> wrote: > Vitalii -- are you still planning to open a ticket and pull request for the > fix you've noted below? > > -- Zelaine > > On Wed, Oct 26, 2016 at 8:28 AM, Vitalii Diravka < > [email protected]> > wrote: > > > @Paul Rogers > > It may be the undefined case when the file is generated with > drill.version > > = 1.9-SNAPSHOT. > > It is more easy to determine corrupted date with this flag and there is > no > > need to wait the end of release to merge these changes. > > > > @Jinfeng NI > > It looks like you are right. > > With consistent mode (isDateCorrect = true) all tests are passed. So I am > > going to open a jira ticket for it with next changes > > https://github.com/vdiravka/drill/commit/ff8d5c7d601915f760d1b0e9618730 > > 3410cac5d3 > > Thanks. > > > > Kind regards > > Vitalii > > > > 2016-10-25 18:36 GMT+00:00 Jinfeng Ni <[email protected]>: > > > > > I'm not sure if I fully understand your answers. The bottom line is > > > quite simple: given a set of parquet files, the ParquetTableMeta > > > instance constructed in Drill should have identical value for > > > "isDateCorrect", whether it comes from parquet footer, or parquet > > > metadata cache, or whether there is partition pruning or not. However, > > > the code shows that this flag is not in consistent mode across > > > different cases. > > > > > > > > > > > > On Tue, Oct 25, 2016 at 11:24 AM, Vitalii Diravka > > > <[email protected]> wrote: > > > > Hi Jinfeng, > > > > > > > > 1.If the parquet files are generated with Drill after Drill-4203 > these > > > > files have "isDateCorrect = true" property. > > > > Drill serializes this property from metadata now. When we set this > > > property > > > > in the first constructor we will hide the value from metadata. > > > > IsDateCorrect will be false only if this value equals to the false > (no > > > case > > > > for it now) or absent in parquet metadata footer. > > > > > > > > > > > > 2. I'm not sure the reason to change isDateCorrect metadata property > > when > > > > the user disable dates correction. > > > > If you have some use case it would be great if you provide it. > > > > > > > > 3. Maybe you are right regarding to when Parquet metadata is cloned. > > > > Here I added the property in the same manner as Jason's new property > > > > "drillVersion. So need it a separate unit test? > > > > > > > > > > > > Kind regards > > > > Vitalii > > > > > > > > 2016-10-25 16:23 GMT+00:00 Jinfeng Ni <[email protected]>: > > > > > > > >> Forgot to copy the link to the code. > > > >> > > > >> [1] https://github.com/apache/drill/blob/master/exec/java- > > > >> exec/src/main/java/org/apache/drill/exec/store/parquet/ > > > >> Metadata.java#L950-L955 > > > >> > > > >> On Tue, Oct 25, 2016 at 9:16 AM, Jinfeng Ni <[email protected]> wrote: > > > >> > @Jason, @Vitalli, > > > >> > > > > >> > Any thoughts on this question, since both you worked on fix of > > > >> DRILL-4203? > > > >> > > > > >> > Looking through the code, there is a third case [1], where this > flag > > > >> > is set to false when Parquet metadata is cloned (after partition > > > >> > pruning, etc). That means, for the 2nd case where the flag is set > > to > > > >> > true, if there is pruning happening, the new parquet metadata will > > see > > > >> > the flag is flipped to false. This does not make sense to me. > > > >> > > > > >> > > > > >> > > > > >> > On Mon, Oct 24, 2016 at 3:10 PM, Jinfeng Ni <[email protected]> > wrote: > > > >> >> Hello All, > > > >> >> > > > >> >> DRILL-4203 addressed the date field issue. In the fix, it > > introduced > > > >> >> a new field in ParquetTableMetadata_v2 : isDateCorrect. I have > > some > > > >> >> difficulty in understanding the meaning of this field. > > > >> >> > > > >> >> According to [1], this field is set to false, when Drill gets > > parquet > > > >> >> metadata from parquet footer. This field is set to true in code > > > flow > > > >> >> of [2] and [3], when Drill gets parquet metadata from meta data > > > cache. > > > >> >> > > > >> >> Questions I have: > > > >> >> 1. If the parquet files are generated with Drill after > DRILL-4203, > > > >> >> Drill still thinks date field is NOT correct (isDateCorrect = > > false)? > > > >> >> 2. Why does this filed have nothing to do with "autoCorrection" > > flag > > > >> >> [4]? If someone turns off autoCorrection, will it have impact on > > > this > > > >> >> "isDateCorrect" flag ? > > > >> >> > > > >> >> Thanks in advance for any input, > > > >> >> > > > >> >> Jinfeng > > > >> >> > > > >> >> > > > >> >> [1] https://github.com/apache/drill/blob/master/exec/java- > > > >> exec/src/main/java/org/apache/drill/exec/store/parquet/ > > > Metadata.java#L932 > > > >> >> [2] https://github.com/apache/drill/blob/master/exec/java- > > > >> exec/src/main/java/org/apache/drill/exec/store/parquet/ > > > Metadata.java#L936 > > > >> >> [3] https://github.com/apache/drill/blob/master/exec/java- > > > >> exec/src/main/java/org/apache/drill/exec/store/parquet/ > > > Metadata.java#L187 > > > >> >> [4] https://github.com/apache/drill/blob/master/exec/java- > > > >> exec/src/main/java/org/apache/drill/exec/store/parquet/ > > > >> Metadata.java#L354-L355 > > > >> > > > > > >
