I'm not sure when I can have a more thorough look at this.

To be honest, I'm personally struggling with the burden of supporting
the existing feature set in the Parquet C++ library. The integration
testing strategy for this as well as other features that are being
added to both the Java and C++ libraries (e.g. encryption) make me
uncomfortable due to the lack of automation. As an example, DataPageV2
support in parquet-cpp has been broken since the beginning of the
project (see PARQUET-458) but it's only recently become an issue when
people have been trying to read such files produced by Spark. More
comprehensive integration testing would help ensure that the libraries
remain compatible.

On Tue, Jul 30, 2019 at 9:17 PM 俊杰陈 <[email protected]> wrote:
>
> Dear Parquet developers
>
> We still need your vote!
>
>
> On Wed, Jul 24, 2019 at 9:30 PM 俊杰陈 <[email protected]> wrote:
> >
> > Hi @Ryan Blue  @Wes McKinney
> >
> > We need your valuable vote, any feedback is welcome as well.
> >
> > On Tue, Jul 23, 2019 at 1:24 PM 俊杰陈 <[email protected]> wrote:
> > >
> > > Call for voting again.
> > >
> > > On Fri, Jul 19, 2019 at 1:17 PM 俊杰陈 <[email protected]> wrote:
> > > >
> > > > Dear Parquet developers
> > > >
> > > > We need more votes, please help to vote on this.
> > > >
> > > > On Wed, Jul 17, 2019 at 3:42 PM Gabor Szadovszky
> > > > <[email protected]> wrote:
> > > > >
> > > > > After getting in PARQUET-1625 I vote again for having bloom filter 
> > > > > spec and
> > > > > the thrift file update as is in parquet-format master.
> > > > > +1 (binding)
> > > > >
> > > > > On Mon, Jul 15, 2019 at 3:23 PM 俊杰陈 <[email protected]> wrote:
> > > > >
> > > > > > Thanks Gabor, It's never too late to make it better. We don't have 
> > > > > > to
> > > > > > run it in a hurry, it has been developed for a long time yet.:)
> > > > > >
> > > > > > The thrift file is indeed a bit lag behind the spec. As the spec
> > > > > > defined, the bloom filter data is stored near the footer which means
> > > > > > we don't have to handle it like the page. Therefore, I just opened a
> > > > > > jira to remove bloom_filter_page_header in PageHeader structure, 
> > > > > > while
> > > > > > the BloomFitlerHeader is kept intentionally for convenience. Since 
> > > > > > the
> > > > > > spec and the thrift should be aligned with each other eventually, so
> > > > > > the vote target is both of them.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, Jul 15, 2019 at 7:48 PM Gabor Szadovszky
> > > > > > <[email protected]> wrote:
> > > > > > >
> > > > > > > Hi Junjie,
> > > > > > >
> > > > > > > Sorry for bringing up this a bit late but I have some problems 
> > > > > > > with the
> > > > > > > format update. The parquet.thrift file is updated to have the 
> > > > > > > bloom
> > > > > > filters
> > > > > > > as a page (just as dictionaries and data pages). Meanwhile, the 
> > > > > > > spec
> > > > > > > (BloomFilter.md) says that the bloom filter is stored near the 
> > > > > > > footer.
> > > > > > So,
> > > > > > > if the bloom filter is not part of the row-groups (like column 
> > > > > > > indexes) I
> > > > > > > would not add it as a page. See the struct ColumnIndex in the 
> > > > > > > thrift
> > > > > > file.
> > > > > > > This struct is not referenced anywhere in it only declared. It 
> > > > > > > was done
> > > > > > > this way because we don't parse it in the same way as we parse 
> > > > > > > the pages.
> > > > > > >
> > > > > > > Currently, I am not 100% sure about the target of this vote. If 
> > > > > > > it is a
> > > > > > > vote about adding bloom filters in general then it is a +1 
> > > > > > > (binding). If
> > > > > > it
> > > > > > > is about adding the bloom filters to parquet-format as is then, 
> > > > > > > it is a
> > > > > > -1
> > > > > > > (binding) until we fix the issue above.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Gabor
> > > > > > >
> > > > > > > On Mon, Jul 15, 2019 at 11:45 AM Gidon Gershinsky 
> > > > > > > <[email protected]>
> > > > > > wrote:
> > > > > > >
> > > > > > > > +1 (non-binding)
> > > > > > > >
> > > > > > > > On Mon, Jul 15, 2019 at 12:08 PM Zoltan Ivanfi 
> > > > > > > > <[email protected]
> > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > +1 (binding)
> > > > > > > > >
> > > > > > > > > On Mon, Jul 15, 2019 at 9:57 AM 俊杰陈 <[email protected]> 
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > Dear Parquet developers
> > > > > > > > > >
> > > > > > > > > > I'd like to resume this vote, you can start to vote now. 
> > > > > > > > > > Thanks for
> > > > > > > > your
> > > > > > > > > time.
> > > > > > > > > >
> > > > > > > > > > On Wed, Jul 10, 2019 at 9:29 PM 俊杰陈 <[email protected]> 
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > I see, will resume this next week.  Thanks.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Jul 10, 2019 at 5:26 PM Zoltan Ivanfi
> > > > > > > > <[email protected]>
> > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Hi Junjie,
> > > > > > > > > > > >
> > > > > > > > > > > > Since there are ongoing improvements addressing review
> > > > > > comments, I
> > > > > > > > > would
> > > > > > > > > > > > hold off with the vote for a few more days until the
> > > > > > specification
> > > > > > > > > settles.
> > > > > > > > > > > >
> > > > > > > > > > > > Br,
> > > > > > > > > > > >
> > > > > > > > > > > > Zoltan
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Jul 10, 2019 at 9:32 AM 俊杰陈 <[email protected]>
> > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi Parquet committers and developers
> > > > > > > > > > > > >
> > > > > > > > > > > > > We are waiting for your important ballot:)
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, Jul 9, 2019 at 10:21 AM 俊杰陈 
> > > > > > > > > > > > > <[email protected]>
> > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Yes, there are some public benchmark results, such 
> > > > > > > > > > > > > > as the
> > > > > > > > > official
> > > > > > > > > > > > > > benchmark from xxhash site (http://www.xxhash.com/) 
> > > > > > > > > > > > > > and
> > > > > > > > > published
> > > > > > > > > > > > > > comparison from smhasher project
> > > > > > > > > > > > > > (https://github.com/rurban/smhasher/).
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Tue, Jul 9, 2019 at 5:25 AM Wes McKinney <
> > > > > > > > [email protected]>
> > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Do you have any benchmark data to support the 
> > > > > > > > > > > > > > > choice of
> > > > > > hash
> > > > > > > > > function?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Wed, Jul 3, 2019 at 8:41 AM 俊杰陈 
> > > > > > > > > > > > > > > <[email protected]>
> > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Dear Parquet developers
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > To simplify the voting, I 'd like to update 
> > > > > > > > > > > > > > > > voting
> > > > > > content
> > > > > > > > > to the
> > > > > > > > > > > > > spec
> > > > > > > > > > > > > > > > with xxHash hash strategy. Now you can reply 
> > > > > > > > > > > > > > > > with +1
> > > > > > or -1.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks for your participation.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Tue, Jul 2, 2019 at 10:23 AM 俊杰陈 <
> > > > > > [email protected]>
> > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Dear Parquet developers
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Parquet Bloom filter has been developed for a 
> > > > > > > > > > > > > > > > > while,
> > > > > > per
> > > > > > > > > the
> > > > > > > > > > > > > discussion on the mail list, it's time to call a vote 
> > > > > > > > > > > > > for
> > > > > > spec to
> > > > > > > > > move
> > > > > > > > > > > > > forward. The current spec can be found at
> > > > > > > > > > > > >
> > > > > > > > > https://github.com/apache/parquet-format/blob/master/BloomFilter.md.
> > > > > > > > > > > > > There are some different options about the internal 
> > > > > > > > > > > > > hash
> > > > > > choice
> > > > > > > > of
> > > > > > > > > Bloom
> > > > > > > > > > > > > filter and the PR is for that concern.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > So I 'd like to propose to vote the spec + 
> > > > > > > > > > > > > > > > > hash
> > > > > > option,
> > > > > > > > for
> > > > > > > > > > > > > example:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > +1 to spec and xxHash
> > > > > > > > > > > > > > > > > +1 to spec and murmur3
> > > > > > > > > > > > > > > > > ...
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Please help to vote, any feedback is also 
> > > > > > > > > > > > > > > > > welcome in
> > > > > > the
> > > > > > > > > > > > > discussion thread.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks & Best Regards
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > Thanks & Best Regards
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > --
> > > > > > > > > > > > > > Thanks & Best Regards
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > --
> > > > > > > > > > > > > Thanks & Best Regards
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > Thanks & Best Regards
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Thanks & Best Regards
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Thanks & Best Regards
> > > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Thanks & Best Regards
> > >
> > >
> > >
> > > --
> > > Thanks & Best Regards
> >
> >
> >
> > --
> > Thanks & Best Regards
>
>
>
> --
> Thanks & Best Regards

Reply via email to