Hi Dev

Thanks Wes for these comments.

As Informed in other threads, I have completed most of it. Will try to 
structure it according to the comments.

I had one question reading a (un)related matter. whenever we make calls to

ReadBatch(int64_t batch_size, int16_t* def_levels,
                                                int16_t* rep_levels, T* values,
                                                int64_t* values_read)

Is there are possibility to keep track of which page we are at to retrieve 
values?

Regards
Arun Balajiee
________________________________
From: Wes McKinney <wesmck...@gmail.com>
Sent: 02 April 2020 13:16
To: Parquet Dev <dev@parquet.apache.org>
Cc: Deepak Majeti <deepak.maj...@microfocus.com>; Anatoli Shein 
<sh...@microfocus.com>
Subject: Re: Arrow 1404: Adding index for Page-level Skipping

I just left comments on the PR. The new APIs (their semantics and what
should be passed as arguments) are still not adequately documented (in
other words, I wouldn't know how to use them just from reading the
header file), so I think we should focus on that for the moment. In
fairness documentation for other functions in these headers in poor,
but they also have the semantics of "read all data in the file from
start to finish". These new APIs appear to do something different, so
we need to write that down in detail in Doxygen-style comments

On Thu, Apr 2, 2020 at 2:23 AM Lekshmi Narayanan, Arun Balajiee
<arl...@pitt.edu> wrote:
>
> Hi
> Would my pull request be useful for the discussion from here?
> https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Farrow%2Fpull%2F6807&amp;data=02%7C01%7CARL122%40pitt.edu%7Cd36ddd6e18fb44808ef308d7d729b8c8%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C1%7C637214446509861845&amp;sdata=PQAIxpTPm87qRb%2FmZoHXfLCsdcCiyC%2Biqui40tqEd9U%3D&amp;reserved=0
>
> Regards,
> Arun Balajiee
>
> From: Wes McKinney<mailto:wesmck...@gmail.com>
> Sent: Tuesday, February 18, 2020 3:34 AM
> To: Parquet Dev<mailto:dev@parquet.apache.org>
> Cc: Deepak Majeti<mailto:deepak.maj...@microfocus.com>; Anatoli 
> Shein<mailto:sh...@microfocus.com>
> Subject: Re: Arrow 1404: Adding index for Page-level Skipping
>
> That's helpful, but I think it would be a good idea to have enough
> information in the header files to determine what the new APIs do
> without reading example code.
>
> On Mon, Feb 17, 2020 at 10:59 AM Lekshmi Narayanan, Arun Balajiee
> <arl...@pitt.edu> wrote:
> >
> > I also made changes in the low-level-api folder, couldn’t capture in that 
> > link I think
> > https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fa2un%2Farrow%2Fblob%2FPARQUET-1404-Add-index-pages-to-the-format-to-support-efficient-page-skipping-to-parquet-cpp%2Fcpp%2Fexamples%2Fparquet%2Flow-level-api%2Freader-writer-with-index.cc&amp;data=02%7C01%7CARL122%40pitt.edu%7Cd36ddd6e18fb44808ef308d7d729b8c8%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C1%7C637214446509861845&amp;sdata=vxuK%2BvZRtwhLcGepda6T5i3r6HDk0JLS3vh9leIcBlo%3D&amp;reserved=0
> >
> > Regards,
> > Arun Balajiee
> >
> > ________________________________
> > From: Wes McKinney <wesmck...@gmail.com>
> > Sent: Monday, February 17, 2020 8:11:09 AM
> > To: Parquet Dev <dev@parquet.apache.org>
> > Cc: Deepak Majeti <deepak.maj...@microfocus.com>; Anatoli Shein 
> > <sh...@microfocus.com>
> > Subject: Re: Arrow 1404: Adding index for Page-level Skipping
> >
> > hi Arun,
> >
> > By "public APIs" I was referring to changes in the public header
> > files. I see there are some changes to parquet/file_reader.h and
> > metadata.h
> >
> > https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Farrow%2Fcompare%2Fmaster...a2un%3APARQUET-1404-Add-index-pages-to-the-format-to-support-efficient-page-skipping-to-parquet-cpp&amp;data=02%7C01%7CARL122%40pitt.edu%7Cd36ddd6e18fb44808ef308d7d729b8c8%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C1%7C637214446509871841&amp;sdata=rBl3pY6bRFuSzWg2QT2Ca6aui2HZJjSoh1mbzDq%2F93M%3D&amp;reserved=0
> >
> > Can you add some Doxygen comments to the new APIs that explain how
> > these APIs are to be used (and what the parameters mean)? The hope
> > would be that a user could make use of the column index functionality
> > by reading the .h files only.
> >
> > Thanks
> > Wes
> >
> > On Fri, Feb 14, 2020 at 2:57 PM Lekshmi Narayanan, Arun Balajiee
> > <arl...@pitt.edu> wrote:
> > >
> > > Hi
> > > I have made my changes for api here, does it look good and is this what 
> > > you were seeking from me? The writer- api is still in the works and I 
> > > need to make the reader more generic to support all class data types.
> > >
> > > https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fa2un%2Farrow%2Fblob%2FPARQUET-1404-Add-index-pages-to-the-format-to-support-efficient-page-skipping-to-parquet-cpp%2Fcpp%2Fexamples%2Fparquet%2Flow-level-api%2Freader-writer-with-index.cc&amp;data=02%7C01%7CARL122%40pitt.edu%7Cd36ddd6e18fb44808ef308d7d729b8c8%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C1%7C637214446509871841&amp;sdata=TB29CbqF3MlD0v9d%2BOTH%2FD4NAF%2BqGJvpMpJZIeWd2P4%3D&amp;reserved=0
> > >
> > >
> > > Regards,
> > > Arun Balajiee
> > >
> > > From: Wes McKinney<mailto:wesmck...@gmail.com>
> > > Sent: Tuesday, February 4, 2020 11:24 PM
> > > To: Parquet Dev<mailto:dev@parquet.apache.org>
> > > Cc: Deepak Majeti<mailto:deepak.maj...@microfocus.com>; Anatoli 
> > > Shein<mailto:sh...@microfocus.com>
> > > Subject: Re: Arrow 1404: Adding index for Page-level Skipping
> > >
> > > hi Arun,
> > >
> > > We can keep the discussion going on here and on GitHub when you have a
> > > pull request to discuss. There are a number of different people who
> > > can give advice.
> > >
> > > Thanks
> > >
> > > On Tue, Feb 4, 2020 at 10:11 PM Lekshmi Narayanan, Arun Balajiee
> > > <arl...@pitt.edu> wrote:
> > > >
> > > > Actually I made some changes after the date on the pull request ( even 
> > > > in this year), which are not getting reflected on this compare link
> > > >
> > > > Regards,
> > > > Arun Balajiee
> > > >
> > > > From: Wes McKinney<mailto:wesmck...@gmail.com>
> > > > Sent: Tuesday, February 4, 2020 6:43 PM
> > > > To: Parquet Dev<mailto:dev@parquet.apache.org>
> > > > Cc: Deepak Majeti<mailto:deepak.maj...@microfocus.com>; Anatoli 
> > > > Shein<mailto:sh...@microfocus.com>
> > > > Subject: Re: Arrow 1404: Adding index for Page-level Skipping
> > > >
> > > > Here's a compare link in case others want to have a look
> > > >
> > > > https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Farrow%2Fcompare%2Fmaster...a2un%3APARQUET-1404-Add-index-pages-to-the-format-to-support-efficient-page-skipping-to-parquet-cpp&amp;data=02%7C01%7CARL122%40pitt.edu%7Cd36ddd6e18fb44808ef308d7d729b8c8%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C1%7C637214446509871841&amp;sdata=rBl3pY6bRFuSzWg2QT2Ca6aui2HZJjSoh1mbzDq%2F93M%3D&amp;reserved=0
> > > >
> > > > On Tue, Feb 4, 2020 at 5:41 PM Wes McKinney <wesmck...@gmail.com> wrote:
> > > > >
> > > > > hi Arun,
> > > > >
> > > > > I took a brief look at your branch. One thing that is missing is the
> > > > > proposed public APIs that use the index pages -- that would be very
> > > > > helpful for this discussion.
> > > > >
> > > > > I don't think we have any code for doing random access of a particular
> > > > > data page in a column chunk, so having as an initial matter would also
> > > > > be helpful.
> > > > >
> > > > > - Wes
> > > > >
> > > > > On Tue, Feb 4, 2020 at 2:28 PM Lekshmi Narayanan, Arun Balajiee
> > > > > <arl...@pitt.edu> wrote:
> > > > > >
> > > > > > Hi Parquet dev
> > > > > >
> > > > > > Deepak Majeti was my dev lead during my summer internship, from 
> > > > > > when I am trying to add a few changes in the Arrow Parquet Project 
> > > > > > for the ticket below
> > > > > >
> > > > > > https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FPARQUET-1404&amp;data=02%7C01%7CARL122%40pitt.edu%7Cd36ddd6e18fb44808ef308d7d729b8c8%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C1%7C637214446509871841&amp;sdata=IXX%2FwrAwPOFIAHl1WH4n6nNkq9JZ2asOf99dzIUxBN8%3D&amp;reserved=0
> > > > > >  (Assigned to Deepak)
> > > > > >
> > > > > > With this regard, I am making a few changes to 
> > > > > > src/parquet/file_reader.cc ( in a fork on my repository)
> > > > > >
> > > > > > https://nam05.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fa2un%2Farrow%2Ftree%2FPARQUET-1404-Add-index-pages-to-the-format-to-support-efficient-page-skipping-to-parquet-cpp%2Fcpp&amp;data=02%7C01%7CARL122%40pitt.edu%7Cd36ddd6e18fb44808ef308d7d729b8c8%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C1%7C637214446509871841&amp;sdata=ps%2FRPqvGv%2F04f49yF0vPXBQv2Eu6mS8gZEW83Qg9Cv0%3D&amp;reserved=0
> > > > > >
> > > > > > I am stuck at trying to read a particular row using the index that 
> > > > > > I get in the page_location array struct of offset index. Could you 
> > > > > > help me with this ? and if there have been discussions on the 
> > > > > > forums for this as well, could you direct me to that link?
> > > > > >
> > > > > > Regards,
> > > > > > Arun Balajiee
> > > > > >
> > > >
> > >
>

Reply via email to