[Announce] new Parquet committer Benoit Hanotte

2018-05-29 Thread Julien Le Dem
We are happy to announce that Benoit has accepted to become a Parquet
committer.
Welcome Benoit!


[jira] [Commented] (PARQUET-1292) Add constructors to ProtoParquetWriter to write specs compliant Parquet

2018-05-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16493896#comment-16493896
 ] 

ASF GitHub Bot commented on PARQUET-1292:
-

chawlakunal commented on issue #473: PARQUET-1292 Adding constructors to 
ProtoParquetWriter with writeSpecsCompliant flag
URL: https://github.com/apache/parquet-mr/pull/473#issuecomment-392865677
 
 
   @BenoitHanotte I'll work on that sometime this week.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add constructors to ProtoParquetWriter to write specs compliant Parquet
> ---
>
> Key: PARQUET-1292
> URL: https://issues.apache.org/jira/browse/PARQUET-1292
> Project: Parquet
>  Issue Type: Improvement
>Reporter: Kunal Chawla
>Assignee: Kunal Chawla
>Priority: Minor
> Fix For: 1.11.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1292) Add constructors to ProtoParquetWriter to write specs compliant Parquet

2018-05-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-1292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16493879#comment-16493879
 ] 

ASF GitHub Bot commented on PARQUET-1292:
-

BenoitHanotte commented on issue #473: PARQUET-1292 Adding constructors to 
ProtoParquetWriter with writeSpecsCompliant flag
URL: https://github.com/apache/parquet-mr/pull/473#issuecomment-392860267
 
 
   Hello @costimuraru @chawlakunal 
   As we might again be adding a new flag if 
https://github.com/apache/parquet-mr/pull/410 passes, I think we may want to 
follow @lukasnalezenec suggestion and try to have constructors that accept the 
conf object, which would allow to avoid duplicating all the existing 
constructors for each new flag.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add constructors to ProtoParquetWriter to write specs compliant Parquet
> ---
>
> Key: PARQUET-1292
> URL: https://issues.apache.org/jira/browse/PARQUET-1292
> Project: Parquet
>  Issue Type: Improvement
>Reporter: Kunal Chawla
>Assignee: Kunal Chawla
>Priority: Minor
> Fix For: 1.11.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-951) Missing field id support in parquet metadata

2018-05-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16493865#comment-16493865
 ] 

ASF GitHub Bot commented on PARQUET-951:


BenoitHanotte commented on issue #410: [PARQUET-951] Pull request for handling 
protobuf field id
URL: https://github.com/apache/parquet-mr/pull/410#issuecomment-392855680
 
 
   Hello @costimuraru @qinghui-xu @julienledem 
   As the protobuf descriptor is already serialized in the file metadata 
(https://github.com/apache/parquet-mr/blob/master/parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoWriteSupport.java#L132)
 and contains all the information required to map the protobuf field id to its 
name, can't we leverage this instead of changing the way we set the field id in 
the parquet schema?
   Not only would this isolate the change to the protobuf part of the logic, it 
would also bring backward compatibility as files already contain the descriptor 
in its serialized form. In this case we would only need to set a flag at 
read-time, instead of also having to add a flag when writing.
   If we were setting the parquet field ids according to the protobuf ids, I 
don't think we would be able to support schema compatibility for files written 
with a previous version of parquet as the parquet schema of the file would be 
missing the required information.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Missing field id support in parquet metadata
> 
>
> Key: PARQUET-951
> URL: https://issues.apache.org/jira/browse/PARQUET-951
> Project: Parquet
>  Issue Type: Bug
>Reporter: Qinghui Xu
>Priority: Major
>
> Field id is essential for some serialization framework such as protobuf, and 
> they are used to keep schema forward/backward compatibility which could not 
> be achieved by using field names. Currently field id is not persisted as file 
> metadata.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1315) [C++] ColumnChunkMetaData.has_dictionary_page() should return bool, not int64_t

2018-05-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16493860#comment-16493860
 ] 

ASF GitHub Bot commented on PARQUET-1315:
-

majetideepak opened a new pull request #469: PARQUET-1315: 
ColumnChunkMetaData.has_dictionary_page() should return…
URL: https://github.com/apache/parquet-cpp/pull/469
 
 
   … bool, not int64_t


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] ColumnChunkMetaData.has_dictionary_page() should return bool, not 
> int64_t
> ---
>
> Key: PARQUET-1315
> URL: https://issues.apache.org/jira/browse/PARQUET-1315
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cpp
>Reporter: Antoine Pitrou
>Assignee: Deepak Majeti
>Priority: Major
>
> It's semantically a boolean.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (PARQUET-41) Add bloom filters to parquet statistics

2018-05-29 Thread Junjie Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junjie Chen reassigned PARQUET-41:
--

Assignee: Junjie Chen  (was: Ferdinand Xu)

> Add bloom filters to parquet statistics
> ---
>
> Key: PARQUET-41
> URL: https://issues.apache.org/jira/browse/PARQUET-41
> Project: Parquet
>  Issue Type: New Feature
>  Components: parquet-format, parquet-mr
>Reporter: Alex Levenson
>Assignee: Junjie Chen
>Priority: Major
>  Labels: filter2
>
> For row groups with no dictionary, we could still produce a bloom filter. 
> This could be very useful in filtering entire row groups.
> Pull request:
> https://github.com/apache/parquet-mr/pull/215



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (PARQUET-1315) [C++] ColumnChunkMetaData.has_dictionary_page() should return bool, not int64_t

2018-05-29 Thread Deepak Majeti (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Majeti reassigned PARQUET-1315:
--

Assignee: Deepak Majeti

> [C++] ColumnChunkMetaData.has_dictionary_page() should return bool, not 
> int64_t
> ---
>
> Key: PARQUET-1315
> URL: https://issues.apache.org/jira/browse/PARQUET-1315
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cpp
>Reporter: Antoine Pitrou
>Assignee: Deepak Majeti
>Priority: Major
>
> It's semantically a boolean.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PARQUET-1315) [C++] ColumnChunkMetaData.has_dictionary_page() should return bool, not int64_t

2018-05-29 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created PARQUET-1315:
---

 Summary: [C++] ColumnChunkMetaData.has_dictionary_page() should 
return bool, not int64_t
 Key: PARQUET-1315
 URL: https://issues.apache.org/jira/browse/PARQUET-1315
 Project: Parquet
  Issue Type: Bug
  Components: parquet-cpp
Reporter: Antoine Pitrou


It's semantically a boolean.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1244) Documentation link to logical types broken

2018-05-29 Thread Gabor Szadovszky (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-1244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16493438#comment-16493438
 ] 

Gabor Szadovszky commented on PARQUET-1244:
---

Thanks a lot, [~nkollar].

+1

> Documentation link to logical types broken
> --
>
> Key: PARQUET-1244
> URL: https://issues.apache.org/jira/browse/PARQUET-1244
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-format
>Reporter: Antoine Pitrou
>Assignee: Nandor Kollar
>Priority: Minor
>  Labels: beginner
> Attachments: PARQUET-1244_1.patch, PARQUET-1244_2.patch
>
>
> The link to {{LogicalTypes.md}} here is broken:
> https://parquet.apache.org/documentation/latest/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Move Dremel paper to parquet-format

2018-05-29 Thread Nandor Kollar
Moving the details about details, implementation, types etc. to Github is a
good idea. In my opinion the website on parquet.apache.org should provide
just a very high level overview of Parquet with links to Github pages and
contact information.

Nandor

On Tue, May 29, 2018 at 1:42 PM, Zoltan Ivanfi  wrote:

> Hi,
>
> Taking a step back, are we satisfied with the current web page mechanism? I
> find its dependence on subversion a real pain (checking it out, making
> patches for review, and the reviews themselves are a lot more complicated
> than with github). I think that's one of the main reasons it's so neglected
> (it describes Parquet as of 2003). Can't we use a Wiki for the same
> purpose? Or .md files in the github repo? Or can we migrate the web page to
> its own github repo?
>
> Best,
>
> Zoltan
>
> On Tue, May 29, 2018 at 1:21 PM Uwe L. Korn  wrote:
>
> > Hello Nandor,
> >
> > as it seems that wiki contents were written by Julian and as they are on
> > github wiki, they are markdown in the backend.
> >
> > The easiest thing from an IP side would be if Julien could contribute as
> > plain markdown files to the parquet-format repo. I don't think we
> want/can
> > to enable the wiki for the parquet-format repo.
> >
> > Uwe
> >
> > On Tue, May 29, 2018, at 12:36 PM, Nandor Kollar wrote:
> > > Hi All,
> > >
> > > I'm wondering if we can move the Dremel paper to parquet-format wiki.
> > > Right
> > > now, every reference to this paper in Github (both parquet-mr and
> > > parquet-format readme) and the website refers to Julien's Github
> > > <
> > https://github.com/julienledem/redelm/wiki/The-striping-and-assembly-
> algorithms-from-the-Dremel-paper
> > >,
> > > it would be nice if we can make this consistent, and move the mentioned
> > > page to a Github Wiki page inside apache/parquet-format.
> > >
> > > Regards,
> > > Nandor
> >
>


Re: Move Dremel paper to parquet-format

2018-05-29 Thread Uwe L. Korn
Hello,

in Arrow we have the website in the main repository [1] as Markdown files. 
Review is done with patches like any other project. The deployment to the 
actual Apache servers is done using a separate (magic) git-repo. This is 
working really nice for us (and also some other Apache projects are also happy 
with this approach). I would also recommend to use this for Parquet. The only 
negative point I see is that the deployment is still done manually: a committer 
needs to run a script locally that builds the site and push the rendered 
version back to the magic git repo so that site is updated.

Uwe

[1]

On Tue, May 29, 2018, at 1:42 PM, Zoltan Ivanfi wrote:
> Hi,
> 
> Taking a step back, are we satisfied with the current web page mechanism? I
> find its dependence on subversion a real pain (checking it out, making
> patches for review, and the reviews themselves are a lot more complicated
> than with github). I think that's one of the main reasons it's so neglected
> (it describes Parquet as of 2003). Can't we use a Wiki for the same
> purpose? Or .md files in the github repo? Or can we migrate the web page to
> its own github repo?
> 
> Best,
> 
> Zoltan
> 
> On Tue, May 29, 2018 at 1:21 PM Uwe L. Korn  wrote:
> 
> > Hello Nandor,
> >
> > as it seems that wiki contents were written by Julian and as they are on
> > github wiki, they are markdown in the backend.
> >
> > The easiest thing from an IP side would be if Julien could contribute as
> > plain markdown files to the parquet-format repo. I don't think we want/can
> > to enable the wiki for the parquet-format repo.
> >
> > Uwe
> >
> > On Tue, May 29, 2018, at 12:36 PM, Nandor Kollar wrote:
> > > Hi All,
> > >
> > > I'm wondering if we can move the Dremel paper to parquet-format wiki.
> > > Right
> > > now, every reference to this paper in Github (both parquet-mr and
> > > parquet-format readme) and the website refers to Julien's Github
> > > <
> > https://github.com/julienledem/redelm/wiki/The-striping-and-assembly-algorithms-from-the-Dremel-paper
> > >,
> > > it would be nice if we can make this consistent, and move the mentioned
> > > page to a Github Wiki page inside apache/parquet-format.
> > >
> > > Regards,
> > > Nandor
> >


Re: Move Dremel paper to parquet-format

2018-05-29 Thread Zoltan Ivanfi
Hi,

Taking a step back, are we satisfied with the current web page mechanism? I
find its dependence on subversion a real pain (checking it out, making
patches for review, and the reviews themselves are a lot more complicated
than with github). I think that's one of the main reasons it's so neglected
(it describes Parquet as of 2003). Can't we use a Wiki for the same
purpose? Or .md files in the github repo? Or can we migrate the web page to
its own github repo?

Best,

Zoltan

On Tue, May 29, 2018 at 1:21 PM Uwe L. Korn  wrote:

> Hello Nandor,
>
> as it seems that wiki contents were written by Julian and as they are on
> github wiki, they are markdown in the backend.
>
> The easiest thing from an IP side would be if Julien could contribute as
> plain markdown files to the parquet-format repo. I don't think we want/can
> to enable the wiki for the parquet-format repo.
>
> Uwe
>
> On Tue, May 29, 2018, at 12:36 PM, Nandor Kollar wrote:
> > Hi All,
> >
> > I'm wondering if we can move the Dremel paper to parquet-format wiki.
> > Right
> > now, every reference to this paper in Github (both parquet-mr and
> > parquet-format readme) and the website refers to Julien's Github
> > <
> https://github.com/julienledem/redelm/wiki/The-striping-and-assembly-algorithms-from-the-Dremel-paper
> >,
> > it would be nice if we can make this consistent, and move the mentioned
> > page to a Github Wiki page inside apache/parquet-format.
> >
> > Regards,
> > Nandor
>


Re: Move Dremel paper to parquet-format

2018-05-29 Thread Uwe L. Korn
Hello Nandor,

as it seems that wiki contents were written by Julian and as they are on github 
wiki, they are markdown in the backend.

The easiest thing from an IP side would be if Julien could contribute as plain 
markdown files to the parquet-format repo. I don't think we want/can to enable 
the wiki for the parquet-format repo.

Uwe

On Tue, May 29, 2018, at 12:36 PM, Nandor Kollar wrote:
> Hi All,
> 
> I'm wondering if we can move the Dremel paper to parquet-format wiki. 
> Right
> now, every reference to this paper in Github (both parquet-mr and
> parquet-format readme) and the website refers to Julien's Github
> ,
> it would be nice if we can make this consistent, and move the mentioned
> page to a Github Wiki page inside apache/parquet-format.
> 
> Regards,
> Nandor


Move Dremel paper to parquet-format

2018-05-29 Thread Nandor Kollar
Hi All,

I'm wondering if we can move the Dremel paper to parquet-format wiki. Right
now, every reference to this paper in Github (both parquet-mr and
parquet-format readme) and the website refers to Julien's Github
,
it would be nice if we can make this consistent, and move the mentioned
page to a Github Wiki page inside apache/parquet-format.

Regards,
Nandor


[jira] [Commented] (PARQUET-1244) Documentation link to logical types broken

2018-05-29 Thread Nandor Kollar (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-1244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16493346#comment-16493346
 ] 

Nandor Kollar commented on PARQUET-1244:


Fixed several other broken links and style glitches. I also added a reference 
to the C++ and Rust implementation too.

> Documentation link to logical types broken
> --
>
> Key: PARQUET-1244
> URL: https://issues.apache.org/jira/browse/PARQUET-1244
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-format
>Reporter: Antoine Pitrou
>Assignee: Nandor Kollar
>Priority: Minor
>  Labels: beginner
> Attachments: PARQUET-1244_1.patch, PARQUET-1244_2.patch
>
>
> The link to {{LogicalTypes.md}} here is broken:
> https://parquet.apache.org/documentation/latest/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PARQUET-1244) Documentation link to logical types broken

2018-05-29 Thread Nandor Kollar (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nandor Kollar updated PARQUET-1244:
---
Attachment: PARQUET-1244_2.patch

> Documentation link to logical types broken
> --
>
> Key: PARQUET-1244
> URL: https://issues.apache.org/jira/browse/PARQUET-1244
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-format
>Reporter: Antoine Pitrou
>Assignee: Nandor Kollar
>Priority: Minor
>  Labels: beginner
> Attachments: PARQUET-1244_1.patch, PARQUET-1244_2.patch
>
>
> The link to {{LogicalTypes.md}} here is broken:
> https://parquet.apache.org/documentation/latest/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PARQUET-1244) Documentation link to logical types broken

2018-05-29 Thread Nandor Kollar (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nandor Kollar updated PARQUET-1244:
---
Attachment: PARQUET-1244_1.patch

> Documentation link to logical types broken
> --
>
> Key: PARQUET-1244
> URL: https://issues.apache.org/jira/browse/PARQUET-1244
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-format
>Reporter: Antoine Pitrou
>Assignee: Nandor Kollar
>Priority: Minor
>  Labels: beginner
> Attachments: PARQUET-1244_1.patch
>
>
> The link to {{LogicalTypes.md}} here is broken:
> https://parquet.apache.org/documentation/latest/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (PARQUET-1244) Documentation link to logical types broken

2018-05-29 Thread Nandor Kollar (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nandor Kollar reassigned PARQUET-1244:
--

Assignee: Nandor Kollar

> Documentation link to logical types broken
> --
>
> Key: PARQUET-1244
> URL: https://issues.apache.org/jira/browse/PARQUET-1244
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-format
>Reporter: Antoine Pitrou
>Assignee: Nandor Kollar
>Priority: Minor
>  Labels: beginner
>
> The link to {{LogicalTypes.md}} here is broken:
> https://parquet.apache.org/documentation/latest/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1313) [C++] Compilation failure with VS2017

2018-05-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16493286#comment-16493286
 ] 

ASF GitHub Bot commented on PARQUET-1313:
-

pitrou opened a new pull request #468: PARQUET-1313: [C++] Fix gtest build 
failure on Windows
URL: https://github.com/apache/parquet-cpp/pull/468
 
 
   Also add an option to enable clcache if found.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Compilation failure with VS2017
> -
>
> Key: PARQUET-1313
> URL: https://issues.apache.org/jira/browse/PARQUET-1313
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cpp
>Reporter: Antoine Pitrou
>Priority: Major
>
> I get hit by the following issue:
> https://github.com/google/googletest/issues/
> Not sure why I don't get the same problem with Arrow C++.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PARQUET-1314) [C++] arrow-reader-writer-test::TestInt96ParquetIO fails on Windows (VS2017)

2018-05-29 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created PARQUET-1314:
---

 Summary: [C++] arrow-reader-writer-test::TestInt96ParquetIO fails 
on Windows (VS2017)
 Key: PARQUET-1314
 URL: https://issues.apache.org/jira/browse/PARQUET-1314
 Project: Parquet
  Issue Type: Bug
  Components: parquet-cpp
Reporter: Antoine Pitrou


{code}
[--] 1 test from TestInt96ParquetIO
[ RUN  ] TestInt96ParquetIO.ReadIntoTimestamp
..\src\parquet\arrow\arrow-reader-writer-test.cc(344): error: Failed
Got:
[128100145738543]
Expected:
[145738543]
..\src\parquet\arrow\arrow-reader-writer-test.cc(893): error: Expected: this->Re
adAndCheckSingleColumnFile(*values) doesn't generate new fatal failures in the c
urrent thread.
  Actual: it does.
[  FAILED  ] TestInt96ParquetIO.ReadIntoTimestamp (0 ms)
[--] 1 test from TestInt96ParquetIO (15 ms total)
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PARQUET-1313) [C++] Compilation failure with VS2017

2018-05-29 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created PARQUET-1313:
---

 Summary: [C++] Compilation failure with VS2017
 Key: PARQUET-1313
 URL: https://issues.apache.org/jira/browse/PARQUET-1313
 Project: Parquet
  Issue Type: Bug
  Components: parquet-cpp
Reporter: Antoine Pitrou


I get hit by the following issue:
https://github.com/google/googletest/issues/

Not sure why I don't get the same problem with Arrow C++.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)