Re: Date and time for next Parquet Sync

2017-09-12 Thread Julien Le Dem
+1

On Mon, Sep 11, 2017 at 8:36 PM, Lars Volker  wrote:

> There were no objections so I sent out a meeting invite to everyone who was
> on the last invite. If you'd like to participate, too, please reply to this
> email.
>
> Cheers, Lars
>
> On Mon, Sep 11, 2017 at 11:06 AM, Ryan Blue 
> wrote:
>
> > That works for me.
> >
> > On Mon, Sep 11, 2017 at 7:55 AM, Lars Volker  wrote:
> >
> > > Hi All,
> > >
> > > I'd like to propose to have the next Parquet Sync on Wednesday, Sep
> 13th,
> > > at 9am PST. Possible topics would be the pull request to add a page
> index
> > > to the format, ongoing work on bloom filters.
> > >
> > > If Wednesday does not work for you, please propose another date and
> time.
> > > Otherwise I'll send out a MR later today.
> > >
> > > Cheers, Lars
> > >
> >
> >
> >
> > --
> > Ryan Blue
> > Software Engineer
> > Netflix
> >
>


[jira] [Assigned] (PARQUET-1102) Travis CI builds are failing for parquet-format PRs

2017-09-12 Thread Cheng Lian (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian reassigned PARQUET-1102:
---

Assignee: Cheng Lian

> Travis CI builds are failing for parquet-format PRs
> ---
>
> Key: PARQUET-1102
> URL: https://issues.apache.org/jira/browse/PARQUET-1102
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-format
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>Priority: Blocker
> Fix For: format-2.3.2
>
>
> Travis CI builds are failing for parquet-format PRs, probably due to the 
> migration from Ubuntu precise to trusty on Sep 1 according to [this Travis 
> official blog 
> post|https://blog.travis-ci.com/2017-08-31-trusty-as-default-status].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (PARQUET-1091) Wrong and broken links in README

2017-09-12 Thread Cheng Lian (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian resolved PARQUET-1091.
-
   Resolution: Fixed
Fix Version/s: format-2.3.2

Issue resolved by pull request 65
[https://github.com/apache/parquet-format/pull/65]

> Wrong and broken links in README
> 
>
> Key: PARQUET-1091
> URL: https://issues.apache.org/jira/browse/PARQUET-1091
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-format
>Reporter: Cheng Lian
>Assignee: Cheng Lian
>Priority: Minor
> Fix For: format-2.3.2
>
>
> Multiple links in README.md still point to the old {{Parquet/parquet-format}} 
> repository, which is now removed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (PARQUET-1102) Travis CI builds are failing for parquet-format PRs

2017-09-12 Thread Cheng Lian (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian resolved PARQUET-1102.
-
   Resolution: Fixed
Fix Version/s: format-2.3.2

Issue resolved by pull request 66
[https://github.com/apache/parquet-format/pull/66]

> Travis CI builds are failing for parquet-format PRs
> ---
>
> Key: PARQUET-1102
> URL: https://issues.apache.org/jira/browse/PARQUET-1102
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-format
>Reporter: Cheng Lian
>Priority: Blocker
> Fix For: format-2.3.2
>
>
> Travis CI builds are failing for parquet-format PRs, probably due to the 
> migration from Ubuntu precise to trusty on Sep 1 according to [this Travis 
> official blog 
> post|https://blog.travis-ci.com/2017-08-31-trusty-as-default-status].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (PARQUET-1102) Travis CI builds are failing for parquet-format PRs

2017-09-12 Thread Cheng Lian (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated PARQUET-1102:

Priority: Blocker  (was: Major)

> Travis CI builds are failing for parquet-format PRs
> ---
>
> Key: PARQUET-1102
> URL: https://issues.apache.org/jira/browse/PARQUET-1102
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-format
>Reporter: Cheng Lian
>Priority: Blocker
>
> Travis CI builds are failing for parquet-format PRs, probably due to the 
> migration from Ubuntu precise to trusty on Sep 1 according to [this Travis 
> official blog 
> post|https://blog.travis-ci.com/2017-08-31-trusty-as-default-status].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (PARQUET-1102) Travis CI builds are failing for parquet-format PRs

2017-09-12 Thread Cheng Lian (JIRA)
Cheng Lian created PARQUET-1102:
---

 Summary: Travis CI builds are failing for parquet-format PRs
 Key: PARQUET-1102
 URL: https://issues.apache.org/jira/browse/PARQUET-1102
 Project: Parquet
  Issue Type: Bug
  Components: parquet-format
Reporter: Cheng Lian


Travis CI builds are failing for parquet-format PRs, probably due to the 
migration from Ubuntu precise to trusty on Sep 1 according to [this Travis 
official blog 
post|https://blog.travis-ci.com/2017-08-31-trusty-as-default-status].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PARQUET-1015) Object categoricals are not serialized when only None is present

2017-09-12 Thread Uwe L. Korn (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163709#comment-16163709
 ] 

Uwe L. Korn commented on PARQUET-1015:
--

PR: https://github.com/apache/parquet-cpp/pull/393

> Object categoricals are not serialized when only None is present
> 
>
> Key: PARQUET-1015
> URL: https://issues.apache.org/jira/browse/PARQUET-1015
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cpp
>Affects Versions: cpp-1.1.0
>Reporter: Marco Neumann
>Assignee: Uwe L. Korn
>Priority: Minor
> Fix For: cpp-1.3.0
>
>
> The following code sample fails with {{pyarrow.lib.ArrowNotImplementedError: 
> NotImplemented: unhandled type}} but should not:
> {noformat}
> import pandas as pd
> import pyarrow as pa
> import pyarrow.parquet as pq
> df = pd.DataFrame({'x': [None]})
> df['x'] = df['x'].astype('category')
> table = pa.Table.from_pandas(df)
> buf = pa.InMemoryOutputStream()
> pq.write_table(table, buf)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (PARQUET-1100) [C++] Reading repeated types should decode number of records rather than number of values

2017-09-12 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16162896#comment-16162896
 ] 

Wes McKinney commented on PARQUET-1100:
---

I'm going to spend some time on this today

> [C++] Reading repeated types should decode number of records rather than 
> number of values
> -
>
> Key: PARQUET-1100
> URL: https://issues.apache.org/jira/browse/PARQUET-1100
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cpp
>Affects Versions: cpp-1.2.0
>Reporter: Jarno Seppanen
>Assignee: Wes McKinney
> Fix For: cpp-1.3.0
>
> Attachments: 
> part-0-6570e34b-b42c-4a39-8adf-21d3a97fb87d.snappy.parquet
>
>
> Reading the attached parquet file into pandas dataframe and then using the 
> dataframe segfaults.
> {noformat}
> Python 3.5.3 |Continuum Analytics, Inc.| (default, Mar  6 2017, 11:58:13) 
> [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> 
> >>> import pyarrow
> >>> import pyarrow.parquet as pq
> >>> pyarrow.__version__
> '0.6.0'
> >>> import pandas as pd
> >>> pd.__version__
> '0.19.0'
> >>> df = 
> >>> pq.read_table('part-0-6570e34b-b42c-4a39-8adf-21d3a97fb87d.snappy.parquet')
> >>>  \
> ....to_pandas()
> >>> len(df)
> 69
> >>> df.info()
> 
> RangeIndex: 69 entries, 0 to 68
> Data columns (total 6 columns):
> label   69 non-null int32
> account_meta69 non-null object
> features_type   69 non-null int32
> features_size   69 non-null int32
> features_indices1 non-null object
> features_values 1 non-null object
> dtypes: int32(3), object(3)
> memory usage: 2.5+ KB
> >>> 
> >>> pd.concat([df, df])
> Segmentation fault (core dumped)
> {noformat}
> Actually just print(df) is enough to trigger the segfault



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (PARQUET-1100) [C++] Reading repeated types should decode number of records rather than number of values

2017-09-12 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-1100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned PARQUET-1100:
-

Assignee: Wes McKinney

> [C++] Reading repeated types should decode number of records rather than 
> number of values
> -
>
> Key: PARQUET-1100
> URL: https://issues.apache.org/jira/browse/PARQUET-1100
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cpp
>Affects Versions: cpp-1.2.0
>Reporter: Jarno Seppanen
>Assignee: Wes McKinney
> Fix For: cpp-1.3.0
>
> Attachments: 
> part-0-6570e34b-b42c-4a39-8adf-21d3a97fb87d.snappy.parquet
>
>
> Reading the attached parquet file into pandas dataframe and then using the 
> dataframe segfaults.
> {noformat}
> Python 3.5.3 |Continuum Analytics, Inc.| (default, Mar  6 2017, 11:58:13) 
> [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> 
> >>> import pyarrow
> >>> import pyarrow.parquet as pq
> >>> pyarrow.__version__
> '0.6.0'
> >>> import pandas as pd
> >>> pd.__version__
> '0.19.0'
> >>> df = 
> >>> pq.read_table('part-0-6570e34b-b42c-4a39-8adf-21d3a97fb87d.snappy.parquet')
> >>>  \
> ....to_pandas()
> >>> len(df)
> 69
> >>> df.info()
> 
> RangeIndex: 69 entries, 0 to 68
> Data columns (total 6 columns):
> label   69 non-null int32
> account_meta69 non-null object
> features_type   69 non-null int32
> features_size   69 non-null int32
> features_indices1 non-null object
> features_values 1 non-null object
> dtypes: int32(3), object(3)
> memory usage: 2.5+ KB
> >>> 
> >>> pd.concat([df, df])
> Segmentation fault (core dumped)
> {noformat}
> Actually just print(df) is enough to trigger the segfault



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)