Re: Date and time for next Parquet Sync
+1 On Mon, Sep 11, 2017 at 8:36 PM, Lars Volkerwrote: > There were no objections so I sent out a meeting invite to everyone who was > on the last invite. If you'd like to participate, too, please reply to this > email. > > Cheers, Lars > > On Mon, Sep 11, 2017 at 11:06 AM, Ryan Blue > wrote: > > > That works for me. > > > > On Mon, Sep 11, 2017 at 7:55 AM, Lars Volker wrote: > > > > > Hi All, > > > > > > I'd like to propose to have the next Parquet Sync on Wednesday, Sep > 13th, > > > at 9am PST. Possible topics would be the pull request to add a page > index > > > to the format, ongoing work on bloom filters. > > > > > > If Wednesday does not work for you, please propose another date and > time. > > > Otherwise I'll send out a MR later today. > > > > > > Cheers, Lars > > > > > > > > > > > -- > > Ryan Blue > > Software Engineer > > Netflix > > >
[jira] [Assigned] (PARQUET-1102) Travis CI builds are failing for parquet-format PRs
[ https://issues.apache.org/jira/browse/PARQUET-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian reassigned PARQUET-1102: --- Assignee: Cheng Lian > Travis CI builds are failing for parquet-format PRs > --- > > Key: PARQUET-1102 > URL: https://issues.apache.org/jira/browse/PARQUET-1102 > Project: Parquet > Issue Type: Bug > Components: parquet-format >Reporter: Cheng Lian >Assignee: Cheng Lian >Priority: Blocker > Fix For: format-2.3.2 > > > Travis CI builds are failing for parquet-format PRs, probably due to the > migration from Ubuntu precise to trusty on Sep 1 according to [this Travis > official blog > post|https://blog.travis-ci.com/2017-08-31-trusty-as-default-status]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (PARQUET-1091) Wrong and broken links in README
[ https://issues.apache.org/jira/browse/PARQUET-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved PARQUET-1091. - Resolution: Fixed Fix Version/s: format-2.3.2 Issue resolved by pull request 65 [https://github.com/apache/parquet-format/pull/65] > Wrong and broken links in README > > > Key: PARQUET-1091 > URL: https://issues.apache.org/jira/browse/PARQUET-1091 > Project: Parquet > Issue Type: Bug > Components: parquet-format >Reporter: Cheng Lian >Assignee: Cheng Lian >Priority: Minor > Fix For: format-2.3.2 > > > Multiple links in README.md still point to the old {{Parquet/parquet-format}} > repository, which is now removed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (PARQUET-1102) Travis CI builds are failing for parquet-format PRs
[ https://issues.apache.org/jira/browse/PARQUET-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian resolved PARQUET-1102. - Resolution: Fixed Fix Version/s: format-2.3.2 Issue resolved by pull request 66 [https://github.com/apache/parquet-format/pull/66] > Travis CI builds are failing for parquet-format PRs > --- > > Key: PARQUET-1102 > URL: https://issues.apache.org/jira/browse/PARQUET-1102 > Project: Parquet > Issue Type: Bug > Components: parquet-format >Reporter: Cheng Lian >Priority: Blocker > Fix For: format-2.3.2 > > > Travis CI builds are failing for parquet-format PRs, probably due to the > migration from Ubuntu precise to trusty on Sep 1 according to [this Travis > official blog > post|https://blog.travis-ci.com/2017-08-31-trusty-as-default-status]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PARQUET-1102) Travis CI builds are failing for parquet-format PRs
[ https://issues.apache.org/jira/browse/PARQUET-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Lian updated PARQUET-1102: Priority: Blocker (was: Major) > Travis CI builds are failing for parquet-format PRs > --- > > Key: PARQUET-1102 > URL: https://issues.apache.org/jira/browse/PARQUET-1102 > Project: Parquet > Issue Type: Bug > Components: parquet-format >Reporter: Cheng Lian >Priority: Blocker > > Travis CI builds are failing for parquet-format PRs, probably due to the > migration from Ubuntu precise to trusty on Sep 1 according to [this Travis > official blog > post|https://blog.travis-ci.com/2017-08-31-trusty-as-default-status]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (PARQUET-1102) Travis CI builds are failing for parquet-format PRs
Cheng Lian created PARQUET-1102: --- Summary: Travis CI builds are failing for parquet-format PRs Key: PARQUET-1102 URL: https://issues.apache.org/jira/browse/PARQUET-1102 Project: Parquet Issue Type: Bug Components: parquet-format Reporter: Cheng Lian Travis CI builds are failing for parquet-format PRs, probably due to the migration from Ubuntu precise to trusty on Sep 1 according to [this Travis official blog post|https://blog.travis-ci.com/2017-08-31-trusty-as-default-status]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PARQUET-1015) Object categoricals are not serialized when only None is present
[ https://issues.apache.org/jira/browse/PARQUET-1015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16163709#comment-16163709 ] Uwe L. Korn commented on PARQUET-1015: -- PR: https://github.com/apache/parquet-cpp/pull/393 > Object categoricals are not serialized when only None is present > > > Key: PARQUET-1015 > URL: https://issues.apache.org/jira/browse/PARQUET-1015 > Project: Parquet > Issue Type: Bug > Components: parquet-cpp >Affects Versions: cpp-1.1.0 >Reporter: Marco Neumann >Assignee: Uwe L. Korn >Priority: Minor > Fix For: cpp-1.3.0 > > > The following code sample fails with {{pyarrow.lib.ArrowNotImplementedError: > NotImplemented: unhandled type}} but should not: > {noformat} > import pandas as pd > import pyarrow as pa > import pyarrow.parquet as pq > df = pd.DataFrame({'x': [None]}) > df['x'] = df['x'].astype('category') > table = pa.Table.from_pandas(df) > buf = pa.InMemoryOutputStream() > pq.write_table(table, buf) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PARQUET-1100) [C++] Reading repeated types should decode number of records rather than number of values
[ https://issues.apache.org/jira/browse/PARQUET-1100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16162896#comment-16162896 ] Wes McKinney commented on PARQUET-1100: --- I'm going to spend some time on this today > [C++] Reading repeated types should decode number of records rather than > number of values > - > > Key: PARQUET-1100 > URL: https://issues.apache.org/jira/browse/PARQUET-1100 > Project: Parquet > Issue Type: Bug > Components: parquet-cpp >Affects Versions: cpp-1.2.0 >Reporter: Jarno Seppanen >Assignee: Wes McKinney > Fix For: cpp-1.3.0 > > Attachments: > part-0-6570e34b-b42c-4a39-8adf-21d3a97fb87d.snappy.parquet > > > Reading the attached parquet file into pandas dataframe and then using the > dataframe segfaults. > {noformat} > Python 3.5.3 |Continuum Analytics, Inc.| (default, Mar 6 2017, 11:58:13) > [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux > Type "help", "copyright", "credits" or "license" for more information. > >>> > >>> import pyarrow > >>> import pyarrow.parquet as pq > >>> pyarrow.__version__ > '0.6.0' > >>> import pandas as pd > >>> pd.__version__ > '0.19.0' > >>> df = > >>> pq.read_table('part-0-6570e34b-b42c-4a39-8adf-21d3a97fb87d.snappy.parquet') > >>> \ > ....to_pandas() > >>> len(df) > 69 > >>> df.info() > > RangeIndex: 69 entries, 0 to 68 > Data columns (total 6 columns): > label 69 non-null int32 > account_meta69 non-null object > features_type 69 non-null int32 > features_size 69 non-null int32 > features_indices1 non-null object > features_values 1 non-null object > dtypes: int32(3), object(3) > memory usage: 2.5+ KB > >>> > >>> pd.concat([df, df]) > Segmentation fault (core dumped) > {noformat} > Actually just print(df) is enough to trigger the segfault -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (PARQUET-1100) [C++] Reading repeated types should decode number of records rather than number of values
[ https://issues.apache.org/jira/browse/PARQUET-1100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned PARQUET-1100: - Assignee: Wes McKinney > [C++] Reading repeated types should decode number of records rather than > number of values > - > > Key: PARQUET-1100 > URL: https://issues.apache.org/jira/browse/PARQUET-1100 > Project: Parquet > Issue Type: Bug > Components: parquet-cpp >Affects Versions: cpp-1.2.0 >Reporter: Jarno Seppanen >Assignee: Wes McKinney > Fix For: cpp-1.3.0 > > Attachments: > part-0-6570e34b-b42c-4a39-8adf-21d3a97fb87d.snappy.parquet > > > Reading the attached parquet file into pandas dataframe and then using the > dataframe segfaults. > {noformat} > Python 3.5.3 |Continuum Analytics, Inc.| (default, Mar 6 2017, 11:58:13) > [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux > Type "help", "copyright", "credits" or "license" for more information. > >>> > >>> import pyarrow > >>> import pyarrow.parquet as pq > >>> pyarrow.__version__ > '0.6.0' > >>> import pandas as pd > >>> pd.__version__ > '0.19.0' > >>> df = > >>> pq.read_table('part-0-6570e34b-b42c-4a39-8adf-21d3a97fb87d.snappy.parquet') > >>> \ > ....to_pandas() > >>> len(df) > 69 > >>> df.info() > > RangeIndex: 69 entries, 0 to 68 > Data columns (total 6 columns): > label 69 non-null int32 > account_meta69 non-null object > features_type 69 non-null int32 > features_size 69 non-null int32 > features_indices1 non-null object > features_values 1 non-null object > dtypes: int32(3), object(3) > memory usage: 2.5+ KB > >>> > >>> pd.concat([df, df]) > Segmentation fault (core dumped) > {noformat} > Actually just print(df) is enough to trigger the segfault -- This message was sent by Atlassian JIRA (v6.4.14#64029)