[jira] [Commented] (ARROW-1538) [C++] Support Ubuntu 14.04 in .deb packaging automation
[ https://issues.apache.org/jira/browse/ARROW-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174290#comment-16174290 ] Rares Vernica commented on ARROW-1538: -- Thanks for the pointer on glib. I applied the {{bfe65790}} commit on top of the {{0.7.0}} tag and I am able to move past that error, but I get some errors related to missing files, even while building for Ubuntu {{16.04}}: {code} # docker run --rm --tty --volume /arrow-dist/cpp-linux/apt:/host:rw --env DEBUG=yes apache-arrow-ubuntu-16.04 /host/build.sh ... -- Installing: /build/apache-arrow-0.7.0/debian/tmp/usr/include/arrow/python/type_traits.h -- Installing: /build/apache-arrow-0.7.0/debian/tmp/usr/lib/x86_64-linux-gnu/pkgconfig/arrow-python.pc make[2]: Leaving directory '/build/apache-arrow-0.7.0/cpp_build' dh_auto_install \ --sourcedirectory=c_glib \ --builddirectory=c_glib_build make[1]: Leaving directory '/build/apache-arrow-0.7.0' dh_install dh_install: libarrow-glib0 missing files: usr/lib/*/libarrow-glib.so.* dh_install: gir1.2-arrow-1.0 missing files: usr/lib/*/girepository-1.0/ dh_install: libarrow-glib-dev missing files: usr/include/arrow-glib/ dh_install: libarrow-glib-dev missing files: usr/lib/*/libarrow-glib.a dh_install: libarrow-glib-dev missing files: usr/lib/*/libarrow-glib.so dh_install: libarrow-glib-dev missing files: usr/lib/*/pkgconfig/arrow-glib.pc dh_install: libarrow-glib-dev missing files: usr/share/gir-1.0/ dh_install: libarrow-glib-dev missing files: usr/share/arrow-glib/example/ dh_install: libarrow-glib-doc missing files: usr/share/doc/libarrow-glib-doc/arrow-glib/ dh_install: missing files, aborting debian/rules:12: recipe for target 'binary' failed make: *** [binary] Error 2 dpkg-buildpackage: error: fakeroot debian/rules binary gave error exit status 2 debuild: fatal error at line 1376: dpkg-buildpackage -rfakeroot -D -us -uc failed Failed debuild -us -uc {code} It seems like nothing is happening for {{c_glib}} during {{dh_auto_install}}. > [C++] Support Ubuntu 14.04 in .deb packaging automation > --- > > Key: ARROW-1538 > URL: https://issues.apache.org/jira/browse/ARROW-1538 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Packaging >Reporter: Wes McKinney > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1578) [C++/Python] Run lint checks in Travis CI to fail for linting issues as early as possible
[ https://issues.apache.org/jira/browse/ARROW-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174169#comment-16174169 ] ASF GitHub Bot commented on ARROW-1578: --- Github user asfgit closed the pull request at: https://github.com/apache/arrow/pull/1118 > [C++/Python] Run lint checks in Travis CI to fail for linting issues as early > as possible > - > > Key: ARROW-1578 > URL: https://issues.apache.org/jira/browse/ARROW-1578 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Python >Reporter: Wes McKinney > Labels: pull-request-available > Fix For: 0.8.0 > > > The lint checks are run relatively late in the CI process, and a build may > fail after holding a worker for ~20 minutes or more. These could fail much > sooner and free up build slaves -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (ARROW-1585) serialize_pandas round trip fails on integer columns
[ https://issues.apache.org/jira/browse/ARROW-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174086#comment-16174086 ] Tom Augspurger edited comment on ARROW-1585 at 9/21/17 1:11 AM: Sorry, yes, I meant for the original data to be {{ pd.DataFrame({0: [1, 2]}) }} (an int, not a string). Agreed that restricting field names to strings is best. Being able to reconstruct the original from the metadata is sufficient. was (Author: tomaugspurger): Sorry, yes, I meant for the original data to be {{ pd.DataFrame({0: [1, 2]}))).columns }} (an int, not a string). Agreed that restricting field names to strings is best. Being able to reconstruct the original from the metadata is sufficient. > serialize_pandas round trip fails on integer columns > > > Key: ARROW-1585 > URL: https://issues.apache.org/jira/browse/ARROW-1585 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 >Reporter: Tom Augspurger >Priority: Minor > Fix For: 0.8.0 > > > This roundtrip fails, since the Integer column isn't converted to a string > after deserializing > {code:python} > In [1]: import pandas as pd > im > In [2]: import pyarrow as pa > In [3]: pa.deserialize_pandas(pa.serialize_pandas(pd.DataFrame({"0": [1, > 2]}))).columns > Out[3]: Index(['0'], dtype='object') > {code} > That should be an {{ Int64Index([0]) }} for the columns. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1585) serialize_pandas round trip fails on integer columns
[ https://issues.apache.org/jira/browse/ARROW-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174086#comment-16174086 ] Tom Augspurger commented on ARROW-1585: --- Sorry, yes, I meant for the original data to be {{ pd.DataFrame({0: [1, 2]}))).columns }} (an int, not a string). Agreed that restricting field names to strings is best. Being able to reconstruct the original from the metadata is sufficient. > serialize_pandas round trip fails on integer columns > > > Key: ARROW-1585 > URL: https://issues.apache.org/jira/browse/ARROW-1585 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 >Reporter: Tom Augspurger >Priority: Minor > Fix For: 0.8.0 > > > This roundtrip fails, since the Integer column isn't converted to a string > after deserializing > {code:python} > In [1]: import pandas as pd > im > In [2]: import pyarrow as pa > In [3]: pa.deserialize_pandas(pa.serialize_pandas(pd.DataFrame({"0": [1, > 2]}))).columns > Out[3]: Index(['0'], dtype='object') > {code} > That should be an {{ Int64Index([0]) }} for the columns. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1581) [Python] Set up nightly wheel builds for Linux, macOS
[ https://issues.apache.org/jira/browse/ARROW-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174080#comment-16174080 ] Wes McKinney commented on ARROW-1581: - I don't see why not to put them on PyPI as long as the Apache project does not advertise them. It might be a little work to munge the package metadata to do this. I have already found having the conda nightlies to be incredibly useful > [Python] Set up nightly wheel builds for Linux, macOS > - > > Key: ARROW-1581 > URL: https://issues.apache.org/jira/browse/ARROW-1581 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (ARROW-1500) [C++] Result of ftruncate ignored in MemoryMappedFile::Create
[ https://issues.apache.org/jira/browse/ARROW-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-1500. - Resolution: Fixed Issue resolved by pull request 1116 [https://github.com/apache/arrow/pull/1116] > [C++] Result of ftruncate ignored in MemoryMappedFile::Create > - > > Key: ARROW-1500 > URL: https://issues.apache.org/jira/browse/ARROW-1500 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Wes McKinney >Assignee: Amir Malekpour > Labels: pull-request-available > Fix For: 0.8.0 > > > Observed in gcc 5.4.0 release build -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1585) serialize_pandas round trip fails on integer columns
[ https://issues.apache.org/jira/browse/ARROW-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174073#comment-16174073 ] Wes McKinney commented on ARROW-1585: - You mean integer 0 instead of {{"0"}} for the column name, though, right? {code} In [7]: df = pd.DataFrame({"0": [1, 2]}) In [8]: df.columns Out[8]: Index(['0'], dtype='object') {code} We made the decision to coerce non-string column names to strings, but we could add metadata to http://pandas-docs.github.io/pandas-docs-travis/developer.html that allows the original dtype to be recovered for the simple cases (e.g. {{Int64Index}}). cc [~cpcloud] > serialize_pandas round trip fails on integer columns > > > Key: ARROW-1585 > URL: https://issues.apache.org/jira/browse/ARROW-1585 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 >Reporter: Tom Augspurger >Priority: Minor > Fix For: 0.8.0 > > > This roundtrip fails, since the Integer column isn't converted to a string > after deserializing > {code:python} > In [1]: import pandas as pd > im > In [2]: import pyarrow as pa > In [3]: pa.deserialize_pandas(pa.serialize_pandas(pd.DataFrame({"0": [1, > 2]}))).columns > Out[3]: Index(['0'], dtype='object') > {code} > That should be an {{ Int64Index([0]) }} for the columns. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (ARROW-1586) [PYTHON] serialize_pandas roundtrip loses columns name
[ https://issues.apache.org/jira/browse/ARROW-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phillip Cloud reassigned ARROW-1586: Assignee: Phillip Cloud > [PYTHON] serialize_pandas roundtrip loses columns name > -- > > Key: ARROW-1586 > URL: https://issues.apache.org/jira/browse/ARROW-1586 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 >Reporter: Tom Augspurger >Assignee: Phillip Cloud >Priority: Minor > Fix For: 0.8.0 > > > The serialize / deserialize roundtrip loses {{ df.columns.name }} > {code:python} > In [1]: import pandas as pd > In [2]: import pyarrow as pa > In [3]: df = pd.DataFrame([[1, 2]], columns=pd.Index(['a', 'b'], > name='col_name')) > In [4]: df.columns.name > Out[4]: 'col_name' > In [5]: pa.deserialize_pandas(pa.serialize_pandas(df)).columns.name > {code} > Is this in scope for pyarrow? I suspect it would require an update to the > pandas section of the Schema metadata. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1586) [PYTHON] serialize_pandas roundtrip loses columns name
[ https://issues.apache.org/jira/browse/ARROW-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174068#comment-16174068 ] Wes McKinney commented on ARROW-1586: - cc [~cpcloud] > [PYTHON] serialize_pandas roundtrip loses columns name > -- > > Key: ARROW-1586 > URL: https://issues.apache.org/jira/browse/ARROW-1586 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 >Reporter: Tom Augspurger >Priority: Minor > Fix For: 0.8.0 > > > The serialize / deserialize roundtrip loses {{ df.columns.name }} > {code:python} > In [1]: import pandas as pd > In [2]: import pyarrow as pa > In [3]: df = pd.DataFrame([[1, 2]], columns=pd.Index(['a', 'b'], > name='col_name')) > In [4]: df.columns.name > Out[4]: 'col_name' > In [5]: pa.deserialize_pandas(pa.serialize_pandas(df)).columns.name > {code} > Is this in scope for pyarrow? I suspect it would require an update to the > pandas section of the Schema metadata. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1586) [PYTHON] serialize_pandas roundtrip loses columns name
[ https://issues.apache.org/jira/browse/ARROW-1586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174067#comment-16174067 ] Wes McKinney commented on ARROW-1586: - Yes, we should preserve this metadata and add this to http://pandas-docs.github.io/pandas-docs-travis/developer.html#storing-pandas-dataframe-objects-in-apache-parquet-format. Though perhaps we can constrain the name to be a string, or coercible to a string? > [PYTHON] serialize_pandas roundtrip loses columns name > -- > > Key: ARROW-1586 > URL: https://issues.apache.org/jira/browse/ARROW-1586 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 >Reporter: Tom Augspurger >Priority: Minor > Fix For: 0.8.0 > > > The serialize / deserialize roundtrip loses {{ df.columns.name }} > {code:python} > In [1]: import pandas as pd > In [2]: import pyarrow as pa > In [3]: df = pd.DataFrame([[1, 2]], columns=pd.Index(['a', 'b'], > name='col_name')) > In [4]: df.columns.name > Out[4]: 'col_name' > In [5]: pa.deserialize_pandas(pa.serialize_pandas(df)).columns.name > {code} > Is this in scope for pyarrow? I suspect it would require an update to the > pandas section of the Schema metadata. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (ARROW-1588) [C++/Format] Harden Decimal Format
[ https://issues.apache.org/jira/browse/ARROW-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phillip Cloud updated ARROW-1588: - Component/s: Format > [C++/Format] Harden Decimal Format > -- > > Key: ARROW-1588 > URL: https://issues.apache.org/jira/browse/ARROW-1588 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Format >Affects Versions: 0.7.0 >Reporter: Phillip Cloud >Assignee: Phillip Cloud > Fix For: 0.8.0 > > > We should finalize and harden the decimal format. The remaining issues are > officially writing down the choice of making every decimal value 16 bytes and > byte order. > For byte order we'll need to run some benchmarks to compare little endian vs > big endian. I plan to work on this over the next week or two. > [~jacq...@dremio.com] [~wesmckinn] If there are any additional items you'd > like to see addressed here please chime in. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (ARROW-1588) [C++/Format] Harden Decimal Format
[ https://issues.apache.org/jira/browse/ARROW-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phillip Cloud updated ARROW-1588: - Issue Type: Improvement (was: Bug) > [C++/Format] Harden Decimal Format > -- > > Key: ARROW-1588 > URL: https://issues.apache.org/jira/browse/ARROW-1588 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Affects Versions: 0.7.0 >Reporter: Phillip Cloud >Assignee: Phillip Cloud > Fix For: 0.8.0 > > > We should finalize and harden the decimal format. The remaining issues are > officially writing down the choice of making every decimal value 16 bytes and > byte order. > For byte order we'll need to run some benchmarks to compare little endian vs > big endian. I plan to work on this over the next week or two. > [~jacq...@dremio.com] [~wesmckinn] If there are any additional items you'd > like to see addressed here please chime in. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1588) [C++/Format] Harden Decimal Format
Phillip Cloud created ARROW-1588: Summary: [C++/Format] Harden Decimal Format Key: ARROW-1588 URL: https://issues.apache.org/jira/browse/ARROW-1588 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 0.7.0 Reporter: Phillip Cloud Assignee: Phillip Cloud Fix For: 0.8.0 We should finalize and harden the decimal format. The remaining issues are officially writing down the choice of making every decimal value 16 bytes and byte order. For byte order we'll need to run some benchmarks to compare little endian vs big endian. I plan to work on this over the next week or two. [~jacq...@dremio.com] [~wesmckinn] If there are any additional items you'd like to see addressed here please chime in. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1587) [Format] Add metadata for user-defined logical types
Wes McKinney created ARROW-1587: --- Summary: [Format] Add metadata for user-defined logical types Key: ARROW-1587 URL: https://issues.apache.org/jira/browse/ARROW-1587 Project: Apache Arrow Issue Type: Improvement Components: Format Reporter: Wes McKinney Fix For: 0.8.0 While we have the custom_metadata field at the Field level, it may be useful to have a proper user-defined type metadata in the `Type` union, which would allow us to provide a physical representation type (e.g. "Latitude longitude is represented by a struct, whose children consist of two doubles") from the other non-user defined types. This is more flexible than {{custom_metadata}} because we can leverage existing structure in the Flatbuffers for describing the user type https://github.com/apache/arrow/blob/master/format/Schema.fbs#L285 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1347) [JAVA] List null type should use consistent name for inner field
[ https://issues.apache.org/jira/browse/ARROW-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174008#comment-16174008 ] ASF GitHub Bot commented on ARROW-1347: --- Github user BryanCutler commented on the issue: https://github.com/apache/arrow/pull/1119 Continuation of #959 to use `instanceof` and add a test. cc @jacques-n @wesm @StevenMPhillips > [JAVA] List null type should use consistent name for inner field > > > Key: ARROW-1347 > URL: https://issues.apache.org/jira/browse/ARROW-1347 > Project: Apache Arrow > Issue Type: Bug >Reporter: Steven Phillips >Assignee: Steven Phillips > Labels: pull-request-available > > The child field for List type has the field name "$data$" in most cases. In > the case that there is not a known type for the List, currently the > getField() method will return a subfield with name "DEFAULT". We should make > this consistent with the rest of the cases. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1347) [JAVA] List null type should use consistent name for inner field
[ https://issues.apache.org/jira/browse/ARROW-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16174007#comment-16174007 ] ASF GitHub Bot commented on ARROW-1347: --- GitHub user BryanCutler opened a pull request: https://github.com/apache/arrow/pull/1119 ARROW-1347: [JAVA] Return consistent child field name for List Vectors This makes the child fields of ListVector have consistent names of `ListVector.DATA_VECTOR_NAME`. Previously, an empty ListVector would have a child name of `ZeroVector.name` which is "[DEFAULT]". You can merge this pull request into a Git repository by running: $ git pull https://github.com/BryanCutler/arrow java-ListVector-child-name-ARROW-1347 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/arrow/pull/1119.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1119 commit 2923a453ef005278db834e85d6fa084dbc3453a3 Author: Steven PhillipsDate: 2017-08-10T22:15:28Z ARROW-1347: [JAVA] return consistent child field name for List vectors commit c240378b3122d95351ad97db78bfb45d34097d61 Author: Bryan Cutler Date: 2017-09-20T23:25:28Z changed to use instanceof and added test > [JAVA] List null type should use consistent name for inner field > > > Key: ARROW-1347 > URL: https://issues.apache.org/jira/browse/ARROW-1347 > Project: Apache Arrow > Issue Type: Bug >Reporter: Steven Phillips >Assignee: Steven Phillips > Labels: pull-request-available > > The child field for List type has the field name "$data$" in most cases. In > the case that there is not a known type for the List, currently the > getField() method will return a subfield with name "DEFAULT". We should make > this consistent with the rest of the cases. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1586) [PYTHON] serialize_pandas roundtrip loses columns name
Tom Augspurger created ARROW-1586: - Summary: [PYTHON] serialize_pandas roundtrip loses columns name Key: ARROW-1586 URL: https://issues.apache.org/jira/browse/ARROW-1586 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.7.0 Reporter: Tom Augspurger Priority: Minor Fix For: 0.8.0 The serialize / deserialize roundtrip loses {{ df.columns.name }} {code:python} In [1]: import pandas as pd In [2]: import pyarrow as pa In [3]: df = pd.DataFrame([[1, 2]], columns=pd.Index(['a', 'b'], name='col_name')) In [4]: df.columns.name Out[4]: 'col_name' In [5]: pa.deserialize_pandas(pa.serialize_pandas(df)).columns.name {code} Is this in scope for pyarrow? I suspect it would require an update to the pandas section of the Schema metadata. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1585) serialize_pandas round trip fails on integer columns
Tom Augspurger created ARROW-1585: - Summary: serialize_pandas round trip fails on integer columns Key: ARROW-1585 URL: https://issues.apache.org/jira/browse/ARROW-1585 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.7.0 Reporter: Tom Augspurger Priority: Minor Fix For: 0.8.0 This roundtrip fails, since the Integer column isn't converted to a string after deserializing {code:python} In [1]: import pandas as pd im In [2]: import pyarrow as pa In [3]: pa.deserialize_pandas(pa.serialize_pandas(pd.DataFrame({"0": [1, 2]}))).columns Out[3]: Index(['0'], dtype='object') {code} That should be an {{ Int64Index([0]) }} for the columns. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1584) [PYTHON] serialize_pandas on empty dataframe
Tom Augspurger created ARROW-1584: - Summary: [PYTHON] serialize_pandas on empty dataframe Key: ARROW-1584 URL: https://issues.apache.org/jira/browse/ARROW-1584 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.7.0 Reporter: Tom Augspurger Priority: Minor Fix For: 0.8.0 This code {code:python} import pandas as pd import pyarrow as pa pa.serialize_pandas(pd.DataFrame()) {code} Raises {code} --- ArrowNotImplementedError Traceback (most recent call last) in () > 1 pa.serialize_pandas(pd.DataFrame()) ~/Envs/dask-dev/lib/python3.6/site-packages/pyarrow/ipc.py in serialize_pandas(df) 158 sink = pa.BufferOutputStream() 159 writer = pa.RecordBatchStreamWriter(sink, batch.schema) --> 160 writer.write_batch(batch) 161 writer.close() 162 return sink.get_result() pyarrow/ipc.pxi in pyarrow.lib._RecordBatchWriter.write_batch (/Users/travis/build/apache/arrow-dist/arrow/python/build/temp.macosx-10.6-intel-3.6/lib.cxx:59238)() pyarrow/error.pxi in pyarrow.lib.check_status (/Users/travis/build/apache/arrow-dist/arrow/python/build/temp.macosx-10.6-intel-3.6/lib.cxx:8113)() ArrowNotImplementedError: Unable to convert type: null {code} Presumably {{pa.deserialize_pandas}} will need a fix as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1500) [C++] Result of ftruncate ignored in MemoryMappedFile::Create
[ https://issues.apache.org/jira/browse/ARROW-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173882#comment-16173882 ] ASF GitHub Bot commented on ARROW-1500: --- Github user amirma commented on the issue: https://github.com/apache/arrow/pull/1116 Fixed lint errors. > [C++] Result of ftruncate ignored in MemoryMappedFile::Create > - > > Key: ARROW-1500 > URL: https://issues.apache.org/jira/browse/ARROW-1500 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Wes McKinney >Assignee: Amir Malekpour > Labels: pull-request-available > Fix For: 0.8.0 > > > Observed in gcc 5.4.0 release build -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1581) [Python] Set up nightly wheel builds for Linux, macOS
[ https://issues.apache.org/jira/browse/ARROW-1581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173840#comment-16173840 ] Robert Nishihara commented on ARROW-1581: - Are you planning on putting these on PyPI? I'd like to do something similar with Ray, ideally people would be able to pip install the project from any commit. Sort of like https://pypi.python.org/pypi/tf-nightly except with every commit, not just the most recent. > [Python] Set up nightly wheel builds for Linux, macOS > - > > Key: ARROW-1581 > URL: https://issues.apache.org/jira/browse/ARROW-1581 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Wes McKinney > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1579) Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173609#comment-16173609 ] Wes McKinney commented on ARROW-1579: - I think just the Docker images, and we could run nightly builds at some point or simply run all our "ad hoc" integration tests prior to cutting release candidates. Basically I don't want to be surprised by an issue when an RC is out for a vote cc [~heimir] > Add dockerized test setup to validate Spark integration > --- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1583) Use "travis_retry" function in some key places to reduce CI flakiness
Wes McKinney created ARROW-1583: --- Summary: Use "travis_retry" function in some key places to reduce CI flakiness Key: ARROW-1583 URL: https://issues.apache.org/jira/browse/ARROW-1583 Project: Apache Arrow Issue Type: Improvement Reporter: Wes McKinney There's enough things that can go wrong in our CI due to external package registries that we can often end up with spurious failures not caused by code changes. For example, here is an NPM registry failure: https://travis-ci.org/apache/arrow/jobs/277798491#L941 I have seen Maven Central fail or anaconda.org fail in the past, too. Some of these package commands that hit external resources could be wrapped in {{travis_retry}} to give them another shot at success -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1579) Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173592#comment-16173592 ] Bryan Cutler commented on ARROW-1579: - This would be awesome to have! I'm glad to help out. Is this just to create the docker images or will it also be run as part of CI? > Add dockerized test setup to validate Spark integration > --- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1582) [Python] Set up + document nightly conda builds for macOS
Wes McKinney created ARROW-1582: --- Summary: [Python] Set up + document nightly conda builds for macOS Key: ARROW-1582 URL: https://issues.apache.org/jira/browse/ARROW-1582 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Wes McKinney It's already been great to be able to test the nightlies on Linux in conda; it would be great to be able to do the same on macOS -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1580) [Python] Instructions for setting up nightly builds on Linux
Wes McKinney created ARROW-1580: --- Summary: [Python] Instructions for setting up nightly builds on Linux Key: ARROW-1580 URL: https://issues.apache.org/jira/browse/ARROW-1580 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Wes McKinney cc [~cpcloud] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1581) [Python] Set up nightly wheel builds for Linux, macOS
Wes McKinney created ARROW-1581: --- Summary: [Python] Set up nightly wheel builds for Linux, macOS Key: ARROW-1581 URL: https://issues.apache.org/jira/browse/ARROW-1581 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Wes McKinney -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1557) [PYTHON] pyarrow.Table.from_arrays doesn't validate names length
[ https://issues.apache.org/jira/browse/ARROW-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173558#comment-16173558 ] ASF GitHub Bot commented on ARROW-1557: --- Github user wesm commented on the issue: https://github.com/apache/arrow/pull/1117 In case it's useful we have nightly dev builds > [PYTHON] pyarrow.Table.from_arrays doesn't validate names length > > > Key: ARROW-1557 > URL: https://issues.apache.org/jira/browse/ARROW-1557 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 >Reporter: Tom Augspurger >Assignee: Tom Augspurger >Priority: Minor > Labels: pull-request-available > Fix For: 0.8.0 > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > pa.Table.from_arrays doesn't validate that the length of {{arrays}} and > {{names}} matches. I think this should raise with a {{ValueError}}: > {code} > In [1]: import pyarrow as pa > In [2]: pa.Table.from_arrays([pa.array([1, 2]), pa.array([3, 4])], > names=['a', 'b', 'c']) > Out[2]: > pyarrow.Table > a: int64 > b: int64 > In [3]: pa.__version__ > Out[3]: '0.7.0' > {code} > (This is my first time using JIRA, hopefully I didn't mess up too badly) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (ARROW-1497) [Java] JsonFileReader doesn't set value count for some vectors
[ https://issues.apache.org/jira/browse/ARROW-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-1497: --- Assignee: Li Jin > [Java] JsonFileReader doesn't set value count for some vectors > -- > > Key: ARROW-1497 > URL: https://issues.apache.org/jira/browse/ARROW-1497 > Project: Apache Arrow > Issue Type: Bug >Reporter: Li Jin >Assignee: Li Jin > Labels: pull-request-available > Fix For: 0.8.0 > > > Currently, in complex types, JsonFileReader only sets value count for > NullableMapType by an instance check, this is error prone and cause issues > with reading other complex types: > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/json/JsonFileReader.java#L269 > We should have a better way to do this. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1497) [Java] JsonFileReader doesn't set value count for some vectors
[ https://issues.apache.org/jira/browse/ARROW-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173555#comment-16173555 ] ASF GitHub Bot commented on ARROW-1497: --- Github user asfgit closed the pull request at: https://github.com/apache/arrow/pull/1067 > [Java] JsonFileReader doesn't set value count for some vectors > -- > > Key: ARROW-1497 > URL: https://issues.apache.org/jira/browse/ARROW-1497 > Project: Apache Arrow > Issue Type: Bug >Reporter: Li Jin >Assignee: Li Jin > Labels: pull-request-available > Fix For: 0.8.0 > > > Currently, in complex types, JsonFileReader only sets value count for > NullableMapType by an instance check, this is error prone and cause issues > with reading other complex types: > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/json/JsonFileReader.java#L269 > We should have a better way to do this. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (ARROW-1497) [Java] JsonFileReader doesn't set value count for some vectors
[ https://issues.apache.org/jira/browse/ARROW-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-1497: Fix Version/s: 0.8.0 > [Java] JsonFileReader doesn't set value count for some vectors > -- > > Key: ARROW-1497 > URL: https://issues.apache.org/jira/browse/ARROW-1497 > Project: Apache Arrow > Issue Type: Bug >Reporter: Li Jin >Assignee: Li Jin > Labels: pull-request-available > Fix For: 0.8.0 > > > Currently, in complex types, JsonFileReader only sets value count for > NullableMapType by an instance check, this is error prone and cause issues > with reading other complex types: > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/json/JsonFileReader.java#L269 > We should have a better way to do this. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (ARROW-1497) [Java] JsonFileReader doesn't set value count for some vectors
[ https://issues.apache.org/jira/browse/ARROW-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-1497. - Resolution: Fixed Issue resolved by pull request 1067 [https://github.com/apache/arrow/pull/1067] > [Java] JsonFileReader doesn't set value count for some vectors > -- > > Key: ARROW-1497 > URL: https://issues.apache.org/jira/browse/ARROW-1497 > Project: Apache Arrow > Issue Type: Bug >Reporter: Li Jin > Labels: pull-request-available > > Currently, in complex types, JsonFileReader only sets value count for > NullableMapType by an instance check, this is error prone and cause issues > with reading other complex types: > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/json/JsonFileReader.java#L269 > We should have a better way to do this. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (ARROW-1557) [PYTHON] pyarrow.Table.from_arrays doesn't validate names length
[ https://issues.apache.org/jira/browse/ARROW-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-1557. - Resolution: Fixed Issue resolved by pull request 1117 [https://github.com/apache/arrow/pull/1117] > [PYTHON] pyarrow.Table.from_arrays doesn't validate names length > > > Key: ARROW-1557 > URL: https://issues.apache.org/jira/browse/ARROW-1557 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 >Reporter: Tom Augspurger >Assignee: Tom Augspurger >Priority: Minor > Labels: pull-request-available > Fix For: 0.8.0 > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > pa.Table.from_arrays doesn't validate that the length of {{arrays}} and > {{names}} matches. I think this should raise with a {{ValueError}}: > {code} > In [1]: import pyarrow as pa > In [2]: pa.Table.from_arrays([pa.array([1, 2]), pa.array([3, 4])], > names=['a', 'b', 'c']) > Out[2]: > pyarrow.Table > a: int64 > b: int64 > In [3]: pa.__version__ > Out[3]: '0.7.0' > {code} > (This is my first time using JIRA, hopefully I didn't mess up too badly) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1557) [PYTHON] pyarrow.Table.from_arrays doesn't validate names length
[ https://issues.apache.org/jira/browse/ARROW-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173550#comment-16173550 ] ASF GitHub Bot commented on ARROW-1557: --- Github user asfgit closed the pull request at: https://github.com/apache/arrow/pull/1117 > [PYTHON] pyarrow.Table.from_arrays doesn't validate names length > > > Key: ARROW-1557 > URL: https://issues.apache.org/jira/browse/ARROW-1557 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 >Reporter: Tom Augspurger >Assignee: Tom Augspurger >Priority: Minor > Labels: pull-request-available > Fix For: 0.8.0 > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > pa.Table.from_arrays doesn't validate that the length of {{arrays}} and > {{names}} matches. I think this should raise with a {{ValueError}}: > {code} > In [1]: import pyarrow as pa > In [2]: pa.Table.from_arrays([pa.array([1, 2]), pa.array([3, 4])], > names=['a', 'b', 'c']) > Out[2]: > pyarrow.Table > a: int64 > b: int64 > In [3]: pa.__version__ > Out[3]: '0.7.0' > {code} > (This is my first time using JIRA, hopefully I didn't mess up too badly) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1579) Add dockerized test setup to validate Spark integration
Wes McKinney created ARROW-1579: --- Summary: Add dockerized test setup to validate Spark integration Key: ARROW-1579 URL: https://issues.apache.org/jira/browse/ARROW-1579 Project: Apache Arrow Issue Type: Improvement Components: Java - Vectors Reporter: Wes McKinney cc [~bryanc] -- the goal of this will be to validate master-to-master to catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1497) [Java] JsonFileReader doesn't set value count for some vectors
[ https://issues.apache.org/jira/browse/ARROW-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173492#comment-16173492 ] ASF GitHub Bot commented on ARROW-1497: --- Github user siddharthteotia commented on the issue: https://github.com/apache/arrow/pull/1067 +1 > [Java] JsonFileReader doesn't set value count for some vectors > -- > > Key: ARROW-1497 > URL: https://issues.apache.org/jira/browse/ARROW-1497 > Project: Apache Arrow > Issue Type: Bug >Reporter: Li Jin > Labels: pull-request-available > > Currently, in complex types, JsonFileReader only sets value count for > NullableMapType by an instance check, this is error prone and cause issues > with reading other complex types: > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/json/JsonFileReader.java#L269 > We should have a better way to do this. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (ARROW-1478) [JAVA] clear should release the buffer only if the buffer is not NULL
[ https://issues.apache.org/jira/browse/ARROW-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Teotia resolved ARROW-1478. - Resolution: Won't Fix Not needed. > [JAVA] clear should release the buffer only if the buffer is not NULL > - > > Key: ARROW-1478 > URL: https://issues.apache.org/jira/browse/ARROW-1478 > Project: Apache Arrow > Issue Type: Bug > Components: Java - Vectors >Reporter: Siddharth Teotia >Assignee: Siddharth Teotia > > In some cases we use a fake allocator in Dremio for the purpose of field > materialization only. The buffers of the underlying vectors are not > allocated. Fake allocator is a simple implementation of BufferAllocator > interface where almost every method throws UnsupportedOperation exception and > methods like getEmpty() return NULL. > It is more like a pass-through mechanism that allows us to be able to > instantiate a vector using a non-functional allocator since the constructors > in vector code don't allow for the allocator itself to be NULL. > Portions of code where we have this scenario are generic in nature and so > have typical methods like close() / clear() which underneath invoke the > corresponding methods on vectors. > The clear() method in BaseDataValueVector releases the data buffer without > checking if the buffer is NULL and that's where callers hit NPE. > We don't see such problems in Arrow unit tests. My guess is that when a > vector is instantiated, the buffer is still probably a valid reference > returned through allocator.getEmpty() call in the constructor of > BaseDataValueVector. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1347) [JAVA] List null type should use consistent name for inner field
[ https://issues.apache.org/jira/browse/ARROW-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173347#comment-16173347 ] ASF GitHub Bot commented on ARROW-1347: --- Github user BryanCutler commented on the issue: https://github.com/apache/arrow/pull/959 Wouldn't it be better to use `instanceof`? I could change that and add a test for this if @StevenMPhillips is busy > [JAVA] List null type should use consistent name for inner field > > > Key: ARROW-1347 > URL: https://issues.apache.org/jira/browse/ARROW-1347 > Project: Apache Arrow > Issue Type: Bug >Reporter: Steven Phillips >Assignee: Steven Phillips > Labels: pull-request-available > > The child field for List type has the field name "$data$" in most cases. In > the case that there is not a known type for the List, currently the > getField() method will return a subfield with name "DEFAULT". We should make > this consistent with the rest of the cases. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1347) [JAVA] List null type should use consistent name for inner field
[ https://issues.apache.org/jira/browse/ARROW-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173320#comment-16173320 ] ASF GitHub Bot commented on ARROW-1347: --- Github user jacques-n commented on the issue: https://github.com/apache/arrow/pull/959 LGTM +1. > [JAVA] List null type should use consistent name for inner field > > > Key: ARROW-1347 > URL: https://issues.apache.org/jira/browse/ARROW-1347 > Project: Apache Arrow > Issue Type: Bug >Reporter: Steven Phillips >Assignee: Steven Phillips > Labels: pull-request-available > > The child field for List type has the field name "$data$" in most cases. In > the case that there is not a known type for the List, currently the > getField() method will return a subfield with name "DEFAULT". We should make > this consistent with the rest of the cases. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1578) [C++/Python] Run lint checks in Travis CI to fail for linting issues as early as possible
[ https://issues.apache.org/jira/browse/ARROW-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173222#comment-16173222 ] ASF GitHub Bot commented on ARROW-1578: --- Github user wesm commented on the issue: https://github.com/apache/arrow/pull/1118 Pretty unsure why https://travis-ci.org/apache/arrow/jobs/277763849 failed > [C++/Python] Run lint checks in Travis CI to fail for linting issues as early > as possible > - > > Key: ARROW-1578 > URL: https://issues.apache.org/jira/browse/ARROW-1578 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Python >Reporter: Wes McKinney > Labels: pull-request-available > Fix For: 0.8.0 > > > The lint checks are run relatively late in the CI process, and a build may > fail after holding a worker for ~20 minutes or more. These could fail much > sooner and free up build slaves -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1578) [C++/Python] Run lint checks in Travis CI to fail for linting issues as early as possible
[ https://issues.apache.org/jira/browse/ARROW-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173135#comment-16173135 ] ASF GitHub Bot commented on ARROW-1578: --- GitHub user wesm opened a pull request: https://github.com/apache/arrow/pull/1118 ARROW-1578: [C++] Run lint checks in Travis CI much earlier at before_script stage to fail faster You can merge this pull request into a Git repository by running: $ git pull https://github.com/wesm/arrow ARROW-1578 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/arrow/pull/1118.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1118 commit 329f01790ad5a16965e744acf54440d7fb010c5e Author: Wes McKinneyDate: 2017-09-20T12:58:34Z Run lint checks before compiling anything. Make cpplint warning Change-Id: Ib812f49e248540c7283a1e058f26925dbc36af00 commit 28fc3fb07589551959664db31997a9a0d8599b0c Author: Wes McKinney Date: 2017-09-20T13:02:00Z Typo Change-Id: Ifeae6a35fc35939bfdaf191b2639b3aee9f27274 > [C++/Python] Run lint checks in Travis CI to fail for linting issues as early > as possible > - > > Key: ARROW-1578 > URL: https://issues.apache.org/jira/browse/ARROW-1578 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Python >Reporter: Wes McKinney > Labels: pull-request-available > Fix For: 0.8.0 > > > The lint checks are run relatively late in the CI process, and a build may > fail after holding a worker for ~20 minutes or more. These could fail much > sooner and free up build slaves -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (ARROW-1578) [C++/Python] Run lint checks in Travis CI to fail for linting issues as early as possible
[ https://issues.apache.org/jira/browse/ARROW-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-1578: -- Labels: pull-request-available (was: ) > [C++/Python] Run lint checks in Travis CI to fail for linting issues as early > as possible > - > > Key: ARROW-1578 > URL: https://issues.apache.org/jira/browse/ARROW-1578 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Python >Reporter: Wes McKinney > Labels: pull-request-available > Fix For: 0.8.0 > > > The lint checks are run relatively late in the CI process, and a build may > fail after holding a worker for ~20 minutes or more. These could fail much > sooner and free up build slaves -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ARROW-1578) [C++/Python] Run lint checks in Travis CI to fail for linting issues as early as possible
Wes McKinney created ARROW-1578: --- Summary: [C++/Python] Run lint checks in Travis CI to fail for linting issues as early as possible Key: ARROW-1578 URL: https://issues.apache.org/jira/browse/ARROW-1578 Project: Apache Arrow Issue Type: Improvement Components: C++, Python Reporter: Wes McKinney Fix For: 0.8.0 The lint checks are run relatively late in the CI process, and a build may fail after holding a worker for ~20 minutes or more. These could fail much sooner and free up build slaves -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1500) [C++] Result of ftruncate ignored in MemoryMappedFile::Create
[ https://issues.apache.org/jira/browse/ARROW-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173082#comment-16173082 ] ASF GitHub Bot commented on ARROW-1500: --- Github user wesm commented on the issue: https://github.com/apache/arrow/pull/1116 This is failing with cpplint warnings ``` /home/travis/build/apache/arrow/cpp/src/arrow/io/file.cc:138: Line ends in whitespace. Consider deleting these extra spaces. [whitespace/end_of_line] [4] /home/travis/build/apache/arrow/cpp/src/arrow/io/file.cc:610: Line ends in whitespace. Consider deleting these extra spaces. [whitespace/end_of_line] [4] ``` you can use `make lint` to run the lint checks locally > [C++] Result of ftruncate ignored in MemoryMappedFile::Create > - > > Key: ARROW-1500 > URL: https://issues.apache.org/jira/browse/ARROW-1500 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Wes McKinney >Assignee: Amir Malekpour > Labels: pull-request-available > Fix For: 0.8.0 > > > Observed in gcc 5.4.0 release build -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1497) [Java] JsonFileReader doesn't set value count for some vectors
[ https://issues.apache.org/jira/browse/ARROW-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173065#comment-16173065 ] ASF GitHub Bot commented on ARROW-1497: --- Github user wesm commented on the issue: https://github.com/apache/arrow/pull/1067 @siddharthteotia can you take a look at this? @icexelloss can you change the PR title to start with "ARROW-1497:" (remove the brackets). thanks! > [Java] JsonFileReader doesn't set value count for some vectors > -- > > Key: ARROW-1497 > URL: https://issues.apache.org/jira/browse/ARROW-1497 > Project: Apache Arrow > Issue Type: Bug >Reporter: Li Jin > Labels: pull-request-available > > Currently, in complex types, JsonFileReader only sets value count for > NullableMapType by an instance check, this is error prone and cause issues > with reading other complex types: > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/json/JsonFileReader.java#L269 > We should have a better way to do this. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (ARROW-1497) [Java] JsonFileReader doesn't set value count for some vectors
[ https://issues.apache.org/jira/browse/ARROW-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-1497: -- Labels: pull-request-available (was: ) > [Java] JsonFileReader doesn't set value count for some vectors > -- > > Key: ARROW-1497 > URL: https://issues.apache.org/jira/browse/ARROW-1497 > Project: Apache Arrow > Issue Type: Bug >Reporter: Li Jin > Labels: pull-request-available > > Currently, in complex types, JsonFileReader only sets value count for > NullableMapType by an instance check, this is error prone and cause issues > with reading other complex types: > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/file/json/JsonFileReader.java#L269 > We should have a better way to do this. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1557) [PYTHON] pyarrow.Table.from_arrays doesn't validate names length
[ https://issues.apache.org/jira/browse/ARROW-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173060#comment-16173060 ] ASF GitHub Bot commented on ARROW-1557: --- Github user wesm commented on the issue: https://github.com/apache/arrow/pull/1117 `if not K` is probably better, feel free to make that change too > [PYTHON] pyarrow.Table.from_arrays doesn't validate names length > > > Key: ARROW-1557 > URL: https://issues.apache.org/jira/browse/ARROW-1557 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.7.0 >Reporter: Tom Augspurger >Assignee: Tom Augspurger >Priority: Minor > Labels: pull-request-available > Fix For: 0.8.0 > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > pa.Table.from_arrays doesn't validate that the length of {{arrays}} and > {{names}} matches. I think this should raise with a {{ValueError}}: > {code} > In [1]: import pyarrow as pa > In [2]: pa.Table.from_arrays([pa.array([1, 2]), pa.array([3, 4])], > names=['a', 'b', 'c']) > Out[2]: > pyarrow.Table > a: int64 > b: int64 > In [3]: pa.__version__ > Out[3]: '0.7.0' > {code} > (This is my first time using JIRA, hopefully I didn't mess up too badly) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-1555) [Python] write_to_dataset on s3
[ https://issues.apache.org/jira/browse/ARROW-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172932#comment-16172932 ] Young-Jun Ko commented on ARROW-1555: - I think the simplest way to fix this would be to just expose the fs functions implemented by `s3fs`, `exists` being one of them. I suppose that's what Florian had in mind. Thanks guys for looking into this! > [Python] write_to_dataset on s3 > --- > > Key: ARROW-1555 > URL: https://issues.apache.org/jira/browse/ARROW-1555 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: 0.7.0 >Reporter: Young-Jun Ko >Assignee: Florian Jetter >Priority: Trivial > Fix For: 0.8.0 > > > When writing a arrow table to s3, I get an NotImplemented Exception. > The root cause is in _ensure_filesystem and can be reproduced as follows: > import pyarrow > import pyarrow.parquet as pqa > import s3fs > s3 = s3fs.S3FileSystem() > pqa._ensure_filesystem(s3).exists("anything") > It appears that the S3FSWrapper that is instantiated in _ensure_filesystem > does not expose the exist method of s3. -- This message was sent by Atlassian JIRA (v6.4.14#64029)