Re: Symbol not found: _PyCObject_Type (MacOS El Capitan, Python 3.6)
Hi Wes, Thank you for your suggestion. Clearing out CMake temporary files, and rebuilding them again, helps fixing the issue. I am so glad I have a working dev environment now. Thanks again! On Tue, May 15, 2018 at 9:45 PM Wes McKinneywrote: > hi Quang -- I recommend clearing out your CMake temporary files after > making any conda environment changes. If you activate a different > conda environment, CMake will not know to recompute variables related > to Python's header files and libraries. So it might have been that you > invoked CMake with Python 2 activated and later activated Python 3 > > - Wes > > On Tue, May 15, 2018 at 5:15 AM, Quang Vu wrote: > > Yes Antoine, that happens when compiling Arrow under an activated conda > > environment. > > Thank you for all the info you are helping me with! > > > > Quang. > > > > On Mon, May 14, 2018 at 3:34 PM Antoine Pitrou > wrote: > > > >> > >> To give a bit more insight: you should compile Arrow with your conda > >> environment activated, so that it picks the right Python version (3.6.5, > >> in your case). If it's still picking the wrong Python version, that > >> might be a bug. > >> > >> Regards > >> > >> Antoine. > >> > >> > >> Le 14/05/2018 à 20:50, Quang Vu a écrit : > >> > Thanks Antoine, > >> > > >> > I will need to learn more about the compiling process that happens on > my > >> > Mac, to see how that link to Python 2. > >> > I am not familiar with that process. But this is a good pointer for > my > >> > issue. Thank you for your response to my issue! > >> > > >> > Quang. > >> > > >> > On Mon, May 14, 2018 at 12:50 PM Antoine Pitrou > >> wrote: > >> > > >> >> > >> >> Hi Quang, > >> >> > >> >> It sounds like you have compiled Arrow against a Python 2 install but > >> >> are now trying to use it with Python 3. This won't work, the same > >> >> Python version must be used when compiling and when using PyArrow. > >> >> > >> >> ("PyCObject" is a Python 2-specific API that doesn't exist anymore in > >> >> Python 3) > >> >> > >> >> Regards > >> >> > >> >> Antoine. > >> >> > >> >> > >> >> Le 14/05/2018 à 18:34, Quang Vu a écrit : > >> >>> Hi Arrow dev, > >> >>> > >> >>> I am having trouble with installing and setting my development > >> >> environment > >> >>> for Arrow. I wonder if anyone is familiar with the issue. My system > >> info: > >> >>> - MacOS 10.11.6 (El Capitan) > >> >>> - conda 4.5.1 > >> >>> - python 3.6.5 > >> >>> - arrow's current commit: 4b8511 > >> >>> > >> >>> Installing Arrow C++ libraries and Pacquet are both successful. But > >> >>> importing `pyarrow` fail: > >> >>> > >> >>> $ python -c 'import pyarrow' > >> >>> > >> >>> Traceback (most recent call last): > >> >>> File "", line 1, in > >> >>> File "/Users/myuser/code/arrow/python/pyarrow/__init__.py", line > 47, > >> in > >> >>> > >> >>> from pyarrow.lib import cpu_count, set_cpu_count > >> >>> ImportError: dlopen(/Users/myuser/code/arrow/python/pyarrow/ > >> >>> lib.cpython-36m-darwin.so, 2): Symbol not found: _PyCObject_Type > >> >>> Referenced from: > >> >>> > /Users/myuser/miniconda3/envs/pyarrow-test/lib/libarrow_python.10.dylib > >> >>> Expected in: flat namespace > >> >>> in > >> >> > /Users/myuser/miniconda3/envs/pyarrow-test/lib/libarrow_python.10.dylib > >> >>> > >> >>> If anyone have suggestion on what the problem is about, please let > me > >> >> know. > >> >>> Thanks! > >> >>> > >> >> > >> > > >> >
[jira] [Created] (ARROW-2612) [Plasma] Fix deprecated PLASMA_DEFAULT_RELEASE_DELAY
Philipp Moritz created ARROW-2612: - Summary: [Plasma] Fix deprecated PLASMA_DEFAULT_RELEASE_DELAY Key: ARROW-2612 URL: https://issues.apache.org/jira/browse/ARROW-2612 Project: Apache Arrow Issue Type: Improvement Reporter: Philipp Moritz The deprecated PLASMA_DEFAULT_RELEASE_DELAY is currently broken, since it refers to kDeprecatedPlasmaDefaultReleaseDelay without the plasma:: namespace qualifier. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2611) [Python] Python 2 integer serialization
Philipp Moritz created ARROW-2611: - Summary: [Python] Python 2 integer serialization Key: ARROW-2611 URL: https://issues.apache.org/jira/browse/ARROW-2611 Project: Apache Arrow Issue Type: Improvement Components: Python Affects Versions: 0.9.0 Reporter: Philipp Moritz In Python 2, serializing a Python int with pyarrow.serialize and then deserializing it returns a {{long }}instead of an integer. Note that this is not an issue in python 3 where the long type does not exist. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Language-independent and cross-language docs
+1 on setting up a top-level documentation project. I think that establishing an information hierarchy to help people understand all the layers of the project is more important than the choice of the documentation tool -- for example, if we started with Sphinx and decided to move later to something else, there are tools to exist with converting between markup languages (though it would require some manual fixes). I'm sort of neutral on combining the current language-specific documentation projects into a monolithic documentation project. My prior for this would be that the top-level documentation should consist of: * High level overview of the Arrow project: components, languages, and vision * Columnar specification documents (migrating the current Markdown documents in format/) and other specification documents * High level project roadmap and contributor guide * Guides for maintainers / committers * Getting started guide for each language The top-level documentation could direct users to the language-specific API and usage docs (i.e. like the current Python Sphinx project) I'm interested what people think about how to integrate this statically-generated content with our current Jekyll-based website. One could argue that all this top-level documentation could be handled by Jekyll (or equivalent static site generator) - Wes On Thu, May 17, 2018 at 3:44 PM, Uwe L. Kornwrote: > Hello, > > I can second that we should move the documentation to a central one. As a C++ > and Python contributor at the same time it always hard to think of where you > should document a specific piece. We have a very small C++ documentation and > a bit larger Python one. For some features it would though make sense to have > them in both. IPC and in-process sharing is also a main part of the Arrow > project. Documenting this separately for each language will be a lot of work > and probably leave blind spots in each language. > > Not everything in each language ecosystem can be directly included in Sphinx > but as Sphinx is becoming a very broadly used documentation system, there are > many nice converters like Breeze [1] (Doxygen to Sphinx) available. > > To directly answer the questions: > > - Should we do this at all (i.e. build up a central documentation system)? > > Yes > > - Should we use Sphinx for it? > > Very much in favour. There is probably also a tendency that some people > prefer Markdown (I do) but given the feature set of Sphinx, I would very much > argue in favour of it. > > - To which extent our current docs should be migrated to Sphinx (apart > from the Python docs, which already use Sphinx)? For example, should > the specs (currently standalone pages written in Markdown) be migrated > to Sphinx for better cross-referencing and navigation? What about the > C++ tutorial pages? etc. > > I would migrate C++ documentation definitely fully into that but the C++ / > Python relation is very tight. There are a lot of topics that either touch > two languages or are general to the project, these should also go in there. > > - Should we preferably have a single Sphinx doctree, or several > independent per-topic / per-language doctrees? > > I'm not 100% sure what the definition of a "Sphinx doctree" is but as we will > have many shared topics between the different implemenations so I would > expect that we should have a single documentation with well organized > sections. > > Also we probably will face the issue we have documentation on a specific > topic and only a small part is different between two > implementations/setups/... I really like the Scala/Python tabs in the Spark > docs [2]. There is a Sphinx extension that seems to something similar to this > [3]. This could either be used to have documentation on how to construct > things where one switches between Ruby and Python or the main issue where I > would need it: Setting up the build with slightly different package managers > (e.g. conda vs pip in Python). > > Uwe > > [1]: https://breathe.readthedocs.io/en/latest/ > [2]: > http://spark.apache.org/docs/latest/quick-start.html#more-on-dataset-operations > [3]: http://sphinxcontrib-contentui.readthedocs.io/en/latest/tabs.html > > > On Sat, May 12, 2018, at 6:03 PM, Antoine Pitrou wrote: >> >> Hi, >> >> In the following PR discussion it was mentioned that we currently lack a >> central documentation system for cross-language topics: >> https://github.com/apache/arrow/pull/1575#issuecomment-364062240 >> >> Sphinx looks like a reasonable contender for that purpose. For that who >> don't know it, Sphinx is a documentation system initially developed for >> the Python language, which quickly became widely-used amongst Python >> projects, and is now being used by non-Python projects as well. For >> example, the LLVM docs (https://llvm.org/docs/) and even the Linux >> kernel online docs are now written using Sphinx >>
[jira] [Created] (ARROW-2610) [Java/Python] Add support for dictionary type to pyarrow.Field.from_jvm
Uwe L. Korn created ARROW-2610: -- Summary: [Java/Python] Add support for dictionary type to pyarrow.Field.from_jvm Key: ARROW-2610 URL: https://issues.apache.org/jira/browse/ARROW-2610 Project: Apache Arrow Issue Type: New Feature Components: Python Reporter: Uwe L. Korn The DictionaryType is a bit more complex as it also references the dictionary values itself. This also needs to be integrated into {{pyarrow.Field.from_jvm}} but the work to make DictionaryType working maybe also depends on that {{pyarrow.Array.from_jvm}} first supports non-primitive arrays. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2609) [Java/Python] Complex type conversion in pyarrow.Field.from_jvm
Uwe L. Korn created ARROW-2609: -- Summary: [Java/Python] Complex type conversion in pyarrow.Field.from_jvm Key: ARROW-2609 URL: https://issues.apache.org/jira/browse/ARROW-2609 Project: Apache Arrow Issue Type: New Feature Components: Python Reporter: Uwe L. Korn Fix For: 0.10.0 The converter {{pyarrow.Field.from_jvm}} currently only works for primitive types. Types like List, Struct or Union that have children in their definition are not supported. We should add the needed recursion for these types and enable the respective tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2607) [Java/Python] Support VarCharVector / StringArray in pyarrow.Array.from_jvm
Uwe L. Korn created ARROW-2607: -- Summary: [Java/Python] Support VarCharVector / StringArray in pyarrow.Array.from_jvm Key: ARROW-2607 URL: https://issues.apache.org/jira/browse/ARROW-2607 Project: Apache Arrow Issue Type: New Feature Components: Java - Vectors, Python Reporter: Uwe L. Korn Fix For: 0.10.0 Follow-up after https://issues.apache.org/jira/browse/ARROW-2249: Currently only primitive arrays are supported in {{pyarrow.Array.from_jvm}} as it uses {{pyarrow.Array.from_buffers}} underneath. We should extend one of the two functions to be able to deal with string arrays. There is a currently failing unit test {{test_jvm_string_array}} in {{pyarrow/tests/test_jvm.py}} to verify the implementation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2606) [Java/Python] Add unit test for pyarrow.decimal128 in Array.from_jvm
Uwe L. Korn created ARROW-2606: -- Summary: [Java/Python] Add unit test for pyarrow.decimal128 in Array.from_jvm Key: ARROW-2606 URL: https://issues.apache.org/jira/browse/ARROW-2606 Project: Apache Arrow Issue Type: New Feature Components: Java - Vectors, Python Reporter: Uwe L. Korn Fix For: 0.10.0 Follow-up after https://issues.apache.org/jira/browse/ARROW-2249. We need to find the correct code to construct Java decimals and fill them into a {{DecimalVector}}. Afterwards, we should activate the decimal128 type on {{test_jvm_array}} and ensure that we load them correctly from Java into Python. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2605) [Java/Python] Add unit test for pyarrow.timeX types in Array.from_jvm
Uwe L. Korn created ARROW-2605: -- Summary: [Java/Python] Add unit test for pyarrow.timeX types in Array.from_jvm Key: ARROW-2605 URL: https://issues.apache.org/jira/browse/ARROW-2605 Project: Apache Arrow Issue Type: New Feature Components: Java - Vectors, Python Reporter: Uwe L. Korn Fix For: 0.10.0 Follow-up after https://issues.apache.org/jira/browse/ARROW-2249 as we are missing the necessary methods to construct these arrays conveniently on the Python side. Once there is a path to construct {{pyarrow.Array}} instances from a Python list of {{datetime.time}} for the various time types, we should activate the time types on {{test_jvm_array}} and ensure that we load them correctly from Java into Python. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2604) [Java] Add method overload for VarCharVector.set(int,String)
Uwe L. Korn created ARROW-2604: -- Summary: [Java] Add method overload for VarCharVector.set(int,String) Key: ARROW-2604 URL: https://issues.apache.org/jira/browse/ARROW-2604 Project: Apache Arrow Issue Type: New Feature Components: Java - Vectors Reporter: Uwe L. Korn Fix For: 0.10.0 I would have expected that this is a very typical use case but at the moment I only see code that first fills a {{VarCharHolder}}. We could also provide this as a convenience overload. Correct me please if I missed a convenience feature. I'm still new to the Java side. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Language-independent and cross-language docs
Hello, I can second that we should move the documentation to a central one. As a C++ and Python contributor at the same time it always hard to think of where you should document a specific piece. We have a very small C++ documentation and a bit larger Python one. For some features it would though make sense to have them in both. IPC and in-process sharing is also a main part of the Arrow project. Documenting this separately for each language will be a lot of work and probably leave blind spots in each language. Not everything in each language ecosystem can be directly included in Sphinx but as Sphinx is becoming a very broadly used documentation system, there are many nice converters like Breeze [1] (Doxygen to Sphinx) available. To directly answer the questions: - Should we do this at all (i.e. build up a central documentation system)? Yes - Should we use Sphinx for it? Very much in favour. There is probably also a tendency that some people prefer Markdown (I do) but given the feature set of Sphinx, I would very much argue in favour of it. - To which extent our current docs should be migrated to Sphinx (apart from the Python docs, which already use Sphinx)? For example, should the specs (currently standalone pages written in Markdown) be migrated to Sphinx for better cross-referencing and navigation? What about the C++ tutorial pages? etc. I would migrate C++ documentation definitely fully into that but the C++ / Python relation is very tight. There are a lot of topics that either touch two languages or are general to the project, these should also go in there. - Should we preferably have a single Sphinx doctree, or several independent per-topic / per-language doctrees? I'm not 100% sure what the definition of a "Sphinx doctree" is but as we will have many shared topics between the different implemenations so I would expect that we should have a single documentation with well organized sections. Also we probably will face the issue we have documentation on a specific topic and only a small part is different between two implementations/setups/... I really like the Scala/Python tabs in the Spark docs [2]. There is a Sphinx extension that seems to something similar to this [3]. This could either be used to have documentation on how to construct things where one switches between Ruby and Python or the main issue where I would need it: Setting up the build with slightly different package managers (e.g. conda vs pip in Python). Uwe [1]: https://breathe.readthedocs.io/en/latest/ [2]: http://spark.apache.org/docs/latest/quick-start.html#more-on-dataset-operations [3]: http://sphinxcontrib-contentui.readthedocs.io/en/latest/tabs.html On Sat, May 12, 2018, at 6:03 PM, Antoine Pitrou wrote: > > Hi, > > In the following PR discussion it was mentioned that we currently lack a > central documentation system for cross-language topics: > https://github.com/apache/arrow/pull/1575#issuecomment-364062240 > > Sphinx looks like a reasonable contender for that purpose. For that who > don't know it, Sphinx is a documentation system initially developed for > the Python language, which quickly became widely-used amongst Python > projects, and is now being used by non-Python projects as well. For > example, the LLVM docs (https://llvm.org/docs/) and even the Linux > kernel online docs are now written using Sphinx > (https://www.kernel.org/doc/html/latest/index.html). > > Sphinx uses reStructuredText (a.k.a "reST") as its basic markup > language, but with many extensions. It allows for structured > documentation with extensive cross-referencing (even between independent > Sphinx sites, using the "intersphinx" extension). > > The questions here are: > > - Should we do this at all (i.e. build up a central documentation system)? > > - Should we use Sphinx for it? > > - To which extent our current docs should be migrated to Sphinx (apart > from the Python docs, which already use Sphinx)? For example, should > the specs (currently standalone pages written in Markdown) be migrated > to Sphinx for better cross-referencing and navigation? What about the > C++ tutorial pages? etc. > > - Should we preferably have a single Sphinx doctree, or several > independent per-topic / per-language doctrees? > > Regards > > Antoine.
New Arrow PMC Member: Siddharth Teotia
The Project Management Committee (PMC) for Apache Arrow has invited Siddharth Teotia to become a PMC member and we are pleased to announce that he has accepted. Congratulations and welcome, Sidd!
Re: Arrow 1319 [Python] Add additional HDFS filesystem methods
hi Alex, Yes, please feel free to break this into multiple PRs as each new filesystem method may require a number of unit tests, and these may be easier to review in smaller batches. cheers Wes On Mon, May 14, 2018 at 8:44 AM, Alex Hagermanwrote: > Hello, > > I was reviewing tickets to work on during the sprint days at PyCon and came > across 1319. > > https://issues.apache.org/jira/browse/ARROW-1319 > > I was going to pick this up and see what I could do with it. I read the > history and wanted to check if there has been any changes that might impact > the ticket since the last update in December 2017? > > Also would it be ok to break this into multiple PRs? I would like to be able > to get some feedback as I add the first few filesystem methods. > > Thanks, > Alex >
[CI] Rust/C++/Python/Cython coverage published
Hi, As a heads-up, Travis-CI runs now generate code coverage data for the aforementioned languages (after running the relevant test suite(s)), and upload it to CodeCov. You can find an example report here: https://codecov.io/gh/apache/arrow/list/455318556339ca492d3c02e6c6a297865f647bf7/ Regards Antoine.
[jira] [Created] (ARROW-2603) [Python] from pandas raises ArrowInvalid for date(time) subclasses
Florian Jetter created ARROW-2603: - Summary: [Python] from pandas raises ArrowInvalid for date(time) subclasses Key: ARROW-2603 URL: https://issues.apache.org/jira/browse/ARROW-2603 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.9.0 Reporter: Florian Jetter Assignee: Florian Jetter When converting a pandas dataframe holding subclasses of date/datetime objects, arrow raises an {{ArrowInvalid}} exception {code:java} import pandas as pd import pyarrow as pa import datetime classMyDate(datetime.date): pass date_array = [MyDate(2000, 1, 1)] df = pd.DataFrame({"date": pd.Series(date_array, dtype=object)}) table = pa.Table.from_pandas(df){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2602) [C++/Python] Automate build of development docker container
Uwe L. Korn created ARROW-2602: -- Summary: [C++/Python] Automate build of development docker container Key: ARROW-2602 URL: https://issues.apache.org/jira/browse/ARROW-2602 Project: Apache Arrow Issue Type: Wish Components: C++, Python Reporter: Uwe L. Korn Fix For: 0.10.0 With [https://github.com/apache/arrow/pull/2016|https://github.com/apache/arrow/pull/2016#pullrequestreview-121047089] we provide a convenience docker container so that one can develop Arrow but does not directly run into the hassles of setting up the development on chain his machine. The current base image is not build automatically as we are waiting for input from INFRA on https://issues.apache.org/jira/browse/INFRA-16533 Once we know how to upload continously to docker hub, we should move the Dockerfile appropriately. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2601) [Python] MemoryPool bytes_allocated causes seg
Alex Hagerman created ARROW-2601: Summary: [Python] MemoryPool bytes_allocated causes seg Key: ARROW-2601 URL: https://issues.apache.org/jira/browse/ARROW-2601 Project: Apache Arrow Issue Type: Bug Affects Versions: 0.9.0 Reporter: Alex Hagerman Fix For: 0.10.0 Python 3.6.5 |Anaconda, Inc.| (default, Mar 29 2018, 18:21:58) [GCC 7.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import pyarrow as pa >>> mp = pa.MemoryPool() >>> arr = pa.array([1,2,3], memory_pool=mp) >>> mp.bytes_allocated() Segmentation fault (core dumped) I'll dig into this further, but should bytes_alloacted be returning anything when called like this? Or should it raise NotImplemented? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2600) [Python] Add additional LocalFileSystem filesystem methods
Alex Hagerman created ARROW-2600: Summary: [Python] Add additional LocalFileSystem filesystem methods Key: ARROW-2600 URL: https://issues.apache.org/jira/browse/ARROW-2600 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Alex Hagerman Assignee: Alex Hagerman Fix For: 0.10.0 Related to https://issues.apache.org/jira/browse/ARROW-1319 I noticed the methods Martin listed are also not part of the LocalFileSystem class. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2599) pip install on ARM fails
Dominykas Mostauskis created ARROW-2599: --- Summary: pip install on ARM fails Key: ARROW-2599 URL: https://issues.apache.org/jira/browse/ARROW-2599 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.9.0 Environment: Arch ARM Linux pip 10.0.1 Python 3.6.5 Reporter: Dominykas Mostauskis Trying to install pyarrow with pip on ARM fails with `{{Could not find the Arrow library. Looked for headers in , and for libs in}}`: {{$ pip install pyarrow --no-build-isolation --user}} {{[omitted]}} {{Thread model: posix}} {{ gcc version 8.1.0 (GCC)}}{{INFOCompiler id: GNU}} {{ Selected compiler gcc 8.1.0}} {{ -- Performing Test CXX_SUPPORTS_SSE3}} {{ -- Performing Test CXX_SUPPORTS_SSE3 - Failed}} {{ -- Performing Test CXX_SUPPORTS_ALTIVEC}} {{ -- Performing Test CXX_SUPPORTS_ALTIVEC - Failed}} {{ Configured for DEBUG build (set with cmake -DCMAKE_BUILD_TYPE={release,debug}} {{,...})}} {{ -- Build Type: DEBUG}} {{ -- Build output directory: /tmp/pip-install-auk894mc/pyarrow/build/temp.linu}} {{x-armv7l-3.6/debug/}} {{ -- Found PythonInterp: /usr/bin/python (found version "3.6.5")}} {{ -- Searching for Python libs in /usr/lib;/usr/lib/python3.6/config-3.6m-arm-}} {{linux-gnueabihf}} {{ -- Looking for python3.6m}} {{ -- Found Python lib /usr/lib/libpython3.6m.so}} {{ -- Found PythonLibs: /usr/lib/libpython3.6m.so}} {{ -- Found NumPy: version "1.14.3" /home/domas/.local/lib/python3.6/site-packa}} {{ges/numpy/core/include}} {{ -- Searching for Python libs in /usr/lib;/usr/lib/python3.6/config-3.6m-arm-}} {{linux-gnueabihf}} {{ -- Looking for python3.6m}} {{ -- Found Python lib /usr/lib/libpython3.6m.so}} {{ -- Found PkgConfig: /usr/bin/pkg-config (found version "0.29.2")}} {{ -- Checking for module 'arrow'}} {{ -- No package 'arrow' found}} {{ CMake Error at cmake_modules/FindArrow.cmake:130 (message):}} {{ Could not find the Arrow library. Looked for headers in , and for libs in}} {{ Call Stack (most recent call first):}} {{ CMakeLists.txt:197 (find_package)}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2598) [Python] table.to_pandas segfault
jacques created ARROW-2598: -- Summary: [Python] table.to_pandas segfault Key: ARROW-2598 URL: https://issues.apache.org/jira/browse/ARROW-2598 Project: Apache Arrow Issue Type: Bug Components: Python Reporter: jacques Here is a small snippet which produce a segfault: {noformat} In [1]: import pyarrow as pa In [2]: import pyarrow.parquet as pq In [3]: pa_ar = pa.array([[], []]) In [4]: pq.write_table( ...: table=pa.Table.from_arrays([pa_ar],["test"]), ...: where="test5.parquet", ...: compression="snappy", ...: flavor="spark" ...: ) In [5]: pq.read_table("test5.parquet") Out[5]: pyarrow.Table test: list child 0, item: null In [6]: pq.read_table("test5.parquet").to_pydict() Out[6]: OrderedDict([(u'test', [None, None])]) In [7]: pq.read_table("test5.parquet").to_pandas() Segmentation fault {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2597) [Plasma] remove UniqueIDHasher
Zhijun Fu created ARROW-2597: Summary: [Plasma] remove UniqueIDHasher Key: ARROW-2597 URL: https://issues.apache.org/jira/browse/ARROW-2597 Project: Apache Arrow Issue Type: Improvement Components: Plasma (C++) Reporter: Zhijun Fu -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-2596) [GLib] Use the default value of GTK-Doc
Kouhei Sutou created ARROW-2596: --- Summary: [GLib] Use the default value of GTK-Doc Key: ARROW-2596 URL: https://issues.apache.org/jira/browse/ARROW-2596 Project: Apache Arrow Issue Type: Improvement Components: GLib Affects Versions: 0.9.0 Reporter: Kouhei Sutou Assignee: Kouhei Sutou Fix For: 0.10.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)