[jira] [Updated] (ARROW-3591) [R] Support to collect decimal type
[ https://issues.apache.org/jira/browse/ARROW-3591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-3591: -- Labels: pull-request-available (was: ) > [R] Support to collect decimal type > --- > > Key: ARROW-3591 > URL: https://issues.apache.org/jira/browse/ARROW-3591 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Javier Luraschi >Priority: Major > Labels: pull-request-available > > Collecting from `sparklyr` decimal types through: > > {code:java} > library(sparklyr) > sc <- spark_connect(master = "local") > sdf_len(sc, 3) %>% dplyr::mutate(new = 1) %>% dplyr::collect(){code} > causes, > > {code:java} > Error in RecordBatch__to_dataframe(x) : cannot handle Array of type decimal > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3591) [R] Support to collect decimal type
Javier Luraschi created ARROW-3591: -- Summary: [R] Support to collect decimal type Key: ARROW-3591 URL: https://issues.apache.org/jira/browse/ARROW-3591 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Javier Luraschi Collecting from `sparklyr` decimal types through: {code:java} library(sparklyr) sc <- spark_connect(master = "local") sdf_len(sc, 3) %>% dplyr::mutate(new = 1) %>% dplyr::collect(){code} causes, {code:java} Error in RecordBatch__to_dataframe(x) : cannot handle Array of type decimal {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-2712) [C#] Initial C# .NET library
[ https://issues.apache.org/jira/browse/ARROW-2712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659971#comment-16659971 ] Jamie Elliott commented on ARROW-2712: -- Hey! Sorry I let this slide. I had more or less clear in my head what I was planning but just got too busy with my day job. If someone is donating that is very exciting. Can you give any more details? BTW - the name I was going to suggest was SharpArrow. > [C#] Initial C# .NET library > > > Key: ARROW-2712 > URL: https://issues.apache.org/jira/browse/ARROW-2712 > Project: Apache Arrow > Issue Type: New Feature > Components: C# >Reporter: Jamie Elliott >Priority: Major > Labels: features, newbie, pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > A feature request. I've seen this pop up in a few places. Want to have a > record of discussion on this topic. > I may be open to contributing this, but first need some general guidance on > approach so I can understand effort level. > It looks like there is not a good tool available for GObject Introspection > binding to .NET so the easy pathway via Arrow Glib C API appears to be > closed. > The only GObject integration for .NET appears to be Mono GAPI > [http://www.mono-project.com/docs/gui/gtksharp/gapi/] > From what I can see this produces a GIR or similar XML, then generates C# > code directly from that. Likely involves many manual fix ups of the XML. > Worth a try? > > Alternatively I could look at generating some other direct binding from .NET > to C/C++. Where I work we use Swig [http://www.swig.org/]. Good for vanilla > cases, requires hand crafting of the .i files and specialized marshalling > strategies for optimizing performance critical cases. > Haven't tried CppSharp but it looks more appealing than Swig in some ways > [https://github.com/mono/CppSharp/wiki/Users-Manual] > In either case, not sure if better to use Glib C API or C++ API directly. > What would be pros/cons? > > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-3574) Fix remaining bug with plasma static versus shared libraries.
[ https://issues.apache.org/jira/browse/ARROW-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz resolved ARROW-3574. --- Resolution: Fixed > Fix remaining bug with plasma static versus shared libraries. > - > > Key: ARROW-3574 > URL: https://issues.apache.org/jira/browse/ARROW-3574 > Project: Apache Arrow > Issue Type: Improvement > Components: Plasma (C++) >Reporter: Robert Nishihara >Assignee: Robert Nishihara >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Address a few missing pieces in [https://github.com/apache/arrow/pull/2792.] > On Mac, moving the {{plasma_store_server}} executable around and then > executing it leads to > > {code:java} > dyld: Library not loaded: @rpath/libarrow.12.dylib > Referenced from: > /Users/rkn/Workspace/ray/./python/ray/core/src/plasma/plasma_store_server > Reason: image not found > Abort trap: 6{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3574) Fix remaining bug with plasma static versus shared libraries.
[ https://issues.apache.org/jira/browse/ARROW-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philipp Moritz updated ARROW-3574: -- Fix Version/s: 0.12.0 > Fix remaining bug with plasma static versus shared libraries. > - > > Key: ARROW-3574 > URL: https://issues.apache.org/jira/browse/ARROW-3574 > Project: Apache Arrow > Issue Type: Improvement > Components: Plasma (C++) >Reporter: Robert Nishihara >Assignee: Robert Nishihara >Priority: Major > Labels: pull-request-available > Fix For: 0.12.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Address a few missing pieces in [https://github.com/apache/arrow/pull/2792.] > On Mac, moving the {{plasma_store_server}} executable around and then > executing it leads to > > {code:java} > dyld: Library not loaded: @rpath/libarrow.12.dylib > Referenced from: > /Users/rkn/Workspace/ray/./python/ray/core/src/plasma/plasma_store_server > Reason: image not found > Abort trap: 6{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3590) Expose Python API for start and end offset of row group in parquet file
Heejong Lee created ARROW-3590: -- Summary: Expose Python API for start and end offset of row group in parquet file Key: ARROW-3590 URL: https://issues.apache.org/jira/browse/ARROW-3590 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Heejong Lee Is there a way to get more detailed metadata from Parquet file in Pyarrow? Specifically, I want to access the start and end offset information about each row group. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3589) [Gandiva] Make it possible to compile gandiva without JNI
[ https://issues.apache.org/jira/browse/ARROW-3589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-3589: -- Labels: pull-request-available (was: ) > [Gandiva] Make it possible to compile gandiva without JNI > - > > Key: ARROW-3589 > URL: https://issues.apache.org/jira/browse/ARROW-3589 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Philipp Moritz >Priority: Major > Labels: pull-request-available > > When trying to compile arrow with > {code:java} > cmake -DARROW_PYTHON=on -DARROW_GANDIVA=on -DARROW_PLASMA=on ..{code} > I'm seeing the following error right now: > {code:java} > CMake Error at > /home/ubuntu/anaconda3/share/cmake-3.12/Modules/FindPackageHandleStandardArgs.cmake:137 > (message): > Could NOT find JNI (missing: JAVA_AWT_LIBRARY JAVA_JVM_LIBRARY > JAVA_INCLUDE_PATH JAVA_INCLUDE_PATH2 JAVA_AWT_INCLUDE_PATH) > Call Stack (most recent call first): > > /home/ubuntu/anaconda3/share/cmake-3.12/Modules/FindPackageHandleStandardArgs.cmake:378 > (_FPHSA_FAILURE_MESSAGE) > /home/ubuntu/anaconda3/share/cmake-3.12/Modules/FindJNI.cmake:356 > (FIND_PACKAGE_HANDLE_STANDARD_ARGS) > src/gandiva/jni/CMakeLists.txt:21 (find_package) > -- Configuring incomplete, errors occurred{code} > It should be possible to compile the C++ gandiva code without JNI bindings, > how about we introduce a new flag "-DARROW_GANDIVA_JAVA=off" (which could be > on by default if desired). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3589) [Gandiva] Make it possible to compile gandiva without JNI
[ https://issues.apache.org/jira/browse/ARROW-3589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659803#comment-16659803 ] Wes McKinney commented on ARROW-3589: - You could also do -DARROW_JNI=off to disable all JNI extensions as they exist > [Gandiva] Make it possible to compile gandiva without JNI > - > > Key: ARROW-3589 > URL: https://issues.apache.org/jira/browse/ARROW-3589 > Project: Apache Arrow > Issue Type: Improvement >Reporter: Philipp Moritz >Priority: Major > > When trying to compile arrow with > {code:java} > cmake -DARROW_PYTHON=on -DARROW_GANDIVA=on -DARROW_PLASMA=on ..{code} > I'm seeing the following error right now: > {code:java} > CMake Error at > /home/ubuntu/anaconda3/share/cmake-3.12/Modules/FindPackageHandleStandardArgs.cmake:137 > (message): > Could NOT find JNI (missing: JAVA_AWT_LIBRARY JAVA_JVM_LIBRARY > JAVA_INCLUDE_PATH JAVA_INCLUDE_PATH2 JAVA_AWT_INCLUDE_PATH) > Call Stack (most recent call first): > > /home/ubuntu/anaconda3/share/cmake-3.12/Modules/FindPackageHandleStandardArgs.cmake:378 > (_FPHSA_FAILURE_MESSAGE) > /home/ubuntu/anaconda3/share/cmake-3.12/Modules/FindJNI.cmake:356 > (FIND_PACKAGE_HANDLE_STANDARD_ARGS) > src/gandiva/jni/CMakeLists.txt:21 (find_package) > -- Configuring incomplete, errors occurred{code} > It should be possible to compile the C++ gandiva code without JNI bindings, > how about we introduce a new flag "-DARROW_GANDIVA_JAVA=off" (which could be > on by default if desired). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3589) [Gandiva] Make it possible to compile gandiva without JNI
Philipp Moritz created ARROW-3589: - Summary: [Gandiva] Make it possible to compile gandiva without JNI Key: ARROW-3589 URL: https://issues.apache.org/jira/browse/ARROW-3589 Project: Apache Arrow Issue Type: Improvement Reporter: Philipp Moritz When trying to compile arrow with {code:java} cmake -DARROW_PYTHON=on -DARROW_GANDIVA=on -DARROW_PLASMA=on ..{code} I'm seeing the following error right now: {code:java} CMake Error at /home/ubuntu/anaconda3/share/cmake-3.12/Modules/FindPackageHandleStandardArgs.cmake:137 (message): Could NOT find JNI (missing: JAVA_AWT_LIBRARY JAVA_JVM_LIBRARY JAVA_INCLUDE_PATH JAVA_INCLUDE_PATH2 JAVA_AWT_INCLUDE_PATH) Call Stack (most recent call first): /home/ubuntu/anaconda3/share/cmake-3.12/Modules/FindPackageHandleStandardArgs.cmake:378 (_FPHSA_FAILURE_MESSAGE) /home/ubuntu/anaconda3/share/cmake-3.12/Modules/FindJNI.cmake:356 (FIND_PACKAGE_HANDLE_STANDARD_ARGS) src/gandiva/jni/CMakeLists.txt:21 (find_package) -- Configuring incomplete, errors occurred{code} It should be possible to compile the C++ gandiva code without JNI bindings, how about we introduce a new flag "-DARROW_GANDIVA_JAVA=off" (which could be on by default if desired). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3588) [Java] checkstyle - fix license
[ https://issues.apache.org/jira/browse/ARROW-3588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-3588: -- Labels: pull-request-available (was: ) > [Java] checkstyle - fix license > --- > > Key: ARROW-3588 > URL: https://issues.apache.org/jira/browse/ARROW-3588 > Project: Apache Arrow > Issue Type: Sub-task > Components: Java >Reporter: Bryan Cutler >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > > Make header correspond to the defined Apache license in checkstyle.license -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3588) [Java] checkstyle - fix license
Bryan Cutler created ARROW-3588: --- Summary: [Java] checkstyle - fix license Key: ARROW-3588 URL: https://issues.apache.org/jira/browse/ARROW-3588 Project: Apache Arrow Issue Type: Sub-task Components: Java Reporter: Bryan Cutler Assignee: Bryan Cutler Make header correspond to the defined Apache license in checkstyle.license -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3585) [Python] Update the documentation about Schema & Metadata usage
[ https://issues.apache.org/jira/browse/ARROW-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3585: Summary: [Python] Update the documentation about Schema & Metadata usage (was: Update the documentation about Schema & Metadata usage) > [Python] Update the documentation about Schema & Metadata usage > --- > > Key: ARROW-3585 > URL: https://issues.apache.org/jira/browse/ARROW-3585 > Project: Apache Arrow > Issue Type: Task > Components: Documentation >Reporter: Daniel Haviv >Assignee: Daniel Haviv >Priority: Trivial > Labels: beginner, documentation, easyfix, newbie > Original Estimate: 24h > Remaining Estimate: 24h > > Reusing the Schema object from a Parquet file written with Spark with Pandas > fails due to Schema mismatch. > The culprit is in the metadata part of the schema which each component fills > according to it's implementation. More details can be found here: > [https://github.com/apache/arrow/issues/2805] > The documentation should point that out. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3586) [Python] Segmentation fault when converting empty table to pandas with categoricals
[ https://issues.apache.org/jira/browse/ARROW-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3586: Fix Version/s: 0.12.0 > [Python] Segmentation fault when converting empty table to pandas with > categoricals > --- > > Key: ARROW-3586 > URL: https://issues.apache.org/jira/browse/ARROW-3586 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: 0.10.0, 0.11.0 > Environment: - Ubuntu 16.04, Python 2.7.12, pyarrow 0.11.0, pandas > 0.23.4 > - Debian9, Python 2.7.13, pyarrow 0.10.0, pandas 0.23.4 >Reporter: Andreas >Priority: Major > Fix For: 0.12.0 > > > {code:java} > import pyarrow as pa > table = pa.Table.from_arrays(arrays=[pa.array([], type=pa.int32())], > names=['col']) > table.to_pandas(categories=['col']){code} > This produces a segmentation fault for certain types (e.g, int\{32,64}) while > it works for others (e.g. string, binary). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3586) [Python] Segmentation fault when converting empty table to pandas with categoricals
[ https://issues.apache.org/jira/browse/ARROW-3586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3586: Summary: [Python] Segmentation fault when converting empty table to pandas with categoricals (was: Segmentation fault when converting empty table to pandas with categoricals) > [Python] Segmentation fault when converting empty table to pandas with > categoricals > --- > > Key: ARROW-3586 > URL: https://issues.apache.org/jira/browse/ARROW-3586 > Project: Apache Arrow > Issue Type: Bug >Affects Versions: 0.10.0, 0.11.0 > Environment: - Ubuntu 16.04, Python 2.7.12, pyarrow 0.11.0, pandas > 0.23.4 > - Debian9, Python 2.7.13, pyarrow 0.10.0, pandas 0.23.4 >Reporter: Andreas >Priority: Major > Fix For: 0.12.0 > > > {code:java} > import pyarrow as pa > table = pa.Table.from_arrays(arrays=[pa.array([], type=pa.int32())], > names=['col']) > table.to_pandas(categories=['col']){code} > This produces a segmentation fault for certain types (e.g, int\{32,64}) while > it works for others (e.g. string, binary). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3587) [Python] Efficient serialization for Arrow Objects (array, table, tensor, etc)
[ https://issues.apache.org/jira/browse/ARROW-3587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated ARROW-3587: Summary: [Python] Efficient serialization for Arrow Objects (array, table, tensor, etc) (was: Efficient serialization for Arrow Objects (array, table, tensor, etc)) > [Python] Efficient serialization for Arrow Objects (array, table, tensor, etc) > -- > > Key: ARROW-3587 > URL: https://issues.apache.org/jira/browse/ARROW-3587 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Plasma (C++), Python >Reporter: Siyuan Zhuang >Priority: Major > > Currently, Arrow seems to have poor serialization support for its own objects. > For example, > > {code} > import pyarrow > arr = pyarrow.array([1, 2, 3, 4]) > pyarrow.serialize(arr) > {code} > Traceback (most recent call last): > File "", line 1, in > File "pyarrow/serialization.pxi", line 337, in pyarrow.lib.serialize > File "pyarrow/serialization.pxi", line 136, in > pyarrow.lib.SerializationContext._serialize_callback > pyarrow.lib.SerializationCallbackError: pyarrow does not know how to > serialize objects of type . > I am working Ray & modin project, using plasma to store Arrow objects. Lack > of direct serialization support harms the performance, so I would like to > push a PR to fix this problem. > I wonder if it is welcome or is there someone else doing it? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3587) Efficient serialization for Arrow Objects (array, table, tensor, etc)
[ https://issues.apache.org/jira/browse/ARROW-3587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659749#comment-16659749 ] Wes McKinney commented on ARROW-3587: - No objections from me. The kinds of objects supported by {{pyarrow.serialize}} as you see are quite limited at the moment > Efficient serialization for Arrow Objects (array, table, tensor, etc) > - > > Key: ARROW-3587 > URL: https://issues.apache.org/jira/browse/ARROW-3587 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Plasma (C++), Python >Reporter: Siyuan Zhuang >Priority: Major > > Currently, Arrow seems to have poor serialization support for its own objects. > For example, > > {code} > import pyarrow > arr = pyarrow.array([1, 2, 3, 4]) > pyarrow.serialize(arr) > {code} > Traceback (most recent call last): > File "", line 1, in > File "pyarrow/serialization.pxi", line 337, in pyarrow.lib.serialize > File "pyarrow/serialization.pxi", line 136, in > pyarrow.lib.SerializationContext._serialize_callback > pyarrow.lib.SerializationCallbackError: pyarrow does not know how to > serialize objects of type . > I am working Ray & modin project, using plasma to store Arrow objects. Lack > of direct serialization support harms the performance, so I would like to > push a PR to fix this problem. > I wonder if it is welcome or is there someone else doing it? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-3547) [R] Protect against Null crash when reading from RecordBatch
[ https://issues.apache.org/jira/browse/ARROW-3547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Javier Luraschi resolved ARROW-3547. Resolution: Fixed Fixed by [https://github.com/apache/arrow/pull/2795] > [R] Protect against Null crash when reading from RecordBatch > > > Key: ARROW-3547 > URL: https://issues.apache.org/jira/browse/ARROW-3547 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Javier Luraschi >Priority: Minor > > Reprex: > > {code:java} > tbl <- tibble::tibble( > int = 1:10, dbl = as.numeric(1:10), > lgl = sample(c(TRUE, FALSE, NA), 10, replace = TRUE), > chr = letters[1:10] > ) > batch <- record_batch(tbl) > bytes <- write_record_batch(batch, raw()) > stream_reader <- record_batch_stream_reader(bytes) > batch1 <- read_record_batch(stream_reader) > batch2 <- read_record_batch(stream_reader) > > # Crash > as_tibble(batch2){code} > > While users should check for Null entries by running: > > {code:java} > if(!batch2$is_null()) as_tibble(batch2) > {code} > It's harsh to trigger a crash, we should consider protecting all functions > that use RecordBatch pointers to return NULL instead, for instance: > > {code:java} > List RecordBatch__to_dataframe(const std::shared_ptr& > batch) { >if (batch->get() == nullptr) Rcpp::stop("Can't read from NULL record > batch.") > }{code} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3547) [R] Protect against Null crash when reading from RecordBatch
[ https://issues.apache.org/jira/browse/ARROW-3547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16659731#comment-16659731 ] Javier Luraschi commented on ARROW-3547: This one got fixed by https://github.com/apache/arrow/pull/2795 > [R] Protect against Null crash when reading from RecordBatch > > > Key: ARROW-3547 > URL: https://issues.apache.org/jira/browse/ARROW-3547 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Javier Luraschi >Priority: Minor > > Reprex: > > {code:java} > tbl <- tibble::tibble( > int = 1:10, dbl = as.numeric(1:10), > lgl = sample(c(TRUE, FALSE, NA), 10, replace = TRUE), > chr = letters[1:10] > ) > batch <- record_batch(tbl) > bytes <- write_record_batch(batch, raw()) > stream_reader <- record_batch_stream_reader(bytes) > batch1 <- read_record_batch(stream_reader) > batch2 <- read_record_batch(stream_reader) > > # Crash > as_tibble(batch2){code} > > While users should check for Null entries by running: > > {code:java} > if(!batch2$is_null()) as_tibble(batch2) > {code} > It's harsh to trigger a crash, we should consider protecting all functions > that use RecordBatch pointers to return NULL instead, for instance: > > {code:java} > List RecordBatch__to_dataframe(const std::shared_ptr& > batch) { >if (batch->get() == nullptr) Rcpp::stop("Can't read from NULL record > batch.") > }{code} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2712) [C#] Initial C# .NET library
[ https://issues.apache.org/jira/browse/ARROW-2712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-2712: -- Labels: features newbie pull-request-available (was: features newbie) > [C#] Initial C# .NET library > > > Key: ARROW-2712 > URL: https://issues.apache.org/jira/browse/ARROW-2712 > Project: Apache Arrow > Issue Type: New Feature > Components: C# >Reporter: Jamie Elliott >Priority: Major > Labels: features, newbie, pull-request-available > > A feature request. I've seen this pop up in a few places. Want to have a > record of discussion on this topic. > I may be open to contributing this, but first need some general guidance on > approach so I can understand effort level. > It looks like there is not a good tool available for GObject Introspection > binding to .NET so the easy pathway via Arrow Glib C API appears to be > closed. > The only GObject integration for .NET appears to be Mono GAPI > [http://www.mono-project.com/docs/gui/gtksharp/gapi/] > From what I can see this produces a GIR or similar XML, then generates C# > code directly from that. Likely involves many manual fix ups of the XML. > Worth a try? > > Alternatively I could look at generating some other direct binding from .NET > to C/C++. Where I work we use Swig [http://www.swig.org/]. Good for vanilla > cases, requires hand crafting of the .i files and specialized marshalling > strategies for optimizing performance critical cases. > Haven't tried CppSharp but it looks more appealing than Swig in some ways > [https://github.com/mono/CppSharp/wiki/Users-Manual] > In either case, not sure if better to use Glib C API or C++ API directly. > What would be pros/cons? > > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3557) [Python] Set language_level in Cython sources
[ https://issues.apache.org/jira/browse/ARROW-3557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-3557: -- Labels: pull-request-available (was: ) > [Python] Set language_level in Cython sources > - > > Key: ARROW-3557 > URL: https://issues.apache.org/jira/browse/ARROW-3557 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Affects Versions: 0.11.0 >Reporter: Antoine Pitrou >Priority: Minor > Labels: pull-request-available > > Cython 0.29.0 emits the following warning: > {code} > C:\Miniconda36-x64\envs\arrow\lib\site-packages\Cython\Compiler\Main.py:367: > FutureWarning: Cython directive 'language_level' not set, using 2 for now > (Py2). This will change in a later release! File: > C:\projects\arrow\python\pyarrow\_parquet.pxd > {code} > We should probably try to switch it to Python 3. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3587) Efficient serialization for Arrow Objects (array, table, tensor, etc)
Siyuan Zhuang created ARROW-3587: Summary: Efficient serialization for Arrow Objects (array, table, tensor, etc) Key: ARROW-3587 URL: https://issues.apache.org/jira/browse/ARROW-3587 Project: Apache Arrow Issue Type: Improvement Components: C++, Plasma (C++), Python Reporter: Siyuan Zhuang Currently, Arrow seems to have poor serialization support for its own objects. For example, {code} import pyarrow arr = pyarrow.array([1, 2, 3, 4]) pyarrow.serialize(arr) {code} Traceback (most recent call last): File "", line 1, in File "pyarrow/serialization.pxi", line 337, in pyarrow.lib.serialize File "pyarrow/serialization.pxi", line 136, in pyarrow.lib.SerializationContext._serialize_callback pyarrow.lib.SerializationCallbackError: pyarrow does not know how to serialize objects of type . I am working Ray & modin project, using plasma to store Arrow objects. Lack of direct serialization support harms the performance, so I would like to push a PR to fix this problem. I wonder if it is welcome or is there someone else doing it? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3381) [C++] Implement InputStream for bz2 files
[ https://issues.apache.org/jira/browse/ARROW-3381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-3381: -- Labels: csv pull-request-available (was: csv) > [C++] Implement InputStream for bz2 files > - > > Key: ARROW-3381 > URL: https://issues.apache.org/jira/browse/ARROW-3381 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Assignee: Antoine Pitrou >Priority: Major > Labels: csv, pull-request-available > Fix For: 0.12.0 > > > For reading compressed CSV files -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3586) Segmentation fault when converting empty table to pandas with categoricals
Andreas created ARROW-3586: -- Summary: Segmentation fault when converting empty table to pandas with categoricals Key: ARROW-3586 URL: https://issues.apache.org/jira/browse/ARROW-3586 Project: Apache Arrow Issue Type: Bug Affects Versions: 0.11.0, 0.10.0 Environment: - Ubuntu 16.04, Python 2.7.12, pyarrow 0.11.0, pandas 0.23.4 - Debian9, Python 2.7.13, pyarrow 0.10.0, pandas 0.23.4 Reporter: Andreas {code:java} import pyarrow as pa table = pa.Table.from_arrays(arrays=[pa.array([], type=pa.int32())], names=['col']) table.to_pandas(categories=['col']){code} This produces a segmentation fault for certain types (e.g, int\{32,64}) while it works for others (e.g. string, binary). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3585) Update the documentation about Schema & Metadata usage
[ https://issues.apache.org/jira/browse/ARROW-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658758#comment-16658758 ] Uwe L. Korn commented on ARROW-3585: [~danielil] Assigned to you and also gave you permission to self-assign JIRAs in the future. > Update the documentation about Schema & Metadata usage > -- > > Key: ARROW-3585 > URL: https://issues.apache.org/jira/browse/ARROW-3585 > Project: Apache Arrow > Issue Type: Task > Components: Documentation >Reporter: Daniel Haviv >Assignee: Daniel Haviv >Priority: Trivial > Labels: beginner, documentation, easyfix, newbie > Original Estimate: 24h > Remaining Estimate: 24h > > Reusing the Schema object from a Parquet file written with Spark with Pandas > fails due to Schema mismatch. > The culprit is in the metadata part of the schema which each component fills > according to it's implementation. More details can be found here: > [https://github.com/apache/arrow/issues/2805] > The documentation should point that out. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-3585) Update the documentation about Schema & Metadata usage
[ https://issues.apache.org/jira/browse/ARROW-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe L. Korn reassigned ARROW-3585: -- Assignee: Daniel Haviv > Update the documentation about Schema & Metadata usage > -- > > Key: ARROW-3585 > URL: https://issues.apache.org/jira/browse/ARROW-3585 > Project: Apache Arrow > Issue Type: Task > Components: Documentation >Reporter: Daniel Haviv >Assignee: Daniel Haviv >Priority: Trivial > Labels: beginner, documentation, easyfix, newbie > Original Estimate: 24h > Remaining Estimate: 24h > > Reusing the Schema object from a Parquet file written with Spark with Pandas > fails due to Schema mismatch. > The culprit is in the metadata part of the schema which each component fills > according to it's implementation. More details can be found here: > [https://github.com/apache/arrow/issues/2805] > The documentation should point that out. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3585) Update the documentation about Schema & Metadata usage
[ https://issues.apache.org/jira/browse/ARROW-3585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658754#comment-16658754 ] Daniel Haviv commented on ARROW-3585: - Please assign to me > Update the documentation about Schema & Metadata usage > -- > > Key: ARROW-3585 > URL: https://issues.apache.org/jira/browse/ARROW-3585 > Project: Apache Arrow > Issue Type: Task > Components: Documentation >Reporter: Daniel Haviv >Priority: Trivial > Labels: beginner, documentation, easyfix, newbie > Original Estimate: 24h > Remaining Estimate: 24h > > Reusing the Schema object from a Parquet file written with Spark with Pandas > fails due to Schema mismatch. > The culprit is in the metadata part of the schema which each component fills > according to it's implementation. More details can be found here: > [https://github.com/apache/arrow/issues/2805] > The documentation should point that out. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3585) Update the documentation about Schema & Metadata usage
Daniel Haviv created ARROW-3585: --- Summary: Update the documentation about Schema & Metadata usage Key: ARROW-3585 URL: https://issues.apache.org/jira/browse/ARROW-3585 Project: Apache Arrow Issue Type: Task Components: Documentation Reporter: Daniel Haviv Reusing the Schema object from a Parquet file written with Spark with Pandas fails due to Schema mismatch. The culprit is in the metadata part of the schema which each component fills according to it's implementation. More details can be found here: [https://github.com/apache/arrow/issues/2805] The documentation should point that out. -- This message was sent by Atlassian JIRA (v7.6.3#76005)