[jira] [Commented] (ARROW-5500) [R] read_csv_arrow() signature should match readr::read_csv()
[ https://issues.apache.org/jira/browse/ARROW-5500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855271#comment-16855271 ] Neal Richardson commented on ARROW-5500: Perhaps it does. IMO the idea that we would want two R packages–one that just wraps the C++ library for developers, and a separate one that provides an interface for analysts to work with datasets–is YAGNI. There's no reason we can't have the lower-level C++ API wrappers and the analyst-centric interface in the same package, and no value at this point to splitting them. Currently there already is a lower-level `csv_table_reader`, and all the `read_csv_arrow()` function does is invoke it: [https://github.com/apache/arrow/blob/master/r/R/csv.R#L179-L181] I'm proposing adding R-flavored substance to `read_csv_arrow()` (and documenting it). I'm not proposing removing or making private the classes and methods that invoke the C++ library, so a "developer" could choose to write something at that layer if it were useful. > [R] read_csv_arrow() signature should match readr::read_csv() > - > > Key: ARROW-5500 > URL: https://issues.apache.org/jira/browse/ARROW-5500 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Neal Richardson >Priority: Major > Fix For: 0.14.0 > > > So that using it is natural for R users. Internally handle all of the logic > needed to map those onto csv_convert_options, csv_read_options, and > csv_parse_options. And give a useful error message if a user requests a > setting that readr supports but arrow does not. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-5498) [C++] Build failure with Flatbuffers 1.11.0 and MinGW
[ https://issues.apache.org/jira/browse/ARROW-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-5498. - Resolution: Fixed Fix Version/s: 0.14.0 Issue resolved by pull request 4465 [https://github.com/apache/arrow/pull/4465] > [C++] Build failure with Flatbuffers 1.11.0 and MinGW > - > > Key: ARROW-5498 > URL: https://issues.apache.org/jira/browse/ARROW-5498 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Sutou Kouhei >Assignee: Sutou Kouhei >Priority: Minor > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-5481) [GLib] garrow_seekable_input_stream_peek() misses "error" parameter document
[ https://issues.apache.org/jira/browse/ARROW-5481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-5481. - Resolution: Fixed Fix Version/s: 0.14.0 Issue resolved by pull request 4462 [https://github.com/apache/arrow/pull/4462] > [GLib] garrow_seekable_input_stream_peek() misses "error" parameter document > > > Key: ARROW-5481 > URL: https://issues.apache.org/jira/browse/ARROW-5481 > Project: Apache Arrow > Issue Type: Improvement > Components: GLib >Reporter: Sutou Kouhei >Assignee: Yosuke Shiro >Priority: Minor > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 10m > Remaining Estimate: 0h > > https://github.com/apache/arrow/blob/master/c_glib/arrow-glib/input-stream.cpp#L402 > This is follow-up work of > https://github.com/apache/arrow/commit/ff2ee42092c09d13e38205fedd3acbdf375199f0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Issue Comment Deleted] (ARROW-5438) [JS] Utilize stream EOS in File format
[ https://issues.apache.org/jira/browse/ARROW-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Muehlhausen updated ARROW-5438: Comment: was deleted (was: Will add test case when I can) > [JS] Utilize stream EOS in File format > -- > > Key: ARROW-5438 > URL: https://issues.apache.org/jira/browse/ARROW-5438 > Project: Apache Arrow > Issue Type: Improvement > Components: JavaScript >Reporter: John Muehlhausen >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > We currently do not write EOS at the end of a Message stream inside the File > format. As a result, the file cannot be parsed sequentially. This change > prepares for other implementations or future reference features that parse a > File sequentially... i.e. without access to seek(). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Issue Comment Deleted] (ARROW-5439) [Java] Utilize stream EOS in File format
[ https://issues.apache.org/jira/browse/ARROW-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Muehlhausen updated ARROW-5439: Comment: was deleted (was: Will add test case when I can) > [Java] Utilize stream EOS in File format > > > Key: ARROW-5439 > URL: https://issues.apache.org/jira/browse/ARROW-5439 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Reporter: John Muehlhausen >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > We currently do not write EOS at the end of a Message stream inside the File > format. As a result, the file cannot be parsed sequentially. This change > prepares for other implementations or future reference features that parse a > File sequentially... i.e. without access to seek(). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5439) [Java] Utilize stream EOS in File format
[ https://issues.apache.org/jira/browse/ARROW-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855189#comment-16855189 ] John Muehlhausen commented on ARROW-5439: - Will add test case when I can > [Java] Utilize stream EOS in File format > > > Key: ARROW-5439 > URL: https://issues.apache.org/jira/browse/ARROW-5439 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Reporter: John Muehlhausen >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > We currently do not write EOS at the end of a Message stream inside the File > format. As a result, the file cannot be parsed sequentially. This change > prepares for other implementations or future reference features that parse a > File sequentially... i.e. without access to seek(). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5438) [JS] Utilize stream EOS in File format
[ https://issues.apache.org/jira/browse/ARROW-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855188#comment-16855188 ] John Muehlhausen commented on ARROW-5438: - Will add test case when I can > [JS] Utilize stream EOS in File format > -- > > Key: ARROW-5438 > URL: https://issues.apache.org/jira/browse/ARROW-5438 > Project: Apache Arrow > Issue Type: Improvement > Components: JavaScript >Reporter: John Muehlhausen >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > We currently do not write EOS at the end of a Message stream inside the File > format. As a result, the file cannot be parsed sequentially. This change > prepares for other implementations or future reference features that parse a > File sequentially... i.e. without access to seek(). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5438) [JS] Utilize stream EOS in File format
[ https://issues.apache.org/jira/browse/ARROW-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-5438: -- Labels: pull-request-available (was: ) > [JS] Utilize stream EOS in File format > -- > > Key: ARROW-5438 > URL: https://issues.apache.org/jira/browse/ARROW-5438 > Project: Apache Arrow > Issue Type: Improvement > Components: JavaScript >Reporter: John Muehlhausen >Priority: Minor > Labels: pull-request-available > > We currently do not write EOS at the end of a Message stream inside the File > format. As a result, the file cannot be parsed sequentially. This change > prepares for other implementations or future reference features that parse a > File sequentially... i.e. without access to seek(). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5439) [Java] Utilize stream EOS in File format
[ https://issues.apache.org/jira/browse/ARROW-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-5439: -- Labels: pull-request-available (was: ) > [Java] Utilize stream EOS in File format > > > Key: ARROW-5439 > URL: https://issues.apache.org/jira/browse/ARROW-5439 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Reporter: John Muehlhausen >Priority: Minor > Labels: pull-request-available > > We currently do not write EOS at the end of a Message stream inside the File > format. As a result, the file cannot be parsed sequentially. This change > prepares for other implementations or future reference features that parse a > File sequentially... i.e. without access to seek(). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5501) [R] read/write_feather/arrow?
[ https://issues.apache.org/jira/browse/ARROW-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855181#comment-16855181 ] Wes McKinney commented on ARROW-5501: - Can you open a JIRA issue about FeatherV2? I would like to retain the file format name as a "simple memory-mappable Arrow-based file format" and handle backwards compatibility for old files for some period of time > [R] read/write_feather/arrow? > - > > Key: ARROW-5501 > URL: https://issues.apache.org/jira/browse/ARROW-5501 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Neal Richardson >Priority: Major > Fix For: 0.14.0 > > > read_feather and write_feather exist, and there is also write_arrow. But no > read_arrow. > Some questions (which go beyond just R): There's talk of a "feather 2.0", > i.e. "just" serializing the IPC format (which IIUC is what write_arrow does). > Are we going to continue to call the file format "Feather", and possibly > continue supporting the "feather 1.0" format as a subset/special case? Or > will "feather" mean this limited format and "arrow" be the name of the > full-featured file? > In terms of this issue, should write_arrow be folded into write_feather and > there be an argument for indicating which version to write? Or should the > distinction be maintained, and we need to add a read_arrow() function? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (ARROW-5501) [R] read/write_feather/arrow?
[ https://issues.apache.org/jira/browse/ARROW-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855181#comment-16855181 ] Wes McKinney edited comment on ARROW-5501 at 6/4/19 12:50 AM: -- Can you open a JIRA issue about FeatherV2 (or maybe this is the issue)? I would like to retain the file format name as a "simple memory-mappable Arrow-based file format" and handle backwards compatibility for old files for some period of time was (Author: wesmckinn): Can you open a JIRA issue about FeatherV2? I would like to retain the file format name as a "simple memory-mappable Arrow-based file format" and handle backwards compatibility for old files for some period of time > [R] read/write_feather/arrow? > - > > Key: ARROW-5501 > URL: https://issues.apache.org/jira/browse/ARROW-5501 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Neal Richardson >Priority: Major > Fix For: 0.14.0 > > > read_feather and write_feather exist, and there is also write_arrow. But no > read_arrow. > Some questions (which go beyond just R): There's talk of a "feather 2.0", > i.e. "just" serializing the IPC format (which IIUC is what write_arrow does). > Are we going to continue to call the file format "Feather", and possibly > continue supporting the "feather 1.0" format as a subset/special case? Or > will "feather" mean this limited format and "arrow" be the name of the > full-featured file? > In terms of this issue, should write_arrow be folded into write_feather and > there be an argument for indicating which version to write? Or should the > distinction be maintained, and we need to add a read_arrow() function? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5500) [R] read_csv_arrow() signature should match readr::read_csv()
[ https://issues.apache.org/jira/browse/ARROW-5500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855180#comment-16855180 ] Wes McKinney commented on ARROW-5500: - This brings up a bigger question of whether the `arrow` library as it is being developed now is the desired "front end" for end-users. > [R] read_csv_arrow() signature should match readr::read_csv() > - > > Key: ARROW-5500 > URL: https://issues.apache.org/jira/browse/ARROW-5500 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Neal Richardson >Priority: Major > Fix For: 0.14.0 > > > So that using it is natural for R users. Internally handle all of the logic > needed to map those onto csv_convert_options, csv_read_options, and > csv_parse_options. And give a useful error message if a user requests a > setting that readr supports but arrow does not. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5505) [R] Stop masking base R functions
Neal Richardson created ARROW-5505: -- Summary: [R] Stop masking base R functions Key: ARROW-5505 URL: https://issues.apache.org/jira/browse/ARROW-5505 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Neal Richardson Fix For: 0.14.0 The package startup message about masking base functions can be scary. We should avoid masking base functions without a compelling reason (i.e. let's do arrow_array() instead of array(), arrow_table()). The arrow versions do very different things than the base functions; plus, end users shouldn’t be dealing directly with Tables and Arrays, so they don’t need to figure so prominently in the public API of the package. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5504) [R] move use_threads argument to global option
[ https://issues.apache.org/jira/browse/ARROW-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neal Richardson updated ARROW-5504: --- Priority: Minor (was: Major) > [R] move use_threads argument to global option > -- > > Key: ARROW-5504 > URL: https://issues.apache.org/jira/browse/ARROW-5504 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Neal Richardson >Priority: Minor > Fix For: 0.14.0 > > > Why wouldn't you want to use the multithreaded API for reading data from > arrow into R? We shouldn't clutter our function signatures with options that > people won't use. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5504) [R] move use_threads argument to global option
Neal Richardson created ARROW-5504: -- Summary: [R] move use_threads argument to global option Key: ARROW-5504 URL: https://issues.apache.org/jira/browse/ARROW-5504 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Neal Richardson Fix For: 0.14.0 Why wouldn't you want to use the multithreaded API for reading data from arrow into R? We shouldn't clutter our function signatures with options that people won't use. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5503) [R] add read_json()
Neal Richardson created ARROW-5503: -- Summary: [R] add read_json() Key: ARROW-5503 URL: https://issues.apache.org/jira/browse/ARROW-5503 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Neal Richardson Fix For: 0.14.0 The C++ library gained a JSON file reader last month, and pyarrow already has bindings for it. R should have it too. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5502) [R] file readers should mmap
Neal Richardson created ARROW-5502: -- Summary: [R] file readers should mmap Key: ARROW-5502 URL: https://issues.apache.org/jira/browse/ARROW-5502 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Neal Richardson Fix For: 0.14.0 Arrow is supposed to let you work with datasets bigger than memory. Memory mapping is a big part of that. It should be the default way that files are read in the `read_*` functions. To disable memory mapping, we could use a global `option()`, or a function argument, but that might clutter the interface. Or we could not give a choice and only fall back to not memory mapping if the platform/file system doesn't support it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5501) [R] read/write_feather/arrow?
Neal Richardson created ARROW-5501: -- Summary: [R] read/write_feather/arrow? Key: ARROW-5501 URL: https://issues.apache.org/jira/browse/ARROW-5501 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Neal Richardson Fix For: 0.14.0 read_feather and write_feather exist, and there is also write_arrow. But no read_arrow. Some questions (which go beyond just R): There's talk of a "feather 2.0", i.e. "just" serializing the IPC format (which IIUC is what write_arrow does). Are we going to continue to call the file format "Feather", and possibly continue supporting the "feather 1.0" format as a subset/special case? Or will "feather" mean this limited format and "arrow" be the name of the full-featured file? In terms of this issue, should write_arrow be folded into write_feather and there be an argument for indicating which version to write? Or should the distinction be maintained, and we need to add a read_arrow() function? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5500) [R] read_csv_arrow() signature should match readr::read_csv()
Neal Richardson created ARROW-5500: -- Summary: [R] read_csv_arrow() signature should match readr::read_csv() Key: ARROW-5500 URL: https://issues.apache.org/jira/browse/ARROW-5500 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Neal Richardson Fix For: 0.14.0 So that using it is natural for R users. Internally handle all of the logic needed to map those onto csv_convert_options, csv_read_options, and csv_parse_options. And give a useful error message if a user requests a setting that readr supports but arrow does not. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5492) [R] Add "col_select" argument to read_* functions to read subset of columns
[ https://issues.apache.org/jira/browse/ARROW-5492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neal Richardson updated ARROW-5492: --- Description: read_feather, read_parquet, read_csv_arrow (and read_json, when it exists) should take a `col_select` argument, following the model of [vroom|http://vroom.r-lib.org/articles/vroom.html#column-selection] (readr and base R file readers also support this feature, just much more awkwardly). Currently, read_feather has a "columns" argument and none of the other readers expose it. Parquet can certainly support it; cf. {{pyarrow.parquet.read_table.}} was:This is just like like the same option in {{pyarrow.parquet.read_table}} > [R] Add "col_select" argument to read_* functions to read subset of columns > > > Key: ARROW-5492 > URL: https://issues.apache.org/jira/browse/ARROW-5492 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Wes McKinney >Priority: Major > Fix For: 0.14.0 > > > read_feather, read_parquet, read_csv_arrow (and read_json, when it exists) > should take a `col_select` argument, following the model of > [vroom|http://vroom.r-lib.org/articles/vroom.html#column-selection] (readr > and base R file readers also support this feature, just much more awkwardly). > Currently, read_feather has a "columns" argument and none of the other > readers expose it. Parquet can certainly support it; cf. > {{pyarrow.parquet.read_table.}} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5492) [R] Add "col_select" argument to read_* functions to read subset of columns
[ https://issues.apache.org/jira/browse/ARROW-5492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neal Richardson updated ARROW-5492: --- Summary: [R] Add "col_select" argument to read_* functions to read subset of columns (was: [R] Add "columns" option to read_parquet to read subset of columns ) > [R] Add "col_select" argument to read_* functions to read subset of columns > > > Key: ARROW-5492 > URL: https://issues.apache.org/jira/browse/ARROW-5492 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Wes McKinney >Priority: Major > Fix For: 0.14.0 > > > This is just like like the same option in {{pyarrow.parquet.read_table}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5499) [R] Alternate bindings for when libarrow is not found
[ https://issues.apache.org/jira/browse/ARROW-5499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neal Richardson updated ARROW-5499: --- Component/s: R > [R] Alternate bindings for when libarrow is not found > - > > Key: ARROW-5499 > URL: https://issues.apache.org/jira/browse/ARROW-5499 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Neal Richardson >Assignee: Romain François >Priority: Major > Fix For: 0.14.0 > > > This will also allow the package to build and install successfully on hosts > where the arrow C++ library is not present, which will enable us, among other > things, to provide an `install_arrow()` function similar to other R packages > that have big external dependencies. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5499) [R] Alternate bindings for when libarrow is not found
Neal Richardson created ARROW-5499: -- Summary: [R] Alternate bindings for when libarrow is not found Key: ARROW-5499 URL: https://issues.apache.org/jira/browse/ARROW-5499 Project: Apache Arrow Issue Type: Improvement Reporter: Neal Richardson Assignee: Romain François Fix For: 0.14.0 This will also allow the package to build and install successfully on hosts where the arrow C++ library is not present, which will enable us, among other things, to provide an `install_arrow()` function similar to other R packages that have big external dependencies. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5498) [C++] Build failure with Flatbuffers 1.11.0 and MinGW
[ https://issues.apache.org/jira/browse/ARROW-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-5498: -- Labels: pull-request-available (was: ) > [C++] Build failure with Flatbuffers 1.11.0 and MinGW > - > > Key: ARROW-5498 > URL: https://issues.apache.org/jira/browse/ARROW-5498 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Sutou Kouhei >Assignee: Sutou Kouhei >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5498) [C++] Build failure with Flatbuffers 1.11.0 and MinGW
[ https://issues.apache.org/jira/browse/ARROW-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sutou Kouhei updated ARROW-5498: Summary: [C++] Build failure with Flatbuffers 1.11.0 and MinGW (was: [C++] Add support for Flatbuffers 1.11.0 with MinGW) > [C++] Build failure with Flatbuffers 1.11.0 and MinGW > - > > Key: ARROW-5498 > URL: https://issues.apache.org/jira/browse/ARROW-5498 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Sutou Kouhei >Assignee: Sutou Kouhei >Priority: Minor > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5498) [C++] Add support for Flatbuffers 1.11.0 with MinGW
Sutou Kouhei created ARROW-5498: --- Summary: [C++] Add support for Flatbuffers 1.11.0 with MinGW Key: ARROW-5498 URL: https://issues.apache.org/jira/browse/ARROW-5498 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Sutou Kouhei Assignee: Sutou Kouhei -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-5477) [C++] Check required RapidJSON version
[ https://issues.apache.org/jira/browse/ARROW-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-5477. - Resolution: Fixed Fix Version/s: 0.14.0 Issue resolved by pull request 4447 [https://github.com/apache/arrow/pull/4447] > [C++] Check required RapidJSON version > -- > > Key: ARROW-5477 > URL: https://issues.apache.org/jira/browse/ARROW-5477 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Sutou Kouhei >Assignee: Sutou Kouhei >Priority: Minor > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 1h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5497) [R][Release] Build and publish R package docs
Neal Richardson created ARROW-5497: -- Summary: [R][Release] Build and publish R package docs Key: ARROW-5497 URL: https://issues.apache.org/jira/browse/ARROW-5497 Project: Apache Arrow Issue Type: Improvement Components: Developer Tools, Documentation, R Reporter: Neal Richardson Assignee: Neal Richardson Fix For: 0.14.0 https://issues.apache.org/jira/browse/ARROW-5452 added the R pkgdown site config. Adding the wiring into the apidocs build scripts was deferred because there was some discussion about which workflow was supported and which was deprecated. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5496) [R][CI] Fix relative paths in R codecov.io reporting
[ https://issues.apache.org/jira/browse/ARROW-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neal Richardson updated ARROW-5496: --- Fix Version/s: 0.14.0 > [R][CI] Fix relative paths in R codecov.io reporting > > > Key: ARROW-5496 > URL: https://issues.apache.org/jira/browse/ARROW-5496 > Project: Apache Arrow > Issue Type: Improvement > Components: Continuous Integration, R >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Minor > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 10m > Remaining Estimate: 0h > > https://issues.apache.org/jira/browse/ARROW-5418 added coverage stats for R, > but due to an assumption in the coverage runner that the project would be at > the top level of the GitHub repository, the `r/` subdirectory was not > included, so R coverage stats were put in the wrong place, and detail files > (such as [https://codecov.io/gh/apache/arrow/src/master/R/ArrayData.R]) > return 404. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5496) [R][CI] Fix relative paths in R codecov.io reporting
[ https://issues.apache.org/jira/browse/ARROW-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-5496: -- Labels: pull-request-available (was: ) > [R][CI] Fix relative paths in R codecov.io reporting > > > Key: ARROW-5496 > URL: https://issues.apache.org/jira/browse/ARROW-5496 > Project: Apache Arrow > Issue Type: Improvement > Components: Continuous Integration, R >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Minor > Labels: pull-request-available > > https://issues.apache.org/jira/browse/ARROW-5418 added coverage stats for R, > but due to an assumption in the coverage runner that the project would be at > the top level of the GitHub repository, the `r/` subdirectory was not > included, so R coverage stats were put in the wrong place, and detail files > (such as [https://codecov.io/gh/apache/arrow/src/master/R/ArrayData.R]) > return 404. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5496) [R][CI] Fix relative paths in R codecov.io reporting
Neal Richardson created ARROW-5496: -- Summary: [R][CI] Fix relative paths in R codecov.io reporting Key: ARROW-5496 URL: https://issues.apache.org/jira/browse/ARROW-5496 Project: Apache Arrow Issue Type: Improvement Components: Continuous Integration, R Reporter: Neal Richardson Assignee: Neal Richardson https://issues.apache.org/jira/browse/ARROW-5418 added coverage stats for R, but due to an assumption in the coverage runner that the project would be at the top level of the GitHub repository, the `r/` subdirectory was not included, so R coverage stats were put in the wrong place, and detail files (such as [https://codecov.io/gh/apache/arrow/src/master/R/ArrayData.R]) return 404. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5495) [C++] Use HTTPS consistently for downloading dependencies
[ https://issues.apache.org/jira/browse/ARROW-5495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-5495: -- Labels: pull-request-available (was: ) > [C++] Use HTTPS consistently for downloading dependencies > - > > Key: ARROW-5495 > URL: https://issues.apache.org/jira/browse/ARROW-5495 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5495) [C++] Use HTTPS consistently for downloading dependencies
Wes McKinney created ARROW-5495: --- Summary: [C++] Use HTTPS consistently for downloading dependencies Key: ARROW-5495 URL: https://issues.apache.org/jira/browse/ARROW-5495 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Wes McKinney Assignee: Wes McKinney Fix For: 0.14.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5478) [Packaging] Drop Ubuntu 14.04 support
[ https://issues.apache.org/jira/browse/ARROW-5478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854906#comment-16854906 ] Neal Richardson commented on ARROW-5478: FWIW Trusty is at "End of Standard Support", not EOL: https://wiki.ubuntu.com/Releases > [Packaging] Drop Ubuntu 14.04 support > - > > Key: ARROW-5478 > URL: https://issues.apache.org/jira/browse/ARROW-5478 > Project: Apache Arrow > Issue Type: Improvement > Components: Packaging >Reporter: Sutou Kouhei >Assignee: Sutou Kouhei >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5494) [Python] Create FileSystem bindings
Antoine Pitrou created ARROW-5494: - Summary: [Python] Create FileSystem bindings Key: ARROW-5494 URL: https://issues.apache.org/jira/browse/ARROW-5494 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Antoine Pitrou Now that we have a C++ filesystem API, it should be usable from Python as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5494) [Python] Create FileSystem bindings
[ https://issues.apache.org/jira/browse/ARROW-5494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5494: -- Labels: filesystem (was: ) > [Python] Create FileSystem bindings > --- > > Key: ARROW-5494 > URL: https://issues.apache.org/jira/browse/ARROW-5494 > Project: Apache Arrow > Issue Type: Improvement > Components: Python >Reporter: Antoine Pitrou >Priority: Major > Labels: filesystem > > Now that we have a C++ filesystem API, it should be usable from Python as > well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-4912) [C++, Python] Allow specifying column names to CSV reader
[ https://issues.apache.org/jira/browse/ARROW-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854896#comment-16854896 ] Antoine Pitrou commented on ARROW-4912: --- If there's no way to change column names post-hoc, then perhaps we should just add one? That sounds more universal than adding ad hoc options to the CSV reader. As for the header_rows=0, can you open a separate issue? > [C++, Python] Allow specifying column names to CSV reader > - > > Key: ARROW-4912 > URL: https://issues.apache.org/jira/browse/ARROW-4912 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Python >Affects Versions: 0.12.1 >Reporter: Philipp Moritz >Priority: Major > Labels: csv > > Currently I think there is no way to specify custom column names for CSV > files. It's possible to specify the full schema of the file, but not just > column names. > See the related discussion here: ARROW-3722 > The goal of this is to re-use the CSV type-inference but still allow people > to specify custom names for the columns. As far as I know, there is currently > no way to set column names post-hoc, so we should provide a way to specify > them before reading the file. > Related to this, ParseOptions(header_rows=0) is not currently implemented. > Is there any current way to do this or does this need to be implmented? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5481) [GLib] garrow_seekable_input_stream_peek() misses "error" parameter document
[ https://issues.apache.org/jira/browse/ARROW-5481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-5481: -- Labels: pull-request-available (was: ) > [GLib] garrow_seekable_input_stream_peek() misses "error" parameter document > > > Key: ARROW-5481 > URL: https://issues.apache.org/jira/browse/ARROW-5481 > Project: Apache Arrow > Issue Type: Improvement > Components: GLib >Reporter: Sutou Kouhei >Assignee: Yosuke Shiro >Priority: Minor > Labels: pull-request-available > > https://github.com/apache/arrow/blob/master/c_glib/arrow-glib/input-stream.cpp#L402 > This is follow-up work of > https://github.com/apache/arrow/commit/ff2ee42092c09d13e38205fedd3acbdf375199f0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-5365) [C++][CI] Add UBSan and ASAN into CI
[ https://issues.apache.org/jira/browse/ARROW-5365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-5365. - Resolution: Fixed Fix Version/s: 0.14.0 Issue resolved by pull request 4347 [https://github.com/apache/arrow/pull/4347] > [C++][CI] Add UBSan and ASAN into CI > > > Key: ARROW-5365 > URL: https://issues.apache.org/jira/browse/ARROW-5365 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Continuous Integration >Reporter: Micah Kornfield >Assignee: Micah Kornfield >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 10h > Remaining Estimate: 0h > > We should be running UBSan and ASAN in CI to detect issues with the C++ > build. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5474) [C++] What version of Boost do we require now?
[ https://issues.apache.org/jira/browse/ARROW-5474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854820#comment-16854820 ] Neal Richardson commented on ARROW-5474: Boost 1.58 seemed to be sufficient in the build I was fighting last week. Fine by me if we declare that the minimum, so that still leaves two tasks: (1) fail with a useful message in CMake if boost is < 1.58, and (2) resolve why it later reported that boost 1.67 was present. > [C++] What version of Boost do we require now? > -- > > Key: ARROW-5474 > URL: https://issues.apache.org/jira/browse/ARROW-5474 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Neal Richardson >Assignee: Antoine Pitrou >Priority: Major > Fix For: 0.14.0 > > > See debugging on https://issues.apache.org/jira/browse/ARROW-5470. One > possible cause for that error is that the local filesystem patch increased > the version of boost that we actually require. The boost version (1.54 vs > 1.58) was one difference between failure and success. > Another point of confusion was that CMake reported two different versions of > boost at different times. > If we require a minimum version of boost, can we document that better, check > for it more accurately in the build scripts, and fail with a useful message > if that minimum isn't met? Or something else helpful. > If the actual cause of the failure was something else (e.g. compiler > version), we should figure that out too. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5481) [GLib] garrow_seekable_input_stream_peek() misses "error" parameter document
[ https://issues.apache.org/jira/browse/ARROW-5481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854818#comment-16854818 ] Yosuke Shiro commented on ARROW-5481: - Yes. > [GLib] garrow_seekable_input_stream_peek() misses "error" parameter document > > > Key: ARROW-5481 > URL: https://issues.apache.org/jira/browse/ARROW-5481 > Project: Apache Arrow > Issue Type: Improvement > Components: GLib >Reporter: Sutou Kouhei >Assignee: Yosuke Shiro >Priority: Minor > > https://github.com/apache/arrow/blob/master/c_glib/arrow-glib/input-stream.cpp#L402 > This is follow-up work of > https://github.com/apache/arrow/commit/ff2ee42092c09d13e38205fedd3acbdf375199f0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5077) [Rust] Release process should change Cargo.toml to use release versions
[ https://issues.apache.org/jira/browse/ARROW-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-5077: -- Labels: pull-request-available (was: ) > [Rust] Release process should change Cargo.toml to use release versions > --- > > Key: ARROW-5077 > URL: https://issues.apache.org/jira/browse/ARROW-5077 > Project: Apache Arrow > Issue Type: Improvement > Components: Rust >Affects Versions: 0.13.0 >Reporter: Andy Grove >Assignee: Yosuke Shiro >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > > In the dev tree we use relative path dependencies between arrow, parquet, and > datafusion, which means we can't just run cargo publish for each crate from > the release source tarball. > It would be good to have the relaese packaging change the Cargo.toml for > parquet and datafusion to have dependencies on a versioned release instead of > a relative path to remove this manual step when publishing. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5020) [C++][Gandiva] Split Gandiva-related conda packages for builds into separate .yml conda env file
[ https://issues.apache.org/jira/browse/ARROW-5020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-5020: -- Labels: pull-request-available (was: ) > [C++][Gandiva] Split Gandiva-related conda packages for builds into separate > .yml conda env file > > > Key: ARROW-5020 > URL: https://issues.apache.org/jira/browse/ARROW-5020 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Continuous Integration >Reporter: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > > These installs are large and should not be required unconditionally in CI and > elsewhere -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-5256) [Packaging][deb] Failed to build with LLVM 7.1.0
[ https://issues.apache.org/jira/browse/ARROW-5256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-5256. - Resolution: Fixed Fix Version/s: 0.14.0 Issue resolved by pull request 4453 [https://github.com/apache/arrow/pull/4453] > [Packaging][deb] Failed to build with LLVM 7.1.0 > > > Key: ARROW-5256 > URL: https://issues.apache.org/jira/browse/ARROW-5256 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ - Gandiva, Packaging >Reporter: Sutou Kouhei >Assignee: Sutou Kouhei >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > https://travis-ci.org/ursa-labs/crossbow/builds/527710714#L6144-L6157 > {noformat} > CMake Error at cmake_modules/FindLLVM.cmake:33 (find_package): > Could not find a configuration file for package "LLVM" that is compatible > with requested version "7.0". > The following configuration files were considered but not accepted: > /usr/lib/llvm-7/cmake/LLVMConfig.cmake, version: 7.1.0 > /usr/lib/llvm-7/lib/cmake/llvm/LLVMConfig.cmake, version: 7.1.0 > /usr/lib/llvm-7/share/llvm/cmake/LLVMConfig.cmake, version: 7.1.0 > /usr/lib/llvm-3.8/share/llvm/cmake/LLVMConfig.cmake, version: 3.8.1 > /usr/share/llvm-3.8/cmake/LLVMConfig.cmake, version: 3.8.1 > Call Stack (most recent call first): > src/gandiva/CMakeLists.txt:31 (find_package) > {noformat} > Can we use "7" instead of "7.0" for {{ARROW_LLVM_VERSION}}? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-3814) [R] RecordBatch$from_arrays()
[ https://issues.apache.org/jira/browse/ARROW-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-3814: --- Assignee: Romain François > [R] RecordBatch$from_arrays() > - > > Key: ARROW-3814 > URL: https://issues.apache.org/jira/browse/ARROW-3814 > Project: Apache Arrow > Issue Type: New Feature > Components: R >Reporter: Romain François >Assignee: Romain François >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 4h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-3814) [R] RecordBatch$from_arrays()
[ https://issues.apache.org/jira/browse/ARROW-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-3814. - Resolution: Fixed Fix Version/s: 0.14.0 Issue resolved by pull request 3565 [https://github.com/apache/arrow/pull/3565] > [R] RecordBatch$from_arrays() > - > > Key: ARROW-3814 > URL: https://issues.apache.org/jira/browse/ARROW-3814 > Project: Apache Arrow > Issue Type: New Feature > Components: R >Reporter: Romain François >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 4h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5493) [Integration/Go] add Go support for IPC integration tests
Sebastien Binet created ARROW-5493: -- Summary: [Integration/Go] add Go support for IPC integration tests Key: ARROW-5493 URL: https://issues.apache.org/jira/browse/ARROW-5493 Project: Apache Arrow Issue Type: Test Components: Go, Integration Reporter: Sebastien Binet it would be great to add support for the cross-language integration tests of the IPC file/stream format: - [https://github.com/apache/arrow/tree/master/integration] - [https://github.com/apache/arrow/blob/master/integration/integration_test.py] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-4787) [C++] Include "null" values (perhaps with an option to toggle on/off) in hash kernel actions
[ https://issues.apache.org/jira/browse/ARROW-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Francois Saint-Jacques reassigned ARROW-4787: - Assignee: Francois Saint-Jacques > [C++] Include "null" values (perhaps with an option to toggle on/off) in hash > kernel actions > > > Key: ARROW-4787 > URL: https://issues.apache.org/jira/browse/ARROW-4787 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Francois Saint-Jacques >Priority: Major > Fix For: 0.15.0 > > > Null is a meaningful value in the context of analytics. We should have the > option of considering it distinctly in e.g. {{ValueCounts}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5492) [R] Add "columns" option to read_parquet to read subset of columns
Wes McKinney created ARROW-5492: --- Summary: [R] Add "columns" option to read_parquet to read subset of columns Key: ARROW-5492 URL: https://issues.apache.org/jira/browse/ARROW-5492 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Wes McKinney Fix For: 0.14.0 This is just like like the same option in {{pyarrow.parquet.read_table}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-263) Design an initial IPC mechanism for Arrow Vectors
[ https://issues.apache.org/jira/browse/ARROW-263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Micah Kornfield resolved ARROW-263. --- Resolution: Won't Fix > Design an initial IPC mechanism for Arrow Vectors > - > > Key: ARROW-263 > URL: https://issues.apache.org/jira/browse/ARROW-263 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Micah Kornfield >Priority: Major > > Prior discussion on this topic [1]. > Use-cases: > 1. User defined function (UDF) execution: One process wants to execute a > user defined function written in another language (e.g. Java executing a > function defined in python, this involves creating Arrow Arrays in java, > sending them to python and receiving a new set of Arrow Arrays produced in > python back in the java process). > 2. If a storage system and a query engine are running on the same host we > might want use IPC instead of RPC (e.g. Apache Drill querying Apache Kudu) > Assumptions: > 1. IPC mechanism should be useable from the core set of supported languages > (Java, Python, C) on POSIX and ideally windows systems. Ideally, we would > not need to add dependencies on additional libraries outside of each > languages outside of this document. > We want leverage shared memory for Arrays to avoid doubling RAM requirements > by duplicating the same Array in different memory locations. > 2. Under some circumstances shared memory might be more efficient than FIFOs > or sockets (in other scenarios they won’t see thread below). > 3. Security is not a concern for V1, we assume all processes running are > “trusted”. > Requirements: > 1.Resource management: > a. Both processes need a way of allocating memory for Arrow Arrays so > that data can be passed from one process to another. > b. There must be a mechanism to cleanup unused Arrow Arrays to limit > resource usage but avoid race conditions when processing arrays > 2. Schema negotiation - before sending data, both processes need to agree on > schema each one will produce. > Out of scope requirements: > 1. IPC channel metadata discovery is out of scope of this document. > Discovery can be provided by passing appropriate command line arguments, > configuration files or other mechanisms like RPC (in which case RPC channel > discovery is still an issue). > [1] > http://mail-archives.apache.org/mod_mbox/arrow-dev/201603.mbox/%3c8d5f7e3237b3ed47b84cf187bb17b666148e7...@shsmsx103.ccr.corp.intel.com%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-263) Design an initial IPC mechanism for Arrow Vectors
[ https://issues.apache.org/jira/browse/ARROW-263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854722#comment-16854722 ] Micah Kornfield commented on ARROW-263: --- I thin it can be closed. > Design an initial IPC mechanism for Arrow Vectors > - > > Key: ARROW-263 > URL: https://issues.apache.org/jira/browse/ARROW-263 > Project: Apache Arrow > Issue Type: New Feature >Reporter: Micah Kornfield >Priority: Major > > Prior discussion on this topic [1]. > Use-cases: > 1. User defined function (UDF) execution: One process wants to execute a > user defined function written in another language (e.g. Java executing a > function defined in python, this involves creating Arrow Arrays in java, > sending them to python and receiving a new set of Arrow Arrays produced in > python back in the java process). > 2. If a storage system and a query engine are running on the same host we > might want use IPC instead of RPC (e.g. Apache Drill querying Apache Kudu) > Assumptions: > 1. IPC mechanism should be useable from the core set of supported languages > (Java, Python, C) on POSIX and ideally windows systems. Ideally, we would > not need to add dependencies on additional libraries outside of each > languages outside of this document. > We want leverage shared memory for Arrays to avoid doubling RAM requirements > by duplicating the same Array in different memory locations. > 2. Under some circumstances shared memory might be more efficient than FIFOs > or sockets (in other scenarios they won’t see thread below). > 3. Security is not a concern for V1, we assume all processes running are > “trusted”. > Requirements: > 1.Resource management: > a. Both processes need a way of allocating memory for Arrow Arrays so > that data can be passed from one process to another. > b. There must be a mechanism to cleanup unused Arrow Arrays to limit > resource usage but avoid race conditions when processing arrays > 2. Schema negotiation - before sending data, both processes need to agree on > schema each one will produce. > Out of scope requirements: > 1. IPC channel metadata discovery is out of scope of this document. > Discovery can be provided by passing appropriate command line arguments, > configuration files or other mechanisms like RPC (in which case RPC channel > discovery is still an issue). > [1] > http://mail-archives.apache.org/mod_mbox/arrow-dev/201603.mbox/%3c8d5f7e3237b3ed47b84cf187bb17b666148e7...@shsmsx103.ccr.corp.intel.com%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-5395) [C++] Utilize stream EOS in File format
[ https://issues.apache.org/jira/browse/ARROW-5395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-5395: --- Assignee: John Muehlhausen > [C++] Utilize stream EOS in File format > --- > > Key: ARROW-5395 > URL: https://issues.apache.org/jira/browse/ARROW-5395 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Documentation >Reporter: John Muehlhausen >Assignee: John Muehlhausen >Priority: Minor > Labels: pull-request-available > Fix For: 0.14.0 > > Original Estimate: 0.25h > Time Spent: 2h 10m > Remaining Estimate: 0h > > We currently do not write EOS at the end of a Message stream inside the File > format. As a result, the file cannot be parsed sequentially. This change > prepares for other implementations or future reference features that parse a > File sequentially... i.e. without access to seek(). > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-5395) [C++] Utilize stream EOS in File format
[ https://issues.apache.org/jira/browse/ARROW-5395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-5395. - Resolution: Fixed Fix Version/s: 0.14.0 Issue resolved by pull request 4372 [https://github.com/apache/arrow/pull/4372] > [C++] Utilize stream EOS in File format > --- > > Key: ARROW-5395 > URL: https://issues.apache.org/jira/browse/ARROW-5395 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Documentation >Reporter: John Muehlhausen >Priority: Minor > Labels: pull-request-available > Fix For: 0.14.0 > > Original Estimate: 0.25h > Time Spent: 2h > Remaining Estimate: 0h > > We currently do not write EOS at the end of a Message stream inside the File > format. As a result, the file cannot be parsed sequentially. This change > prepares for other implementations or future reference features that parse a > File sequentially... i.e. without access to seek(). > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-4504) [C++] Reduce the number of unit test executables
[ https://issues.apache.org/jira/browse/ARROW-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-4504. - Resolution: Fixed Issue resolved by pull request 4442 [https://github.com/apache/arrow/pull/4442] > [C++] Reduce the number of unit test executables > > > Key: ARROW-4504 > URL: https://issues.apache.org/jira/browse/ARROW-4504 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Link times are a significant drag in MSVC builds. They don't affect Linux > nearly as much when building with Ninja. I suggest we combine some of the > fast-running tests within logical units to see if we can cut down from 106 > test executables to 70 or so > {code} > 100% tests passed, 0 tests failed out of 107 > Label Time Summary: > arrow-tests = 21.19 sec*proc (48 tests) > arrow_python-tests= 0.26 sec*proc (1 test) > example = 0.05 sec*proc (1 test) > gandiva-tests = 11.65 sec*proc (39 tests) > parquet-tests = 35.81 sec*proc (18 tests) > unittest = 68.92 sec*proc (106 tests) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-5390) [CI] Job time limit exceeded on Travis
[ https://issues.apache.org/jira/browse/ARROW-5390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney reassigned ARROW-5390: --- Assignee: Antoine Pitrou > [CI] Job time limit exceeded on Travis > -- > > Key: ARROW-5390 > URL: https://issues.apache.org/jira/browse/ARROW-5390 > Project: Apache Arrow > Issue Type: Bug > Components: Continuous Integration, Python >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 20m > Remaining Estimate: 0h > > We now frequently hit the 50 minutes job time limit on Travis-CI on the > "Python 2.7 and 3.6 unit tests w/ Valgrind, conda-forge toolchain, coverage" > job. > e.g. https://travis-ci.org/pitrou/arrow/jobs/535373888 > Hopefully we can soon ditch Python 2.7, which would allow saving a bit of > time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-5390) [CI] Job time limit exceeded on Travis
[ https://issues.apache.org/jira/browse/ARROW-5390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney resolved ARROW-5390. - Resolution: Fixed Fix Version/s: 0.14.0 Issue resolved by pull request 4457 [https://github.com/apache/arrow/pull/4457] > [CI] Job time limit exceeded on Travis > -- > > Key: ARROW-5390 > URL: https://issues.apache.org/jira/browse/ARROW-5390 > Project: Apache Arrow > Issue Type: Bug > Components: Continuous Integration, Python >Reporter: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > Time Spent: 20m > Remaining Estimate: 0h > > We now frequently hit the 50 minutes job time limit on Travis-CI on the > "Python 2.7 and 3.6 unit tests w/ Valgrind, conda-forge toolchain, coverage" > job. > e.g. https://travis-ci.org/pitrou/arrow/jobs/535373888 > Hopefully we can soon ditch Python 2.7, which would allow saving a bit of > time. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5474) [C++] What version of Boost do we require now?
[ https://issues.apache.org/jira/browse/ARROW-5474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854692#comment-16854692 ] Wes McKinney commented on ARROW-5474: - That's also fine with me. We have https://github.com/apache/arrow/blob/master/cpp/Dockerfile.ubuntu-xenial to help maintain this support, is running that sufficient to check? > [C++] What version of Boost do we require now? > -- > > Key: ARROW-5474 > URL: https://issues.apache.org/jira/browse/ARROW-5474 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Neal Richardson >Assignee: Antoine Pitrou >Priority: Major > Fix For: 0.14.0 > > > See debugging on https://issues.apache.org/jira/browse/ARROW-5470. One > possible cause for that error is that the local filesystem patch increased > the version of boost that we actually require. The boost version (1.54 vs > 1.58) was one difference between failure and success. > Another point of confusion was that CMake reported two different versions of > boost at different times. > If we require a minimum version of boost, can we document that better, check > for it more accurately in the build scripts, and fail with a useful message > if that minimum isn't met? Or something else helpful. > If the actual cause of the failure was something else (e.g. compiler > version), we should figure that out too. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5488) [R] Workaround when C++ lib not available
[ https://issues.apache.org/jira/browse/ARROW-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854686#comment-16854686 ] Wes McKinney commented on ARROW-5488: - One possibility is to bundle the Arrow header files with the CRAN package and build against them, but do not include {{libarrow}} and {{libparquet}} when linking. When the library is loaded, the libraries must be loaded in-process via {{dlopen}} before loading the Rcpp extensions > [R] Workaround when C++ lib not available > - > > Key: ARROW-5488 > URL: https://issues.apache.org/jira/browse/ARROW-5488 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Romain François >Priority: Major > > As a way to get to CRAN, we need some way for the package still compile and > install and test (although do nothing useful) even when the c++ lib is not > available. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5474) [C++] What version of Boost do we require now?
[ https://issues.apache.org/jira/browse/ARROW-5474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854687#comment-16854687 ] Uwe L. Korn commented on ARROW-5474: For adoption reasons, it would be nice to use Ubuntu 16.04 as a baseline. This has Boost 1.58. > [C++] What version of Boost do we require now? > -- > > Key: ARROW-5474 > URL: https://issues.apache.org/jira/browse/ARROW-5474 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Neal Richardson >Assignee: Antoine Pitrou >Priority: Major > Fix For: 0.14.0 > > > See debugging on https://issues.apache.org/jira/browse/ARROW-5470. One > possible cause for that error is that the local filesystem patch increased > the version of boost that we actually require. The boost version (1.54 vs > 1.58) was one difference between failure and success. > Another point of confusion was that CMake reported two different versions of > boost at different times. > If we require a minimum version of boost, can we document that better, check > for it more accurately in the build scripts, and fail with a useful message > if that minimum isn't met? Or something else helpful. > If the actual cause of the failure was something else (e.g. compiler > version), we should figure that out too. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (ARROW-5488) [R] Workaround when C++ lib not available
[ https://issues.apache.org/jira/browse/ARROW-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854686#comment-16854686 ] Wes McKinney edited comment on ARROW-5488 at 6/3/19 3:06 PM: - One possibility is to bundle the Arrow header files with the CRAN package and build against them, but do not include {{libarrow}} and {{libparquet}} when linking. When the library is loaded, the libraries must be loaded in-process via {{dlopen}} before loading the Rcpp extensions. The C++ libraries can be installed then after the fact was (Author: wesmckinn): One possibility is to bundle the Arrow header files with the CRAN package and build against them, but do not include {{libarrow}} and {{libparquet}} when linking. When the library is loaded, the libraries must be loaded in-process via {{dlopen}} before loading the Rcpp extensions > [R] Workaround when C++ lib not available > - > > Key: ARROW-5488 > URL: https://issues.apache.org/jira/browse/ARROW-5488 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Romain François >Priority: Major > > As a way to get to CRAN, we need some way for the package still compile and > install and test (although do nothing useful) even when the c++ lib is not > available. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5407) [C++] Integration test Travis CI entry builds many unnecessary targets
[ https://issues.apache.org/jira/browse/ARROW-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-5407: -- Labels: pull-request-available (was: ) > [C++] Integration test Travis CI entry builds many unnecessary targets > -- > > Key: ARROW-5407 > URL: https://issues.apache.org/jira/browse/ARROW-5407 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > > Only the IPC and Flight integration test targets are needed to run the tests. > It appears that all targets including all unit tests are being built in Travis -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5488) [R] Workaround when C++ lib not available
[ https://issues.apache.org/jira/browse/ARROW-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854674#comment-16854674 ] Uwe L. Korn commented on ARROW-5488: Would this involve compiling the C++ lib from source in that case? > [R] Workaround when C++ lib not available > - > > Key: ARROW-5488 > URL: https://issues.apache.org/jira/browse/ARROW-5488 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Romain François >Priority: Major > > As a way to get to CRAN, we need some way for the package still compile and > install and test (although do nothing useful) even when the c++ lib is not > available. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1774) [C++] Add "view" function to create zero-copy views for compatible types, if supported
[ https://issues.apache.org/jira/browse/ARROW-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854672#comment-16854672 ] Antoine Pitrou commented on ARROW-1774: --- What is meant here by "same physical memory layout"? For example, should we allow a view of int32 as float32? If so, it's not the same thing as casting. > [C++] Add "view" function to create zero-copy views for compatible types, if > supported > -- > > Key: ARROW-1774 > URL: https://issues.apache.org/jira/browse/ARROW-1774 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Priority: Major > Fix For: 0.14.0 > > > Similar to NumPy's {{ndarray.view}}, but with the restriction that the input > and output types have the same physical Arrow memory layout. This might be as > simple as adding a "zero copy only" option to the existing {{Cast}} kernel -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-5040) [C++] ArrayFromJSON can't parse Timestamp from strings
[ https://issues.apache.org/jira/browse/ARROW-5040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved ARROW-5040. --- Resolution: Duplicate Looks like this was fixed as part of ARROW-4708 > [C++] ArrayFromJSON can't parse Timestamp from strings > -- > > Key: ARROW-5040 > URL: https://issues.apache.org/jira/browse/ARROW-5040 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Benjamin Kietzman >Assignee: Benjamin Kietzman >Priority: Minor > Fix For: 0.14.0 > > > Currently, ArrayFromJSON can only produce timestamps from numbers. > This is an impediment for writing tests for JSON and CSV, since those formats > parse timestamps from strings and it's not immediately obvious that > "2000-20-29" corresponds to 951782400 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5491) [C++] Remove unecessary semicolons following MACRO definitions
[ https://issues.apache.org/jira/browse/ARROW-5491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-5491: -- Labels: pull-request-available (was: ) > [C++] Remove unecessary semicolons following MACRO definitions > -- > > Key: ARROW-5491 > URL: https://issues.apache.org/jira/browse/ARROW-5491 > Project: Apache Arrow > Issue Type: Task > Components: C++ >Affects Versions: 0.13.0 >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Labels: pull-request-available > Fix For: 0.14.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5491) [C++] Remove unecessary semicolons following MACRO definitions
[ https://issues.apache.org/jira/browse/ARROW-5491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Hulette updated ARROW-5491: - Summary: [C++] Remove unecessary semicolons following MACRO definitions (was: Remove unecessary semicolons following MACRO definitions) > [C++] Remove unecessary semicolons following MACRO definitions > -- > > Key: ARROW-5491 > URL: https://issues.apache.org/jira/browse/ARROW-5491 > Project: Apache Arrow > Issue Type: Task > Components: C++ >Affects Versions: 0.13.0 >Reporter: Brian Hulette >Assignee: Brian Hulette >Priority: Major > Fix For: 0.14.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5491) Remove unecessary semicolons following MACRO definitions
Brian Hulette created ARROW-5491: Summary: Remove unecessary semicolons following MACRO definitions Key: ARROW-5491 URL: https://issues.apache.org/jira/browse/ARROW-5491 Project: Apache Arrow Issue Type: Task Components: C++ Affects Versions: 0.13.0 Reporter: Brian Hulette Assignee: Brian Hulette Fix For: 0.14.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ARROW-5430) [Python] Can read but not write parquet partitioned on large ints
[ https://issues.apache.org/jira/browse/ARROW-5430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved ARROW-5430. --- Resolution: Fixed Fix Version/s: 0.14.0 Issue resolved by pull request 4440 [https://github.com/apache/arrow/pull/4440] > [Python] Can read but not write parquet partitioned on large ints > - > > Key: ARROW-5430 > URL: https://issues.apache.org/jira/browse/ARROW-5430 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.13.0 > Environment: Mac OSX 10.14.4, Python 3.7.1, x86_64. >Reporter: Robin Kåveland >Priority: Minor > Labels: parquet, pull-request-available > Fix For: 0.14.0 > > Time Spent: 40m > Remaining Estimate: 0h > > Here's a contrived example that reproduces this issue using pandas: > {code:java} > import numpy as np > import pandas as pd > real_usernames = np.array(['anonymize', 'me']) > usernames = pd.util.hash_array(real_usernames) > login_count = [13, 9] > df = pd.DataFrame({'user': usernames, 'logins': login_count}) > df.to_parquet('can_write.parq', partition_cols=['user']) > # But not read > pd.read_parquet('can_write.parq'){code} > Expected behaviour: > * Either the write fails > * Or the read succeeds > Actual behaviour: The read fails with the following error: > {code:java} > Traceback (most recent call last): > File "", line 2, in > File > "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pandas/io/parquet.py", > line 282, in read_parquet > return impl.read(path, columns=columns, **kwargs) > File > "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pandas/io/parquet.py", > line 129, in read > **kwargs).to_pandas() > File > "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pyarrow/parquet.py", > line 1152, in read_table > use_pandas_metadata=use_pandas_metadata) > File > "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pyarrow/filesystem.py", > line 181, in read_parquet > use_pandas_metadata=use_pandas_metadata) > File > "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pyarrow/parquet.py", > line 1014, in read > use_pandas_metadata=use_pandas_metadata) > File > "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pyarrow/parquet.py", > line 587, in read > dictionary = partitions.levels[i].dictionary > File > "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pyarrow/parquet.py", > line 642, in dictionary > dictionary = lib.array(integer_keys) > File "pyarrow/array.pxi", line 173, in pyarrow.lib.array > File "pyarrow/array.pxi", line 36, in pyarrow.lib._sequence_to_array > File "pyarrow/error.pxi", line 104, in pyarrow.lib.check_status > pyarrow.lib.ArrowException: Unknown error: Python int too large to convert to > C long{code} > I set the priority to minor here because it's easy enough to work around this > in user code unless you really need the 64 bit hash (and you probably > shouldn't be partitioning on that anyway). > I could take a stab at writing a patch for this if there's interest? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-5490) [C++] Remove ARROW_BOOST_HEADER_ONLY
Antoine Pitrou created ARROW-5490: - Summary: [C++] Remove ARROW_BOOST_HEADER_ONLY Key: ARROW-5490 URL: https://issues.apache.org/jira/browse/ARROW-5490 Project: Apache Arrow Issue Type: Task Components: C++ Affects Versions: 0.13.0 Reporter: Antoine Pitrou That CMake variable isn't exposed as an option and probably doesn't work anymore. All code paths depending on that variable should probably be simplified. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ARROW-5407) [C++] Integration test Travis CI entry builds many unnecessary targets
[ https://issues.apache.org/jira/browse/ARROW-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou reassigned ARROW-5407: - Assignee: Antoine Pitrou > [C++] Integration test Travis CI entry builds many unnecessary targets > -- > > Key: ARROW-5407 > URL: https://issues.apache.org/jira/browse/ARROW-5407 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Wes McKinney >Assignee: Antoine Pitrou >Priority: Major > Fix For: 0.14.0 > > > Only the IPC and Flight integration test targets are needed to run the tests. > It appears that all targets including all unit tests are being built in Travis -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5190) [R] Discussion: tibble dependency in R package
[ https://issues.apache.org/jira/browse/ARROW-5190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854615#comment-16854615 ] James Lamb commented on ARROW-5190: --- Thanks [~romainfrancois]!!! > [R] Discussion: tibble dependency in R package > -- > > Key: ARROW-5190 > URL: https://issues.apache.org/jira/browse/ARROW-5190 > Project: Apache Arrow > Issue Type: Wish > Components: R >Reporter: James Lamb >Assignee: Romain François >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Hello, > > I would like to have a discussion on the use of *tibble* in the Apache Arrow > R package. I looked at the [the project contributor > guidelines|[https://github.com/apache/arrow/blob/master/docs/source/developers/contributing.rst]] > and could not tell where the best place might be to start a public > discussion on this topic, so I decided on JIRA. I apologize if this is not > the right place. > > *TL;DR* > I would like to propose moving the *tibble* dependency in the *arrow* R > package to "Suggests", removing the _as_tibble()_ in _read_arrow()_, and > having the core R code implementing the Arrow API only return data.frames or > other base-R data structures wherever possible. > > *Reasoning* > [As far as I can > tell|[https://github.com/apache/arrow/search?p=1=tibble_q=tibble]], > outside of tests and examples *tibble* is only used in three places in the > package: > * S3 methods to convert Arrow objects to tibbles > (_as_tibble.arrow__::__RecordBatch()_, _as.tibble.arrow::Table()_) > * optional "convert to tibble on the way out" behavior controlled by a flag > in interfaces to file types (parquet and feather) > * > [_read_arrow()_|[https://github.com/apache/arrow/blob/0536ef8174982a7a13a251174cc38701e8663b68/r/R/read_table.R#L88]] > > In my opinion, all three of these uses of *tibble* are valuable for > developers who use that package (or other packages in its ecosystem), but I > am not convinced that the Arrow R package should be tightly coupled to them. > In the Python community, *pandas* is a broadly agreed-upon standard for > representing data frames. Even with that ubiquity, *pyarrow* does not depend > on *pandas* (it is not necessary to work with it) and all "compatibility with > *pandas*" code is isolated in a place explicitly intended for that purpose: > [https://github.com/apache/arrow/blob/master/python/pyarrow/pandas_compat.py] > I think that is the ideal handling for integration of Arrow extensions with > other software it might be used with. This allows users who care about only > one of the integrations (e.g. feather, parquet, HDFS, Apache Spark, tibble, > data.table, etc.) to only have to build things they're already using. > > *Other background information* > I took the time to write this tonight after talking a colleague through the > issues *feather* (R package) users experienced after the *tibble 2.0* > release. See for example > [wesm/feather#374|[https://github.com/wesm/feather/issues/374]] and > [wesm/feather#372|[https://github.com/wesm/feather/issues/37|https://github.com/wesm/feather/issues/374]2]. > When *tibble 2.0* came out it broke *feather 0.3.1* and the maintainers > there promptly released to CRAN a *feather 0.3.2* which was compatible with > *tibble 2.0+*. Unfortunately, this still caused disruptions for many people > using *feather* (who inadvertently had *tibble* upgraded as part of > installing other packages which depended on it). Nothing about *tibble* was > necessary to the implementation of _read_feather()_, as far as I can tell, > but this design choice made installing and upgrading *tibble* non-optional > for developers who just wanted to use the feather file format and all it's > awesome features. > > If the proposal here is accepted, I hope it will mean we can prevent > repeating the same experience with the R *arrow* package and set a strong > precedent for developers who want to add compatibility in this package for > other members of the ecosystem like parquet or Apache Spark. > > > Thank you for hearing me out! > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5473) [C++] Build failure on googletest_ep on Windows when using Ninja
[ https://issues.apache.org/jira/browse/ARROW-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854593#comment-16854593 ] Antoine Pitrou commented on ARROW-5473: --- I think that line is necessary to workaround a CMake bug when non-existent directories are referenced. > [C++] Build failure on googletest_ep on Windows when using Ninja > > > Key: ARROW-5473 > URL: https://issues.apache.org/jira/browse/ARROW-5473 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Wes McKinney >Priority: Major > Fix For: 0.14.0 > > > I consistently get this error when trying to use Ninja locally: > {code} > -- extracting... > > src='C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/release-1.8.1.tar.gz' > > dst='C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/googletest_ep' > -- extracting... [tar xfz] > -- extracting... [analysis] > -- extracting... [rename] > CMake Error at googletest_ep-stamp/extract-googletest_ep.cmake:51 (file): > file RENAME failed to rename > > C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/ex-googletest_ep1234/googletest-release-1.8.1 > to > C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/googletest_ep > because: Directory not empty > [179/623] Building CXX object > src\arrow\CMakeFiles\arrow_static.dir\array\builder_dict.cc.obj > ninja: build stopped: subcommand failed. > {code} > I'm running within cmdr terminal emulator so it's conceivable there's some > path modifications that are causing issues. > The CMake invocation is > {code} > cmake -G "Ninja" ^ -DCMAKE_BUILD_TYPE=Release ^ > -DARROW_BUILD_TESTS=on ^ -DARROW_CXXFLAGS="/WX /MP" ^ > -DARROW_FLIGHT=off -DARROW_PARQUET=on -DARROW_GANDIVA=ON > -DARROW_VERBOSE_THIRDPARTY_BUILD=on .. > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-5473) [C++] Build failure on googletest_ep on Windows when using Ninja
[ https://issues.apache.org/jira/browse/ARROW-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854594#comment-16854594 ] Antoine Pitrou commented on ARROW-5473: --- See e.g. https://gitlab.kitware.com/cmake/cmake/issues/15052 > [C++] Build failure on googletest_ep on Windows when using Ninja > > > Key: ARROW-5473 > URL: https://issues.apache.org/jira/browse/ARROW-5473 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Wes McKinney >Priority: Major > Fix For: 0.14.0 > > > I consistently get this error when trying to use Ninja locally: > {code} > -- extracting... > > src='C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/release-1.8.1.tar.gz' > > dst='C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/googletest_ep' > -- extracting... [tar xfz] > -- extracting... [analysis] > -- extracting... [rename] > CMake Error at googletest_ep-stamp/extract-googletest_ep.cmake:51 (file): > file RENAME failed to rename > > C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/ex-googletest_ep1234/googletest-release-1.8.1 > to > C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/googletest_ep > because: Directory not empty > [179/623] Building CXX object > src\arrow\CMakeFiles\arrow_static.dir\array\builder_dict.cc.obj > ninja: build stopped: subcommand failed. > {code} > I'm running within cmdr terminal emulator so it's conceivable there's some > path modifications that are causing issues. > The CMake invocation is > {code} > cmake -G "Ninja" ^ -DCMAKE_BUILD_TYPE=Release ^ > -DARROW_BUILD_TESTS=on ^ -DARROW_CXXFLAGS="/WX /MP" ^ > -DARROW_FLIGHT=off -DARROW_PARQUET=on -DARROW_GANDIVA=ON > -DARROW_VERBOSE_THIRDPARTY_BUILD=on .. > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-2) Post Simple Website
[ https://issues.apache.org/jira/browse/ARROW-2?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-2: --- Component/s: Website > Post Simple Website > --- > > Key: ARROW-2 > URL: https://issues.apache.org/jira/browse/ARROW-2 > Project: Apache Arrow > Issue Type: New Feature > Components: Website >Reporter: Jacques Nadeau >Assignee: Jason Altekruse >Priority: Major > Fix For: 0.1.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-35) Add a short call-to-action / how-to-get-involved to the main README.md
[ https://issues.apache.org/jira/browse/ARROW-35?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-35: Component/s: Documentation > Add a short call-to-action / how-to-get-involved to the main README.md > -- > > Key: ARROW-35 > URL: https://issues.apache.org/jira/browse/ARROW-35 > Project: Apache Arrow > Issue Type: Task > Components: Documentation >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Fix For: 0.1.0 > > > * Engage on the mailing list > * Read the format documentation > * Contribute code and design ideas to the reference implementations -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-13) Add PR merge tool similar to that used in Parquet
[ https://issues.apache.org/jira/browse/ARROW-13?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-13: Component/s: Developer Tools > Add PR merge tool similar to that used in Parquet > - > > Key: ARROW-13 > URL: https://issues.apache.org/jira/browse/ARROW-13 > Project: Apache Arrow > Issue Type: New Feature > Components: Developer Tools >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Minor > Fix For: 0.1.0 > > > See https://github.com/apache/parquet-mr/tree/master/dev -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-6) Hope to add development document
[ https://issues.apache.org/jira/browse/ARROW-6?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-6: --- Component/s: Documentation > Hope to add development document > > > Key: ARROW-6 > URL: https://issues.apache.org/jira/browse/ARROW-6 > Project: Apache Arrow > Issue Type: Wish > Components: Documentation >Reporter: AllenFang >Priority: Major > Labels: documentation > Fix For: 0.3.0 > > > Awesome project, great job :) > Anyway, is possible to add some useful documents for development > Thanks -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-5) Error when run maven install
[ https://issues.apache.org/jira/browse/ARROW-5?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5: --- Component/s: Java > Error when run maven install > > > Key: ARROW-5 > URL: https://issues.apache.org/jira/browse/ARROW-5 > Project: Apache Arrow > Issue Type: Bug > Components: Java > Environment: Ubuntu Maven 3.2 >Reporter: AllenFang >Assignee: Liwei Lin(Inactive) >Priority: Major > Labels: maven > Fix For: 0.1.0 > > > when I run maven to install, I got following problem: > Failed to execute goal > org.apache.drill.tools:drill-fmpp-maven-plugin:1.4.0:generate > (generate-fmpp) on project vector: Execution generate-fmpp of goal > org.apache.drill.tools:drill-fmpp-maven-plugin:1.4.0:generate failed: > Plugin org.apache.drill.tools:drill-fmpp-maven-plugin:1.4.0 or one of its > dependencies could not be resolved: Failure to find > org.freemarker:freemarker:jar:2.3.24-SNAPSHOT in > http://repository.apache.org/snapshots was cached in the local repository > btw, I just clone repo and run mvn clean install. > dev mailing link > http://mail-archives.apache.org/mod_mbox/arrow-dev/201602.mbox/%3CCAABsKVCSEULDTL2hoANL8-wrWMDO8%3Dgv0RFmSQMXt3MdiqUcPw%40mail.gmail.com%3E -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-4) Initial Arrow CPP Implementation
[ https://issues.apache.org/jira/browse/ARROW-4?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-4: --- Component/s: C++ > Initial Arrow CPP Implementation > > > Key: ARROW-4 > URL: https://issues.apache.org/jira/browse/ARROW-4 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Jacques Nadeau >Assignee: Wes McKinney >Priority: Major > Fix For: 0.1.0 > > Attachments: 0001-arrow-initial-cpp.patch.gz > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-8) Set up Travis CI
[ https://issues.apache.org/jira/browse/ARROW-8?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-8: --- Component/s: Continuous Integration > Set up Travis CI > > > Key: ARROW-8 > URL: https://issues.apache.org/jira/browse/ARROW-8 > Project: Apache Arrow > Issue Type: Improvement > Components: Continuous Integration >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Fix For: 0.1.0 > > > I will ask INFRA to enable Travis CI for the repo, and then will propose a > patch that runs the C++ test suite to start (unless some kind soul beats me > to it with a Java patch). We can use a build matrix with one build per > language SDK (so gcc and clang for arrow-cpp) to start. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3) Post Initial Arrow Format Spec
[ https://issues.apache.org/jira/browse/ARROW-3?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-3: --- Component/s: Format > Post Initial Arrow Format Spec > -- > > Key: ARROW-3 > URL: https://issues.apache.org/jira/browse/ARROW-3 > Project: Apache Arrow > Issue Type: New Feature > Components: Format >Reporter: Jacques Nadeau >Assignee: Wes McKinney >Priority: Major > Fix For: 0.1.0 > > Attachments: 0001-arrow-format-draft.patch.gz > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-64) Add zsh support to C++ build scripts
[ https://issues.apache.org/jira/browse/ARROW-64?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-64: Component/s: Developer Tools C++ > Add zsh support to C++ build scripts > > > Key: ARROW-64 > URL: https://issues.apache.org/jira/browse/ARROW-64 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Developer Tools >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > Fix For: 0.1.0 > > > All scripts that have to be sourced during development currently only support > bash. This patch adds zsh support. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-52) Set up project blog
[ https://issues.apache.org/jira/browse/ARROW-52?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-52: Component/s: Website > Set up project blog > --- > > Key: ARROW-52 > URL: https://issues.apache.org/jira/browse/ARROW-52 > Project: Apache Arrow > Issue Type: Task > Components: Website >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Fix For: 0.3.0 > > > I would like to be able to publish blog posts under arrow.apache.org (see, > for example, posts I've written recently like > http://blog.ibis-project.org/kudu-impala-ibis/). > I have a bias towards using Pelican as the publishing toolchain as posts can > be written in Markdown and include IPython notebooks. GitHub pages is the > easiest way to publish but this may not be compatible with apache.org, so > using rsync or some other static content publishing tool would be fine too. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-51) Move ValueVector test from Drill project
[ https://issues.apache.org/jira/browse/ARROW-51?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-51: Component/s: Java > Move ValueVector test from Drill project > > > Key: ARROW-51 > URL: https://issues.apache.org/jira/browse/ARROW-51 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Reporter: Steven Phillips >Assignee: Steven Phillips >Priority: Major > Fix For: 0.1.0 > > > There are some simple tests that should be moved from the Drill project. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-36) Remove fixVersions from patch tool (until we have them)
[ https://issues.apache.org/jira/browse/ARROW-36?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-36: Component/s: Developer Tools > Remove fixVersions from patch tool (until we have them) > --- > > Key: ARROW-36 > URL: https://issues.apache.org/jira/browse/ARROW-36 > Project: Apache Arrow > Issue Type: Bug > Components: Developer Tools >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Fix For: 0.1.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-46) Port DRILL-4410 to Arrow
[ https://issues.apache.org/jira/browse/ARROW-46?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-46: Component/s: Java > Port DRILL-4410 to Arrow > > > Key: ARROW-46 > URL: https://issues.apache.org/jira/browse/ARROW-46 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Reporter: Steven Phillips >Assignee: Steven Phillips >Priority: Major > Fix For: 0.1.0 > > > This fixes a bug in ListVector which causes OversizeAllocation -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-102) travis-ci support for java project
[ https://issues.apache.org/jira/browse/ARROW-102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-102: - Component/s: Java Continuous Integration > travis-ci support for java project > -- > > Key: ARROW-102 > URL: https://issues.apache.org/jira/browse/ARROW-102 > Project: Apache Arrow > Issue Type: Task > Components: Continuous Integration, Java >Reporter: Laurent Goujon >Assignee: Laurent Goujon >Priority: Minor > Fix For: 0.1.0 > > > The java part of the Arrow project has no automated build using travis-ci, > unlike c++. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-101) Fix java warnings emitted by java compiler
[ https://issues.apache.org/jira/browse/ARROW-101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-101: - Component/s: Java > Fix java warnings emitted by java compiler > -- > > Key: ARROW-101 > URL: https://issues.apache.org/jira/browse/ARROW-101 > Project: Apache Arrow > Issue Type: Task > Components: Java >Reporter: Laurent Goujon >Assignee: Laurent Goujon >Priority: Trivial > Fix For: 0.1.0 > > > Java compiler emits several warnings regarding the use of rawtypes and > unclosed resources on a few classes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-85) C++: memcmp can be avoided in Equal when comparing with the same Buffer
[ https://issues.apache.org/jira/browse/ARROW-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-85: Component/s: C++ > C++: memcmp can be avoided in Equal when comparing with the same Buffer > --- > > Key: ARROW-85 > URL: https://issues.apache.org/jira/browse/ARROW-85 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Kai Zheng >Assignee: Kai Zheng >Priority: Major > Fix For: 0.1.0 > > > It looks too expensive to use memcmp to compare two buffers. Instead, the > starting address and length/capacity would be good enough to use. Higher > level codes relying on memcmp behaviour can be done in higher level. > Update: memcmp should be avoided in Equal when comparing with the same > Buffer. In other cases, it's still needed to know the content are the same or > not. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-84) C++: separate test codes
[ https://issues.apache.org/jira/browse/ARROW-84?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-84: Component/s: C++ > C++: separate test codes > > > Key: ARROW-84 > URL: https://issues.apache.org/jira/browse/ARROW-84 > Project: Apache Arrow > Issue Type: Test > Components: C++ >Reporter: Kai Zheng >Priority: Major > Fix For: 0.1.0 > > > Currently test codes reside with normal codes together. Not sure if it's a > good practice in C++, but guess it would be much clean to separate the test > codes out into a {{test}} folder. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-103) Missing patterns from .gitignore
[ https://issues.apache.org/jira/browse/ARROW-103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-103: - Component/s: Developer Tools > Missing patterns from .gitignore > > > Key: ARROW-103 > URL: https://issues.apache.org/jira/browse/ARROW-103 > Project: Apache Arrow > Issue Type: Bug > Components: Developer Tools >Reporter: Dan Robinson >Assignee: Dan Robinson >Priority: Minor > Fix For: 0.1.0 > > > There are some build files created on at least my platform (such as > libpyarrow.dylib) that aren't covered by any of the patterns in the > .gitignore files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-95) Scaffold Main Documentation using asciidoc
[ https://issues.apache.org/jira/browse/ARROW-95?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-95: Component/s: Documentation > Scaffold Main Documentation using asciidoc > -- > > Key: ARROW-95 > URL: https://issues.apache.org/jira/browse/ARROW-95 > Project: Apache Arrow > Issue Type: Task > Components: Documentation >Reporter: Uwe L. Korn >Assignee: Wes McKinney >Priority: Major > Fix For: 0.3.0 > > > For the general documentation of Arrow, we want to use asciidoc. The "general > documentation" includes: > * The Arrow spec / memory layout > * Howtos for building arrow on different platforms > * Getting Started snippets for each language and a link to the (to-be-build) > API documentation > It would be nice to have a build system and the main file/folder structure in > place so we can split the work up. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-213) Exposing static arrow build
[ https://issues.apache.org/jira/browse/ARROW-213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-213: - Component/s: C++ > Exposing static arrow build > --- > > Key: ARROW-213 > URL: https://issues.apache.org/jira/browse/ARROW-213 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Philipp Moritz >Assignee: Wes McKinney >Priority: Major > Fix For: 0.1.0 > > Original Estimate: 10m > Remaining Estimate: 10m > > I'd like to be able to link arrow statically into my application. > At the moment, arrow can be built as a static library using the > 'LIBARROW_LINKAGE' variable in CMakeLists.txt. I'd like to configure this > behavior from the command line. Are there any objections of exposing the > variable as a cached variable? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-259) Use flatbuffer fields in java implementation
[ https://issues.apache.org/jira/browse/ARROW-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-259: - Component/s: Java > Use flatbuffer fields in java implementation > > > Key: ARROW-259 > URL: https://issues.apache.org/jira/browse/ARROW-259 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Reporter: Steven Phillips >Assignee: Steven Phillips >Priority: Major > Fix For: 0.1.0 > > > The value vectors in the java implementation should switch to using the Field > and types as defined in the flatbuffer spec. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-217) Fix Travis w.r.t conda 4.1.0 changes
[ https://issues.apache.org/jira/browse/ARROW-217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-217: - Component/s: Continuous Integration > Fix Travis w.r.t conda 4.1.0 changes > > > Key: ARROW-217 > URL: https://issues.apache.org/jira/browse/ARROW-217 > Project: Apache Arrow > Issue Type: Bug > Components: Continuous Integration >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > Fix For: 0.1.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-218) Add option to use GitHub API token via environment variable when merging PRs
[ https://issues.apache.org/jira/browse/ARROW-218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-218: - Component/s: Developer Tools > Add option to use GitHub API token via environment variable when merging PRs > > > Key: ARROW-218 > URL: https://issues.apache.org/jira/browse/ARROW-218 > Project: Apache Arrow > Issue Type: Improvement > Components: Developer Tools >Reporter: Wes McKinney >Assignee: Wes McKinney >Priority: Major > Fix For: 0.1.0 > > > While the patch tool only requires public repo access, on shared networks, > the GitHub API rate limit may be exceeded for unauthenticated requests. This > patch will add an option to use a GitHub personal access token to authenticate -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-205) builds failing on master branch with apt-get error
[ https://issues.apache.org/jira/browse/ARROW-205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-205: - Component/s: Continuous Integration > builds failing on master branch with apt-get error > -- > > Key: ARROW-205 > URL: https://issues.apache.org/jira/browse/ARROW-205 > Project: Apache Arrow > Issue Type: Bug > Components: Continuous Integration >Reporter: Micah Kornfield >Assignee: Micah Kornfield >Priority: Blocker > Labels: ci-failure > Fix For: 0.1.0 > > > Logs from: https://travis-ci.org/apache/arrow/jobs/131207432 > 0.50s$ sudo -E apt-get -yq --no-install-suggests --no-install-recommends > --force-yes install clang-format-3.7 clang-tidy-3.7 gcc-4.9 g++-4.9 gdb > ccache cmake valgrind > Reading package lists... > Building dependency tree... > Reading state information... > E: Unable to locate package g++-4.9 > E: Couldn't find any package by regex 'g++-4.9' > apt-get.diagnostics > apt-get install failed > $ cat ~/apt-get-update.log > Get:1 http://downloads-distro.mongodb.org dist Release.gpg [490 B] > Hit http://us.archive.ubuntu.com precise Release.gpg > Get:2 http://us.archive.ubuntu.com precise-updates Release.gpg [198 B] > Get:3 http://downloads-distro.mongodb.org dist Release [2,040 B] > Get:4 http://us.archive.ubuntu.com precise-backports Release.gpg [198 B] > Hit http://us.archive.ubuntu.com precise Release > Get:5 http://downloads-distro.mongodb.org dist/10gen amd64 Packages [30.9 kB] > Get:6 http://us.archive.ubuntu.com precise-updates Release [55.4 kB] > Hit http://ppa.launchpad.net precise Release.gpg > Get:7 http://security.ubuntu.com precise-security Release.gpg [198 B] > Get:8 http://downloads-distro.mongodb.org dist/10gen i386 Packages [30.5 kB] > Get:9 http://us.archive.ubuntu.com precise-backports Release [55.5 kB] > Hit http://ppa.launchpad.net precise Release.gpg > Get:10 http://security.ubuntu.com precise-security Release [55.5 kB] > Hit http://us.archive.ubuntu.com precise/main Sources > Ign http://downloads-distro.mongodb.org dist/10gen TranslationIndex > Hit http://us.archive.ubuntu.com precise/universe Sources > Get:11 http://ppa.launchpad.net precise Release.gpg [316 B] > Hit http://us.archive.ubuntu.com precise/multiverse Sources > Hit http://us.archive.ubuntu.com precise/main amd64 Packages > Hit http://us.archive.ubuntu.com precise/universe amd64 Packages > Get:12 http://ppa.launchpad.net precise Release.gpg [316 B] > Hit http://us.archive.ubuntu.com precise/multiverse amd64 Packages > Hit http://us.archive.ubuntu.com precise/main i386 Packages > Hit http://us.archive.ubuntu.com precise/universe i386 Packages > Hit http://ppa.launchpad.net precise Release.gpg > Hit http://us.archive.ubuntu.com precise/multiverse i386 Packages > Get:13 http://security.ubuntu.com precise-security/main Sources [142 kB] > Hit http://us.archive.ubuntu.com precise/main TranslationIndex > Get:14 http://ppa.launchpad.net precise Release.gpg [316 B] > Hit http://us.archive.ubuntu.com precise/multiverse TranslationIndex > Hit http://us.archive.ubuntu.com precise/universe TranslationIndex > Hit http://ppa.launchpad.net precise Release > Get:15 http://us.archive.ubuntu.com precise-updates/main Sources [496 kB] > Get:16 http://security.ubuntu.com precise-security/universe Sources [48.5 kB] > Hit http://ppa.launchpad.net precise Release > Get:17 http://us.archive.ubuntu.com precise-updates/universe Sources [127 kB] > Get:18 http://security.ubuntu.com precise-security/multiverse Sources [2,721 > B] > Get:19 http://us.archive.ubuntu.com precise-updates/multiverse Sources [10.2 > kB] > Get:20 http://us.archive.ubuntu.com precise-updates/main amd64 Packages [989 > kB] > Get:21 http://ppa.launchpad.net precise Release [12.9 kB] > Get:22 http://security.ubuntu.com precise-security/main amd64 Packages [607 > kB] > Get:23 http://us.archive.ubuntu.com precise-updates/universe amd64 Packages > [276 kB] > Get:24 http://us.archive.ubuntu.com precise-updates/multiverse amd64 Packages > [16.9 kB] > Get:25 http://ppa.launchpad.net precise Release [12.9 kB] > Get:26 http://us.archive.ubuntu.com precise-updates/main i386 Packages [1,051 > kB] > Get:27 http://us.archive.ubuntu.com precise-updates/universe i386 Packages > [286 kB] > Hit http://ppa.launchpad.net precise Release > Get:28 http://us.archive.ubuntu.com precise-updates/multiverse i386 Packages > [17.1 kB] > Get:29 http://us.archive.ubuntu.com precise-updates/main TranslationIndex > [208 B] > Get:30 http://us.archive.ubuntu.com precise-updates/multiverse > TranslationIndex [202 B] > Get:31 http://ppa.launchpad.net precise Release [13.0 kB] > Get:32 http://us.archive.ubuntu.com precise-updates/universe TranslationIndex > [205 B] > Get:33 http://security.ubuntu.com
[jira] [Updated] (ARROW-265) Negative decimal values have wrong padding
[ https://issues.apache.org/jira/browse/ARROW-265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-265: - Component/s: Java > Negative decimal values have wrong padding > -- > > Key: ARROW-265 > URL: https://issues.apache.org/jira/browse/ARROW-265 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Reporter: Steven Phillips >Assignee: Steven Phillips >Priority: Major > Fix For: 0.1.0 > > > Pad negative values with 1 and not 0. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-269) UnionVector getBuffers method does not include typevector
[ https://issues.apache.org/jira/browse/ARROW-269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-269: - Component/s: Java > UnionVector getBuffers method does not include typevector > - > > Key: ARROW-269 > URL: https://issues.apache.org/jira/browse/ARROW-269 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Reporter: Steven Phillips >Assignee: Steven Phillips >Priority: Major > Fix For: 0.7.0 > > > Only the interMapVecgtor's buffers are returned currently. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-264) Create an Arrow File format
[ https://issues.apache.org/jira/browse/ARROW-264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-264: - Component/s: Format > Create an Arrow File format > --- > > Key: ARROW-264 > URL: https://issues.apache.org/jira/browse/ARROW-264 > Project: Apache Arrow > Issue Type: Improvement > Components: Format >Reporter: Julien Le Dem >Assignee: Julien Le Dem >Priority: Major > Fix For: 0.1.0 > > > File layout: > (DictionaryBatch, RecordBatch, Schema as defined in Message.fbs) > {noformat} > MAGIC: ARROW1 > ( > DictionaryBatch: DictionaryBatch Header (FlatBuffer) > DictionaryBatch: DictionaryBatch Body (buffers concatenated) > )* > ( > RecordBacth: RecordBatch Header (FlatBuffer) > RecordBacth: RecordBatch Body (buffers concatenated) > )+ > Footer: Flatbuffer > Footer length: int (4 bytes unsigned LE) > MAGIC: ARROW1 > {noformat} > Footer definition: > {noformat} > table Footer { > schema: org.apache.arrow.flatbuf.Schema; > dictionaries: [ Block ]; > recordBatches: [ Block ]; > } > struct Block { > offset: long; > metaDataLength: int; > bodyLength: long; > } > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)