[jira] [Commented] (ARROW-5500) [R] read_csv_arrow() signature should match readr::read_csv()

2019-06-03 Thread Neal Richardson (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855271#comment-16855271
 ] 

Neal Richardson commented on ARROW-5500:


Perhaps it does. IMO the idea that we would want two R packages–one that just 
wraps the C++ library for developers, and a separate one that provides an 
interface for analysts to work with datasets–is YAGNI. There's no reason we 
can't have the lower-level C++ API wrappers and the analyst-centric interface 
in the same package, and no value at this point to splitting them. 

Currently there already is a lower-level `csv_table_reader`, and all the 
`read_csv_arrow()` function does is invoke it: 
[https://github.com/apache/arrow/blob/master/r/R/csv.R#L179-L181]

I'm proposing adding R-flavored substance to `read_csv_arrow()` (and 
documenting it). I'm not proposing removing or making private the classes and 
methods that invoke the C++ library, so a "developer" could choose to write 
something at that layer if it were useful. 

> [R] read_csv_arrow() signature should match readr::read_csv()
> -
>
> Key: ARROW-5500
> URL: https://issues.apache.org/jira/browse/ARROW-5500
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Priority: Major
> Fix For: 0.14.0
>
>
> So that using it is natural for R users. Internally handle all of the logic 
> needed to map those onto csv_convert_options, csv_read_options, and 
> csv_parse_options. And give a useful error message if a user requests a 
> setting that readr supports but arrow does not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5498) [C++] Build failure with Flatbuffers 1.11.0 and MinGW

2019-06-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-5498.
-
   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4465
[https://github.com/apache/arrow/pull/4465]

> [C++] Build failure with Flatbuffers 1.11.0 and MinGW
> -
>
> Key: ARROW-5498
> URL: https://issues.apache.org/jira/browse/ARROW-5498
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Sutou Kouhei
>Assignee: Sutou Kouhei
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5481) [GLib] garrow_seekable_input_stream_peek() misses "error" parameter document

2019-06-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-5481.
-
   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4462
[https://github.com/apache/arrow/pull/4462]

> [GLib] garrow_seekable_input_stream_peek() misses "error" parameter document
> 
>
> Key: ARROW-5481
> URL: https://issues.apache.org/jira/browse/ARROW-5481
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: GLib
>Reporter: Sutou Kouhei
>Assignee: Yosuke Shiro
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://github.com/apache/arrow/blob/master/c_glib/arrow-glib/input-stream.cpp#L402
> This is follow-up work of 
> https://github.com/apache/arrow/commit/ff2ee42092c09d13e38205fedd3acbdf375199f0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Issue Comment Deleted] (ARROW-5438) [JS] Utilize stream EOS in File format

2019-06-03 Thread John Muehlhausen (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Muehlhausen updated ARROW-5438:

Comment: was deleted

(was: Will add test case when I can)

> [JS] Utilize stream EOS in File format
> --
>
> Key: ARROW-5438
> URL: https://issues.apache.org/jira/browse/ARROW-5438
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: John Muehlhausen
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We currently do not write EOS at the end of a Message stream inside the File 
> format.  As a result, the file cannot be parsed sequentially.  This change 
> prepares for other implementations or future reference features that parse a 
> File sequentially... i.e. without access to seek().



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Issue Comment Deleted] (ARROW-5439) [Java] Utilize stream EOS in File format

2019-06-03 Thread John Muehlhausen (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Muehlhausen updated ARROW-5439:

Comment: was deleted

(was: Will add test case when I can)

> [Java] Utilize stream EOS in File format
> 
>
> Key: ARROW-5439
> URL: https://issues.apache.org/jira/browse/ARROW-5439
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: John Muehlhausen
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We currently do not write EOS at the end of a Message stream inside the File 
> format.  As a result, the file cannot be parsed sequentially.  This change 
> prepares for other implementations or future reference features that parse a 
> File sequentially... i.e. without access to seek().



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5439) [Java] Utilize stream EOS in File format

2019-06-03 Thread John Muehlhausen (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855189#comment-16855189
 ] 

John Muehlhausen commented on ARROW-5439:
-

Will add test case when I can

> [Java] Utilize stream EOS in File format
> 
>
> Key: ARROW-5439
> URL: https://issues.apache.org/jira/browse/ARROW-5439
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: John Muehlhausen
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We currently do not write EOS at the end of a Message stream inside the File 
> format.  As a result, the file cannot be parsed sequentially.  This change 
> prepares for other implementations or future reference features that parse a 
> File sequentially... i.e. without access to seek().



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5438) [JS] Utilize stream EOS in File format

2019-06-03 Thread John Muehlhausen (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855188#comment-16855188
 ] 

John Muehlhausen commented on ARROW-5438:
-

Will add test case when I can

> [JS] Utilize stream EOS in File format
> --
>
> Key: ARROW-5438
> URL: https://issues.apache.org/jira/browse/ARROW-5438
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: John Muehlhausen
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We currently do not write EOS at the end of a Message stream inside the File 
> format.  As a result, the file cannot be parsed sequentially.  This change 
> prepares for other implementations or future reference features that parse a 
> File sequentially... i.e. without access to seek().



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5438) [JS] Utilize stream EOS in File format

2019-06-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5438:
--
Labels: pull-request-available  (was: )

> [JS] Utilize stream EOS in File format
> --
>
> Key: ARROW-5438
> URL: https://issues.apache.org/jira/browse/ARROW-5438
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: JavaScript
>Reporter: John Muehlhausen
>Priority: Minor
>  Labels: pull-request-available
>
> We currently do not write EOS at the end of a Message stream inside the File 
> format.  As a result, the file cannot be parsed sequentially.  This change 
> prepares for other implementations or future reference features that parse a 
> File sequentially... i.e. without access to seek().



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5439) [Java] Utilize stream EOS in File format

2019-06-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5439:
--
Labels: pull-request-available  (was: )

> [Java] Utilize stream EOS in File format
> 
>
> Key: ARROW-5439
> URL: https://issues.apache.org/jira/browse/ARROW-5439
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: John Muehlhausen
>Priority: Minor
>  Labels: pull-request-available
>
> We currently do not write EOS at the end of a Message stream inside the File 
> format.  As a result, the file cannot be parsed sequentially.  This change 
> prepares for other implementations or future reference features that parse a 
> File sequentially... i.e. without access to seek().



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5501) [R] read/write_feather/arrow?

2019-06-03 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855181#comment-16855181
 ] 

Wes McKinney commented on ARROW-5501:
-

Can you open a JIRA issue about FeatherV2? I would like to retain the file 
format name as a "simple memory-mappable Arrow-based file format" and handle 
backwards compatibility for old files for some period of time

> [R] read/write_feather/arrow?
> -
>
> Key: ARROW-5501
> URL: https://issues.apache.org/jira/browse/ARROW-5501
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Priority: Major
> Fix For: 0.14.0
>
>
> read_feather and write_feather exist, and there is also write_arrow. But no 
> read_arrow.
> Some questions (which go beyond just R): There's talk of a "feather 2.0", 
> i.e. "just" serializing the IPC format (which IIUC is what write_arrow does). 
> Are we going to continue to call the file format "Feather", and possibly 
> continue supporting the "feather 1.0" format as a subset/special case? Or 
> will "feather" mean this limited format and "arrow" be the name of the 
> full-featured file?
> In terms of this issue, should write_arrow be folded into write_feather and 
> there be an argument for indicating which version to write? Or should the 
> distinction be maintained, and we need to add a read_arrow() function?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-5501) [R] read/write_feather/arrow?

2019-06-03 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855181#comment-16855181
 ] 

Wes McKinney edited comment on ARROW-5501 at 6/4/19 12:50 AM:
--

Can you open a JIRA issue about FeatherV2 (or maybe this is the issue)? I would 
like to retain the file format name as a "simple memory-mappable Arrow-based 
file format" and handle backwards compatibility for old files for some period 
of time


was (Author: wesmckinn):
Can you open a JIRA issue about FeatherV2? I would like to retain the file 
format name as a "simple memory-mappable Arrow-based file format" and handle 
backwards compatibility for old files for some period of time

> [R] read/write_feather/arrow?
> -
>
> Key: ARROW-5501
> URL: https://issues.apache.org/jira/browse/ARROW-5501
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Priority: Major
> Fix For: 0.14.0
>
>
> read_feather and write_feather exist, and there is also write_arrow. But no 
> read_arrow.
> Some questions (which go beyond just R): There's talk of a "feather 2.0", 
> i.e. "just" serializing the IPC format (which IIUC is what write_arrow does). 
> Are we going to continue to call the file format "Feather", and possibly 
> continue supporting the "feather 1.0" format as a subset/special case? Or 
> will "feather" mean this limited format and "arrow" be the name of the 
> full-featured file?
> In terms of this issue, should write_arrow be folded into write_feather and 
> there be an argument for indicating which version to write? Or should the 
> distinction be maintained, and we need to add a read_arrow() function?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5500) [R] read_csv_arrow() signature should match readr::read_csv()

2019-06-03 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16855180#comment-16855180
 ] 

Wes McKinney commented on ARROW-5500:
-

This brings up a bigger question of whether the `arrow` library as it is being 
developed now is the desired "front end" for end-users. 

> [R] read_csv_arrow() signature should match readr::read_csv()
> -
>
> Key: ARROW-5500
> URL: https://issues.apache.org/jira/browse/ARROW-5500
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Priority: Major
> Fix For: 0.14.0
>
>
> So that using it is natural for R users. Internally handle all of the logic 
> needed to map those onto csv_convert_options, csv_read_options, and 
> csv_parse_options. And give a useful error message if a user requests a 
> setting that readr supports but arrow does not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5505) [R] Stop masking base R functions

2019-06-03 Thread Neal Richardson (JIRA)
Neal Richardson created ARROW-5505:
--

 Summary: [R] Stop masking base R functions
 Key: ARROW-5505
 URL: https://issues.apache.org/jira/browse/ARROW-5505
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
 Fix For: 0.14.0


The package startup message about masking base functions can be scary. We 
should avoid masking base functions without a compelling reason (i.e. let's do 
arrow_array() instead of array(), arrow_table()). The arrow versions do very 
different things than the base functions; plus, end users shouldn’t be dealing 
directly with Tables and Arrays, so they don’t need to figure so prominently in 
the public API of the package.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5504) [R] move use_threads argument to global option

2019-06-03 Thread Neal Richardson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-5504:
---
Priority: Minor  (was: Major)

> [R] move use_threads argument to global option
> --
>
> Key: ARROW-5504
> URL: https://issues.apache.org/jira/browse/ARROW-5504
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Priority: Minor
> Fix For: 0.14.0
>
>
> Why wouldn't you want to use the multithreaded API for reading data from 
> arrow into R? We shouldn't clutter our function signatures with options that 
> people won't use.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5504) [R] move use_threads argument to global option

2019-06-03 Thread Neal Richardson (JIRA)
Neal Richardson created ARROW-5504:
--

 Summary: [R] move use_threads argument to global option
 Key: ARROW-5504
 URL: https://issues.apache.org/jira/browse/ARROW-5504
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
 Fix For: 0.14.0


Why wouldn't you want to use the multithreaded API for reading data from arrow 
into R? We shouldn't clutter our function signatures with options that people 
won't use.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5503) [R] add read_json()

2019-06-03 Thread Neal Richardson (JIRA)
Neal Richardson created ARROW-5503:
--

 Summary: [R] add read_json()
 Key: ARROW-5503
 URL: https://issues.apache.org/jira/browse/ARROW-5503
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
 Fix For: 0.14.0


The C++ library gained a JSON file reader last month, and pyarrow already has 
bindings for it. R should have it too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5502) [R] file readers should mmap

2019-06-03 Thread Neal Richardson (JIRA)
Neal Richardson created ARROW-5502:
--

 Summary: [R] file readers should mmap
 Key: ARROW-5502
 URL: https://issues.apache.org/jira/browse/ARROW-5502
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
 Fix For: 0.14.0


Arrow is supposed to let you work with datasets bigger than memory. Memory 
mapping is a big part of that. It should be the default way that files are read 
in the `read_*` functions. To disable memory mapping, we could use a global 
`option()`, or a function argument, but that might clutter the interface. Or we 
could not give a choice and only fall back to not memory mapping if the 
platform/file system doesn't support it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5501) [R] read/write_feather/arrow?

2019-06-03 Thread Neal Richardson (JIRA)
Neal Richardson created ARROW-5501:
--

 Summary: [R] read/write_feather/arrow?
 Key: ARROW-5501
 URL: https://issues.apache.org/jira/browse/ARROW-5501
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
 Fix For: 0.14.0


read_feather and write_feather exist, and there is also write_arrow. But no 
read_arrow.

Some questions (which go beyond just R): There's talk of a "feather 2.0", i.e. 
"just" serializing the IPC format (which IIUC is what write_arrow does). Are we 
going to continue to call the file format "Feather", and possibly continue 
supporting the "feather 1.0" format as a subset/special case? Or will "feather" 
mean this limited format and "arrow" be the name of the full-featured file?

In terms of this issue, should write_arrow be folded into write_feather and 
there be an argument for indicating which version to write? Or should the 
distinction be maintained, and we need to add a read_arrow() function?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5500) [R] read_csv_arrow() signature should match readr::read_csv()

2019-06-03 Thread Neal Richardson (JIRA)
Neal Richardson created ARROW-5500:
--

 Summary: [R] read_csv_arrow() signature should match 
readr::read_csv()
 Key: ARROW-5500
 URL: https://issues.apache.org/jira/browse/ARROW-5500
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
 Fix For: 0.14.0


So that using it is natural for R users. Internally handle all of the logic 
needed to map those onto csv_convert_options, csv_read_options, and 
csv_parse_options. And give a useful error message if a user requests a setting 
that readr supports but arrow does not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5492) [R] Add "col_select" argument to read_* functions to read subset of columns

2019-06-03 Thread Neal Richardson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-5492:
---
Description: 
read_feather, read_parquet, read_csv_arrow (and read_json, when it exists) 
should take a `col_select` argument, following the model of 
[vroom|http://vroom.r-lib.org/articles/vroom.html#column-selection] (readr and 
base R file readers also support this feature, just much more awkwardly).

Currently, read_feather has a "columns" argument and none of the other readers 
expose it. Parquet can certainly support it; cf. {{pyarrow.parquet.read_table.}}

 

  was:This is just like like the same option in {{pyarrow.parquet.read_table}}


> [R] Add "col_select" argument to read_* functions to read subset of columns 
> 
>
> Key: ARROW-5492
> URL: https://issues.apache.org/jira/browse/ARROW-5492
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> read_feather, read_parquet, read_csv_arrow (and read_json, when it exists) 
> should take a `col_select` argument, following the model of 
> [vroom|http://vroom.r-lib.org/articles/vroom.html#column-selection] (readr 
> and base R file readers also support this feature, just much more awkwardly).
> Currently, read_feather has a "columns" argument and none of the other 
> readers expose it. Parquet can certainly support it; cf. 
> {{pyarrow.parquet.read_table.}}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5492) [R] Add "col_select" argument to read_* functions to read subset of columns

2019-06-03 Thread Neal Richardson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-5492:
---
Summary: [R] Add "col_select" argument to read_* functions to read subset 
of columns   (was: [R] Add "columns" option to read_parquet to read subset of 
columns )

> [R] Add "col_select" argument to read_* functions to read subset of columns 
> 
>
> Key: ARROW-5492
> URL: https://issues.apache.org/jira/browse/ARROW-5492
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> This is just like like the same option in {{pyarrow.parquet.read_table}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5499) [R] Alternate bindings for when libarrow is not found

2019-06-03 Thread Neal Richardson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-5499:
---
Component/s: R

> [R] Alternate bindings for when libarrow is not found
> -
>
> Key: ARROW-5499
> URL: https://issues.apache.org/jira/browse/ARROW-5499
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Romain François
>Priority: Major
> Fix For: 0.14.0
>
>
> This will also allow the package to build and install successfully on hosts 
> where the arrow C++ library is not present, which will enable us, among other 
> things, to provide an `install_arrow()` function similar to other R packages 
> that have big external dependencies.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5499) [R] Alternate bindings for when libarrow is not found

2019-06-03 Thread Neal Richardson (JIRA)
Neal Richardson created ARROW-5499:
--

 Summary: [R] Alternate bindings for when libarrow is not found
 Key: ARROW-5499
 URL: https://issues.apache.org/jira/browse/ARROW-5499
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Neal Richardson
Assignee: Romain François
 Fix For: 0.14.0


This will also allow the package to build and install successfully on hosts 
where the arrow C++ library is not present, which will enable us, among other 
things, to provide an `install_arrow()` function similar to other R packages 
that have big external dependencies.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5498) [C++] Build failure with Flatbuffers 1.11.0 and MinGW

2019-06-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5498:
--
Labels: pull-request-available  (was: )

> [C++] Build failure with Flatbuffers 1.11.0 and MinGW
> -
>
> Key: ARROW-5498
> URL: https://issues.apache.org/jira/browse/ARROW-5498
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Sutou Kouhei
>Assignee: Sutou Kouhei
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5498) [C++] Build failure with Flatbuffers 1.11.0 and MinGW

2019-06-03 Thread Sutou Kouhei (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sutou Kouhei updated ARROW-5498:

Summary: [C++] Build failure with Flatbuffers 1.11.0 and MinGW  (was: [C++] 
Add support for Flatbuffers 1.11.0 with MinGW)

> [C++] Build failure with Flatbuffers 1.11.0 and MinGW
> -
>
> Key: ARROW-5498
> URL: https://issues.apache.org/jira/browse/ARROW-5498
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Sutou Kouhei
>Assignee: Sutou Kouhei
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5498) [C++] Add support for Flatbuffers 1.11.0 with MinGW

2019-06-03 Thread Sutou Kouhei (JIRA)
Sutou Kouhei created ARROW-5498:
---

 Summary: [C++] Add support for Flatbuffers 1.11.0 with MinGW
 Key: ARROW-5498
 URL: https://issues.apache.org/jira/browse/ARROW-5498
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Sutou Kouhei
Assignee: Sutou Kouhei






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5477) [C++] Check required RapidJSON version

2019-06-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-5477.
-
   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4447
[https://github.com/apache/arrow/pull/4447]

> [C++] Check required RapidJSON version
> --
>
> Key: ARROW-5477
> URL: https://issues.apache.org/jira/browse/ARROW-5477
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Sutou Kouhei
>Assignee: Sutou Kouhei
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5497) [R][Release] Build and publish R package docs

2019-06-03 Thread Neal Richardson (JIRA)
Neal Richardson created ARROW-5497:
--

 Summary: [R][Release] Build and publish R package docs
 Key: ARROW-5497
 URL: https://issues.apache.org/jira/browse/ARROW-5497
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools, Documentation, R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 0.14.0


https://issues.apache.org/jira/browse/ARROW-5452 added the R pkgdown site 
config. Adding the wiring into the apidocs build scripts was deferred because 
there was some discussion about which workflow was supported and which was 
deprecated.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5496) [R][CI] Fix relative paths in R codecov.io reporting

2019-06-03 Thread Neal Richardson (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-5496:
---
Fix Version/s: 0.14.0

> [R][CI] Fix relative paths in R codecov.io reporting
> 
>
> Key: ARROW-5496
> URL: https://issues.apache.org/jira/browse/ARROW-5496
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://issues.apache.org/jira/browse/ARROW-5418 added coverage stats for R, 
> but due to an assumption in the coverage runner that the project would be at 
> the top level of the GitHub repository, the `r/` subdirectory was not 
> included, so R coverage stats were put in the wrong place, and detail files 
> (such as [https://codecov.io/gh/apache/arrow/src/master/R/ArrayData.R]) 
> return 404. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5496) [R][CI] Fix relative paths in R codecov.io reporting

2019-06-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5496:
--
Labels: pull-request-available  (was: )

> [R][CI] Fix relative paths in R codecov.io reporting
> 
>
> Key: ARROW-5496
> URL: https://issues.apache.org/jira/browse/ARROW-5496
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Minor
>  Labels: pull-request-available
>
> https://issues.apache.org/jira/browse/ARROW-5418 added coverage stats for R, 
> but due to an assumption in the coverage runner that the project would be at 
> the top level of the GitHub repository, the `r/` subdirectory was not 
> included, so R coverage stats were put in the wrong place, and detail files 
> (such as [https://codecov.io/gh/apache/arrow/src/master/R/ArrayData.R]) 
> return 404. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5496) [R][CI] Fix relative paths in R codecov.io reporting

2019-06-03 Thread Neal Richardson (JIRA)
Neal Richardson created ARROW-5496:
--

 Summary: [R][CI] Fix relative paths in R codecov.io reporting
 Key: ARROW-5496
 URL: https://issues.apache.org/jira/browse/ARROW-5496
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration, R
Reporter: Neal Richardson
Assignee: Neal Richardson


https://issues.apache.org/jira/browse/ARROW-5418 added coverage stats for R, 
but due to an assumption in the coverage runner that the project would be at 
the top level of the GitHub repository, the `r/` subdirectory was not included, 
so R coverage stats were put in the wrong place, and detail files (such as 
[https://codecov.io/gh/apache/arrow/src/master/R/ArrayData.R]) return 404. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5495) [C++] Use HTTPS consistently for downloading dependencies

2019-06-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5495:
--
Labels: pull-request-available  (was: )

> [C++] Use HTTPS consistently for downloading dependencies
> -
>
> Key: ARROW-5495
> URL: https://issues.apache.org/jira/browse/ARROW-5495
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5495) [C++] Use HTTPS consistently for downloading dependencies

2019-06-03 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-5495:
---

 Summary: [C++] Use HTTPS consistently for downloading dependencies
 Key: ARROW-5495
 URL: https://issues.apache.org/jira/browse/ARROW-5495
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
Assignee: Wes McKinney
 Fix For: 0.14.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5478) [Packaging] Drop Ubuntu 14.04 support

2019-06-03 Thread Neal Richardson (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854906#comment-16854906
 ] 

Neal Richardson commented on ARROW-5478:


FWIW Trusty is at "End of Standard Support", not EOL: 
https://wiki.ubuntu.com/Releases

> [Packaging] Drop Ubuntu 14.04 support
> -
>
> Key: ARROW-5478
> URL: https://issues.apache.org/jira/browse/ARROW-5478
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Sutou Kouhei
>Assignee: Sutou Kouhei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5494) [Python] Create FileSystem bindings

2019-06-03 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-5494:
-

 Summary: [Python] Create FileSystem bindings
 Key: ARROW-5494
 URL: https://issues.apache.org/jira/browse/ARROW-5494
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Antoine Pitrou


Now that we have a C++ filesystem API, it should be usable from Python as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5494) [Python] Create FileSystem bindings

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5494:
--
Labels: filesystem  (was: )

> [Python] Create FileSystem bindings
> ---
>
> Key: ARROW-5494
> URL: https://issues.apache.org/jira/browse/ARROW-5494
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Antoine Pitrou
>Priority: Major
>  Labels: filesystem
>
> Now that we have a C++ filesystem API, it should be usable from Python as 
> well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-4912) [C++, Python] Allow specifying column names to CSV reader

2019-06-03 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854896#comment-16854896
 ] 

Antoine Pitrou commented on ARROW-4912:
---

If there's no way to change column names post-hoc, then perhaps we should just 
add one? That sounds more universal than adding ad hoc options to the CSV 
reader.

As for the header_rows=0, can you open a separate issue?

> [C++, Python] Allow specifying column names to CSV reader
> -
>
> Key: ARROW-4912
> URL: https://issues.apache.org/jira/browse/ARROW-4912
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Affects Versions: 0.12.1
>Reporter: Philipp Moritz
>Priority: Major
>  Labels: csv
>
> Currently I think there is no way to specify custom column names for CSV 
> files. It's possible to specify the full schema of the file, but not just 
> column names.
> See the related discussion here: ARROW-3722
> The goal of this is to re-use the CSV type-inference but still allow people 
> to specify custom names for the columns. As far as I know, there is currently 
> no way to set column names post-hoc, so we should provide a way to specify 
> them before reading the file.
> Related to this, ParseOptions(header_rows=0) is not currently implemented.
> Is there any current way to do this or does this need to be implmented?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5481) [GLib] garrow_seekable_input_stream_peek() misses "error" parameter document

2019-06-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5481:
--
Labels: pull-request-available  (was: )

> [GLib] garrow_seekable_input_stream_peek() misses "error" parameter document
> 
>
> Key: ARROW-5481
> URL: https://issues.apache.org/jira/browse/ARROW-5481
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: GLib
>Reporter: Sutou Kouhei
>Assignee: Yosuke Shiro
>Priority: Minor
>  Labels: pull-request-available
>
> https://github.com/apache/arrow/blob/master/c_glib/arrow-glib/input-stream.cpp#L402
> This is follow-up work of 
> https://github.com/apache/arrow/commit/ff2ee42092c09d13e38205fedd3acbdf375199f0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5365) [C++][CI] Add UBSan and ASAN into CI

2019-06-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-5365.
-
   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4347
[https://github.com/apache/arrow/pull/4347]

> [C++][CI] Add UBSan and ASAN into CI
> 
>
> Key: ARROW-5365
> URL: https://issues.apache.org/jira/browse/ARROW-5365
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Micah Kornfield
>Assignee: Micah Kornfield
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 10h
>  Remaining Estimate: 0h
>
> We should be running UBSan and ASAN in CI to detect issues with the C++ 
> build.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5474) [C++] What version of Boost do we require now?

2019-06-03 Thread Neal Richardson (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854820#comment-16854820
 ] 

Neal Richardson commented on ARROW-5474:


Boost 1.58 seemed to be sufficient in the build I was fighting last week. Fine 
by me if we declare that the minimum, so that still leaves two tasks: (1) fail 
with a useful message in CMake if boost is < 1.58, and (2) resolve why it later 
reported that boost 1.67 was present.

> [C++] What version of Boost do we require now?
> --
>
> Key: ARROW-5474
> URL: https://issues.apache.org/jira/browse/ARROW-5474
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Neal Richardson
>Assignee: Antoine Pitrou
>Priority: Major
> Fix For: 0.14.0
>
>
> See debugging on https://issues.apache.org/jira/browse/ARROW-5470. One 
> possible cause for that error is that the local filesystem patch increased 
> the version of boost that we actually require. The boost version (1.54 vs 
> 1.58) was one difference between failure and success. 
> Another point of confusion was that CMake reported two different versions of 
> boost at different times. 
> If we require a minimum version of boost, can we document that better, check 
> for it more accurately in the build scripts, and fail with a useful message 
> if that minimum isn't met? Or something else helpful.
> If the actual cause of the failure was something else (e.g. compiler 
> version), we should figure that out too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5481) [GLib] garrow_seekable_input_stream_peek() misses "error" parameter document

2019-06-03 Thread Yosuke Shiro (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854818#comment-16854818
 ] 

Yosuke Shiro commented on ARROW-5481:
-

Yes.

> [GLib] garrow_seekable_input_stream_peek() misses "error" parameter document
> 
>
> Key: ARROW-5481
> URL: https://issues.apache.org/jira/browse/ARROW-5481
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: GLib
>Reporter: Sutou Kouhei
>Assignee: Yosuke Shiro
>Priority: Minor
>
> https://github.com/apache/arrow/blob/master/c_glib/arrow-glib/input-stream.cpp#L402
> This is follow-up work of 
> https://github.com/apache/arrow/commit/ff2ee42092c09d13e38205fedd3acbdf375199f0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5077) [Rust] Release process should change Cargo.toml to use release versions

2019-06-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5077:
--
Labels: pull-request-available  (was: )

> [Rust] Release process should change Cargo.toml to use release versions
> ---
>
> Key: ARROW-5077
> URL: https://issues.apache.org/jira/browse/ARROW-5077
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Rust
>Affects Versions: 0.13.0
>Reporter: Andy Grove
>Assignee: Yosuke Shiro
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> In the dev tree we use relative path dependencies between arrow, parquet, and 
> datafusion, which means we can't just run cargo publish for each crate from 
> the release source tarball.
> It would be good to have the relaese packaging change the Cargo.toml for 
> parquet and datafusion to have dependencies on a versioned release instead of 
> a relative path to remove this manual step when publishing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5020) [C++][Gandiva] Split Gandiva-related conda packages for builds into separate .yml conda env file

2019-06-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5020:
--
Labels: pull-request-available  (was: )

> [C++][Gandiva] Split Gandiva-related conda packages for builds into separate 
> .yml conda env file
> 
>
> Key: ARROW-5020
> URL: https://issues.apache.org/jira/browse/ARROW-5020
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> These installs are large and should not be required unconditionally in CI and 
> elsewhere



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5256) [Packaging][deb] Failed to build with LLVM 7.1.0

2019-06-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-5256.
-
   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4453
[https://github.com/apache/arrow/pull/4453]

> [Packaging][deb] Failed to build with LLVM 7.1.0
> 
>
> Key: ARROW-5256
> URL: https://issues.apache.org/jira/browse/ARROW-5256
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++ - Gandiva, Packaging
>Reporter: Sutou Kouhei
>Assignee: Sutou Kouhei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> https://travis-ci.org/ursa-labs/crossbow/builds/527710714#L6144-L6157
> {noformat}
> CMake Error at cmake_modules/FindLLVM.cmake:33 (find_package):
>   Could not find a configuration file for package "LLVM" that is compatible
>   with requested version "7.0".
>   The following configuration files were considered but not accepted:
> /usr/lib/llvm-7/cmake/LLVMConfig.cmake, version: 7.1.0
> /usr/lib/llvm-7/lib/cmake/llvm/LLVMConfig.cmake, version: 7.1.0
> /usr/lib/llvm-7/share/llvm/cmake/LLVMConfig.cmake, version: 7.1.0
> /usr/lib/llvm-3.8/share/llvm/cmake/LLVMConfig.cmake, version: 3.8.1
> /usr/share/llvm-3.8/cmake/LLVMConfig.cmake, version: 3.8.1
> Call Stack (most recent call first):
>   src/gandiva/CMakeLists.txt:31 (find_package)
> {noformat}
> Can we use "7" instead of "7.0" for {{ARROW_LLVM_VERSION}}?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-3814) [R] RecordBatch$from_arrays()

2019-06-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-3814:
---

Assignee: Romain François

> [R] RecordBatch$from_arrays()
> -
>
> Key: ARROW-3814
> URL: https://issues.apache.org/jira/browse/ARROW-3814
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Romain François
>Assignee: Romain François
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-3814) [R] RecordBatch$from_arrays()

2019-06-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-3814.
-
   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 3565
[https://github.com/apache/arrow/pull/3565]

> [R] RecordBatch$from_arrays()
> -
>
> Key: ARROW-3814
> URL: https://issues.apache.org/jira/browse/ARROW-3814
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: R
>Reporter: Romain François
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5493) [Integration/Go] add Go support for IPC integration tests

2019-06-03 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5493:
--

 Summary: [Integration/Go] add Go support for IPC integration tests
 Key: ARROW-5493
 URL: https://issues.apache.org/jira/browse/ARROW-5493
 Project: Apache Arrow
  Issue Type: Test
  Components: Go, Integration
Reporter: Sebastien Binet


it would be great to add support for the cross-language integration tests of 
the IPC file/stream format:

- [https://github.com/apache/arrow/tree/master/integration]

- [https://github.com/apache/arrow/blob/master/integration/integration_test.py]

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-4787) [C++] Include "null" values (perhaps with an option to toggle on/off) in hash kernel actions

2019-06-03 Thread Francois Saint-Jacques (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Saint-Jacques reassigned ARROW-4787:
-

Assignee: Francois Saint-Jacques

> [C++] Include "null" values (perhaps with an option to toggle on/off) in hash 
> kernel actions
> 
>
> Key: ARROW-4787
> URL: https://issues.apache.org/jira/browse/ARROW-4787
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Francois Saint-Jacques
>Priority: Major
> Fix For: 0.15.0
>
>
> Null is a meaningful value in the context of analytics. We should have the 
> option of considering it distinctly in e.g. {{ValueCounts}} 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5492) [R] Add "columns" option to read_parquet to read subset of columns

2019-06-03 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-5492:
---

 Summary: [R] Add "columns" option to read_parquet to read subset 
of columns 
 Key: ARROW-5492
 URL: https://issues.apache.org/jira/browse/ARROW-5492
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Wes McKinney
 Fix For: 0.14.0


This is just like like the same option in {{pyarrow.parquet.read_table}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-263) Design an initial IPC mechanism for Arrow Vectors

2019-06-03 Thread Micah Kornfield (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Micah Kornfield resolved ARROW-263.
---
Resolution: Won't Fix

> Design an initial IPC mechanism for Arrow Vectors
> -
>
> Key: ARROW-263
> URL: https://issues.apache.org/jira/browse/ARROW-263
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Micah Kornfield
>Priority: Major
>
> Prior discussion on this topic [1].
> Use-cases:
> 1.  User defined function (UDF) execution:  One process wants to execute a 
> user defined function written in another language (e.g. Java executing a 
> function defined in python, this involves creating Arrow Arrays in java, 
> sending them to python and receiving a new set of Arrow Arrays produced in 
> python back in the java process).
> 2.  If a storage system and a query engine are running on the same host we 
> might want use IPC instead of RPC (e.g. Apache Drill querying Apache Kudu)
> Assumptions:
> 1.  IPC mechanism should be useable from the core set of supported languages 
> (Java, Python, C) on POSIX and ideally windows systems.  Ideally, we would 
> not need to add dependencies on additional libraries outside of each 
> languages outside of this document.
> We want leverage shared memory for Arrays to avoid doubling RAM requirements 
> by duplicating the same Array in different memory locations.  
> 2. Under some circumstances shared memory might be more efficient than FIFOs 
> or sockets (in other scenarios they won’t see thread below).
> 3. Security is not a concern for V1, we assume all processes running are 
> “trusted”.
> Requirements:
> 1.Resource management: 
> a.  Both processes need a way of allocating memory for Arrow Arrays so 
> that data can be passed from one process to another.
> b. There must be a mechanism to cleanup unused Arrow Arrays to limit 
> resource usage but avoid race conditions when processing arrays
> 2.  Schema negotiation - before sending data, both processes need to agree on 
> schema each one will produce.
> Out of scope requirements:
> 1.  IPC channel metadata discovery is out of scope of this document.  
> Discovery can be provided by passing appropriate command line arguments, 
> configuration files or other mechanisms like RPC (in which case RPC channel 
> discovery is still an issue).
> [1] 
> http://mail-archives.apache.org/mod_mbox/arrow-dev/201603.mbox/%3c8d5f7e3237b3ed47b84cf187bb17b666148e7...@shsmsx103.ccr.corp.intel.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-263) Design an initial IPC mechanism for Arrow Vectors

2019-06-03 Thread Micah Kornfield (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854722#comment-16854722
 ] 

Micah Kornfield commented on ARROW-263:
---

I thin it can be closed.

> Design an initial IPC mechanism for Arrow Vectors
> -
>
> Key: ARROW-263
> URL: https://issues.apache.org/jira/browse/ARROW-263
> Project: Apache Arrow
>  Issue Type: New Feature
>Reporter: Micah Kornfield
>Priority: Major
>
> Prior discussion on this topic [1].
> Use-cases:
> 1.  User defined function (UDF) execution:  One process wants to execute a 
> user defined function written in another language (e.g. Java executing a 
> function defined in python, this involves creating Arrow Arrays in java, 
> sending them to python and receiving a new set of Arrow Arrays produced in 
> python back in the java process).
> 2.  If a storage system and a query engine are running on the same host we 
> might want use IPC instead of RPC (e.g. Apache Drill querying Apache Kudu)
> Assumptions:
> 1.  IPC mechanism should be useable from the core set of supported languages 
> (Java, Python, C) on POSIX and ideally windows systems.  Ideally, we would 
> not need to add dependencies on additional libraries outside of each 
> languages outside of this document.
> We want leverage shared memory for Arrays to avoid doubling RAM requirements 
> by duplicating the same Array in different memory locations.  
> 2. Under some circumstances shared memory might be more efficient than FIFOs 
> or sockets (in other scenarios they won’t see thread below).
> 3. Security is not a concern for V1, we assume all processes running are 
> “trusted”.
> Requirements:
> 1.Resource management: 
> a.  Both processes need a way of allocating memory for Arrow Arrays so 
> that data can be passed from one process to another.
> b. There must be a mechanism to cleanup unused Arrow Arrays to limit 
> resource usage but avoid race conditions when processing arrays
> 2.  Schema negotiation - before sending data, both processes need to agree on 
> schema each one will produce.
> Out of scope requirements:
> 1.  IPC channel metadata discovery is out of scope of this document.  
> Discovery can be provided by passing appropriate command line arguments, 
> configuration files or other mechanisms like RPC (in which case RPC channel 
> discovery is still an issue).
> [1] 
> http://mail-archives.apache.org/mod_mbox/arrow-dev/201603.mbox/%3c8d5f7e3237b3ed47b84cf187bb17b666148e7...@shsmsx103.ccr.corp.intel.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-5395) [C++] Utilize stream EOS in File format

2019-06-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-5395:
---

Assignee: John Muehlhausen

> [C++] Utilize stream EOS in File format
> ---
>
> Key: ARROW-5395
> URL: https://issues.apache.org/jira/browse/ARROW-5395
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Documentation
>Reporter: John Muehlhausen
>Assignee: John Muehlhausen
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>   Original Estimate: 0.25h
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> We currently do not write EOS at the end of a Message stream inside the File 
> format.  As a result, the file cannot be parsed sequentially.  This change 
> prepares for other implementations or future reference features that parse a 
> File sequentially... i.e. without access to seek().
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5395) [C++] Utilize stream EOS in File format

2019-06-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-5395.
-
   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4372
[https://github.com/apache/arrow/pull/4372]

> [C++] Utilize stream EOS in File format
> ---
>
> Key: ARROW-5395
> URL: https://issues.apache.org/jira/browse/ARROW-5395
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Documentation
>Reporter: John Muehlhausen
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>   Original Estimate: 0.25h
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> We currently do not write EOS at the end of a Message stream inside the File 
> format.  As a result, the file cannot be parsed sequentially.  This change 
> prepares for other implementations or future reference features that parse a 
> File sequentially... i.e. without access to seek().
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-4504) [C++] Reduce the number of unit test executables

2019-06-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-4504.
-
Resolution: Fixed

Issue resolved by pull request 4442
[https://github.com/apache/arrow/pull/4442]

> [C++] Reduce the number of unit test executables
> 
>
> Key: ARROW-4504
> URL: https://issues.apache.org/jira/browse/ARROW-4504
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Link times are a significant drag in MSVC builds. They don't affect Linux 
> nearly as much when building with Ninja. I suggest we combine some of the 
> fast-running tests within logical units to see if we can cut down from 106 
> test executables to 70 or so
> {code}
> 100% tests passed, 0 tests failed out of 107
> Label Time Summary:
> arrow-tests   =  21.19 sec*proc (48 tests)
> arrow_python-tests=   0.26 sec*proc (1 test)
> example   =   0.05 sec*proc (1 test)
> gandiva-tests =  11.65 sec*proc (39 tests)
> parquet-tests =  35.81 sec*proc (18 tests)
> unittest  =  68.92 sec*proc (106 tests)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-5390) [CI] Job time limit exceeded on Travis

2019-06-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-5390:
---

Assignee: Antoine Pitrou

> [CI] Job time limit exceeded on Travis
> --
>
> Key: ARROW-5390
> URL: https://issues.apache.org/jira/browse/ARROW-5390
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration, Python
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We now frequently hit the 50 minutes job time limit on Travis-CI on the 
> "Python 2.7 and 3.6 unit tests w/ Valgrind, conda-forge toolchain, coverage" 
> job.
> e.g. https://travis-ci.org/pitrou/arrow/jobs/535373888
> Hopefully we can soon ditch Python 2.7, which would allow saving a bit of 
> time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5390) [CI] Job time limit exceeded on Travis

2019-06-03 Thread Wes McKinney (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-5390.
-
   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4457
[https://github.com/apache/arrow/pull/4457]

> [CI] Job time limit exceeded on Travis
> --
>
> Key: ARROW-5390
> URL: https://issues.apache.org/jira/browse/ARROW-5390
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration, Python
>Reporter: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We now frequently hit the 50 minutes job time limit on Travis-CI on the 
> "Python 2.7 and 3.6 unit tests w/ Valgrind, conda-forge toolchain, coverage" 
> job.
> e.g. https://travis-ci.org/pitrou/arrow/jobs/535373888
> Hopefully we can soon ditch Python 2.7, which would allow saving a bit of 
> time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5474) [C++] What version of Boost do we require now?

2019-06-03 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854692#comment-16854692
 ] 

Wes McKinney commented on ARROW-5474:
-

That's also fine with me. We have 
https://github.com/apache/arrow/blob/master/cpp/Dockerfile.ubuntu-xenial to 
help maintain this support, is running that sufficient to check?

> [C++] What version of Boost do we require now?
> --
>
> Key: ARROW-5474
> URL: https://issues.apache.org/jira/browse/ARROW-5474
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Neal Richardson
>Assignee: Antoine Pitrou
>Priority: Major
> Fix For: 0.14.0
>
>
> See debugging on https://issues.apache.org/jira/browse/ARROW-5470. One 
> possible cause for that error is that the local filesystem patch increased 
> the version of boost that we actually require. The boost version (1.54 vs 
> 1.58) was one difference between failure and success. 
> Another point of confusion was that CMake reported two different versions of 
> boost at different times. 
> If we require a minimum version of boost, can we document that better, check 
> for it more accurately in the build scripts, and fail with a useful message 
> if that minimum isn't met? Or something else helpful.
> If the actual cause of the failure was something else (e.g. compiler 
> version), we should figure that out too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5488) [R] Workaround when C++ lib not available

2019-06-03 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854686#comment-16854686
 ] 

Wes McKinney commented on ARROW-5488:
-

One possibility is to bundle the Arrow header files with the CRAN package and 
build against them, but do not include {{libarrow}} and {{libparquet}} when 
linking. When the library is loaded, the libraries must be loaded in-process 
via {{dlopen}} before loading the Rcpp extensions

> [R] Workaround when C++ lib not available
> -
>
> Key: ARROW-5488
> URL: https://issues.apache.org/jira/browse/ARROW-5488
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Romain François
>Priority: Major
>
> As a way to get to CRAN, we need some way for the package still compile and 
> install and test (although do nothing useful) even when the c++ lib is not 
> available. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5474) [C++] What version of Boost do we require now?

2019-06-03 Thread Uwe L. Korn (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854687#comment-16854687
 ] 

Uwe L. Korn commented on ARROW-5474:


For adoption reasons, it would be nice to use Ubuntu 16.04 as a baseline. This 
has Boost 1.58.

> [C++] What version of Boost do we require now?
> --
>
> Key: ARROW-5474
> URL: https://issues.apache.org/jira/browse/ARROW-5474
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Neal Richardson
>Assignee: Antoine Pitrou
>Priority: Major
> Fix For: 0.14.0
>
>
> See debugging on https://issues.apache.org/jira/browse/ARROW-5470. One 
> possible cause for that error is that the local filesystem patch increased 
> the version of boost that we actually require. The boost version (1.54 vs 
> 1.58) was one difference between failure and success. 
> Another point of confusion was that CMake reported two different versions of 
> boost at different times. 
> If we require a minimum version of boost, can we document that better, check 
> for it more accurately in the build scripts, and fail with a useful message 
> if that minimum isn't met? Or something else helpful.
> If the actual cause of the failure was something else (e.g. compiler 
> version), we should figure that out too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ARROW-5488) [R] Workaround when C++ lib not available

2019-06-03 Thread Wes McKinney (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854686#comment-16854686
 ] 

Wes McKinney edited comment on ARROW-5488 at 6/3/19 3:06 PM:
-

One possibility is to bundle the Arrow header files with the CRAN package and 
build against them, but do not include {{libarrow}} and {{libparquet}} when 
linking. When the library is loaded, the libraries must be loaded in-process 
via {{dlopen}} before loading the Rcpp extensions. The C++ libraries can be 
installed then after the fact


was (Author: wesmckinn):
One possibility is to bundle the Arrow header files with the CRAN package and 
build against them, but do not include {{libarrow}} and {{libparquet}} when 
linking. When the library is loaded, the libraries must be loaded in-process 
via {{dlopen}} before loading the Rcpp extensions

> [R] Workaround when C++ lib not available
> -
>
> Key: ARROW-5488
> URL: https://issues.apache.org/jira/browse/ARROW-5488
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Romain François
>Priority: Major
>
> As a way to get to CRAN, we need some way for the package still compile and 
> install and test (although do nothing useful) even when the c++ lib is not 
> available. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5407) [C++] Integration test Travis CI entry builds many unnecessary targets

2019-06-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5407:
--
Labels: pull-request-available  (was: )

> [C++] Integration test Travis CI entry builds many unnecessary targets
> --
>
> Key: ARROW-5407
> URL: https://issues.apache.org/jira/browse/ARROW-5407
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>
> Only the IPC and Flight integration test targets are needed to run the tests. 
> It appears that all targets including all unit tests are being built in Travis



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5488) [R] Workaround when C++ lib not available

2019-06-03 Thread Uwe L. Korn (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854674#comment-16854674
 ] 

Uwe L. Korn commented on ARROW-5488:


Would this involve compiling the C++ lib from source in that case?

> [R] Workaround when C++ lib not available
> -
>
> Key: ARROW-5488
> URL: https://issues.apache.org/jira/browse/ARROW-5488
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Romain François
>Priority: Major
>
> As a way to get to CRAN, we need some way for the package still compile and 
> install and test (although do nothing useful) even when the c++ lib is not 
> available. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-1774) [C++] Add "view" function to create zero-copy views for compatible types, if supported

2019-06-03 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854672#comment-16854672
 ] 

Antoine Pitrou commented on ARROW-1774:
---

What is meant here by "same physical memory layout"? For example, should we 
allow a view of int32 as float32? If so, it's not the same thing as casting.

> [C++] Add "view" function to create zero-copy views for compatible types, if 
> supported
> --
>
> Key: ARROW-1774
> URL: https://issues.apache.org/jira/browse/ARROW-1774
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> Similar to NumPy's {{ndarray.view}}, but with the restriction that the input 
> and output types have the same physical Arrow memory layout. This might be as 
> simple as adding a "zero copy only" option to the existing {{Cast}} kernel



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5040) [C++] ArrayFromJSON can't parse Timestamp from strings

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-5040.
---
Resolution: Duplicate

Looks like this was fixed as part of ARROW-4708

> [C++] ArrayFromJSON can't parse Timestamp from strings
> --
>
> Key: ARROW-5040
> URL: https://issues.apache.org/jira/browse/ARROW-5040
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Benjamin Kietzman
>Assignee: Benjamin Kietzman
>Priority: Minor
> Fix For: 0.14.0
>
>
> Currently, ArrayFromJSON can only produce timestamps from numbers.
> This is an impediment for writing tests for JSON and CSV, since those formats 
> parse timestamps from strings and it's not immediately obvious that 
> "2000-20-29" corresponds to 951782400



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5491) [C++] Remove unecessary semicolons following MACRO definitions

2019-06-03 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-5491:
--
Labels: pull-request-available  (was: )

> [C++] Remove unecessary semicolons following MACRO definitions
> --
>
> Key: ARROW-5491
> URL: https://issues.apache.org/jira/browse/ARROW-5491
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++
>Affects Versions: 0.13.0
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5491) [C++] Remove unecessary semicolons following MACRO definitions

2019-06-03 Thread Brian Hulette (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Hulette updated ARROW-5491:
-
Summary: [C++] Remove unecessary semicolons following MACRO definitions  
(was: Remove unecessary semicolons following MACRO definitions)

> [C++] Remove unecessary semicolons following MACRO definitions
> --
>
> Key: ARROW-5491
> URL: https://issues.apache.org/jira/browse/ARROW-5491
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++
>Affects Versions: 0.13.0
>Reporter: Brian Hulette
>Assignee: Brian Hulette
>Priority: Major
> Fix For: 0.14.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5491) Remove unecessary semicolons following MACRO definitions

2019-06-03 Thread Brian Hulette (JIRA)
Brian Hulette created ARROW-5491:


 Summary: Remove unecessary semicolons following MACRO definitions
 Key: ARROW-5491
 URL: https://issues.apache.org/jira/browse/ARROW-5491
 Project: Apache Arrow
  Issue Type: Task
  Components: C++
Affects Versions: 0.13.0
Reporter: Brian Hulette
Assignee: Brian Hulette
 Fix For: 0.14.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ARROW-5430) [Python] Can read but not write parquet partitioned on large ints

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-5430.
---
   Resolution: Fixed
Fix Version/s: 0.14.0

Issue resolved by pull request 4440
[https://github.com/apache/arrow/pull/4440]

> [Python] Can read but not write parquet partitioned on large ints
> -
>
> Key: ARROW-5430
> URL: https://issues.apache.org/jira/browse/ARROW-5430
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.13.0
> Environment: Mac OSX 10.14.4, Python 3.7.1, x86_64.
>Reporter: Robin Kåveland
>Priority: Minor
>  Labels: parquet, pull-request-available
> Fix For: 0.14.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Here's a contrived example that reproduces this issue using pandas:
> {code:java}
> import numpy as np
> import pandas as pd
> real_usernames = np.array(['anonymize', 'me'])
> usernames = pd.util.hash_array(real_usernames)
> login_count = [13, 9]
> df = pd.DataFrame({'user': usernames, 'logins': login_count})
> df.to_parquet('can_write.parq', partition_cols=['user'])
> # But not read
> pd.read_parquet('can_write.parq'){code}
> Expected behaviour:
>  * Either the write fails
>  * Or the read succeeds
> Actual behaviour: The read fails with the following error:
> {code:java}
> Traceback (most recent call last):
>   File "", line 2, in 
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pandas/io/parquet.py",
>  line 282, in read_parquet
>     return impl.read(path, columns=columns, **kwargs)
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pandas/io/parquet.py",
>  line 129, in read
>     **kwargs).to_pandas()
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pyarrow/parquet.py",
>  line 1152, in read_table
>     use_pandas_metadata=use_pandas_metadata)
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pyarrow/filesystem.py",
>  line 181, in read_parquet
>     use_pandas_metadata=use_pandas_metadata)
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pyarrow/parquet.py",
>  line 1014, in read
>     use_pandas_metadata=use_pandas_metadata)
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pyarrow/parquet.py",
>  line 587, in read
>     dictionary = partitions.levels[i].dictionary
>   File 
> "/Users/robinkh/code/venvs/datamunge/lib/python3.7/site-packages/pyarrow/parquet.py",
>  line 642, in dictionary
>     dictionary = lib.array(integer_keys)
>   File "pyarrow/array.pxi", line 173, in pyarrow.lib.array
>   File "pyarrow/array.pxi", line 36, in pyarrow.lib._sequence_to_array
>   File "pyarrow/error.pxi", line 104, in pyarrow.lib.check_status
> pyarrow.lib.ArrowException: Unknown error: Python int too large to convert to 
> C long{code}
> I set the priority to minor here because it's easy enough to work around this 
> in user code unless you really need the 64 bit hash (and you probably 
> shouldn't be partitioning on that anyway).
> I could take a stab at writing a patch for this if there's interest?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ARROW-5490) [C++] Remove ARROW_BOOST_HEADER_ONLY

2019-06-03 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-5490:
-

 Summary: [C++] Remove ARROW_BOOST_HEADER_ONLY
 Key: ARROW-5490
 URL: https://issues.apache.org/jira/browse/ARROW-5490
 Project: Apache Arrow
  Issue Type: Task
  Components: C++
Affects Versions: 0.13.0
Reporter: Antoine Pitrou


That CMake variable isn't exposed as an option and probably doesn't work 
anymore. All code paths depending on that variable should probably be 
simplified.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ARROW-5407) [C++] Integration test Travis CI entry builds many unnecessary targets

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou reassigned ARROW-5407:
-

Assignee: Antoine Pitrou

> [C++] Integration test Travis CI entry builds many unnecessary targets
> --
>
> Key: ARROW-5407
> URL: https://issues.apache.org/jira/browse/ARROW-5407
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Antoine Pitrou
>Priority: Major
> Fix For: 0.14.0
>
>
> Only the IPC and Flight integration test targets are needed to run the tests. 
> It appears that all targets including all unit tests are being built in Travis



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5190) [R] Discussion: tibble dependency in R package

2019-06-03 Thread James Lamb (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854615#comment-16854615
 ] 

James Lamb commented on ARROW-5190:
---

Thanks [~romainfrancois]!!!

> [R] Discussion: tibble dependency in R package
> --
>
> Key: ARROW-5190
> URL: https://issues.apache.org/jira/browse/ARROW-5190
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: R
>Reporter: James Lamb
>Assignee: Romain François
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Hello,
>  
> I would like to have a discussion on the use of *tibble* in the Apache Arrow 
> R package. I looked at the [the project contributor 
> guidelines|[https://github.com/apache/arrow/blob/master/docs/source/developers/contributing.rst]]
>  and could not tell where the best place might be to start a public 
> discussion on this topic, so I decided on JIRA. I apologize if this is not 
> the right place.
>  
> *TL;DR*
> I would like to propose moving the *tibble* dependency in the *arrow* R 
> package to "Suggests", removing the _as_tibble()_ in _read_arrow()_, and 
> having the core R code implementing the Arrow API only return data.frames or 
> other base-R data structures wherever possible.
>  
> *Reasoning*
> [As far as I can 
> tell|[https://github.com/apache/arrow/search?p=1=tibble_q=tibble]],
>  outside of tests and examples *tibble* is only used in three places in the 
> package:
>  * S3 methods to convert Arrow objects to tibbles 
> (_as_tibble.arrow__::__RecordBatch()_, _as.tibble.arrow::Table()_)
>  * optional "convert to tibble on the way out" behavior controlled by a flag 
> in interfaces to file types (parquet and feather)
>  * 
> [_read_arrow()_|[https://github.com/apache/arrow/blob/0536ef8174982a7a13a251174cc38701e8663b68/r/R/read_table.R#L88]]
>  
> In my opinion, all three of these uses of *tibble* are valuable for 
> developers who use that package (or other packages in its ecosystem), but I 
> am not convinced that the Arrow R package should be tightly coupled to them.
> In the Python community, *pandas* is a broadly agreed-upon standard for 
> representing data frames. Even with that ubiquity, *pyarrow* does not depend 
> on *pandas* (it is not necessary to work with it) and all "compatibility with 
> *pandas*" code is isolated in a place explicitly intended for that purpose: 
> [https://github.com/apache/arrow/blob/master/python/pyarrow/pandas_compat.py]
> I think that is the ideal handling for integration of Arrow extensions with 
> other software it might be used with. This allows users who care about only 
> one of the integrations (e.g. feather, parquet, HDFS, Apache Spark, tibble, 
> data.table, etc.) to only have to build things they're already using. 
>  
> *Other background information*
> I took the time to write this tonight after talking a colleague through the 
> issues *feather* (R package) users experienced after the *tibble 2.0* 
> release. See for example 
> [wesm/feather#374|[https://github.com/wesm/feather/issues/374]] and 
> [wesm/feather#372|[https://github.com/wesm/feather/issues/37|https://github.com/wesm/feather/issues/374]2].
>  When *tibble 2.0* came out it broke *feather 0.3.1* and the maintainers 
> there promptly released to CRAN a *feather 0.3.2* which was compatible with 
> *tibble 2.0+*. Unfortunately, this still caused disruptions for many people 
> using *feather* (who inadvertently had *tibble* upgraded as part of 
> installing other packages which depended on it). Nothing about *tibble* was 
> necessary to the implementation of _read_feather()_, as far as I can tell, 
> but this design choice made installing and upgrading *tibble* non-optional 
> for developers who just wanted to use the feather file format and all it's 
> awesome features.
>  
> If the proposal here is accepted, I hope it will mean we can prevent 
> repeating the same experience with the R *arrow* package and set a strong 
> precedent for developers who want to add compatibility in this package for 
> other members of the ecosystem like parquet or Apache Spark.
>  
>  
> Thank you for hearing me out!
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5473) [C++] Build failure on googletest_ep on Windows when using Ninja

2019-06-03 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854593#comment-16854593
 ] 

Antoine Pitrou commented on ARROW-5473:
---

I think that line is necessary to workaround a CMake bug when non-existent 
directories are referenced.

> [C++] Build failure on googletest_ep on Windows when using Ninja
> 
>
> Key: ARROW-5473
> URL: https://issues.apache.org/jira/browse/ARROW-5473
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> I consistently get this error when trying to use Ninja locally:
> {code}
> -- extracting...
>  
> src='C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/release-1.8.1.tar.gz'
>  
> dst='C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/googletest_ep'
> -- extracting... [tar xfz]
> -- extracting... [analysis]
> -- extracting... [rename]
> CMake Error at googletest_ep-stamp/extract-googletest_ep.cmake:51 (file):
>   file RENAME failed to rename
> 
> C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/ex-googletest_ep1234/googletest-release-1.8.1
>   to
> C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/googletest_ep
>   because: Directory not empty
> [179/623] Building CXX object 
> src\arrow\CMakeFiles\arrow_static.dir\array\builder_dict.cc.obj
> ninja: build stopped: subcommand failed.
> {code}
> I'm running within cmdr terminal emulator so it's conceivable there's some 
> path modifications that are causing issues.
> The CMake invocation is
> {code}
> cmake -G "Ninja" ^  -DCMAKE_BUILD_TYPE=Release ^  
> -DARROW_BUILD_TESTS=on ^  -DARROW_CXXFLAGS="/WX /MP" ^
>  -DARROW_FLIGHT=off -DARROW_PARQUET=on -DARROW_GANDIVA=ON 
> -DARROW_VERBOSE_THIRDPARTY_BUILD=on ..
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ARROW-5473) [C++] Build failure on googletest_ep on Windows when using Ninja

2019-06-03 Thread Antoine Pitrou (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16854594#comment-16854594
 ] 

Antoine Pitrou commented on ARROW-5473:
---

See e.g. https://gitlab.kitware.com/cmake/cmake/issues/15052

> [C++] Build failure on googletest_ep on Windows when using Ninja
> 
>
> Key: ARROW-5473
> URL: https://issues.apache.org/jira/browse/ARROW-5473
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 0.14.0
>
>
> I consistently get this error when trying to use Ninja locally:
> {code}
> -- extracting...
>  
> src='C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/release-1.8.1.tar.gz'
>  
> dst='C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/googletest_ep'
> -- extracting... [tar xfz]
> -- extracting... [analysis]
> -- extracting... [rename]
> CMake Error at googletest_ep-stamp/extract-googletest_ep.cmake:51 (file):
>   file RENAME failed to rename
> 
> C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/ex-googletest_ep1234/googletest-release-1.8.1
>   to
> C:/Users/wesmc/code/arrow/cpp/build/googletest_ep-prefix/src/googletest_ep
>   because: Directory not empty
> [179/623] Building CXX object 
> src\arrow\CMakeFiles\arrow_static.dir\array\builder_dict.cc.obj
> ninja: build stopped: subcommand failed.
> {code}
> I'm running within cmdr terminal emulator so it's conceivable there's some 
> path modifications that are causing issues.
> The CMake invocation is
> {code}
> cmake -G "Ninja" ^  -DCMAKE_BUILD_TYPE=Release ^  
> -DARROW_BUILD_TESTS=on ^  -DARROW_CXXFLAGS="/WX /MP" ^
>  -DARROW_FLIGHT=off -DARROW_PARQUET=on -DARROW_GANDIVA=ON 
> -DARROW_VERBOSE_THIRDPARTY_BUILD=on ..
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-2) Post Simple Website

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-2?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2:
---
Component/s: Website

> Post Simple Website
> ---
>
> Key: ARROW-2
> URL: https://issues.apache.org/jira/browse/ARROW-2
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Website
>Reporter: Jacques Nadeau
>Assignee: Jason Altekruse
>Priority: Major
> Fix For: 0.1.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-35) Add a short call-to-action / how-to-get-involved to the main README.md

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-35?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-35:

Component/s: Documentation

> Add a short call-to-action / how-to-get-involved to the main README.md
> --
>
> Key: ARROW-35
> URL: https://issues.apache.org/jira/browse/ARROW-35
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Documentation
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.1.0
>
>
> * Engage on the mailing list
> * Read the format documentation
> * Contribute code and design ideas to the reference implementations



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-13) Add PR merge tool similar to that used in Parquet

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-13?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-13:

Component/s: Developer Tools

> Add PR merge tool similar to that used in Parquet
> -
>
> Key: ARROW-13
> URL: https://issues.apache.org/jira/browse/ARROW-13
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Developer Tools
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Minor
> Fix For: 0.1.0
>
>
> See https://github.com/apache/parquet-mr/tree/master/dev



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-6) Hope to add development document

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-6?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-6:
---
Component/s: Documentation

> Hope to add development document
> 
>
> Key: ARROW-6
> URL: https://issues.apache.org/jira/browse/ARROW-6
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: Documentation
>Reporter: AllenFang
>Priority: Major
>  Labels: documentation
> Fix For: 0.3.0
>
>
> Awesome project, great job :)
> Anyway, is possible to add some useful documents for development
> Thanks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-5) Error when run maven install

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5:
---
Component/s: Java

> Error when run maven install
> 
>
> Key: ARROW-5
> URL: https://issues.apache.org/jira/browse/ARROW-5
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
> Environment: Ubuntu Maven 3.2
>Reporter: AllenFang
>Assignee: Liwei Lin(Inactive)
>Priority: Major
>  Labels: maven
> Fix For: 0.1.0
>
>
> when I run maven to install, I got following problem:
> Failed to execute goal
> org.apache.drill.tools:drill-fmpp-maven-plugin:1.4.0:generate
> (generate-fmpp) on project vector: Execution generate-fmpp of goal
> org.apache.drill.tools:drill-fmpp-maven-plugin:1.4.0:generate failed:
> Plugin org.apache.drill.tools:drill-fmpp-maven-plugin:1.4.0 or one of its
> dependencies could not be resolved: Failure to find
> org.freemarker:freemarker:jar:2.3.24-SNAPSHOT in
> http://repository.apache.org/snapshots was cached in the local repository
> btw, I just clone repo and run mvn clean install.
> dev mailing link
> http://mail-archives.apache.org/mod_mbox/arrow-dev/201602.mbox/%3CCAABsKVCSEULDTL2hoANL8-wrWMDO8%3Dgv0RFmSQMXt3MdiqUcPw%40mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-4) Initial Arrow CPP Implementation

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-4?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-4:
---
Component/s: C++

> Initial Arrow CPP Implementation
> 
>
> Key: ARROW-4
> URL: https://issues.apache.org/jira/browse/ARROW-4
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Jacques Nadeau
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.1.0
>
> Attachments: 0001-arrow-initial-cpp.patch.gz
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-8) Set up Travis CI

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-8?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-8:
---
Component/s: Continuous Integration

> Set up Travis CI
> 
>
> Key: ARROW-8
> URL: https://issues.apache.org/jira/browse/ARROW-8
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.1.0
>
>
> I will ask INFRA to enable Travis CI for the repo, and then will propose a 
> patch that runs the C++ test suite to start (unless some kind soul beats me 
> to it with a Java patch). We can use a build matrix with one build per 
> language SDK (so gcc and clang for arrow-cpp) to start. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-3) Post Initial Arrow Format Spec

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-3?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-3:
---
Component/s: Format

> Post Initial Arrow Format Spec
> --
>
> Key: ARROW-3
> URL: https://issues.apache.org/jira/browse/ARROW-3
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Format
>Reporter: Jacques Nadeau
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.1.0
>
> Attachments: 0001-arrow-format-draft.patch.gz
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-64) Add zsh support to C++ build scripts

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-64?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-64:

Component/s: Developer Tools
 C++

> Add zsh support to C++ build scripts
> 
>
> Key: ARROW-64
> URL: https://issues.apache.org/jira/browse/ARROW-64
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Developer Tools
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: 0.1.0
>
>
> All scripts that have to be sourced during development currently only support 
> bash. This patch adds zsh support.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-52) Set up project blog

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-52?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-52:

Component/s: Website

> Set up project blog
> ---
>
> Key: ARROW-52
> URL: https://issues.apache.org/jira/browse/ARROW-52
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Website
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.3.0
>
>
> I would like to be able to publish blog posts under arrow.apache.org (see, 
> for example, posts I've written recently like 
> http://blog.ibis-project.org/kudu-impala-ibis/). 
> I have a bias towards using Pelican as the publishing toolchain as posts can 
> be written in Markdown and include IPython notebooks. GitHub pages is the 
> easiest way to publish but this may not be compatible with apache.org, so 
> using rsync or some other static content publishing tool would be fine too. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-51) Move ValueVector test from Drill project

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-51?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-51:

Component/s: Java

> Move ValueVector test from Drill project
> 
>
> Key: ARROW-51
> URL: https://issues.apache.org/jira/browse/ARROW-51
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Steven Phillips
>Assignee: Steven Phillips
>Priority: Major
> Fix For: 0.1.0
>
>
> There are some simple tests that should be moved from the Drill project.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-36) Remove fixVersions from patch tool (until we have them)

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-36?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-36:

Component/s: Developer Tools

> Remove fixVersions from patch tool (until we have them)
> ---
>
> Key: ARROW-36
> URL: https://issues.apache.org/jira/browse/ARROW-36
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Developer Tools
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.1.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-46) Port DRILL-4410 to Arrow

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-46?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-46:

Component/s: Java

> Port DRILL-4410 to Arrow
> 
>
> Key: ARROW-46
> URL: https://issues.apache.org/jira/browse/ARROW-46
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Steven Phillips
>Assignee: Steven Phillips
>Priority: Major
> Fix For: 0.1.0
>
>
> This fixes a bug in ListVector which causes OversizeAllocation



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-102) travis-ci support for java project

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-102:
-
Component/s: Java
 Continuous Integration

> travis-ci support for java project
> --
>
> Key: ARROW-102
> URL: https://issues.apache.org/jira/browse/ARROW-102
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Continuous Integration, Java
>Reporter: Laurent Goujon
>Assignee: Laurent Goujon
>Priority: Minor
> Fix For: 0.1.0
>
>
> The java part of the Arrow project has no automated build using travis-ci, 
> unlike c++.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-101) Fix java warnings emitted by java compiler

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-101:
-
Component/s: Java

> Fix java warnings emitted by java compiler
> --
>
> Key: ARROW-101
> URL: https://issues.apache.org/jira/browse/ARROW-101
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Java
>Reporter: Laurent Goujon
>Assignee: Laurent Goujon
>Priority: Trivial
> Fix For: 0.1.0
>
>
> Java compiler emits several warnings regarding the use of rawtypes and 
> unclosed resources on a few classes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-85) C++: memcmp can be avoided in Equal when comparing with the same Buffer

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-85:

Component/s: C++

> C++: memcmp can be avoided in Equal when comparing with the same Buffer
> ---
>
> Key: ARROW-85
> URL: https://issues.apache.org/jira/browse/ARROW-85
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Kai Zheng
>Assignee: Kai Zheng
>Priority: Major
> Fix For: 0.1.0
>
>
> It looks too expensive to use memcmp to compare two buffers. Instead, the 
> starting address and length/capacity would be good enough to use. Higher 
> level codes relying on memcmp behaviour can be done in higher level.
> Update: memcmp should be avoided in Equal when comparing with the same 
> Buffer. In other cases, it's still needed to know the content are the same or 
> not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-84) C++: separate test codes

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-84?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-84:

Component/s: C++

> C++: separate test codes
> 
>
> Key: ARROW-84
> URL: https://issues.apache.org/jira/browse/ARROW-84
> Project: Apache Arrow
>  Issue Type: Test
>  Components: C++
>Reporter: Kai Zheng
>Priority: Major
> Fix For: 0.1.0
>
>
> Currently test codes reside with normal codes together. Not sure if it's a 
> good practice in C++, but guess it would be much clean to separate the test 
> codes out into a {{test}} folder.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-103) Missing patterns from .gitignore

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-103:
-
Component/s: Developer Tools

> Missing patterns from .gitignore
> 
>
> Key: ARROW-103
> URL: https://issues.apache.org/jira/browse/ARROW-103
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Developer Tools
>Reporter: Dan Robinson
>Assignee: Dan Robinson
>Priority: Minor
> Fix For: 0.1.0
>
>
> There are some build files created on at least my platform (such as 
> libpyarrow.dylib) that aren't covered by any of the patterns in the 
> .gitignore files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-95) Scaffold Main Documentation using asciidoc

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-95?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-95:

Component/s: Documentation

> Scaffold Main Documentation using asciidoc
> --
>
> Key: ARROW-95
> URL: https://issues.apache.org/jira/browse/ARROW-95
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Documentation
>Reporter: Uwe L. Korn
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.3.0
>
>
> For the general documentation of Arrow, we want to use asciidoc. The "general 
> documentation" includes:
>  * The Arrow spec / memory layout
>  * Howtos for building arrow on different platforms
>  * Getting Started snippets for each language and a link to the (to-be-build) 
> API documentation
> It would be nice to have a build system and the main file/folder structure in 
> place so we can split the work up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-213) Exposing static arrow build

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-213:
-
Component/s: C++

> Exposing static arrow build
> ---
>
> Key: ARROW-213
> URL: https://issues.apache.org/jira/browse/ARROW-213
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Philipp Moritz
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.1.0
>
>   Original Estimate: 10m
>  Remaining Estimate: 10m
>
> I'd like to be able to link arrow statically into my application.
> At the moment, arrow can be built as a static library using the 
> 'LIBARROW_LINKAGE' variable in CMakeLists.txt. I'd like to configure this 
> behavior from the command line. Are there any objections of exposing the 
> variable as a cached variable?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-259) Use flatbuffer fields in java implementation

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-259:
-
Component/s: Java

> Use flatbuffer fields in java implementation
> 
>
> Key: ARROW-259
> URL: https://issues.apache.org/jira/browse/ARROW-259
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Steven Phillips
>Assignee: Steven Phillips
>Priority: Major
> Fix For: 0.1.0
>
>
> The value vectors in the java implementation should switch to using the Field 
> and types as defined in the flatbuffer spec.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-217) Fix Travis w.r.t conda 4.1.0 changes

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-217:
-
Component/s: Continuous Integration

> Fix Travis w.r.t conda 4.1.0 changes
> 
>
> Key: ARROW-217
> URL: https://issues.apache.org/jira/browse/ARROW-217
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
> Fix For: 0.1.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-218) Add option to use GitHub API token via environment variable when merging PRs

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-218:
-
Component/s: Developer Tools

> Add option to use GitHub API token via environment variable when merging PRs
> 
>
> Key: ARROW-218
> URL: https://issues.apache.org/jira/browse/ARROW-218
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Developer Tools
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>Priority: Major
> Fix For: 0.1.0
>
>
> While the patch tool only requires public repo access, on shared networks, 
> the GitHub API rate limit may be exceeded for unauthenticated requests. This 
> patch will add an option to use a GitHub personal access token to authenticate



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-205) builds failing on master branch with apt-get error

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-205:
-
Component/s: Continuous Integration

> builds failing on master branch with apt-get error
> --
>
> Key: ARROW-205
> URL: https://issues.apache.org/jira/browse/ARROW-205
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration
>Reporter: Micah Kornfield
>Assignee: Micah Kornfield
>Priority: Blocker
>  Labels: ci-failure
> Fix For: 0.1.0
>
>
> Logs from: https://travis-ci.org/apache/arrow/jobs/131207432
> 0.50s$ sudo -E apt-get -yq --no-install-suggests --no-install-recommends 
> --force-yes install clang-format-3.7 clang-tidy-3.7 gcc-4.9 g++-4.9 gdb 
> ccache cmake valgrind
> Reading package lists...
> Building dependency tree...
> Reading state information...
> E: Unable to locate package g++-4.9
> E: Couldn't find any package by regex 'g++-4.9'
> apt-get.diagnostics
> apt-get install failed
> $ cat ~/apt-get-update.log
> Get:1 http://downloads-distro.mongodb.org dist Release.gpg [490 B]
> Hit http://us.archive.ubuntu.com precise Release.gpg
> Get:2 http://us.archive.ubuntu.com precise-updates Release.gpg [198 B]
> Get:3 http://downloads-distro.mongodb.org dist Release [2,040 B]
> Get:4 http://us.archive.ubuntu.com precise-backports Release.gpg [198 B]
> Hit http://us.archive.ubuntu.com precise Release
> Get:5 http://downloads-distro.mongodb.org dist/10gen amd64 Packages [30.9 kB]
> Get:6 http://us.archive.ubuntu.com precise-updates Release [55.4 kB]
> Hit http://ppa.launchpad.net precise Release.gpg
> Get:7 http://security.ubuntu.com precise-security Release.gpg [198 B]
> Get:8 http://downloads-distro.mongodb.org dist/10gen i386 Packages [30.5 kB]
> Get:9 http://us.archive.ubuntu.com precise-backports Release [55.5 kB]
> Hit http://ppa.launchpad.net precise Release.gpg
> Get:10 http://security.ubuntu.com precise-security Release [55.5 kB]
> Hit http://us.archive.ubuntu.com precise/main Sources
> Ign http://downloads-distro.mongodb.org dist/10gen TranslationIndex
> Hit http://us.archive.ubuntu.com precise/universe Sources
> Get:11 http://ppa.launchpad.net precise Release.gpg [316 B]
> Hit http://us.archive.ubuntu.com precise/multiverse Sources
> Hit http://us.archive.ubuntu.com precise/main amd64 Packages
> Hit http://us.archive.ubuntu.com precise/universe amd64 Packages
> Get:12 http://ppa.launchpad.net precise Release.gpg [316 B]
> Hit http://us.archive.ubuntu.com precise/multiverse amd64 Packages
> Hit http://us.archive.ubuntu.com precise/main i386 Packages
> Hit http://us.archive.ubuntu.com precise/universe i386 Packages
> Hit http://ppa.launchpad.net precise Release.gpg
> Hit http://us.archive.ubuntu.com precise/multiverse i386 Packages
> Get:13 http://security.ubuntu.com precise-security/main Sources [142 kB]
> Hit http://us.archive.ubuntu.com precise/main TranslationIndex
> Get:14 http://ppa.launchpad.net precise Release.gpg [316 B]
> Hit http://us.archive.ubuntu.com precise/multiverse TranslationIndex
> Hit http://us.archive.ubuntu.com precise/universe TranslationIndex
> Hit http://ppa.launchpad.net precise Release
> Get:15 http://us.archive.ubuntu.com precise-updates/main Sources [496 kB]
> Get:16 http://security.ubuntu.com precise-security/universe Sources [48.5 kB]
> Hit http://ppa.launchpad.net precise Release
> Get:17 http://us.archive.ubuntu.com precise-updates/universe Sources [127 kB]
> Get:18 http://security.ubuntu.com precise-security/multiverse Sources [2,721 
> B]
> Get:19 http://us.archive.ubuntu.com precise-updates/multiverse Sources [10.2 
> kB]
> Get:20 http://us.archive.ubuntu.com precise-updates/main amd64 Packages [989 
> kB]
> Get:21 http://ppa.launchpad.net precise Release [12.9 kB]
> Get:22 http://security.ubuntu.com precise-security/main amd64 Packages [607 
> kB]
> Get:23 http://us.archive.ubuntu.com precise-updates/universe amd64 Packages 
> [276 kB]
> Get:24 http://us.archive.ubuntu.com precise-updates/multiverse amd64 Packages 
> [16.9 kB]
> Get:25 http://ppa.launchpad.net precise Release [12.9 kB]
> Get:26 http://us.archive.ubuntu.com precise-updates/main i386 Packages [1,051 
> kB]
> Get:27 http://us.archive.ubuntu.com precise-updates/universe i386 Packages 
> [286 kB]
> Hit http://ppa.launchpad.net precise Release
> Get:28 http://us.archive.ubuntu.com precise-updates/multiverse i386 Packages 
> [17.1 kB]
> Get:29 http://us.archive.ubuntu.com precise-updates/main TranslationIndex 
> [208 B]
> Get:30 http://us.archive.ubuntu.com precise-updates/multiverse 
> TranslationIndex [202 B]
> Get:31 http://ppa.launchpad.net precise Release [13.0 kB]
> Get:32 http://us.archive.ubuntu.com precise-updates/universe TranslationIndex 
> [205 B]
> Get:33 http://security.ubuntu.com 

[jira] [Updated] (ARROW-265) Negative decimal values have wrong padding

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-265:
-
Component/s: Java

> Negative decimal values have wrong padding
> --
>
> Key: ARROW-265
> URL: https://issues.apache.org/jira/browse/ARROW-265
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Steven Phillips
>Assignee: Steven Phillips
>Priority: Major
> Fix For: 0.1.0
>
>
> Pad negative values with 1 and not 0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-269) UnionVector getBuffers method does not include typevector

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-269:
-
Component/s: Java

> UnionVector getBuffers method does not include typevector
> -
>
> Key: ARROW-269
> URL: https://issues.apache.org/jira/browse/ARROW-269
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Steven Phillips
>Assignee: Steven Phillips
>Priority: Major
> Fix For: 0.7.0
>
>
> Only the interMapVecgtor's buffers are returned currently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ARROW-264) Create an Arrow File format

2019-06-03 Thread Antoine Pitrou (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-264:
-
Component/s: Format

> Create an Arrow File format
> ---
>
> Key: ARROW-264
> URL: https://issues.apache.org/jira/browse/ARROW-264
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Format
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
>Priority: Major
> Fix For: 0.1.0
>
>
> File layout:
> (DictionaryBatch, RecordBatch, Schema as defined in Message.fbs)
> {noformat}
> MAGIC:   ARROW1
> (
> DictionaryBatch:  DictionaryBatch Header (FlatBuffer)
> DictionaryBatch: DictionaryBatch Body (buffers concatenated)
> )*
> (
> RecordBacth: RecordBatch Header (FlatBuffer)
> RecordBacth: RecordBatch Body (buffers concatenated)
> )+
> Footer: Flatbuffer
> Footer length: int (4 bytes unsigned LE)
> MAGIC: ARROW1
> {noformat}
> Footer definition:
> {noformat}
> table Footer {
>   schema: org.apache.arrow.flatbuf.Schema;
>   dictionaries: [ Block ];
>   recordBatches: [ Block ];
> }
> struct Block {
>   offset: long;
>   metaDataLength: int;
>   bodyLength: long;
> }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   4   5   >