Re: Arrow Flight connector for SQL Server
Hey Brendan, Welcome to the community. At Dremio we've exposed flight as an input and output for sql result datasets. I'll have one of our guys share some details. I think a couple questions we've been struggling with include how to standardize additional metadata operations, what should the prepare behavior be and and is there is a way to stadarize exposure of a flight path as an extension of both jdbc and odbc. Can you share more about whether you're initially more focused on input or output and parallel or single stream? Thanks and welcome Jacques On Tue, May 19, 2020, 3:17 PM Brendan Niebruegge wrote: > Hi everyone, > > I wanted to informally introduce myself. My name is Brendan Niebruegge, > I'm a Software Engineer in our SQL Server extensibility team here at > Microsoft. I am leading an effort to explore how we could integrate Arrow > Flight with SQL Server. We think this could be a very interesting > integration that would both benefit SQL Server and the Arrow community. We > are very early in our thoughts so I thought it best to reach out here and > see if you had any thoughts or suggestions for me. What would be the best > way to socialize my thoughts to date? I am keen to learn and deepen my > knowledge of Arrow as well so please let me know how I can be of help to > the community. > > Please feel free to reach out anytime (email:brn...@microsoft.com) > > Thanks, > Brendan Niebruegge > >
[jira] [Created] (ARROW-8869) [Rust] [DataFusion] Type Coercion optimizer rule does not support new scan nodes
Andy Grove created ARROW-8869: - Summary: [Rust] [DataFusion] Type Coercion optimizer rule does not support new scan nodes Key: ARROW-8869 URL: https://issues.apache.org/jira/browse/ARROW-8869 Project: Apache Arrow Issue Type: Bug Components: Rust, Rust - DataFusion Affects Versions: 1.0.0 Reporter: Andy Grove Fix For: 1.0.0 Type Coercion optimizer rule does not support new scan nodes -- This message was sent by Atlassian Jira (v8.3.4#803005)
[Rust] Vectorized traits for using arrays outside Arrow
Hi; I wanted to discuss with Rust lib maintainers about how can we improve the current status of Rust's DictionaryArray and reading its' encoding array outside the Arrow. So a simple predicate filter needs to collect index over iterator and flat map over the optional values or map over the None values and replace with sentinel values. Thou iterator is written nearly frictionless and overheadless by the implementor. (Congrats it looks nice!) Still there is an overhead of iterator and yielding of elements inside iterator implementation. So I propose a simple trait called "Vectorized" which will allow us to dispense arrays with defined type with requested sentinels. This approach will work zerocopy and will use underlying Buffer type. I am eagerly waiting for your input and I would like to clarify more if needed. Best, Mahmut
[jira] [Created] (ARROW-8868) [Python] Feather format cannot store/retrieve lists correctly?
Farzad Abdolhosseini created ARROW-8868: --- Summary: [Python] Feather format cannot store/retrieve lists correctly? Key: ARROW-8868 URL: https://issues.apache.org/jira/browse/ARROW-8868 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.17.1 Environment: Python 3.8.2 PyArrow 0.17.1 Pandas 1.0.3 Linux (Manjaro) Reporter: Farzad Abdolhosseini I'm seeing a very weird behavior when I try to store and retrieve a Pandas data-frame using the Feather format. Simplified example: {code:python} >>> import pandas as pd >>> df = pd.DataFrame(data={"scalar": [1, 2], "array": [[1], [7]]}) >>> df scalar array 0 1 [1] 1 2 [7] >>> df.to_feather("test.ft") >>> pd.read_feather("test.ft") scalar array 0 1 [16] 1 2 [1045468844972122628] {code} As you can see, the retrieved data is incorrect. I was originally trying to use the `feather-format` (not using Pandas directly) and that didn't work well either. By playing around with the data-frame that is to be stored I can also get different but still incorrect behavior, e.g. a larger list, an error that says the file size is incorrect, or simply a segmentation fault. This is my first time using Feather/Arrow BTW. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [VOTE] Release Apache Arrow 0.17.1 - RC1
R submission to CRAN is done and accepted. I'm waiting to do Homebrew until after the website update, given their pushback last time. Neal On Tue, May 19, 2020 at 5:25 AM Uwe L. Korn wrote: > Current status: > > 1. [done] rebase (not required for a patch release) > 2. [done] upload source > 3. [done] upload binaries > 4. [done|in-pr] update website > 5. [done] upload ruby gems > 6. [ ] upload js packages > 8. [done] upload C# packages > 9. [ ] upload rust crates > 10. [done] update conda recipes (dropped ppc64le support though) > 11. [done] upload wheels to pypi > 12. [nealrichardson] update homebrew packages > 13. [done] update maven artifacts > 14. [done|in-pr] update msys2 > 15. [nealrichardson] update R packages > 16. [done|in-pr] update docs > > On Tue, May 19, 2020, at 12:06 AM, Krisztián Szűcs wrote: > > Current status: > > > > 1. [done] rebase (not required for a patch release) > > 2. [done] upload source > > 3. [done] upload binaries > > 4. [done|in-pr] update website > > 5. [done] upload ruby gems > > 6. [ ] upload js packages > > 8. [done] upload C# packages > > 9. [ ] upload rust crates > > 10. [in-progress|in-pr] update conda recipes > > 11. [done] upload wheels to pypi > > 12. [nealrichardson] update homebrew packages > > 13. [done] update maven artifacts > > 14. [done|in-pr] update msys2 > > 15. [nealrichardson] update R packages > > 16. [done|in-pr] update docs > > > > On Mon, May 18, 2020 at 11:33 PM Sutou Kouhei > wrote: > > > > > > >> 14. [ ] update msys2 > > > > > > > > I'll do this. > > > > > > Oh, sorry. Krisztián already did! > > > > > > In <20200519.062731.1037230979568376433@clear-code.com> > > > "Re: [VOTE] Release Apache Arrow 0.17.1 - RC1" on Tue, 19 May 2020 > 06:27:31 +0900 (JST), > > > Sutou Kouhei wrote: > > > > > > >> 14. [ ] update msys2 > > > > > > > > I'll do this. > > > > > > > > In < > cahm19a4wsm3hksf0ubixonu4ru+951viuuavdnzky_tynx-...@mail.gmail.com> > > > > "Re: [VOTE] Release Apache Arrow 0.17.1 - RC1" on Mon, 18 May 2020 > 22:37:50 +0200, > > > > Krisztián Szűcs wrote: > > > > > > > >> 1. [done] rebase (not required for a patch release) > > > >> 2. [done] upload source > > > >> 3. [done] upload binaries > > > >> 4. [done] update website > > > >> 5. [done] upload ruby gems > > > >> 6. [ ] upload js packages > > > >> No javascript changes were applied to the patch release, for > > > >> consistency we might want to choose to upload a 0.17.1 release > though. > > > >> 8. [done] upload C# packages > > > >> 9. [ ] upload rust crates > > > >> @Andy Grove the patch release doesn't affect the rust > implementation. > > > >> We can update the crates despite that no changes were made, not sure > > > >> what policy should we choose here (same as with JS) > > > >> 10. [ ] update conda recipes > > > >> @Uwe Korn seems like arrow-cpp-feedstock have not picked up the new > > > >> release once again > > > >> 11. [done] upload wheels to pypi > > > >> 12. [nealrichardson] update homebrew packages > > > >> 13. [done] update maven artifacts > > > >> 14. [ ] update msys2 > > > >> 15. [nealrichardson] update R packages > > > >> 16. [in-progress] update docs > > > >> > > > >> On Mon, May 18, 2020 at 10:29 PM Krisztián Szűcs > > > >> wrote: > > > >>> > > > >>> Current status: > > > >>> > > > >>> 1. [done] rebase (not required for a patch release) > > > >>> 2. [done] upload source > > > >>> 3. [done] upload binaries > > > >>> 4. [done] update website > > > >>> 5. [ ] upload ruby gems > > > >>> 6. [ ] upload js packages > > > >>> 8. [ ] upload C# packages > > > >>> 9. [ ] upload rust crates > > > >>> 10. [ ] update conda recipes > > > >>> 11. [done] upload wheels to pypi > > > >>> 12. [nealrichardson] update homebrew packages > > > >>> 13. [done] update maven artifacts > > > >>> 14. [ ] update msys2 > > > >>> 15. [nealrichardson] update R packages > > > >>> 16. [in-progress] update docs > > > >>> > > > >>> On Mon, May 18, 2020 at 9:39 PM Neal Richardson > > > >>> wrote: > > > >>> > > > > >>> > I'm working on the R stuff and can do Homebrew again. > > > >>> > > > > >>> > Neal > > > >>> > > > > >>> > On Mon, May 18, 2020 at 12:30 PM Krisztián Szűcs < > szucs.kriszt...@gmail.com> > > > >>> > wrote: > > > >>> > > > > >>> > > Any help with the post release tasks is welcome! > > > >>> > > > > > >>> > > Checklist: > > > >>> > > 1. [done] rebase (not required for a patch release) > > > >>> > > 2. [done] upload source > > > >>> > > 3. [in-progress] upload binaries > > > >>> > > 4. [done] update website > > > >>> > > 5. [ ] upload ruby gems > > > >>> > > 6. [ ] upload js packages > > > >>> > > 8. [ ] upload C# packages > > > >>> > > 9. [ ] upload rust crates > > > >>> > > 10. [ ] update conda recipes > > > >>> > > 11. [kszucs] upload wheels to pypi > > > >>> > > 12. [ ] update homebrew packages > > > >>> > > 13. [kszucs] update maven artifacts > > > >>> > > 14. [ ] update msys2 > > > >>> > > 15. [ ] update R packages > > > >>> > > 16.
Arrow Flight connector for SQL Server
Hi everyone, I wanted to informally introduce myself. My name is Brendan Niebruegge, I'm a Software Engineer in our SQL Server extensibility team here at Microsoft. I am leading an effort to explore how we could integrate Arrow Flight with SQL Server. We think this could be a very interesting integration that would both benefit SQL Server and the Arrow community. We are very early in our thoughts so I thought it best to reach out here and see if you had any thoughts or suggestions for me. What would be the best way to socialize my thoughts to date? I am keen to learn and deepen my knowledge of Arrow as well so please let me know how I can be of help to the community. Please feel free to reach out anytime (email:brn...@microsoft.com) Thanks, Brendan Niebruegge
[jira] [Created] (ARROW-8866) [C++] Split Type::UNION into Type::SPARSE_UNION and Type::DENSE_UNION
Wes McKinney created ARROW-8866: --- Summary: [C++] Split Type::UNION into Type::SPARSE_UNION and Type::DENSE_UNION Key: ARROW-8866 URL: https://issues.apache.org/jira/browse/ARROW-8866 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Wes McKinney Fix For: 1.0.0 Similar to the recent {{Type::INTERVAL}} split, having these two array types which have different memory layouts under the same {{Type::type}} value makes function dispatch somewhat more complicated. This issue is less critical from INTERVAL so this may not be urgent but seems like a good pre-10 change -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8865) windows distribution for 0.17.1 seems broken (conda only?
Maarten Breddels created ARROW-8865: --- Summary: windows distribution for 0.17.1 seems broken (conda only? Key: ARROW-8865 URL: https://issues.apache.org/jira/browse/ARROW-8865 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.17.1 Reporter: Maarten Breddels We just started seeing issues with importing pyarrow on our CI: [https://github.com/vaexio/vaex/pull/749/checks?check_run_id=689857401] Long logs, the issue appears here: > import pyarrow._parquet as _parquet [2541|https://github.com/vaexio/vaex/pull/749/checks?check_run_id=689857401#step:15:2541]E ImportError: DLL load failed: The specified procedure could not be found. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8864) [R] Add methods to Table/RecordBatch for consistency with data.frame
Neal Richardson created ARROW-8864: -- Summary: [R] Add methods to Table/RecordBatch for consistency with data.frame Key: ARROW-8864 URL: https://issues.apache.org/jira/browse/ARROW-8864 Project: Apache Arrow Issue Type: New Feature Components: R Reporter: Neal Richardson Assignee: Neal Richardson Fix For: 1.0.0 Some methods identified in the Feather package test suite -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8862) NumericBuilder does not use MemoryPool passed to CTOR
Simon Watts created ARROW-8862: -- Summary: NumericBuilder does not use MemoryPool passed to CTOR Key: ARROW-8862 URL: https://issues.apache.org/jira/browse/ARROW-8862 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 0.15.0 Reporter: Simon Watts {{NumericBuilder}} uses the {{pool}} ({{MemoryPool*}}) parameter to initialise the {{ArrayBuilder}} base class, but does not use it to initialise its own internal builder, {{data_builder_}} ({{TypedBufferBuilder}}). For comparison {{ArrayBuilder}} uses the {{pool}} to initialise its own {{null_bitmap_builder_}} member (also a {{TypedBufferBuilder}}). Found in version 0.15.0, present in current head. This effect was observed when trying to switch to a custom {{MemoryPool}} for performance reasons. A hook was used to detect any use of the {{MemoryPool}} proved by {{default_memory_pool()}}. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Sparse Union format
Hi Ryan, In addition to the limitations mentioned above another one is only 1 column of each type that can participate in the union. There are some old threads on these differences on the mailing list that should be searchable. Thanks, Micah On Tue, May 19, 2020 at 6:44 AM Antoine Pitrou wrote: > > Also, you may want to run the integration tests and inspect the > generated JSON file for union data, it will probably be informative > (look for type ids). > > Regards > > Antoine. > > > Le 19/05/2020 à 15:38, Ryan Murray a écrit : > > Thanks for the clarification! Next time I will read the whole document > ;-) > > > > On Tue, May 19, 2020 at 2:38 PM Antoine Pitrou > wrote: > > > >> > >> As explained in the comment below: > >> https://github.com/apache/arrow/blob/master/format/Schema.fbs#L91 > >> > >> Regards > >> > >> Antoine. > >> > >> > >> Le 19/05/2020 à 14:14, Ryan Murray a écrit : > >>> Thanks Antoine, > >>> > >>> Can you just clarify what you mean by 'type ids are logical'? In my > mind > >>> type ids are strongly coupled to the types and their order in > Schema.fbs > >>> [1]. Do you mean that the order there is only a convention and we can't > >>> assume that 0 === Null? > >>> > >>> Best, > >>> Ryan > >>> > >>> [1] https://github.com/apache/arrow/blob/master/format/Schema.fbs#L235 > >>> > >>> On Tue, May 19, 2020 at 2:04 PM Antoine Pitrou > >> wrote: > >>> > > Le 19/05/2020 à 13:43, Ryan Murray a écrit : > > Hey All, > > > > While working on https://issues.apache.org/jira/browse/ARROW-1692 I > noticed > > that there is a difference between C++ and Java on the way Sparse > >> Unions > > are handled. I haven't seen in the format spec which the correct is > so > >> I > > wanted to check with the wider community. > > > > c++ (and the integration tests) see sparse unions as: > > name > > count > > VALIDITY[] > > TYPE_ID[] > > children[] > > > > and java as: > > name > > count > > TYPE[] > > children[] > > > > The precise names may only be important for json reading/writing in > the > > integration tests so I will ignore TYPE/TYPE_ID for now. However, the > >> big > > difference is that Java doesn't have a validity buffer and c++ does. > My > > understanding is thta technically the validity buffer is redundant (0 > type > > == NULL) so I can see why Java would omit it. My question is then: > >> which > > language is 'correct'? > > Union type ids are logical, so 0 could very well be a valid type id. > You can't assume that type 0 means a null entry. > > Regards > > Antoine. > > >>> > >> > > >
[jira] [Created] (ARROW-8861) Memory not released until Plasma process is killed
Chengxin Ma created ARROW-8861: -- Summary: Memory not released until Plasma process is killed Key: ARROW-8861 URL: https://issues.apache.org/jira/browse/ARROW-8861 Project: Apache Arrow Issue Type: Bug Components: C++ - Plasma Affects Versions: 0.16.0 Environment: Singularity container (Ubuntu 18.04) Reporter: Chengxin Ma Invoking the {{Delete(const ObjectID& object_id)}} method of a plasma client seems not really to free up the memory used by the object. To reproduce: 1. use {{htop}} (or other similar tools) to monitor memory usage; 2. start up the Plasma Object Store by {{plasma_store -m 10 -s /tmp/plasma}}; 3. use {{put.py}} to put an object into Plasma; 4. compile and run {{delete.cc}} ({{g++ delete.cc `pkg-config --cflags --libs arrow plasma` --std=c++11 -o delete}}); 5. kill the {{plasma_store}} process. Memory usage drops at Step 5, rather than Step 4. How to free up the memory while keeping Plasma Object Store running? {{put.py}}: {code:java} from pyarrow import plasma if __name__ == "__main__": client = plasma.connect("/tmp/plasma") object_id = plasma.ObjectID(20 * b"a") object_size = 5 buffer = memoryview(client.create(object_id, object_size)) for i in range(5): buffer[i] = i % 128 client.seal(object_id) client.disconnect() {code} {{delete.cc}}: {code:java} #include "arrow/util/logging.h" #include using namespace plasma; int main(int argc, char **argv) { PlasmaClient client; ARROW_CHECK_OK(client.Connect("/tmp/plasma")); ObjectID object_id = ObjectID::from_binary(""); client.Delete(object_id); ARROW_CHECK_OK(client.Disconnect()); } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8860) [C++] Compressed Feather file with struct array roundtrips incorrectly
Joris Van den Bossche created ARROW-8860: Summary: [C++] Compressed Feather file with struct array roundtrips incorrectly Key: ARROW-8860 URL: https://issues.apache.org/jira/browse/ARROW-8860 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Joris Van den Bossche When writing a table with a Struct typed column, this is read back with garbage values when using compression (which is the default): {code:python} >>> table = pa.table({'col': pa.StructArray.from_arrays([[0,1,2], [1,2,3]], >>> names=["f1", "f2"])}) >>> table.column("col") [ -- is_valid: all not null -- child 0 type: int64 [ 0, 1, 2 ] -- child 1 type: int64 [ 1, 2, 3 ] ] # roundtrip through feather >>> feather.write_feather(table, "test_struct.feather") >>> table2 = feather.read_table("test_struct.feather") >>> table2.column("col") [ -- is_valid: all not null -- child 0 type: int64 [ 24, 1261641627085906436, 1369095386551025664 ] -- child 1 type: int64 [ 24, 1405756815161762308, 281479842103296 ] ] {code} When not using compression, it is read back correctly: {code:python} >>> feather.write_feather(table, "test_struct.feather", >>> compression="uncompressed") >>> >>> table2 = feather.read_table("test_struct.feather") >>> >>> >>> table2.column("col") >>> >>> [ -- is_valid: all not null -- child 0 type: int64 [ 0, 1, 2 ] -- child 1 type: int64 [ 1, 2, 3 ] ] {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Sparse Union format
Also, you may want to run the integration tests and inspect the generated JSON file for union data, it will probably be informative (look for type ids). Regards Antoine. Le 19/05/2020 à 15:38, Ryan Murray a écrit : > Thanks for the clarification! Next time I will read the whole document ;-) > > On Tue, May 19, 2020 at 2:38 PM Antoine Pitrou wrote: > >> >> As explained in the comment below: >> https://github.com/apache/arrow/blob/master/format/Schema.fbs#L91 >> >> Regards >> >> Antoine. >> >> >> Le 19/05/2020 à 14:14, Ryan Murray a écrit : >>> Thanks Antoine, >>> >>> Can you just clarify what you mean by 'type ids are logical'? In my mind >>> type ids are strongly coupled to the types and their order in Schema.fbs >>> [1]. Do you mean that the order there is only a convention and we can't >>> assume that 0 === Null? >>> >>> Best, >>> Ryan >>> >>> [1] https://github.com/apache/arrow/blob/master/format/Schema.fbs#L235 >>> >>> On Tue, May 19, 2020 at 2:04 PM Antoine Pitrou >> wrote: >>> Le 19/05/2020 à 13:43, Ryan Murray a écrit : > Hey All, > > While working on https://issues.apache.org/jira/browse/ARROW-1692 I noticed > that there is a difference between C++ and Java on the way Sparse >> Unions > are handled. I haven't seen in the format spec which the correct is so >> I > wanted to check with the wider community. > > c++ (and the integration tests) see sparse unions as: > name > count > VALIDITY[] > TYPE_ID[] > children[] > > and java as: > name > count > TYPE[] > children[] > > The precise names may only be important for json reading/writing in the > integration tests so I will ignore TYPE/TYPE_ID for now. However, the >> big > difference is that Java doesn't have a validity buffer and c++ does. My > understanding is thta technically the validity buffer is redundant (0 type > == NULL) so I can see why Java would omit it. My question is then: >> which > language is 'correct'? Union type ids are logical, so 0 could very well be a valid type id. You can't assume that type 0 means a null entry. Regards Antoine. >>> >> >
Re: Sparse Union format
Thanks for the clarification! Next time I will read the whole document ;-) On Tue, May 19, 2020 at 2:38 PM Antoine Pitrou wrote: > > As explained in the comment below: > https://github.com/apache/arrow/blob/master/format/Schema.fbs#L91 > > Regards > > Antoine. > > > Le 19/05/2020 à 14:14, Ryan Murray a écrit : > > Thanks Antoine, > > > > Can you just clarify what you mean by 'type ids are logical'? In my mind > > type ids are strongly coupled to the types and their order in Schema.fbs > > [1]. Do you mean that the order there is only a convention and we can't > > assume that 0 === Null? > > > > Best, > > Ryan > > > > [1] https://github.com/apache/arrow/blob/master/format/Schema.fbs#L235 > > > > On Tue, May 19, 2020 at 2:04 PM Antoine Pitrou > wrote: > > > >> > >> Le 19/05/2020 à 13:43, Ryan Murray a écrit : > >>> Hey All, > >>> > >>> While working on https://issues.apache.org/jira/browse/ARROW-1692 I > >> noticed > >>> that there is a difference between C++ and Java on the way Sparse > Unions > >>> are handled. I haven't seen in the format spec which the correct is so > I > >>> wanted to check with the wider community. > >>> > >>> c++ (and the integration tests) see sparse unions as: > >>> name > >>> count > >>> VALIDITY[] > >>> TYPE_ID[] > >>> children[] > >>> > >>> and java as: > >>> name > >>> count > >>> TYPE[] > >>> children[] > >>> > >>> The precise names may only be important for json reading/writing in the > >>> integration tests so I will ignore TYPE/TYPE_ID for now. However, the > big > >>> difference is that Java doesn't have a validity buffer and c++ does. My > >>> understanding is thta technically the validity buffer is redundant (0 > >> type > >>> == NULL) so I can see why Java would omit it. My question is then: > which > >>> language is 'correct'? > >> > >> Union type ids are logical, so 0 could very well be a valid type id. > >> You can't assume that type 0 means a null entry. > >> > >> Regards > >> > >> Antoine. > >> > > >
[jira] [Created] (ARROW-8859) [Rust] [Integration Testing] Implement --quiet / verbose correctly
Andy Grove created ARROW-8859: - Summary: [Rust] [Integration Testing] Implement --quiet / verbose correctly Key: ARROW-8859 URL: https://issues.apache.org/jira/browse/ARROW-8859 Project: Apache Arrow Issue Type: Sub-task Components: Rust Reporter: Andy Grove Assignee: Andy Grove Fix For: 1.0.0 The Rust tester has verbose=true hard-coded for now. {{archery --quiet}}, RustTester should receive a {{quiet: Bool}} via [kwargs|https://github.com/apache/arrow/blob/master/dev/archery/archery/integration/runner.py#L335] somehwere. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Sparse Union format
As explained in the comment below: https://github.com/apache/arrow/blob/master/format/Schema.fbs#L91 Regards Antoine. Le 19/05/2020 à 14:14, Ryan Murray a écrit : > Thanks Antoine, > > Can you just clarify what you mean by 'type ids are logical'? In my mind > type ids are strongly coupled to the types and their order in Schema.fbs > [1]. Do you mean that the order there is only a convention and we can't > assume that 0 === Null? > > Best, > Ryan > > [1] https://github.com/apache/arrow/blob/master/format/Schema.fbs#L235 > > On Tue, May 19, 2020 at 2:04 PM Antoine Pitrou wrote: > >> >> Le 19/05/2020 à 13:43, Ryan Murray a écrit : >>> Hey All, >>> >>> While working on https://issues.apache.org/jira/browse/ARROW-1692 I >> noticed >>> that there is a difference between C++ and Java on the way Sparse Unions >>> are handled. I haven't seen in the format spec which the correct is so I >>> wanted to check with the wider community. >>> >>> c++ (and the integration tests) see sparse unions as: >>> name >>> count >>> VALIDITY[] >>> TYPE_ID[] >>> children[] >>> >>> and java as: >>> name >>> count >>> TYPE[] >>> children[] >>> >>> The precise names may only be important for json reading/writing in the >>> integration tests so I will ignore TYPE/TYPE_ID for now. However, the big >>> difference is that Java doesn't have a validity buffer and c++ does. My >>> understanding is thta technically the validity buffer is redundant (0 >> type >>> == NULL) so I can see why Java would omit it. My question is then: which >>> language is 'correct'? >> >> Union type ids are logical, so 0 could very well be a valid type id. >> You can't assume that type 0 means a null entry. >> >> Regards >> >> Antoine. >> >
Re: [VOTE] Release Apache Arrow 0.17.1 - RC1
Current status: 1. [done] rebase (not required for a patch release) 2. [done] upload source 3. [done] upload binaries 4. [done|in-pr] update website 5. [done] upload ruby gems 6. [ ] upload js packages 8. [done] upload C# packages 9. [ ] upload rust crates 10. [done] update conda recipes (dropped ppc64le support though) 11. [done] upload wheels to pypi 12. [nealrichardson] update homebrew packages 13. [done] update maven artifacts 14. [done|in-pr] update msys2 15. [nealrichardson] update R packages 16. [done|in-pr] update docs On Tue, May 19, 2020, at 12:06 AM, Krisztián Szűcs wrote: > Current status: > > 1. [done] rebase (not required for a patch release) > 2. [done] upload source > 3. [done] upload binaries > 4. [done|in-pr] update website > 5. [done] upload ruby gems > 6. [ ] upload js packages > 8. [done] upload C# packages > 9. [ ] upload rust crates > 10. [in-progress|in-pr] update conda recipes > 11. [done] upload wheels to pypi > 12. [nealrichardson] update homebrew packages > 13. [done] update maven artifacts > 14. [done|in-pr] update msys2 > 15. [nealrichardson] update R packages > 16. [done|in-pr] update docs > > On Mon, May 18, 2020 at 11:33 PM Sutou Kouhei wrote: > > > > >> 14. [ ] update msys2 > > > > > > I'll do this. > > > > Oh, sorry. Krisztián already did! > > > > In <20200519.062731.1037230979568376433@clear-code.com> > > "Re: [VOTE] Release Apache Arrow 0.17.1 - RC1" on Tue, 19 May 2020 > > 06:27:31 +0900 (JST), > > Sutou Kouhei wrote: > > > > >> 14. [ ] update msys2 > > > > > > I'll do this. > > > > > > In > > > "Re: [VOTE] Release Apache Arrow 0.17.1 - RC1" on Mon, 18 May 2020 > > > 22:37:50 +0200, > > > Krisztián Szűcs wrote: > > > > > >> 1. [done] rebase (not required for a patch release) > > >> 2. [done] upload source > > >> 3. [done] upload binaries > > >> 4. [done] update website > > >> 5. [done] upload ruby gems > > >> 6. [ ] upload js packages > > >> No javascript changes were applied to the patch release, for > > >> consistency we might want to choose to upload a 0.17.1 release though. > > >> 8. [done] upload C# packages > > >> 9. [ ] upload rust crates > > >> @Andy Grove the patch release doesn't affect the rust implementation. > > >> We can update the crates despite that no changes were made, not sure > > >> what policy should we choose here (same as with JS) > > >> 10. [ ] update conda recipes > > >> @Uwe Korn seems like arrow-cpp-feedstock have not picked up the new > > >> release once again > > >> 11. [done] upload wheels to pypi > > >> 12. [nealrichardson] update homebrew packages > > >> 13. [done] update maven artifacts > > >> 14. [ ] update msys2 > > >> 15. [nealrichardson] update R packages > > >> 16. [in-progress] update docs > > >> > > >> On Mon, May 18, 2020 at 10:29 PM Krisztián Szűcs > > >> wrote: > > >>> > > >>> Current status: > > >>> > > >>> 1. [done] rebase (not required for a patch release) > > >>> 2. [done] upload source > > >>> 3. [done] upload binaries > > >>> 4. [done] update website > > >>> 5. [ ] upload ruby gems > > >>> 6. [ ] upload js packages > > >>> 8. [ ] upload C# packages > > >>> 9. [ ] upload rust crates > > >>> 10. [ ] update conda recipes > > >>> 11. [done] upload wheels to pypi > > >>> 12. [nealrichardson] update homebrew packages > > >>> 13. [done] update maven artifacts > > >>> 14. [ ] update msys2 > > >>> 15. [nealrichardson] update R packages > > >>> 16. [in-progress] update docs > > >>> > > >>> On Mon, May 18, 2020 at 9:39 PM Neal Richardson > > >>> wrote: > > >>> > > > >>> > I'm working on the R stuff and can do Homebrew again. > > >>> > > > >>> > Neal > > >>> > > > >>> > On Mon, May 18, 2020 at 12:30 PM Krisztián Szűcs > > >>> > > > >>> > wrote: > > >>> > > > >>> > > Any help with the post release tasks is welcome! > > >>> > > > > >>> > > Checklist: > > >>> > > 1. [done] rebase (not required for a patch release) > > >>> > > 2. [done] upload source > > >>> > > 3. [in-progress] upload binaries > > >>> > > 4. [done] update website > > >>> > > 5. [ ] upload ruby gems > > >>> > > 6. [ ] upload js packages > > >>> > > 8. [ ] upload C# packages > > >>> > > 9. [ ] upload rust crates > > >>> > > 10. [ ] update conda recipes > > >>> > > 11. [kszucs] upload wheels to pypi > > >>> > > 12. [ ] update homebrew packages > > >>> > > 13. [kszucs] update maven artifacts > > >>> > > 14. [ ] update msys2 > > >>> > > 15. [ ] update R packages > > >>> > > 16. [in-progress] update docs > > >>> > > > > >>> > > @Neal Richardson I think you need to handle the R packages. > > >>> > > > > >>> > > On Mon, May 18, 2020 at 8:08 PM Krisztián Szűcs > > >>> > > wrote: > > >>> > > > > > >>> > > > The VOTE carries with 6 binding +1 votes and 1 non-binding +1 > > >>> > > > vote. > > >>> > > > > > >>> > > > I'm starting the post release tasks and keep posted about the > > >>> > > > remaining > > >>> > > tasks. > > >>> > > > > > >>> > > > Thanks everyone! > > >>> > > > > > >>> > > > > > >>> > > > On
Re: Sparse Union format
Thanks Antoine, Can you just clarify what you mean by 'type ids are logical'? In my mind type ids are strongly coupled to the types and their order in Schema.fbs [1]. Do you mean that the order there is only a convention and we can't assume that 0 === Null? Best, Ryan [1] https://github.com/apache/arrow/blob/master/format/Schema.fbs#L235 On Tue, May 19, 2020 at 2:04 PM Antoine Pitrou wrote: > > Le 19/05/2020 à 13:43, Ryan Murray a écrit : > > Hey All, > > > > While working on https://issues.apache.org/jira/browse/ARROW-1692 I > noticed > > that there is a difference between C++ and Java on the way Sparse Unions > > are handled. I haven't seen in the format spec which the correct is so I > > wanted to check with the wider community. > > > > c++ (and the integration tests) see sparse unions as: > > name > > count > > VALIDITY[] > > TYPE_ID[] > > children[] > > > > and java as: > > name > > count > > TYPE[] > > children[] > > > > The precise names may only be important for json reading/writing in the > > integration tests so I will ignore TYPE/TYPE_ID for now. However, the big > > difference is that Java doesn't have a validity buffer and c++ does. My > > understanding is thta technically the validity buffer is redundant (0 > type > > == NULL) so I can see why Java would omit it. My question is then: which > > language is 'correct'? > > Union type ids are logical, so 0 could very well be a valid type id. > You can't assume that type 0 means a null entry. > > Regards > > Antoine. >
Re: Sparse Union format
Le 19/05/2020 à 13:43, Ryan Murray a écrit : > Hey All, > > While working on https://issues.apache.org/jira/browse/ARROW-1692 I noticed > that there is a difference between C++ and Java on the way Sparse Unions > are handled. I haven't seen in the format spec which the correct is so I > wanted to check with the wider community. > > c++ (and the integration tests) see sparse unions as: > name > count > VALIDITY[] > TYPE_ID[] > children[] > > and java as: > name > count > TYPE[] > children[] > > The precise names may only be important for json reading/writing in the > integration tests so I will ignore TYPE/TYPE_ID for now. However, the big > difference is that Java doesn't have a validity buffer and c++ does. My > understanding is thta technically the validity buffer is redundant (0 type > == NULL) so I can see why Java would omit it. My question is then: which > language is 'correct'? Union type ids are logical, so 0 could very well be a valid type id. You can't assume that type 0 means a null entry. Regards Antoine.
[jira] [Created] (ARROW-8858) [FlightRPC] Ensure headers are uniformly exposed
David Li created ARROW-8858: --- Summary: [FlightRPC] Ensure headers are uniformly exposed Key: ARROW-8858 URL: https://issues.apache.org/jira/browse/ARROW-8858 Project: Apache Arrow Issue Type: Bug Components: FlightRPC, Java, Python Affects Versions: 0.17.0 Reporter: David Li Assignee: David Li * Java: MetadataAdapter should support iterating through binary headers * Python: binary headers need to be present in the output -- This message was sent by Atlassian Jira (v8.3.4#803005)
Sparse Union format
Hey All, While working on https://issues.apache.org/jira/browse/ARROW-1692 I noticed that there is a difference between C++ and Java on the way Sparse Unions are handled. I haven't seen in the format spec which the correct is so I wanted to check with the wider community. c++ (and the integration tests) see sparse unions as: name count VALIDITY[] TYPE_ID[] children[] and java as: name count TYPE[] children[] The precise names may only be important for json reading/writing in the integration tests so I will ignore TYPE/TYPE_ID for now. However, the big difference is that Java doesn't have a validity buffer and c++ does. My understanding is thta technically the validity buffer is redundant (0 type == NULL) so I can see why Java would omit it. My question is then: which language is 'correct'? I suppose the actual language implementation is not entirely relevant here, instead correct refers to what the canonical IPC schema for a sparse union should be. Best, Ryan
[NIGHTLY] Arrow Build Report for Job nightly-2020-05-19-0
Arrow Build Report for Job nightly-2020-05-19-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0 Failed Tasks: - conda-linux-gcc-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-azure-conda-linux-gcc-py36 - conda-linux-gcc-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-azure-conda-linux-gcc-py37 - conda-linux-gcc-py38: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-azure-conda-linux-gcc-py38 - conda-osx-clang-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-azure-conda-osx-clang-py36 - conda-osx-clang-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-azure-conda-osx-clang-py37 - conda-osx-clang-py38: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-azure-conda-osx-clang-py38 - conda-win-vs2015-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-azure-conda-win-vs2015-py36 - conda-win-vs2015-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-azure-conda-win-vs2015-py37 - conda-win-vs2015-py38: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-azure-conda-win-vs2015-py38 - homebrew-cpp: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-travis-homebrew-cpp - homebrew-r-autobrew: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-travis-homebrew-r-autobrew - test-conda-python-3.7-spark-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-github-test-conda-python-3.7-spark-master - test-conda-python-3.8-dask-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-github-test-conda-python-3.8-dask-master - wheel-manylinux1-cp35m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-azure-wheel-manylinux1-cp35m - wheel-manylinux2010-cp35m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-azure-wheel-manylinux2010-cp35m Succeeded Tasks: - centos-6-amd64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-github-centos-6-amd64 - centos-7-aarch64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-travis-centos-7-aarch64 - centos-7-amd64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-github-centos-7-amd64 - centos-8-aarch64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-travis-centos-8-aarch64 - centos-8-amd64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-github-centos-8-amd64 - debian-buster-amd64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-github-debian-buster-amd64 - debian-buster-arm64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-travis-debian-buster-arm64 - debian-stretch-amd64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-github-debian-stretch-amd64 - debian-stretch-arm64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-travis-debian-stretch-arm64 - gandiva-jar-osx: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-travis-gandiva-jar-osx - gandiva-jar-xenial: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-travis-gandiva-jar-xenial - nuget: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-github-nuget - test-conda-cpp-valgrind: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-github-test-conda-cpp-valgrind - test-conda-cpp: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-github-test-conda-cpp - test-conda-python-3.6-pandas-0.23: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-github-test-conda-python-3.6-pandas-0.23 - test-conda-python-3.6: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-github-test-conda-python-3.6 - test-conda-python-3.7-dask-latest: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-github-test-conda-python-3.7-dask-latest - test-conda-python-3.7-hdfs-2.9.2: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-github-test-conda-python-3.7-hdfs-2.9.2 - test-conda-python-3.7-kartothek-latest: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-19-0-github-test-conda-python-3.7-kartothek-latest - test-conda-python-3.7-kartothek-master: URL: