Re: [DISCUSS][Julia] How to restart at apache/arrow-julia?
A minor note on the Rust side of things. arrow-rs has a 2 weeks release cycle, but arrow-datafusion mostly does release on demand at the moment. Our most uptodate release processes are documented at [1] and [2]. [1]: https://github.com/apache/arrow-rs/blob/master/dev/release/README.md [2]: https://github.com/apache/arrow-datafusion/blob/master/dev/release/README.md On Tue, Sep 7, 2021 at 4:01 PM Jacob Quinn wrote: > > Thanks kou. > > I think the TODO action list looks good. > > The one point I think could use some additional discussion is around the > release cadence: it IS desirable to be able to release more frequently than > the parent repo 3-4 month cadence. But we also haven't had the frequency of > commits to necessarily warrant a release every 2 weeks. I can think of two > possible options, not sure if one or the other would be more compatible > with the apache release process: > > 1) Allow for release-on-demand; this is idiomatic for most Julia packages > I'm aware of. When a particular bug is fixed, or feature added, a user can > request a release, a little discussion happens, and a new release is made. > This approach would work well for the "bursty" kind of contributions we've > seen to Arrow.jl where development by certain people will happen frequently > for a while, then take a break for other things. This also avoids having > "scheduled" releases (every 2 weeks, 3 months, etc.) where there hasn't > been significant updates to necessarily warrant a new release. This > approach may also facilitate differentiating between bugfix (patch) > releases vs. new functionality releases (minor), since when a release is > requested, it could be specified whether it should be patch or minor (or > major). > > 2) Commit to a scheduled release pattern like every 2 weeks, once a month, > etc. This has the advantage of consistency and clearer expectations for > users/devs involved. A release also doesn't need to be requested, because > we can just wait for the scheduled time to release. In terms of the > "unnecessary releases" mentioned above, it could be as simple as > "cancelling" a release if there hasn't been significant updates in the > elapsed time period. > > My preference would be for 1), but that's influenced from what I'm familiar > with in the Julia package ecosystem. It seems like it would still fit in > the apache way since we would formally request a new release, wait the > elapsed amount of time for voting (24 hours would be preferrable), then at > the end of the voting period, a new release could be made. > > Thanks again kou for helping support the Julia implementation here. > > -Jacob > > 2) > > On Sun, Sep 5, 2021 at 3:25 PM Sutou Kouhei wrote: > > > Hi, > > > > Sorry for the delay. This is a continuation of the "Status > > of Arrow Julia implementation?" thread: > > > > > > https://lists.apache.org/x/thread.html/r6d91286686d92837fbe21dd042801a57e3a7b00b5903ea90a754ac7b%40%3Cdev.arrow.apache.org%3E > > > > I summarize the current status, the next actions and items > > to be discussed. > > > > The current status: > > > > * The Julia Arrow implementation uses > > https://github.com/JuliaData/Arrow.jl as a "dev branch" > > instead of creating a branch in > > https://github.com/apache/arrow > > * The Julia Arrow implementation wants to use GitHub > > for the main issue management platform > > * The Julia Arrow implementation wants to release > > more frequency than 1 release per 3-4 months > > * The current workflow of the Rust Arrow implementation > > will also fit the Julia Arrow implementation > > > > The current workflow of the Rust Arrow implementation: > > > > > > https://docs.google.com/document/d/1TyrUP8_UWXqk97a8Hvb1d0UYWigch0HAephIjW7soSI/edit#heading=h.kv1hwbhi3cmi > > > > * Uses apache/arrow-rs and apache/arrow-datafusion instead > > of apache/arrow for repository > > > > * Uses GitHub instead of JIRA for issue management > > platform > > > > > > https://docs.google.com/document/d/1tMQ67iu8XyGGZuj--h9WQYB9inCk6c2sL_4xMTwENGc/edit > > > > * Releases a new minor and patch version every 2 weeks > > in addition to the quarterly release of the other releases > > > > The next actions after we get a consensus about this > > discussion: > > > > 1. Start voting the Julia Arrow implementation move like > > the Rust's one: > > > > > > https://lists.apache.org/x/thread.html/r44390a18b3fbb08ddb68aa4d12f37245d948984fae11a41494e5fc1d@%3Cdev.arrow.apache.org%3E > > > > 2. Create apache/arrow-julia > > > > 3. Start IP clearance process to import JuliaData/Arrow.jl > > to apache/arrow-julia > > > > (We don't use julia/Arrow/ in apache/arrow.) > > > > 4. Import JuliaData/Arrow.jl to apache/arrow-julia > > > > 5. Prepare integration tests CI in apache/arrow-julia and apache/arrow > > > > 6. Prepare releasing tools in apache/arrow-julia and apache/arrow > > > > 7. Remove julia/... from apache/arrow and leave >
Re: [Question] Allocations along 64 byte cache lines
Thanks Jorge, I'm wondering if the 64 bytes alignment requirement is for cache or for simd register(avx512?). For simd, looks register width alignment does helps. E.g., _mm_load_si128 can only load 128 bits aligned data, it performs better than _mm_loadu_si128, which supports unaligned load. Again, be very skeptical to the benchmark :) https://quick-bench.com/q/NxyDu89azmKJmiVxF29Ei8FybWk On 9/7/21 7:16 PM, Jorge Cardoso Leitão wrote: Thanks, I think that the alignment requirement in IPC is different from this one: we enforce 8/64 byte alignment when serializing for IPC, but we (only) recommend 64 byte alignment in memory addresses (at least this is my understanding from the above link). I did test adding two arrays and the result is independent of the alignment (on my machine, compiler, etc). Yibo, thanks a lot for that example. I am unsure whether it captures the cache alignment concept, though: in the example we are reading a long (8 bytes) from a pointer that is not aligned with 8 bytes (63 % 8 != 0), which is both slow and often undefined behavior. I think that the bench we want is to change 63 to 64-8 (which is still not 64-bytes cache aligned but aligned with a long), the difference vanishes (under the same gotchas that you mentioned) https://quick-bench.com/q/EKIpQFJsAogSHXXLqamoWSTy-eE. Alternatively, add an int32 with an offset of 4. I benched both with explicit (via intrinsics) SIMD and without (i.e. let the compiler do it for us), and the alignment does not impact the benches. Best, Jorge [1] https://stackoverflow.com/a/27184001/931303 On Tue, Sep 7, 2021 at 4:29 AM Yibo Cai wrote: Did a quick bench of accessing long buffer not 8 bytes aligned. Giving enough conditions, looks it does shows unaligned access has some penalty over aligned access. But I don't think this is an issue in practice. Please be very skeptical to this benchmark. It's hard to get it right given the complexity of hardware, compiler, benchmark tool and env. https://quick-bench.com/q/GmyqRk6saGfRu8XnMUyoSXs4SCk On 9/7/21 7:55 AM, Micah Kornfield wrote: My own impression is that the emphasis may be slightly exagerated. But perhaps some other benchmarks would prove differently. This is probably true. [1] is the original mailing list discussion. I think lack of measurable differences and high overhead for 64 byte alignment was the reason for relaxing to 8 byte alignment. Specifically, I performed two types of tests, a "random sum" where we compute the sum of the values taken at random indices, and "sum", where we sum all values of the array (buffer[1] of the primitive array), both for array ranging from 2^10 to 2^25 elements. I was expecting that, at least in the latter, prefetching would help, but I do not observe any difference. The most likely place I think where this could make a difference would be for operations on wider types (Decimal128 and Decimal256). Another place where I think alignment could help is when adding two primitive arrays (it sounds like this was summing a single array?). [1] https://lists.apache.org/thread.html/945b65fb4bc8bcdab695b572f9e9c2dca4cd89012fdbd896a6f2d886%401460092304%40%3Cdev.arrow.apache.org%3E On Mon, Sep 6, 2021 at 3:05 PM Antoine Pitrou wrote: Le 06/09/2021 à 23:20, Jorge Cardoso Leitão a écrit : Thanks a lot Antoine for the pointers. Much appreciated! Generally, it should not hurt to align allocations to 64 bytes anyway, since you are generally dealing with large enough data that the (small) memory overhead doesn't matter. Not for performance. However, 64 byte alignment in Rust requires maintaining a custom container, a custom allocator, and the inability to interoperate with `std::Vec` and the ecosystem that is based on it, since std::Vec allocates with alignment T (.e.g int32), not 64 bytes. For anyone interested, the background for this is this old PR [1] in this in arrow2 [2]. I see. In the C++ implementation, we are not compatible with the default allocator either (but C++ allocators as defined by the standard library don't support resizing, which doesn't make them terribly useful for Arrow anyway). Neither myself in micro benches nor Ritchie from polars (query engine) in large scale benches observe any difference in the archs we have available. This is not consistent with the emphasis we put on the memory alignments discussion [3], and I am trying to understand the root cause for this inconsistency. My own impression is that the emphasis may be slightly exagerated. But perhaps some other benchmarks would prove differently. By prefetching I mean implicit; no intrinsics involved. Well, I'm not aware that implicit prefetching depends on alignment. Regards Antoine.
Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines
As Phillip mentioned, I think there is something powerful in producing a standard serialized representation of compute operations beyond just Arrow and I'd really like to create a broader community around it. This has been something I had been independently thinking about for the last several months. The discussion here has inspired me to start making real progress on this work. As such, I created a new repository and site where I've started to put together work around a new specification for compute. I would love for the people here to help define this and will be looking to a number of other communities to also contribute. One of my goals has been to break the specification into a number of much smaller pieces [1] so that we can make progress on each subsection without being overwhelmed by the amount of content that must be reviewed. Would love to hear people's ideas on this initiative. The site is here: https://substrait.io/ The repo is here: https://github.com/substrait-io/substrait [1] https://substrait.io/spec/specification/#components On Wed, Sep 1, 2021 at 3:26 PM Phillip Cloud wrote: > Hey everyone, > > As many of you know, the compute IR project has a lot of interested parties > and has generated a lot of feedback. In light of some of the feedback we’ve > received, we want to stress that the specification is intended to have > input from many diverse points of view and that we welcome folks outside of > the Arrow community. We think there’s immense potential for a compute IR > that multiple projects--including those outside of the Arrow umbrella--can > leverage. > > With that in mind, Jacques has been working on something outside of the > Arrow repo that’ll be shared in a few days, that is designed to bring those > viewpoints to bear on the problem of generic relational computation that > lives outside of the Arrow project. > > Inside Arrow, we think that a version of the in-development IR > specifications from the last several weeks will add a ton of value by > informing this new effort and would like to continue to move forward with a > work-in-progress IR inside of Arrow for the time being to enable some work > on API development (independent of exactly how things are serialized) to > take place. It is very likely that we will adopt this broader specification > once the dust has settled, so the format inside of Arrow will be relatively > unstable for a while and not have backwards compatibility guarantees for > now. > > The primary focus of the Arrow IR will be on shoring up APIs (producers and > consumers), and we will also be moving the compute IR flatbuffers files out > the format directory into another top-level directory in the repo. > > Thanks, > Phillip > > On Mon, Aug 30, 2021 at 7:30 PM Weston Pace wrote: > > > My (incredibly naive) interpretation is that there are three problems to > > tackle. > > > > 1) How do you represent a graph and relational operators (join, union, > > groupby, etc.) > > - The PR appears to be addressing this question fairly well > > 2) How does a frontend query a backend to know what UDFs are supported. > > - I don't see anything in the spec for this (some comments touch on > > it) but it seems like it would be necessary to build any kind of > > system. > > 3) Is there some well defined set of canonical UDFs that we can all > > agree on the semantics for (e.g. addition, subtraction, etc.) > > - I thought, from earlier comments in this email thread, that the > > goal was to avoid addressing this. Although I think there is strong > > value here as well. > > > > So what is the scope of this initiative? If it is just #1 for example > > then I don't see any need to put types in the IR (and I've commented > > as such in the PR). From a relational perspective isn't a UDF just a > > black box Table -> UDF -> Table? > > > > On Mon, Aug 30, 2021 at 11:10 AM Phillip Cloud > wrote: > > > > > > Hey everyone, > > > > > > There's some interesting discussion around types and where their > location > > > is in the current PR [1] (and in fact whether to store them at all). > > > > > > It would be great to get some community feedback on this [2] part of > the > > PR > > > in particular, because the choice of whether to store types at all has > > > important design consequences. > > > > > > [1]: https://github.com/apache/arrow/pull/10934 > > > [2]: https://github.com/apache/arrow/pull/10934/files#r697025313 > > > > > > On Fri, Aug 27, 2021 at 2:11 AM Micah Kornfield > > > > wrote: > > > > > > > As an FYI, Iceberg is also considering an IR in relation to view > > support > > > > [1]. I chimed in and pointed them to this thread and Wes's doc. > > Phillip > > > > and Jacques chimed in there as well. > > > > > > > > [1] > > > > > > > > > > > https://mail-archives.apache.org/mod_mbox/iceberg-dev/202108.mbox/%3CCAKRVfm6h6WxQtp5fj8Yj8XWR1wFe8VohOkPuoZZGK-UHPhtwjQ%40mail.gmail.com%3E > > > > > > > > On Thu, Aug 26, 2021 at 12:40 PM Phillip Cloud > > wrote: > > > > > >
Re: [DISCUSS][Julia] How to restart at apache/arrow-julia?
Thanks kou. I think the TODO action list looks good. The one point I think could use some additional discussion is around the release cadence: it IS desirable to be able to release more frequently than the parent repo 3-4 month cadence. But we also haven't had the frequency of commits to necessarily warrant a release every 2 weeks. I can think of two possible options, not sure if one or the other would be more compatible with the apache release process: 1) Allow for release-on-demand; this is idiomatic for most Julia packages I'm aware of. When a particular bug is fixed, or feature added, a user can request a release, a little discussion happens, and a new release is made. This approach would work well for the "bursty" kind of contributions we've seen to Arrow.jl where development by certain people will happen frequently for a while, then take a break for other things. This also avoids having "scheduled" releases (every 2 weeks, 3 months, etc.) where there hasn't been significant updates to necessarily warrant a new release. This approach may also facilitate differentiating between bugfix (patch) releases vs. new functionality releases (minor), since when a release is requested, it could be specified whether it should be patch or minor (or major). 2) Commit to a scheduled release pattern like every 2 weeks, once a month, etc. This has the advantage of consistency and clearer expectations for users/devs involved. A release also doesn't need to be requested, because we can just wait for the scheduled time to release. In terms of the "unnecessary releases" mentioned above, it could be as simple as "cancelling" a release if there hasn't been significant updates in the elapsed time period. My preference would be for 1), but that's influenced from what I'm familiar with in the Julia package ecosystem. It seems like it would still fit in the apache way since we would formally request a new release, wait the elapsed amount of time for voting (24 hours would be preferrable), then at the end of the voting period, a new release could be made. Thanks again kou for helping support the Julia implementation here. -Jacob 2) On Sun, Sep 5, 2021 at 3:25 PM Sutou Kouhei wrote: > Hi, > > Sorry for the delay. This is a continuation of the "Status > of Arrow Julia implementation?" thread: > > > https://lists.apache.org/x/thread.html/r6d91286686d92837fbe21dd042801a57e3a7b00b5903ea90a754ac7b%40%3Cdev.arrow.apache.org%3E > > I summarize the current status, the next actions and items > to be discussed. > > The current status: > > * The Julia Arrow implementation uses > https://github.com/JuliaData/Arrow.jl as a "dev branch" > instead of creating a branch in > https://github.com/apache/arrow > * The Julia Arrow implementation wants to use GitHub > for the main issue management platform > * The Julia Arrow implementation wants to release > more frequency than 1 release per 3-4 months > * The current workflow of the Rust Arrow implementation > will also fit the Julia Arrow implementation > > The current workflow of the Rust Arrow implementation: > > > https://docs.google.com/document/d/1TyrUP8_UWXqk97a8Hvb1d0UYWigch0HAephIjW7soSI/edit#heading=h.kv1hwbhi3cmi > > * Uses apache/arrow-rs and apache/arrow-datafusion instead > of apache/arrow for repository > > * Uses GitHub instead of JIRA for issue management > platform > > > https://docs.google.com/document/d/1tMQ67iu8XyGGZuj--h9WQYB9inCk6c2sL_4xMTwENGc/edit > > * Releases a new minor and patch version every 2 weeks > in addition to the quarterly release of the other releases > > The next actions after we get a consensus about this > discussion: > > 1. Start voting the Julia Arrow implementation move like > the Rust's one: > > > https://lists.apache.org/x/thread.html/r44390a18b3fbb08ddb68aa4d12f37245d948984fae11a41494e5fc1d@%3Cdev.arrow.apache.org%3E > > 2. Create apache/arrow-julia > > 3. Start IP clearance process to import JuliaData/Arrow.jl > to apache/arrow-julia > > (We don't use julia/Arrow/ in apache/arrow.) > > 4. Import JuliaData/Arrow.jl to apache/arrow-julia > > 5. Prepare integration tests CI in apache/arrow-julia and apache/arrow > > 6. Prepare releasing tools in apache/arrow-julia and apache/arrow > > 7. Remove julia/... from apache/arrow and leave > julia/README.md pointing to apache/arrow-julia > > > Items to be discussed: > > * Interval of minor and patch releases > > * The Rust Arrow implementation uses 2 weeks. > > * Does the Julia Arrow implementation also wants to use > 2 weeks? > > * Can we accordance with the Apache way with this workflow > without pain? > > The Rust Arrow implementation workflow includes the > following for this: > > > https://docs.google.com/document/d/1TyrUP8_UWXqk97a8Hvb1d0UYWigch0HAephIjW7soSI/edit#heading=h.kv1hwbhi3cmi > > > Contributors will be required to write issues for > > planned
Re: HDFS ORC to Arrow Dataset
I'll just add that a PR in in progress (thanks Joris!) for adding this adapter: https://github.com/apache/arrow/pull/10991 On Tue, Sep 7, 2021 at 12:05 PM Wes McKinney wrote: > > I'm missing context but if you're talking about C++/Python, we are > currently missing a wrapper interface to the ORC reader in the Arrow > datasets library > > https://github.com/apache/arrow/tree/master/cpp/src/arrow/dataset > > We have CSV, Arrow (IPC), and Parquet interfaces. > > But we have an HDFS filesystem implementation and an ORC reader > implementation, so mechanically all of the pieces are there but need > to be connected together. > > Thanks, > Wes > > On Tue, Sep 7, 2021 at 8:22 AM Manoj Kumar wrote: > > > > Hi Dev-Community, > > > > Anyone can help me to guide how to read ORC directly from HDFS to an > > arrow dataset. > > > > Thanks > > Manoj
Re: HDFS ORC to Arrow Dataset
I'm missing context but if you're talking about C++/Python, we are currently missing a wrapper interface to the ORC reader in the Arrow datasets library https://github.com/apache/arrow/tree/master/cpp/src/arrow/dataset We have CSV, Arrow (IPC), and Parquet interfaces. But we have an HDFS filesystem implementation and an ORC reader implementation, so mechanically all of the pieces are there but need to be connected together. Thanks, Wes On Tue, Sep 7, 2021 at 8:22 AM Manoj Kumar wrote: > > Hi Dev-Community, > > Anyone can help me to guide how to read ORC directly from HDFS to an > arrow dataset. > > Thanks > Manoj
Re: HTTP traffic of Arrow Flight
Yes, I got it, I have to do decode as and choose HTTP2 protocol Thanks a lot On 2021/09/07 17:06:10, "David Li" wrote: > Yes and to be extra clear, Flight currently only supports gRPC, and hence > HTTP/2 (barring a few hypothetical configurations), it may also be that you > need to explicitly tell WireShark the protocol in use. > > -David > > On Tue, Sep 7, 2021, at 13:03, Nate Bauernfeind wrote: > > HTTP (and HTTP/2) traffic is sent over TCP. You might need to be more > > specific, or possibly do some more research on your end > > > > Which arrow flight client are you using in your test? Java? C++? Which > > version? Can you provide a simple gRPC server/client example that shows up > > in WireShark as you expect it? > > > > Nate > > > > On Tue, Sep 7, 2021 at 10:40 AM Mohamed Abdelhakem < > > mohamed.abdelha...@incorta.com> wrote: > > > > > When I built a simple FlightServer and FlightClient, I noticed that the > > > traffic captured by WireShark is TCP, not HTTP/2 > > > MY question is how to configure Arrow Flight to use HTTP/2 protocol > > > traffic > > > > > > > > > -- > > >
Re: HTTP traffic of Arrow Flight
I am using Java Flight Client using Arrow Flight gRPC version 5.0 On 2021/09/07 17:03:42, Nate Bauernfeind wrote: > HTTP (and HTTP/2) traffic is sent over TCP. You might need to be more > specific, or possibly do some more research on your end > > Which arrow flight client are you using in your test? Java? C++? Which > version? Can you provide a simple gRPC server/client example that shows up > in WireShark as you expect it? > > Nate > > On Tue, Sep 7, 2021 at 10:40 AM Mohamed Abdelhakem < > mohamed.abdelha...@incorta.com> wrote: > > > When I built a simple FlightServer and FlightClient, I noticed that the > > traffic captured by WireShark is TCP, not HTTP/2 > > MY question is how to configure Arrow Flight to use HTTP/2 protocol traffic > > > > > -- >
Re: HTTP traffic of Arrow Flight
Yes and to be extra clear, Flight currently only supports gRPC, and hence HTTP/2 (barring a few hypothetical configurations), it may also be that you need to explicitly tell WireShark the protocol in use. -David On Tue, Sep 7, 2021, at 13:03, Nate Bauernfeind wrote: > HTTP (and HTTP/2) traffic is sent over TCP. You might need to be more > specific, or possibly do some more research on your end > > Which arrow flight client are you using in your test? Java? C++? Which > version? Can you provide a simple gRPC server/client example that shows up > in WireShark as you expect it? > > Nate > > On Tue, Sep 7, 2021 at 10:40 AM Mohamed Abdelhakem < > mohamed.abdelha...@incorta.com> wrote: > > > When I built a simple FlightServer and FlightClient, I noticed that the > > traffic captured by WireShark is TCP, not HTTP/2 > > MY question is how to configure Arrow Flight to use HTTP/2 protocol traffic > > > > > -- >
Re: HTTP traffic of Arrow Flight
HTTP (and HTTP/2) traffic is sent over TCP. You might need to be more specific, or possibly do some more research on your end Which arrow flight client are you using in your test? Java? C++? Which version? Can you provide a simple gRPC server/client example that shows up in WireShark as you expect it? Nate On Tue, Sep 7, 2021 at 10:40 AM Mohamed Abdelhakem < mohamed.abdelha...@incorta.com> wrote: > When I built a simple FlightServer and FlightClient, I noticed that the > traffic captured by WireShark is TCP, not HTTP/2 > MY question is how to configure Arrow Flight to use HTTP/2 protocol traffic > --
HTTP traffic of Arrow Flight
When I built a simple FlightServer and FlightClient, I noticed that the traffic captured by WireShark is TCP, not HTTP/2 MY question is how to configure Arrow Flight to use HTTP/2 protocol traffic
Fwd: HDFS ORC to Arrow Dataset
Hi Dev-Community, Anyone can help me to guide how to read ORC directly from HDFS to an arrow dataset. Thanks Manoj
Re: [Question] Allocations along 64 byte cache lines
Thanks, I think that the alignment requirement in IPC is different from this one: we enforce 8/64 byte alignment when serializing for IPC, but we (only) recommend 64 byte alignment in memory addresses (at least this is my understanding from the above link). I did test adding two arrays and the result is independent of the alignment (on my machine, compiler, etc). Yibo, thanks a lot for that example. I am unsure whether it captures the cache alignment concept, though: in the example we are reading a long (8 bytes) from a pointer that is not aligned with 8 bytes (63 % 8 != 0), which is both slow and often undefined behavior. I think that the bench we want is to change 63 to 64-8 (which is still not 64-bytes cache aligned but aligned with a long), the difference vanishes (under the same gotchas that you mentioned) https://quick-bench.com/q/EKIpQFJsAogSHXXLqamoWSTy-eE. Alternatively, add an int32 with an offset of 4. I benched both with explicit (via intrinsics) SIMD and without (i.e. let the compiler do it for us), and the alignment does not impact the benches. Best, Jorge [1] https://stackoverflow.com/a/27184001/931303 On Tue, Sep 7, 2021 at 4:29 AM Yibo Cai wrote: > Did a quick bench of accessing long buffer not 8 bytes aligned. Giving > enough conditions, looks it does shows unaligned access has some penalty > over aligned access. But I don't think this is an issue in practice. > > Please be very skeptical to this benchmark. It's hard to get it right > given the complexity of hardware, compiler, benchmark tool and env. > > https://quick-bench.com/q/GmyqRk6saGfRu8XnMUyoSXs4SCk > > > On 9/7/21 7:55 AM, Micah Kornfield wrote: > >> > >> My own impression is that the emphasis may be slightly exagerated. But > >> perhaps some other benchmarks would prove differently. > > > > > > This is probably true. [1] is the original mailing list discussion. I > > think lack of measurable differences and high overhead for 64 byte > > alignment was the reason for relaxing to 8 byte alignment. > > > > Specifically, I performed two types of tests, a "random sum" where we > >> compute the sum of the values taken at random indices, and "sum", where > we > >> sum all values of the array (buffer[1] of the primitive array), both for > >> array ranging from 2^10 to 2^25 elements. I was expecting that, at > least in > >> the latter, prefetching would help, but I do not observe any difference. > > > > > > The most likely place I think where this could make a difference would be > > for operations on wider types (Decimal128 and Decimal256). Another > place > > where I think alignment could help is when adding two primitive arrays > (it > > sounds like this was summing a single array?). > > > > [1] > > > https://lists.apache.org/thread.html/945b65fb4bc8bcdab695b572f9e9c2dca4cd89012fdbd896a6f2d886%401460092304%40%3Cdev.arrow.apache.org%3E > > > > On Mon, Sep 6, 2021 at 3:05 PM Antoine Pitrou > wrote: > > > >> > >> Le 06/09/2021 à 23:20, Jorge Cardoso Leitão a écrit : > >>> Thanks a lot Antoine for the pointers. Much appreciated! > >>> > >>> Generally, it should not hurt to align allocations to 64 bytes anyway, > since you are generally dealing with large enough data that the > (small) memory overhead doesn't matter. > >>> > >>> Not for performance. However, 64 byte alignment in Rust requires > >>> maintaining a custom container, a custom allocator, and the inability > to > >>> interoperate with `std::Vec` and the ecosystem that is based on it, > since > >>> std::Vec allocates with alignment T (.e.g int32), not 64 bytes. For > >> anyone > >>> interested, the background for this is this old PR [1] in this in > arrow2 > >>> [2]. > >> > >> I see. In the C++ implementation, we are not compatible with the default > >> allocator either (but C++ allocators as defined by the standard library > >> don't support resizing, which doesn't make them terribly useful for > >> Arrow anyway). > >> > >>> Neither myself in micro benches nor Ritchie from polars (query engine) > in > >>> large scale benches observe any difference in the archs we have > >> available. > >>> This is not consistent with the emphasis we put on the memory > alignments > >>> discussion [3], and I am trying to understand the root cause for this > >>> inconsistency. > >> > >> My own impression is that the emphasis may be slightly exagerated. But > >> perhaps some other benchmarks would prove differently. > >> > >>> By prefetching I mean implicit; no intrinsics involved. > >> > >> Well, I'm not aware that implicit prefetching depends on alignment. > >> > >> Regards > >> > >> Antoine. > >> > > >