Re: Documentation of concurrency of the compute API?

2022-03-10 Thread Aldrin
I think there's one minor misunderstanding, but I like the essence of the feedback. To clarify, the MeanAggr::Accumulate function is used to gather over points of a sample, where a row is considered a sample, and columns are corresponding values, e.g.: columns (values) | c0 | c1 | c2 | c3

Re: Documentation of concurrency of the compute API?

2022-03-10 Thread Niranda Perera
Okay, one thing I immediately see is that there are a lot of memory allocations/ deallocations happening in the approach you have given IMO. arrow::compute methods are immutable, so when you get an answer, it would be allocated freshly in memory, and when you update an existing shared_ptr, you

Re: Documentation of concurrency of the compute API?

2022-03-10 Thread Aldrin
You're correct with the first clarification. I am not (currently) slicing column-wise. And yes, I am calculating variance, mean, etc. so that I can calculate the t-statistic. Aldrin Montana Computer Science PhD Student UC Santa Cruz On Thu, Mar 10, 2022 at 5:16 PM Niranda Perera wrote: > Or

Re: Documentation of concurrency of the compute API?

2022-03-10 Thread Niranda Perera
Or are you slicing column-wise? On Thu, Mar 10, 2022 at 8:14 PM Niranda Perera wrote: > From the looks of it, you are trying to calculate variance, mean, etc over > rows, isn't it? > > I need to clarify a bit on this statement. > "Where "by slice" is total time, summed from running the function

Re: Documentation of concurrency of the compute API?

2022-03-10 Thread Niranda Perera
>From the looks of it, you are trying to calculate variance, mean, etc over rows, isn't it? I need to clarify a bit on this statement. "Where "by slice" is total time, summed from running the function on each slice and "by table" is the time of just running the function on the table concatenated

Re: Documentation of concurrency of the compute API?

2022-03-10 Thread Aldrin
Oh, but the short answer is that I'm using: Add, Subtract, Divide, Multiply, Power, and Absolute. Sometimes with both inputs being ChunkedArrays, sometimes with 1 input being a ChunkedArray and the other being a scalar. Aldrin Montana Computer Science PhD Student UC Santa Cruz On Thu, Mar 10,

Re: Documentation of concurrency of the compute API?

2022-03-10 Thread Aldrin
Hi Niranda! Sure thing, I've linked to my code. [1] is essentially the function being called, and [2] is an example of a wrapper function (more in that file) I wrote to reduce boilerplate (to make [1] more readable). But, now that I look at [2] again, which I wrote before I really knew much about

Re: Documentation of concurrency of the compute API?

2022-03-10 Thread Niranda Perera
Hi Aldrin, It would be helpful to know what sort of compute operators you are using. On Thu, Mar 10, 2022, 19:12 Aldrin wrote: > I will work on a reproducible example. > > As a sneak peek, what I was seeing was the following (pasted in gmail, see > [1] for markdown version): > > Table ID

Re: Documentation of concurrency of the compute API?

2022-03-10 Thread Aldrin
I will work on a reproducible example. As a sneak peek, what I was seeing was the following (pasted in gmail, see [1] for markdown version): Table ID Columns Rows Rows (slice) Slice count Time (ms) total; by slice Time (ms) total; by table E-GEOD-100618 415 20631 299 69 644.065 410 E-GEOD-76312

Re: Documentation of concurrency of the compute API?

2022-03-10 Thread Weston Pace
As far as I know (and my knowledge here may be dated) the compute kernels themselves do not do any concurrency. There are certainly compute kernels that could benefit from concurrency in this manner (many kernels naively so) and I think things are setup so that, if we decide to tackle this

Documentation of concurrency of the compute API?

2022-03-10 Thread Aldrin
Hello! I'm wondering if there's any documentation that describes the concurrency/parallelism architecture for the compute API. I'd also be interested if there are recommended approaches for seeing performance of threads used by Arrow--should I try to check a processor ID and infer performance or

Re: [Rust] Unable to read in Python or JS Arrow Stream IPC files written in Rust

2022-03-10 Thread Andrew Lamb
I am glad you got it working! On Thu, Mar 10, 2022 at 12:34 PM Kyle Barron wrote: > Thanks to both! > > I did more debugging last night and I believe the entire issue was `unsafe > { Uint8Array::view() }` >

Re: [Rust] Unable to read in Python or JS Arrow Stream IPC files written in Rust

2022-03-10 Thread Kyle Barron
Thanks to both! I did more debugging last night and I believe the entire issue was `unsafe { Uint8Array::view() }` was unsafe . I originally copied that from Dominik Moritz's `arrow-wasm`'s

Re: [Java] How to merge/concat VectorSchemaRoot/FieldVector values? Is there an API for this

2022-03-10 Thread Gavin Ray
Yes! There's even a "VectorSchemaRootAppender", thank you! On Thu, Mar 10, 2022 at 11:43 AM David Li wrote: > It appears it's called VectorAppender: > https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/util/VectorAppender.java > > Does this work? > >

Re: [Java] How to merge/concat VectorSchemaRoot/FieldVector values? Is there an API for this

2022-03-10 Thread David Li
It appears it's called VectorAppender: https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/util/VectorAppender.java Does this work? On Thu, Mar 10, 2022, at 11:38, Gavin Ray wrote: > Curious how I could combine a list of VectorSchemaRoot's that have the

[Java] How to merge/concat VectorSchemaRoot/FieldVector values? Is there an API for this

2022-03-10 Thread Gavin Ray
Curious how I could combine a list of VectorSchemaRoot's that have the same schema and also a list of FieldVectors for the same column Have been trying to write this for a few hours, don't seem to be getting it. The data being columnar instead of row-based, and that the underlying values are

Re: [Rust] Unable to read in Python or JS Arrow Stream IPC files written in Rust

2022-03-10 Thread Andrew Lamb
Sorry Kyle, I totally missed this email Initially I would say the symptoms sound like "not calling finish() on the writer" but I skimmed some of your linked code and saw at least one call to finish, so maybe this is not the root cause In terms of reading from a parquet file and returning arrow,