[DISCUSS] Improving Contributor Guidelines

2021-03-04 Thread Micah Kornfield
Hi Everyone,
I am writing to give a bump to some of what was written in reply to
Andrew's thread on auto-creating JIRAs.  I would like to try to focus on
small (hopefully) short term achievable items, to make the community
friendlier to newcomers and reduce toil for regular contributors.

1.  I think creating a GitHub action that can automatically copy a GitHub
issue to a JIRA, close the issue and leave a note [1] would be useful.

The intent is to be friendlier to people interacting with the project for
the first time and letting them decide how invested they are in a bug to
create the necessary credentials to track it.

2.  Guidelines for trivial/minor patches (those not requiring a JIRA) and
updating the PR tool to accept a title indicating them as such.  I would
propose the following fall under the trivial guideline:
a.  Grammar, usage and spelling fixes that affect no more than N files
b.  Documentation updates affecting no more than N files and not more
than M words.

3.  Guidelines for when to use the auto-create JIRA tool:
a.  Refactors (no functionality change) affect no more than N [2]
files.  If coding work required on the code is less than 1 or 2 hours,
JIRAs can be disruptive enough to one's workflow and don't really
contribute to the "openness" of the project in a meaningful way.  This can
be a slippery slope, but I think we can all be judicious on when to use it.
b.  Small one-off bug fixes by new contributors to the project (ideally
this would be accompanied with a note pointing to the contributors guide).

4.  IIUC some of the angst on the thread from regular contributors was the
amount of duplication of effort it is to fill in details in multiple
places.  However, I do think transparent development is important and
migration away from our current tooling would be an expensive investment.

In my mind JIRA serves as a useful index of issues/feature backlog that is
tied into our development and release tooling.  I'm not sure it falls
within the "Apache Way" but it seems that as long as the necessary
discussion has already happened on list (possibly by way of reviewing
PRs/Google Docs and summarizing them back in the list) then minimal JIRAs
are sufficient (i.e. Title, Component, and a link to the discussed
artifact).

So for instance if we had an RFC process for a major feature I would
imagine the guideline would be something like:
   a.  Create JIRA for writing RFC (should be fairly minimal, I imagine
most content will actually go into a PR for the RFC).  This gives others
that might be interested in the area knowledge that one is being written
and an opportunity to collaborate ahead of time.
   b.  Send RFC for review and give a heads up to the mailing list.
   c.  Gain consensus on RFC
   d.  Create minimal JIRAs corresponding to work items to complete the RFC
and link back to it.

For less involved features I still think minimal JIRAs are still OK, if
other contributors/observers have particular concerns they can ask for more
details.

For me the key is expressing intent up-front to enable potential
collaboration, discussion and feedback before a lot of time is invested.
After the fact, understanding the rationale that went into decisions is
also useful.

Thoughts?  Anything other guidelines or norms we should try to socialize
around our development process?

Thanks,
Micah

[1] A note like: "This issue has been copied to JIRA item [ARROW-1].
If you wish to discuss or further track it please track it there.  For more
details please see the [contributors guide](
https://arrow.apache.org/docs/developers/contributing.html)"

[2] I used N above because I think it is best to treat the number as a
guideline rather than a rule.  But I would pick N=2 if we wanted to force
the rule.


Re: [Java] IPC stream write with re-stated dictionaries

2021-03-04 Thread Micah Kornfield
Hi Joris,
I do believe this is missing.  I believe we worked around this for testing
by directly writing dictionary batches to the stream [1].

Thanks,
Micah

[1]
https://github.com/apache/arrow/blob/master/java/vector/src/test/java/org/apache/arrow/vector/ipc/TestArrowReaderWriter.java#L614

On Thu, Mar 4, 2021 at 4:06 AM Joris Peeters 
wrote:

> Hello,
>
> For my use case I'm sending an Arrow IPC-stream from a server to a client,
> with some columns being dictionary-encoded. Dictionary-encoding happens on
> the fly, though, so the full dictionary isn't known yet at the beginning of
> the stream, but rather is computed for every batch, and DictionaryBatches
> are to be emitted prior to every RecordBatch.
>
> However, unless I am mistaken, this is not currently supported in the
> ArrowStreamWriter. The dictionary provider is passed in at construction
> time, the dicts are emitted once, and there is no hook for re-emitting
> these.
>
> I've locally hacked around this by basically copy-pasting ArrowStreamWriter
> and extending it with a   `public void writeBatch(DictionaryProvider
> provider)` method, that re-emits the dictionaries prior to emitting the
> record batches.
>
> However, I'd of course much prefer if the provided ArrowStreamWriter
> supported this. If people agree that it's missing (i.e. maybe I'm
> overlooking something obvious) and that it would be useful to have, then
> I'm happy to contribute it myself (not necessarily by using the
> aforementioned `writeBatch(provider)` approach, but seems reasonable).
>
> Cheers,
> -J
>


Re: [Flight Extension] Request for Comments

2021-03-04 Thread Nate Bauernfeind
Regarding the BarrageRecordBatch:

I have been concatenating them; it’s one batch with two sets of arrow
payloads. They don’t have separate metadata headers; the update is to be
applied atomically. I have only studied the Java Arrow Flight
implementation, and I believe it is usable maybe with some minor changes.
The piece of code in Flight that does the deserialization takes two
parallel lists/iterators, a `Buffer` list (these describe the length of a
section of the body payload) and a `FieldNode` list (these describe num
rows and null_count). Each field node is 2-3 buffers depending on schema
type. Buffers are allowed to have length of 0, to omit their payloads;
this, for example, is how you omit the validity buffer when null_count is
zero.

The proposed barrage payload keeps this structural pattern (list of buffer,
list of field node) with the following modifications:
- we only include field nodes / buffers for subscribed columns
- the first set of field nodes are for added rows; these may be omitted if
there are no added rows included in the update
- the second set of field nodes are for modified rows; we omit columns that
have no modifications included in the update

I believe the only thing that is missing is the ability to control the
field types to be deserialized (like a third list/iterator parallel to
field nodes and buffers).

Note that the BarrageRecordBatch.addedRowsIncluded,
BarrageFieldNode.addedRows, BarrageFieldNode.modifiedRows and
BarrageFieldNode.includedRows (all part of the flatbuffer metadata) are
intended to be used by code one layer of abstraction higher than that
actual wire-format parser. The parser doesn't really need them except to
know which columns to expect in the payload. Technically, we could encode
the field nodes / buffers as empty, too (but why be wasteful if this
information is already encoded?).

Regarding Browser Flight Support:

Was this company FactSet by chance? (I saw they are mentioned in the JS
thread that recently was bumped on the dev list.)

I looked at the ticket and wanted to comment how we are handling
bi-directional streams for our web-ui. We use ArrowFlight's concept of
Ticket to allow a client to create and identify temporary state (new tables
/ views / REPL sessions / etc). Any bidirectional stream we support also
has a server-streaming only variant with the ability for the client to
attach a Ticket to reference/identify that stream. The client may then send
a message, out-of-band, to the Ticket. They are sequenced by the client
(since gRPC doesn't guarantee ordered delivery) and delivered to the piece
of code controlling that server-stream. It does require that the server be
a bit stateful; but it works =).

On Thu, Mar 4, 2021 at 6:58 AM David Li  wrote:

> Re: the multiple batches, that makes sense. In that case, depending on how
> exactly the two record batches are laid out, I'd suggest considering a
> Union of Struct columns (where a Struct is essentially interchangeable with
> a record batch or table) - that would let you encode two distinct record
> batches inside the same physical batch. Or if the two batches have
> identical schemas, you could just concatenate them and include indices in
> your metadata.
>
> As for browser Flight support - there's an existing ticket:
> https://issues.apache.org/jira/browse/ARROW-9860
>
> I was sure I had seen another organization talking about browser support
> recently, but now I can't find them. I'll update here if I do figure it out.
>
> Best,
> David
>
> On Wed, Mar 3, 2021, at 21:00, Nate Bauernfeind wrote:
> > >  if each payload has two batches with different purposes [...]
> >
> > The purposes of the payloads are slightly different, however they are
> > intended to be applied atomically. If there are guarantees by the table
> > operation generating the updates then those guarantees are only valid on
> > each boundary of applying the update to your local state. In a sense, one
> > is relatively useless without the other. Record batches fit well in
> > map-reduce paradigms / algorithms, but what we have is stateful to
> > enable/support incremental updates. For example, sorting a flight of data
> > is best done map-reduce-style and requires one to re-sort the entire data
> > set when it changes. Our approach focuses on producing incremental
> updates
> > which are used to manipulate your existing client state using a much
> > smaller footprint (in both time and space). You can imagine, in the sort
> > scenario, if you evaluate the table after adding rows but before
> modifying
> > existing rows your table won’t be sorted between the two updates. The
> > client would then need to wait until it receives the pair of
> RecordBatches
> > anyways, so it seems more natural to deliver them together.
> >
> > > As a side note - is said UI browser-based? Another project recently was
> > planning to look at JavaScript support for Flight (using WebSockets as
> the
> > transport, IIRC) and it might make sense to join 

Re: [C++] Generating random Date64 & Timestamp arrays

2021-03-04 Thread Wes McKinney
Agreed, though keep in mind that rather than "some form of
reinterpretation at ArrayData level", you can use the Array::View
function, so it would look something like

auto ty = date64();
auto arr = *rag.Int64(...)->View(ty);

On Thu, Mar 4, 2021 at 3:47 AM Antoine Pitrou  wrote:
>
>
> Hi Ying,
>
> Yes, this approach sounds reasonable.  It would be useful at some point
> to add random date/timestamp generation to RandomArrayGenerator, though.
>
> Regards
>
> Antoine.
>
>
> Le 04/03/2021 à 04:36, Ying Zhou a écrit :
> > Hi,
> >
> > I’d like to generate random Date64 & Timestamp arrays with artificial max 
> > and mins. RandomArrayGenerator::ArrayOf in arrow/testing/random.h does not 
> > help. Currently the approach I’d like to take is using 
> > RandomArrayGenerator::Int64 to generate a random int64 array and then 
> > convert it to a date64/timestamp array through some form of 
> > reinterpretation at ArrayData level. Does that work? If so is it the best 
> > approach? Thanks!
> >
> > Ying
> >


Re: [Rust] Arrow in WebAssemby

2021-03-04 Thread Dominik Moritz
 I just remembered a bigger issue I ran into. I wanted to read from IPC but
I don’t have a file. I do have the data as [u8] already. The current API
incurs more copies than necessary (I think) and therefore the performance
of reading IPC is worse than in JS. (
https://issues.apache.org/jira/projects/ARROW/issues/ARROW-11696).

On Mar 1, 2021 at 23:29:18, Dominik Moritz  wrote:

> I am looking forward to speaking with you then. I’ll talk about the
> motivation.
>
> My experience with the library has been good. I ran into a few limitations
> that I filed Jiras for. I struggled a bit with some of the error handling
> and Arc types but that’s probably because I am now very experienced with
> Rust and wasm-bindgen doesn’t support all Rust features.
>
> I had some bigger issues with the DataFusion and Parquet libraries as they
> don’t support wasm right now (also filed Jiras for those).
>
> On Feb 27, 2021 at 11:14:27, Andrew Lamb  wrote:
>
>> Hi  Dominik,
>>
>> That sounds really interesting -- thank you for the offer
>>
>> I for one would enjoy seeing a demo and suggest that 10 minutes might be a
>> good length. The next call (details are also on the announcement [1]) is
>> scheduled for Wednesday March 10, 2021 at 09:00 PST / 12:00 EST / 17:00
>> UTC. The link is https://meet.google.com/ctp-yujs-aee
>>
>> I would personally be interested in hearing about your experience as a
>> user
>> of the Rust library (what was good, what was challenging, how can we
>> improve).
>>
>> Thanks!
>> Andrew
>>
>> [1]
>>
>> https://lists.apache.org/thread.html/raa72e1a8a3ad5dbb8366e9609a041eccca87f85545c3bc3d85170cfc%40%3Cdev.arrow.apache.org%3E
>>
>> On Fri, Feb 26, 2021 at 4:17 AM Fernando Herrera <
>> fernando.j.herr...@gmail.com> wrote:
>>
>> Hi Dominic,
>>
>>
>> I would be interested in a demo. Im curious to see your implementation and
>>
>> what advantages you have seen over javascript
>>
>>
>> thanks
>>
>> Fernando
>>
>>
>> On Thu, Feb 25, 2021 at 10:39 PM Dominik Moritz  wrote:
>>
>>
>> > Hello Rust Arrow Devs,
>>
>> >
>>
>> > I have been working on a wasm version of Arrow using the Rust library (
>>
>> > https://github.com/domoritz/arrow-wasm). I was wondering whether you
>>
>> would
>>
>> > be interested in having me demo it in the Arrow Rust sync call. If so,
>>
>> when
>>
>> > would be the next one and how much time would you want to allocate for
>>
>> it?
>>
>> > Also, would you be interested for me to dive into something in
>>
>> particular?
>>
>> >
>>
>> > Cheers,
>>
>> > Dominik
>>
>> >
>>
>>
>>


Re: [Flight Extension] Request for Comments

2021-03-04 Thread David Li
Re: the multiple batches, that makes sense. In that case, depending on how 
exactly the two record batches are laid out, I'd suggest considering a Union of 
Struct columns (where a Struct is essentially interchangeable with a record 
batch or table) - that would let you encode two distinct record batches inside 
the same physical batch. Or if the two batches have identical schemas, you 
could just concatenate them and include indices in your metadata.

As for browser Flight support - there's an existing ticket: 
https://issues.apache.org/jira/browse/ARROW-9860

I was sure I had seen another organization talking about browser support 
recently, but now I can't find them. I'll update here if I do figure it out.

Best,
David

On Wed, Mar 3, 2021, at 21:00, Nate Bauernfeind wrote:
> >  if each payload has two batches with different purposes [...]
> 
> The purposes of the payloads are slightly different, however they are
> intended to be applied atomically. If there are guarantees by the table
> operation generating the updates then those guarantees are only valid on
> each boundary of applying the update to your local state. In a sense, one
> is relatively useless without the other. Record batches fit well in
> map-reduce paradigms / algorithms, but what we have is stateful to
> enable/support incremental updates. For example, sorting a flight of data
> is best done map-reduce-style and requires one to re-sort the entire data
> set when it changes. Our approach focuses on producing incremental updates
> which are used to manipulate your existing client state using a much
> smaller footprint (in both time and space). You can imagine, in the sort
> scenario, if you evaluate the table after adding rows but before modifying
> existing rows your table won’t be sorted between the two updates. The
> client would then need to wait until it receives the pair of RecordBatches
> anyways, so it seems more natural to deliver them together.
> 
> > As a side note - is said UI browser-based? Another project recently was
> planning to look at JavaScript support for Flight (using WebSockets as the
> transport, IIRC) and it might make sense to join forces if that’s a path
> you were also going to pursue.
> 
> Yes, our UI runs in the browser, although table operations themselves run
> on the server to keep the browser lean and fast. That said, the browser
> isn’t the only target for the API we’re iterating on. We’re engaged in a
> rewrite to unify our “first-class” Java API for intra-engine (server,
> heavyweight client) usage and our cross-language (Javascript/C++/C#/Python)
> “open” API. Our existing customers use the engine to drive multi-process
> data applications, REPL/notebook experiences, and dashboards. We are
> preserving these capabilities as we make the engine available as open
> source software. One goal of the OSS effort is to produce a singular modern
> API that’s more interoperable with the data science and development
> community as a whole. In the interest of minimizing entry/egress points, we
> are migrating to gRPC for everything in addition to the data IPC layer, so
> not just the barrage/arrow-flight piece.
> 
> The point of all this is to make the Deephaven engine as accessible as
> possible for a broad user base, including developers using the API from
> their language of choice or scripts/code running co-located within an
> engine process. Our software can be used to explore or build applications
> and visualizations around static as well as real-time data (imagine joins,
> aggregations, sorts, filters, time-series joins, etc), perform table
> operations with code or with a few clicks in a GUI, or as a building-block
> in a multi-stage data pipeline. We think making ourselves as interoperable
> as possible with tools built on Arrow is an important part of attaining
> this goal.
> 
> That said, we have run into quite a few pain points migrating to gRPC, such
> as 1) no-client-side streaming is supported by any browser, 2) today,
> server-side streams require a proxy layer of some sort (such as envoy), 3)
> flatbuffer’s javascript/typescript support is a little weak, and I’m sure
> there are others that aren’t coming to mind at the moment. We have some
> interesting solutions to these problems, but, today, these issues are a
> decent chunk of our focus. That said, the UI is usable today by our
> enterprise clients, but it interacts with the server over websockets and a
> protocol that is heavily influenced by 10-years of existing proprietary
> java-to-java IPC (which are NOT friendly to being robust over intermittent
> failures). Today, we’re just heads-down going the gRPC route and hoping
> that eventually browsers get around to better support for some of this
> stuff (so, maybe one day a proxy isn’t required, etc). Some of our RPCs
> make most sense as bidirectional streams, but to support our web-ui we also
> have a server-streaming variant that we can pass data to “out-of-band” via
> a unary call 

[Java] IPC stream write with re-stated dictionaries

2021-03-04 Thread Joris Peeters
Hello,

For my use case I'm sending an Arrow IPC-stream from a server to a client,
with some columns being dictionary-encoded. Dictionary-encoding happens on
the fly, though, so the full dictionary isn't known yet at the beginning of
the stream, but rather is computed for every batch, and DictionaryBatches
are to be emitted prior to every RecordBatch.

However, unless I am mistaken, this is not currently supported in the
ArrowStreamWriter. The dictionary provider is passed in at construction
time, the dicts are emitted once, and there is no hook for re-emitting
these.

I've locally hacked around this by basically copy-pasting ArrowStreamWriter
and extending it with a   `public void writeBatch(DictionaryProvider
provider)` method, that re-emits the dictionaries prior to emitting the
record batches.

However, I'd of course much prefer if the provided ArrowStreamWriter
supported this. If people agree that it's missing (i.e. maybe I'm
overlooking something obvious) and that it would be useful to have, then
I'm happy to contribute it myself (not necessarily by using the
aforementioned `writeBatch(provider)` approach, but seems reasonable).

Cheers,
-J


[NIGHTLY] Arrow Build Report for Job nightly-2021-03-04-0

2021-03-04 Thread Crossbow


Arrow Build Report for Job nightly-2021-03-04-0

All tasks: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-04-0

Failed Tasks:
- conda-linux-gcc-py37-aarch64:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-04-0-drone-conda-linux-gcc-py37-aarch64
- conda-linux-gcc-py38-aarch64:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-04-0-drone-conda-linux-gcc-py38-aarch64
- test-build-vcpkg-win:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-04-0-github-test-build-vcpkg-win
- test-conda-cpp-valgrind:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-04-0-github-test-conda-cpp-valgrind
- test-conda-python-3.7-dask-latest:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-04-0-github-test-conda-python-3.7-dask-latest
- test-conda-python-3.7-turbodbc-latest:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-04-0-github-test-conda-python-3.7-turbodbc-latest
- test-conda-python-3.7-turbodbc-master:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-04-0-github-test-conda-python-3.7-turbodbc-master
- test-conda-python-3.8-jpype:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-04-0-github-test-conda-python-3.8-jpype
- test-r-versions:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-04-0-github-test-r-versions
- test-ubuntu-18.04-docs:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-04-0-azure-test-ubuntu-18.04-docs
- test-ubuntu-18.04-r-sanitizer:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-04-0-azure-test-ubuntu-18.04-r-sanitizer
- wheel-osx-high-sierra-cp36m:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-04-0-github-wheel-osx-high-sierra-cp36m
- wheel-osx-high-sierra-cp37m:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-04-0-github-wheel-osx-high-sierra-cp37m
- wheel-osx-high-sierra-cp38:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-04-0-github-wheel-osx-high-sierra-cp38
- wheel-osx-high-sierra-cp39:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-04-0-github-wheel-osx-high-sierra-cp39
- wheel-osx-mavericks-cp36m:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-04-0-github-wheel-osx-mavericks-cp36m
- wheel-osx-mavericks-cp37m:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-04-0-github-wheel-osx-mavericks-cp37m
- wheel-osx-mavericks-cp38:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-04-0-github-wheel-osx-mavericks-cp38
- wheel-osx-mavericks-cp39:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-04-0-github-wheel-osx-mavericks-cp39

Succeeded Tasks:
- centos-7-amd64:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-04-0-github-centos-7-amd64
- centos-8-amd64:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-04-0-github-centos-8-amd64
- conda-clean:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-04-0-azure-conda-clean
- conda-linux-gcc-py36-aarch64:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-04-0-drone-conda-linux-gcc-py36-aarch64
- conda-linux-gcc-py36-cpu-r36:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-04-0-azure-conda-linux-gcc-py36-cpu-r36
- conda-linux-gcc-py36-cuda:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-04-0-azure-conda-linux-gcc-py36-cuda
- conda-linux-gcc-py37-cpu-r40:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-04-0-azure-conda-linux-gcc-py37-cpu-r40
- conda-linux-gcc-py37-cuda:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-04-0-azure-conda-linux-gcc-py37-cuda
- conda-linux-gcc-py38-cpu:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-04-0-azure-conda-linux-gcc-py38-cpu
- conda-linux-gcc-py38-cuda:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-04-0-azure-conda-linux-gcc-py38-cuda
- conda-linux-gcc-py39-aarch64:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-04-0-drone-conda-linux-gcc-py39-aarch64
- conda-linux-gcc-py39-cpu:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-04-0-azure-conda-linux-gcc-py39-cpu
- conda-linux-gcc-py39-cuda:
  URL: 

Re: [C++] Generating random Date64 & Timestamp arrays

2021-03-04 Thread Antoine Pitrou



Hi Ying,

Yes, this approach sounds reasonable.  It would be useful at some point 
to add random date/timestamp generation to RandomArrayGenerator, though.


Regards

Antoine.


Le 04/03/2021 à 04:36, Ying Zhou a écrit :

Hi,

I’d like to generate random Date64 & Timestamp arrays with artificial max and 
mins. RandomArrayGenerator::ArrayOf in arrow/testing/random.h does not help. 
Currently the approach I’d like to take is using RandomArrayGenerator::Int64 to 
generate a random int64 array and then convert it to a date64/timestamp array 
through some form of reinterpretation at ArrayData level. Does that work? If so is 
it the best approach? Thanks!

Ying