Re: [C++] Dataset API simplification

2021-03-26 Thread Weston Pace
@david - I think all the readers (CSV, IPC, & parquet) will eventually
have support for some intra-file parallelism.  You're also right I
think that there is some global consideration of max concurrent
operations.  For example, if you readahead 10 files and 10 blocks in
each file but only have 16 cores then how will you divy up the work?
In some graphs it doesn't matter (e.g. building a table in memory).
In other graphs it might.  A query graph with reduce nodes (or maybe
the right term is "blocking nodes") might be collecting the first
batch from all files.  In that case a breadth-first prioritization
might be better than depth-first. Off the cuff my answer would be that
backpressure applies the appropriate prioritization.  Although I have
a hunch that long term the scanning node will need to be more formally
integrated with the query engine (i.e. read file and read dataset are
two different nodes).

@Wes Yes, I have been aiming to keep all tasks for a block on a single
core (as part of the same thread task even).  I haven't gotten to the
work stealing part but that can be added later.


On Fri, Mar 26, 2021 at 7:22 AM Wes McKinney  wrote:
>
> I agree with making the decomposition of a fragment into tasks an
> internal detail of the scan implementation. It seems that we want to
> be moving toward a world of consuming a stream of
> Future> and not pushing the complexity of
> concurrency management (necessarily) onto the consumer. The nature of
> multithreading/scheduling would be pushed higher in the stack -- for
> example, you might decide that a fragment and all its child parallel /
> nested tasks could go into the task queue of a single CPU core, where
> idle CPUs are able to steal work from that queue if they want.
>
> On Fri, Mar 26, 2021 at 11:32 AM David Li  wrote:
> >
> > I agree we should present a simplified interface, and then also make 
> > ScanTask internal, but I think that is orthogonal to whether a fragment 
> > produces one or multiple scan tasks.
> >
> > At first, my worry with having (Parquet)ScanTask handle concurrency itself 
> > was that it does need to coordinate with the overall scanner, right? If you 
> > have two files with 100 row groups each, that's much different than 100 
> > files with two row groups each. With a scan task per row group, a single 
> > rea naturally handles both cases, but with a single scan task per file, you 
> > have to juggle the exact amount of readahead on an inter- and intra-file 
> > level.
> >
> > That said, there is an issue for making readahead operate by amount of 
> > memory used instead of number of files/tasks which would presumably handle 
> > that just as well. And right now, one (Parquet)ScanTask-per-row group does 
> > lead to some implementation nuisance elsewhere (since all scan tasks for a 
> > file have to share the same Parquet reader and pre-buffering task).
> >
> > Also I realize my example is poor, because you do actually want to separate 
> > intra- and inter-fragment concurrency - you want to at least be buffering 
> > the next files (without decoding them) while decoding the current file. And 
> > the proposed model would make it easier to support a consumer that can 
> > process batches out of order while limiting memory usage (just limit the 
> > inter-scan-task readahead).
> >
> > So on balance I'm in favor of this.
> >
> > I'll also note that there could be other Fragments which may naturally have 
> > intra-fragment parallelism, if the concern is mostly that ParquetScanTask 
> > is a bit of an outlier. For instance, a hypothetical FlightFragment 
> > wrapping a FlightInfo struct could generate multiple scan tasks, one per 
> > FlightEndpoint in the FlightInfo.
> >
> > Best,
> > David
> >
> > On Thu, Mar 25, 2021, at 19:48, Weston Pace wrote:
> > > This is a bit of a follow-up on
> > > https://issues.apache.org/jira/browse/ARROW-11782 and also a bit of a
> > > consequence of my work on
> > > https://issues.apache.org/jira/browse/ARROW-7001 (nested scan
> > > parallelism).
> > >
> > > I think the current dataset interface should be simplified.
> > > Currently, we have Dataset ->* Fragment ->* ScanTask ->* RecordBatch
> > > with the components being...
> > >
> > > Dataset - Binds together a format & fragment discovery
> > > Fragment - Something that maps to an input stream (usually a file)
> > > ScanTask - Created by a format, turns an input stream into record batches.
> > > RecordBatch - I hope I don't need to define this one :)
> > >
> > > The change I'm recommending (and starting to implement in ARROW-7001)
> > > is to change the cardinality of Fragment ->* ScanTask to Fragment ->
> > > ScanTask (i.e. one scan task per fragment instead of many).
> > >
> > > The IPC format and CSV format already do this (one scan task per
> > > fragment).  The only exception is Parquet which maps "scan task" to
> > > "row group" (keeping in mind row groups may correspond to multiple
> > > batches).  However, that feels like it is a detail 

Re: sparse data array

2021-03-26 Thread Micah Kornfield
I made a proposal a while ago that covers a form of RLE encoding [1].  I
haven't had time to work on it, since it is a substantial effort to
implement.

I wouldn't expect an intern to be able to complete the work necessary to
get this merged over the course of a normal 3 month internship.

[1] https://github.com/apache/arrow/pull/4815/files

On Thu, Mar 25, 2021 at 2:17 AM Jorge Cardoso Leitão <
jorgecarlei...@gmail.com> wrote:

> Would it be an option to use a StructArray for that? One array with the
> values, and one with the repetitions:
>
> Int32([1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 1, 2]) ->
>
> StructArray([
> "values": Int32([1, 2, 3, 1, 2]),
> "repetitions": UInt32([1, 3, 5, 1, 1]),
> ])
>
> It does not have the same API, but I think that the physical operations
> would be different, anyways: ("array + 2" would only operate on "values").
> I think that a small struct / object with some operator overloading would
> address this, and writing something on the metadata would allow others to
> consume it, a-la extension type?
>
> On a related note, such encoding would address DataFusion's issue of
> representing scalars / constant arrays: a constant array would be
> represented as a repetition. Currently we just unpack (i.e. allocate) a
> constant array when we want to transfer through a RecordBatch.
>
> Best,
> Jorge
>
>
>
>
> On Thu, Mar 25, 2021, 10:03 Kirill Lykov  wrote:
>
> > Thanks for the answer.
> > I asked about it because we need it and I was about writing a summer
> intern
> > proposal for a student to work on it.
> > Looks like it could work fine.
> >
> > On Wed, Mar 24, 2021 at 3:49 PM Wes McKinney 
> wrote:
> >
> > > The SparseTensor stuff is something else entirely (that's matrices
> > > where the entries are mostly 0)
> > >
> > > There isn't anything to help you right now aside from dictionary
> > > encoding — if your dictionary has 256 elements or less, you can use
> > > uint8 index type and thus have 1 byte per value. We've discussed
> > > implementing RLE in the project and so if we do that in the future
> > > then a random access data structure could be built on top of RLE (in
> > > principle)
> > >
> > > On Wed, Mar 24, 2021 at 8:53 AM Niranda Perera <
> niranda.per...@gmail.com
> > >
> > > wrote:
> > > >
> > > > Hi Lykov,
> > > >
> > > > I believe there's an arrow sparse tensor abstraction.
> > > >
> > > > On Wed, Mar 24, 2021, 05:05 Kirill Lykov 
> > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I wonder if there is an existing way to store floats/ints with many
> > > > > repetitions in some container (not sure about terminology).
> > > > > For example, I might have data like A=[1, 2, 2, 2, 3, 3, 3, 3, 3,
> 3,
> > > 1, 2]
> > > > > and i would like to store only B=[1, 2, 3, 1, 2] but from user
> > > > > perspective it behaves like container A. I know I can use
> dictionary
> > > but as
> > > > > far I understand it will store internally indices of the chosen
> > > elements so
> > > > > it makes sense more for binary data or structures.
> > > > >
> > > > > --
> > > > > Best regards,
> > > > > Kirill Lykov
> > > > >
> > >
> >
> >
> > --
> > Best regards,
> > Kirill Lykov
> >
>


Re: [Java] Source control of generated flatbuffers code

2021-03-26 Thread bobtins
OK, originally this was part of 
https://issues.apache.org/jira/browse/ARROW-12006 and I was going to just add 
some doc on flatc, but I will make this a new bug because it's a little bigger: 
https://issues.apache.org/jira/browse/ARROW-12111

On 2021/03/23 23:40:50, Micah Kornfield  wrote: 
> >
> > I have a concern, though. Four other languages (Java would be five) check
> > in the generated flatbuffers code, and it appears (based on a quick scan of
> > Git logs) that this is done manually. Is there a danger that the binary
> > format could change, but some language might get forgotten, and thus be
> > working with the old format?
> 
> The format changes relatively slowly and any changes at this point should
> be backwards compatible.
> 
> 
> 
> > Or is there enough interop testing that the problem would get caught right
> > away?
> 
> In most cases I would expect integration tests to catch these types of
> error.
> 
> On Tue, Mar 23, 2021 at 4:26 PM bobtins  wrote:
> 
> > I'm happy to check in the generated Java source. I would also update the
> > Java build info to reflect this change and document how to regenerate the
> > source as needed.
> >
> > I have a concern, though. Four other languages (Java would be five) check
> > in the generated flatbuffers code, and it appears (based on a quick scan of
> > Git logs) that this is done manually. Is there a danger that the binary
> > format could change, but some language might get forgotten, and thus be
> > working with the old format? Or is there enough interop testing that the
> > problem would get caught right away?
> >
> > I'm new to the project and don't know how big of an issue this is in
> > practice. Thanks for any enlightenment.
> >
> > On 2021/03/23 07:39:16, Micah Kornfield  wrote:
> > > I think checking in the java files is fine and probably better then
> > relying
> > > on a third party package.  We should make sure there are instructions on
> > > how to regenerate them along with the PR
> > >
> > > On Monday, March 22, 2021, Antoine Pitrou  wrote:
> > >
> > > >
> > > > Le 22/03/2021 à 20:17, bobtins a écrit :
> > > >
> > > >> TL;DR: The Java implementation doesn't have generated flatbuffers code
> > > >> under source control, and the code generation depends on an
> > > >> unofficially-maintained Maven artifact. Other language
> > implementations do
> > > >> check in the generated code; would it make sense for this to be done
> > for
> > > >> Java as well?
> > > >>
> > > >> I'm currently focusing on Java development; I started building on
> > Windows
> > > >> and got a failure under java/format, because I couldn't download the
> > > >> flatbuffers compiler (flatc) to generate Java source.
> > > >> The artifact for the flatc binary is provided "unofficially" (not by
> > the
> > > >> flatbuffers project), and there was no Windows version, so I had to
> > jump
> > > >> through hoops to build it and proceed.
> > > >>
> > > >
> > > > While this does not answer the more general question of checking in the
> > > > generated Flatbuffers code (which sounds like a good idea, but I'm not
> > a
> > > > Java developer), note that you could workaround this by installing the
> > > > Conda-provided flatbuffers package:
> > > >
> > > >   $ conda install flatbuffers
> > > >
> > > > which should get you the `flatc` compiler, even on Windows.
> > > > (see https://docs.conda.io/projects/conda/en/latest/ for installing
> > conda)
> > > >
> > > > You may also try other package managers such as Chocolatey:
> > > >
> > > >   https://chocolatey.org/packages/flatc
> > > >
> > > > Regards
> > > >
> > > > Antoine.
> > > >
> > >
> >
> 


Re: [C++] Dataset API simplification

2021-03-26 Thread Wes McKinney
I agree with making the decomposition of a fragment into tasks an
internal detail of the scan implementation. It seems that we want to
be moving toward a world of consuming a stream of
Future> and not pushing the complexity of
concurrency management (necessarily) onto the consumer. The nature of
multithreading/scheduling would be pushed higher in the stack -- for
example, you might decide that a fragment and all its child parallel /
nested tasks could go into the task queue of a single CPU core, where
idle CPUs are able to steal work from that queue if they want.

On Fri, Mar 26, 2021 at 11:32 AM David Li  wrote:
>
> I agree we should present a simplified interface, and then also make ScanTask 
> internal, but I think that is orthogonal to whether a fragment produces one 
> or multiple scan tasks.
>
> At first, my worry with having (Parquet)ScanTask handle concurrency itself 
> was that it does need to coordinate with the overall scanner, right? If you 
> have two files with 100 row groups each, that's much different than 100 files 
> with two row groups each. With a scan task per row group, a single rea 
> naturally handles both cases, but with a single scan task per file, you have 
> to juggle the exact amount of readahead on an inter- and intra-file level.
>
> That said, there is an issue for making readahead operate by amount of memory 
> used instead of number of files/tasks which would presumably handle that just 
> as well. And right now, one (Parquet)ScanTask-per-row group does lead to some 
> implementation nuisance elsewhere (since all scan tasks for a file have to 
> share the same Parquet reader and pre-buffering task).
>
> Also I realize my example is poor, because you do actually want to separate 
> intra- and inter-fragment concurrency - you want to at least be buffering the 
> next files (without decoding them) while decoding the current file. And the 
> proposed model would make it easier to support a consumer that can process 
> batches out of order while limiting memory usage (just limit the 
> inter-scan-task readahead).
>
> So on balance I'm in favor of this.
>
> I'll also note that there could be other Fragments which may naturally have 
> intra-fragment parallelism, if the concern is mostly that ParquetScanTask is 
> a bit of an outlier. For instance, a hypothetical FlightFragment wrapping a 
> FlightInfo struct could generate multiple scan tasks, one per FlightEndpoint 
> in the FlightInfo.
>
> Best,
> David
>
> On Thu, Mar 25, 2021, at 19:48, Weston Pace wrote:
> > This is a bit of a follow-up on
> > https://issues.apache.org/jira/browse/ARROW-11782 and also a bit of a
> > consequence of my work on
> > https://issues.apache.org/jira/browse/ARROW-7001 (nested scan
> > parallelism).
> >
> > I think the current dataset interface should be simplified.
> > Currently, we have Dataset ->* Fragment ->* ScanTask ->* RecordBatch
> > with the components being...
> >
> > Dataset - Binds together a format & fragment discovery
> > Fragment - Something that maps to an input stream (usually a file)
> > ScanTask - Created by a format, turns an input stream into record batches.
> > RecordBatch - I hope I don't need to define this one :)
> >
> > The change I'm recommending (and starting to implement in ARROW-7001)
> > is to change the cardinality of Fragment ->* ScanTask to Fragment ->
> > ScanTask (i.e. one scan task per fragment instead of many).
> >
> > The IPC format and CSV format already do this (one scan task per
> > fragment).  The only exception is Parquet which maps "scan task" to
> > "row group" (keeping in mind row groups may correspond to multiple
> > batches).  However, that feels like it is a detail that can be
> > encapsulated in ParquetScanTask (I can implement this in
> > https://issues.apache.org/jira/browse/ARROW-11843).  In other words...
> >
> > The scanner is responsible for managing inter-fragment parallelism
> > (how many files to read at once, pipelining file reads, etc.)
> > The scan task is responsible for managing intra-fragment parallelism
> > (how many row groups to read at once, whether to scan columns in
> > parallel, etc)
> >
> > Then, scan task can be made fully internal (ala ARROW-11782) and the
> > primary external interface would be a record batch iterator.
> >
> > This doesn't just simplify the external interface by removing a type,
> > it actually changes the workflow requirements as well (admittedly,
> > some of this is an inevitable benefit of ARROW-7001 and not directly
> > related to removing scan task).  Currently, if you want maximum
> > performance from a dataset scan, you need to run the scan tasks in
> > parallel.  For example...
> >
> > for scan_task in scanner.scan():
> >   for record_batch in scan_task:
> > # Do something, but do it very fast or do it on another thread
> > because every ms
> > # you spend here is a ms you could be doing I/O
> >
> > With the simplification it should simply be...
> >
> > for record_batch in scanner.scan():
> 

Re: [C++] Dataset API simplification

2021-03-26 Thread David Li
I agree we should present a simplified interface, and then also make ScanTask 
internal, but I think that is orthogonal to whether a fragment produces one or 
multiple scan tasks. 

At first, my worry with having (Parquet)ScanTask handle concurrency itself was 
that it does need to coordinate with the overall scanner, right? If you have 
two files with 100 row groups each, that's much different than 100 files with 
two row groups each. With a scan task per row group, a single rea naturally 
handles both cases, but with a single scan task per file, you have to juggle 
the exact amount of readahead on an inter- and intra-file level. 

That said, there is an issue for making readahead operate by amount of memory 
used instead of number of files/tasks which would presumably handle that just 
as well. And right now, one (Parquet)ScanTask-per-row group does lead to some 
implementation nuisance elsewhere (since all scan tasks for a file have to 
share the same Parquet reader and pre-buffering task).

Also I realize my example is poor, because you do actually want to separate 
intra- and inter-fragment concurrency - you want to at least be buffering the 
next files (without decoding them) while decoding the current file. And the 
proposed model would make it easier to support a consumer that can process 
batches out of order while limiting memory usage (just limit the 
inter-scan-task readahead).

So on balance I'm in favor of this.

I'll also note that there could be other Fragments which may naturally have 
intra-fragment parallelism, if the concern is mostly that ParquetScanTask is a 
bit of an outlier. For instance, a hypothetical FlightFragment wrapping a 
FlightInfo struct could generate multiple scan tasks, one per FlightEndpoint in 
the FlightInfo.

Best,
David

On Thu, Mar 25, 2021, at 19:48, Weston Pace wrote:
> This is a bit of a follow-up on
> https://issues.apache.org/jira/browse/ARROW-11782 and also a bit of a
> consequence of my work on
> https://issues.apache.org/jira/browse/ARROW-7001 (nested scan
> parallelism).
> 
> I think the current dataset interface should be simplified.
> Currently, we have Dataset ->* Fragment ->* ScanTask ->* RecordBatch
> with the components being...
> 
> Dataset - Binds together a format & fragment discovery
> Fragment - Something that maps to an input stream (usually a file)
> ScanTask - Created by a format, turns an input stream into record batches.
> RecordBatch - I hope I don't need to define this one :)
> 
> The change I'm recommending (and starting to implement in ARROW-7001)
> is to change the cardinality of Fragment ->* ScanTask to Fragment ->
> ScanTask (i.e. one scan task per fragment instead of many).
> 
> The IPC format and CSV format already do this (one scan task per
> fragment).  The only exception is Parquet which maps "scan task" to
> "row group" (keeping in mind row groups may correspond to multiple
> batches).  However, that feels like it is a detail that can be
> encapsulated in ParquetScanTask (I can implement this in
> https://issues.apache.org/jira/browse/ARROW-11843).  In other words...
> 
> The scanner is responsible for managing inter-fragment parallelism
> (how many files to read at once, pipelining file reads, etc.)
> The scan task is responsible for managing intra-fragment parallelism
> (how many row groups to read at once, whether to scan columns in
> parallel, etc)
> 
> Then, scan task can be made fully internal (ala ARROW-11782) and the
> primary external interface would be a record batch iterator.
> 
> This doesn't just simplify the external interface by removing a type,
> it actually changes the workflow requirements as well (admittedly,
> some of this is an inevitable benefit of ARROW-7001 and not directly
> related to removing scan task).  Currently, if you want maximum
> performance from a dataset scan, you need to run the scan tasks in
> parallel.  For example...
> 
> for scan_task in scanner.scan():
>   for record_batch in scan_task:
> # Do something, but do it very fast or do it on another thread
> because every ms
> # you spend here is a ms you could be doing I/O
> 
> With the simplification it should simply be...
> 
> for record_batch in scanner.scan():
>   # While you are processing this record batch the scanner is going to 
> continue
>   # running on a different thread.  It will be queing up a backlog of
> batches for you
>   # to process.  As long as you don't take "too long" you should be
> able to keep up.
>   # In other words, as long as your processing time here + the time it took to
>   # decode and prepare the batch is less than the time it takes to
> read the batch
>   # you will never have a break in I/O.
> 
> -Weston
> 


[NIGHTLY] Arrow Build Report for Job nightly-2021-03-26-0

2021-03-26 Thread Crossbow


Arrow Build Report for Job nightly-2021-03-26-0

All tasks: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-26-0

Failed Tasks:
- conda-linux-gcc-py37-aarch64:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-26-0-drone-conda-linux-gcc-py37-aarch64
- conda-linux-gcc-py39-aarch64:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-26-0-drone-conda-linux-gcc-py39-aarch64
- gandiva-jar-ubuntu:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-26-0-github-gandiva-jar-ubuntu
- test-conda-cpp-valgrind:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-26-0-github-test-conda-cpp-valgrind
- test-conda-python-3.7-turbodbc-latest:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-26-0-github-test-conda-python-3.7-turbodbc-latest
- test-conda-python-3.7-turbodbc-master:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-26-0-github-test-conda-python-3.7-turbodbc-master
- test-conda-python-3.8-jpype:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-26-0-github-test-conda-python-3.8-jpype
- test-r-linux-as-cran:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-26-0-github-test-r-linux-as-cran
- test-ubuntu-16.04-cpp:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-26-0-github-test-ubuntu-16.04-cpp
- test-ubuntu-18.04-docs:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-26-0-azure-test-ubuntu-18.04-docs
- test-ubuntu-18.04-r-sanitizer:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-26-0-azure-test-ubuntu-18.04-r-sanitizer
- wheel-osx-high-sierra-cp36m:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-26-0-github-wheel-osx-high-sierra-cp36m
- wheel-osx-high-sierra-cp37m:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-26-0-github-wheel-osx-high-sierra-cp37m
- wheel-osx-high-sierra-cp38:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-26-0-github-wheel-osx-high-sierra-cp38
- wheel-osx-high-sierra-cp39:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-26-0-github-wheel-osx-high-sierra-cp39
- wheel-osx-mavericks-cp36m:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-26-0-github-wheel-osx-mavericks-cp36m
- wheel-osx-mavericks-cp37m:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-26-0-github-wheel-osx-mavericks-cp37m
- wheel-osx-mavericks-cp38:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-26-0-github-wheel-osx-mavericks-cp38
- wheel-osx-mavericks-cp39:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-26-0-github-wheel-osx-mavericks-cp39

Pending Tasks:
- centos-8-aarch64:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-26-0-travis-centos-8-aarch64
- test-r-rstudio-r-base-3.6-opensuse15:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-26-0-azure-test-r-rstudio-r-base-3.6-opensuse15
- test-r-versions:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-26-0-github-test-r-versions
- ubuntu-focal-arm64:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-26-0-travis-ubuntu-focal-arm64
- wheel-manylinux2014-cp37m-arm64:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-26-0-travis-wheel-manylinux2014-cp37m-arm64

Succeeded Tasks:
- centos-7-amd64:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-26-0-github-centos-7-amd64
- centos-8-amd64:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-26-0-github-centos-8-amd64
- conda-clean:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-26-0-azure-conda-clean
- conda-linux-gcc-py36-aarch64:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-26-0-drone-conda-linux-gcc-py36-aarch64
- conda-linux-gcc-py36-cpu-r36:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-26-0-azure-conda-linux-gcc-py36-cpu-r36
- conda-linux-gcc-py36-cuda:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-26-0-azure-conda-linux-gcc-py36-cuda
- conda-linux-gcc-py37-cpu-r40:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-26-0-azure-conda-linux-gcc-py37-cpu-r40
- conda-linux-gcc-py37-cuda:
  URL: