Long title on github page

2021-05-15 Thread Dominik Moritz
Super minor issue but could someone make the description on GitHub shorter?
🙏



GitHub puts the description into the title of the page and makes it hard to
find it in URL autocomplete.


Re: Python Flight example with query command

2021-05-15 Thread David Li
Hey Tanveer,

Something like this should work:

$ python examples/flight/client.py put localhost:1234 foo.csv
File Name: foo.csv
Table rows= 1
   a  b
0  1  2
$ python examples/flight/client.py get localhost:1234 -p foo.csv
Ticket: 

   a  b
0  1  2

Note that Flight itself does not implement SQL query functionality or
anything of the sort. It is a common misconception, I think
exacerbated since Flight is often discussed in the context of products
like Dremio which implement such functionality on top of Flight. But
really, Flight itself is just a 'dumb pipe' for Arrow data for
building such systems.

You may be interested in the FlightSQL proposal which defines at least
an interface for database systems to make themselves available over
Flight and for clients to generically query them. However that
proposal has been stalled for a while.

Best,
David

On 2021/05/15 12:15:26, Tanveer Ahmad - EWI  wrote: 
> Hi all,
> 
> 
> For Python Flight 
> example, 
> I can start server (python server.py -> Serving on grpc+tcp://localhost:5005) 
> and client can put (python client.py put localhost:5005 mycsv.csv) and also 
> get (python client.py get localhost:5005 -p mycsv.csv) command retrieves data 
> with -p (path) option.
> 
> 
> I am wondering how to query (like python client.py get localhost:5005 -c 
> "select * from ? limit 10") using -c, command this data , which I had already 
> put on server through put command.
> 
> 
> Thanks.
> 
> Regards,
> Tanveer Ahmad
> 
> 


Python Flight example with query command

2021-05-15 Thread Tanveer Ahmad - EWI
Hi all,


For Python Flight 
example, I 
can start server (python server.py -> Serving on grpc+tcp://localhost:5005) and 
client can put (python client.py put localhost:5005 mycsv.csv) and also get 
(python client.py get localhost:5005 -p mycsv.csv) command retrieves data with 
-p (path) option.


I am wondering how to query (like python client.py get localhost:5005 -c 
"select * from ? limit 10") using -c, command this data , which I had already 
put on server through put command.


Thanks.

Regards,
Tanveer Ahmad



Re: [C++][DISCUSS] Implementing interpreted (non-compiled) tests for compute functions

2021-05-15 Thread Antoine Pitrou



I think people who think this would be beneficial should try to devise a 
text format to represent compute test data.  As Eduardo pointed out, 
there are various complications that need to be catered for.


To me, it's not obvious that building the necessary infrastructure in 
C++ to ingest that text format will be more pleasant than our current 
way of writing tests.  As a data point, the JSON integration code in C++ 
is really annoying to maintain.


Regards

Antoine.


Le 15/05/2021 à 00:03, Wes McKinney a écrit :

In C++, we have the "ArrayFromJSON" function which is an even simpler
way of specifying input data compared with the integration tests.
That's one possible starting point.

The "interpreted tests" could be all specified and driven by minimal
dependency Python code, as one possible way to approach things.

On Fri, May 14, 2021 at 1:57 PM Jorge Cardoso LeitĂŁo
 wrote:


Hi,

(this problem also exists in Rust, btw)

Couldn't we use something like we do for our integration tests? Create a
separate binary that would allow us to call e.g.

test-compute --method equal --json-file  --arg "column1" --arg
"column2" --expected "column3"

(or simply pass the input via stdin)

and then use Python to call the binary?

The advantage I see here is that we would compile the binary with flags to
disable unnecessary code, use debug, etc, thereby reducing compile time if
the kernel needs
to be changed.

IMO our "json integration" format is a reliable way of passing data across,
it is very easy to read and write, and all our implementations can already
read it for integration tests.

wrt to the "cross implementation", the equality operation seems a natural
candidate for across implementations checks, as that one has important
implications in all our integration tests. filter, take, slice, boolean ops
may also be easy to agree upon. "add" and the like are a bit more difficult
due to how overflow should be handled (abort vs saturate vs None), but
nothing that we can't take. ^_^

Best,
Jorge

On Fri, May 14, 2021 at 8:25 PM David Li  wrote:


I think even if it's not (easily) generalizable across languages, it'd
still be a win for C++ (and hopefully languages that bind to
C++). Also, I don't think they're meant to completely replace
language-specific tests, but rather complement them, and make it
easier to add and maintain tests in the overwhelmingly common case.

I do feel it's somewhat painful to write these kinds of tests in C++,
largely because of the iteration time and the difficulty of repeating
tests across various configurations. I also think this could be an
opportunity to leverage things like Hypothesis/property-based testing
or perhaps fuzzing to make the kernels even more robust.

-David

On 2021/05/14 18:09:45, Eduardo Ponce  wrote:

Another aspect to keep in mind is that some tests require internal

options

to be changed before executing the compute functions (e.g., check

overflow,

allow NaN comparisons, change validity bits, etc.). Also, there are tests
that take randomized inputs and others make use of the min/max values for
each specific data type. Certainly, these details can be generalized

across

languages/testing frameworks but not without careful treatment.

Moreover, each language implementation still needs to test
language-specific or internal functions, so having a meta test framework
will not necessarily get rid of language-specific tests.

~Eduardo

On Fri, May 14, 2021 at 1:56 PM Weston Pace 

wrote:



With that in mind it seems the somewhat recurring discussion on coming
up with a language independent standard for logical query plans
(


https://lists.apache.org/thread.html/rfab15e09c97a8fb961d6c5db8b2093824c58d11a51981a40f40cc2c0%40%3Cdev.arrow.apache.org%3E

)
would be relevant.  Each test case would then be a triplet of (Input
Dataframe, Logical Plan, Output Dataframe).  So perhaps tackling this
effort would be to make progress on both fronts.

On Fri, May 14, 2021 at 7:39 AM Julian Hyde 
wrote:


Do these any of these compute functions have analogs in other

implementations of Arrow (e.g. Rust)?


I believe that as much as possible of Arrow’s compute functionality

should be cross-language. Perhaps there are language-specific

differences

in how functions are invoked, but the basic functionality is the same.


If people buy into that premise, then a single suite of tests is a

powerful way to make that happen. The tests can be written in a

high-level

language and can generate tests in each implementation language. (For

these

purposes, the “high-level language” could be a special text format,

could

be a data language such as JSON, or could be a programming language

such as

Python; it doesn’t matter much.)


For example,

   assertThatCall(“foo(1, 2)”, returns(“3”))

might actually call foo with arguments 1 and 2, or it might generate

a

C++ or Rust test that does the same.



Julian



On May 14, 2021, at 8:45 AM, Antoine Pitrou 

wrote:



Le 14/05/2021 Ă  15:30, Wes 

[NIGHTLY] Arrow Build Report for Job nightly-2021-05-15-0

2021-05-15 Thread Crossbow


Arrow Build Report for Job nightly-2021-05-15-0

All tasks: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-15-0

Failed Tasks:
- conda-osx-clang-py38:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-15-0-azure-conda-osx-clang-py38
- conda-win-vs2017-py36-r36:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-15-0-azure-conda-win-vs2017-py36-r36
- conda-win-vs2017-py37-r40:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-15-0-azure-conda-win-vs2017-py37-r40
- conda-win-vs2017-py38:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-15-0-azure-conda-win-vs2017-py38
- conda-win-vs2017-py39:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-15-0-azure-conda-win-vs2017-py39
- test-conda-python-3.7-turbodbc-latest:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-15-0-github-test-conda-python-3.7-turbodbc-latest
- test-conda-python-3.7-turbodbc-master:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-15-0-github-test-conda-python-3.7-turbodbc-master
- test-conda-python-3.8-spark-master:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-15-0-github-test-conda-python-3.8-spark-master
- test-r-devdocs:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-15-0-github-test-r-devdocs
- test-r-linux-valgrind:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-15-0-azure-test-r-linux-valgrind
- test-r-rstudio-r-base-3.6-opensuse42:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-15-0-azure-test-r-rstudio-r-base-3.6-opensuse42
- test-ubuntu-20.10-docs:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-15-0-azure-test-ubuntu-20.10-docs

Succeeded Tasks:
- centos-7-amd64:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-15-0-github-centos-7-amd64
- centos-8-amd64:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-15-0-github-centos-8-amd64
- centos-8-arm64:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-15-0-travis-centos-8-arm64
- conda-clean:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-15-0-azure-conda-clean
- conda-linux-gcc-py36-arm64:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-15-0-azure-conda-linux-gcc-py36-arm64
- conda-linux-gcc-py36-cpu-r36:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-15-0-azure-conda-linux-gcc-py36-cpu-r36
- conda-linux-gcc-py36-cuda:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-15-0-azure-conda-linux-gcc-py36-cuda
- conda-linux-gcc-py37-arm64:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-15-0-azure-conda-linux-gcc-py37-arm64
- conda-linux-gcc-py37-cpu-r40:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-15-0-azure-conda-linux-gcc-py37-cpu-r40
- conda-linux-gcc-py37-cuda:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-15-0-azure-conda-linux-gcc-py37-cuda
- conda-linux-gcc-py38-arm64:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-15-0-azure-conda-linux-gcc-py38-arm64
- conda-linux-gcc-py38-cpu:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-15-0-azure-conda-linux-gcc-py38-cpu
- conda-linux-gcc-py38-cuda:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-15-0-azure-conda-linux-gcc-py38-cuda
- conda-linux-gcc-py39-arm64:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-15-0-azure-conda-linux-gcc-py39-arm64
- conda-linux-gcc-py39-cpu:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-15-0-azure-conda-linux-gcc-py39-cpu
- conda-linux-gcc-py39-cuda:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-15-0-azure-conda-linux-gcc-py39-cuda
- conda-osx-arm64-clang-py38:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-15-0-azure-conda-osx-arm64-clang-py38
- conda-osx-arm64-clang-py39:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-15-0-azure-conda-osx-arm64-clang-py39
- conda-osx-clang-py36-r36:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-15-0-azure-conda-osx-clang-py36-r36
- conda-osx-clang-py37-r40:
  URL: 
https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-05-15-0-azure-conda-osx-clang-py37-r40
- conda-osx-clang-py39:
  URL: 
ht

Re: [VOTE] [RUST] New release process for arrow-rs

2021-05-15 Thread Andrew Lamb
This vote passes (with +5 binding / + 2 non-binding).

We will start implementing the new process and hopefully have our first
trial release shortly.

Thanks again for all your help and patience,
Andrew

On Wed, May 12, 2021 at 5:26 PM Daniël Heres  wrote:

> +1 (non binding)
>
> Thanks!
> This is going to make quite some users / maintainers happy.
>
> On Wed, May 12, 2021, 23:23 Andrew Lamb  wrote:
>
> > Thank you -- I am expecting that the first few times will have some bumps
> > and that we'll optimize the process accordingly after that
> >
> > On Wed, May 12, 2021 at 9:45 AM Wes McKinney 
> wrote:
> >
> > > +1. I will add that I'm supportive of abbreviated votes for the
> > > biweekly releases, so if you get the votes in a 24 hour window, then
> > > releasing to crates.io sounds fine to me. For major releases with
> > > breaking changes, allowing the full 72 hours seems prudent.
> > >
> > > On Tue, May 11, 2021 at 11:02 PM Jorge Cardoso LeitĂŁo
> > >  wrote:
> > > >
> > > > +1
> > > >
> > > > Thanks a lot, Andrew!
> > > >
> > > > On Wed, May 12, 2021 at 2:04 AM Sutou Kouhei 
> > wrote:
> > > >
> > > > > +1
> > > > >
> > > > > In  > zeke+hb0x9quwdtazn3bx6r1oeecwoaav...@mail.gmail.com
> > > >
> > > > >   "[VOTE] [RUST] New release process for arrow-rs" on Tue, 11 May
> > 2021
> > > > > 18:16:14 -0400,
> > > > >   Andrew Lamb  wrote:
> > > > >
> > > > > > Per previous discussions, I would like to propose a new release
> > > process
> > > > > for
> > > > > > arrow-rs, releasing officially to crates.io every 2 weeks in
> > > addition to
> > > > > > the quarterly release of the other releases.
> > > > > >
> > > > > > The proposal is available as [1] , based on previous discussions
> > > [2][3]
> > > > > in
> > > > > > the mailing list and comments on the draft document [4].
> > > > > >
> > > > > > Please vote in the following manner. The vote will be open for at
> > > least
> > > > > 72
> > > > > > hours.
> > > > > >
> > > > > > [ ] +1 Implement the release process described in the proposal
> > > > > > [ ] +0
> > > > > > [ ] -1 Do not implement the process because...
> > > > > >
> > > > > > Thank you for your patience and participation,
> > > > > > Andrew
> > > > > >
> > > > > >
> > > > > > [1]
> > > > > >
> > > > >
> > >
> >
> https://docs.google.com/document/d/1tMQ67iu8XyGGZuj--h9WQYB9inCk6c2sL_4xMTwENGc/edit?ts=60961758
> > > > > >
> > > > > > [2]
> > > > > >
> > > > >
> > >
> >
> https://lists.apache.org/thread.html/r6b9baf59e3cd1a91905b5f802057026dfa627f00507638b605a3ff1b%40%3Cdev.arrow.apache.org%3E
> > > > > >
> > > > > > [3]
> > > > > >
> > > > >
> > >
> >
> https://lists.apache.org/thread.html/r832296a5bdf8eb363ef1ed7012b8e2dde3fa6894e7fa925a66c6e791%40%3Cdev.arrow.apache.org%3E
> > > > > >
> > > > > >
> > > > > > [4]
> > > > > >
> > > > >
> > >
> >
> https://docs.google.com/document/d/1QTGah5dkRG0Z6Gny_QCHmqMg7L2HmcbEpRISsfNEhSA/edit?ts=60904ac1
> > > > >
> > >
> >
>