[jira] [Created] (ARROW-8986) [Archery][ursabot] Fix benchmark diff checkout of origin/master

2020-05-30 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-8986:
-

 Summary: [Archery][ursabot] Fix benchmark diff checkout of 
origin/master
 Key: ARROW-8986
 URL: https://issues.apache.org/jira/browse/ARROW-8986
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Francois Saint-Jacques


https://github.com/apache/arrow/pull/7300#issuecomment-635967095



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-8985) [Format] Add "byte width" field with default of 16 to Decimal Flatbuffers type for forward compatibility

2020-05-30 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-8985:
---

 Summary: [Format] Add "byte width" field with default of 16 to 
Decimal Flatbuffers type for forward compatibility
 Key: ARROW-8985
 URL: https://issues.apache.org/jira/browse/ARROW-8985
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Format
Reporter: Wes McKinney
 Fix For: 1.0.0


This will permit larger or smaller decimals to be added to the format later 
without having to add a new Type union value



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Writing Parquet datasets using pyarrow.parquet.ParquetWriter

2020-05-30 Thread Palak Harwani
Hi,
I had a few questions regarding pyarrow.parquet. I want to write a Parquet
dataset which is partitioned according to one column. I have a large csv
file and I'm using chunks of csv using the following code :

  # csv_to_parquet.py

import pandas as pdimport pyarrow as paimport pyarrow.parquet as pq

csv_file = '/path/to/my.tsv'
parquet_file = '/path/to/my.parquet'
chunksize = 100_000

csv_stream = pd.read_csv(csv_file, sep='\t', chunksize=chunksize,
low_memory=False)
for i, chunk in enumerate(csv_stream):
print("Chunk", i)
if i == 0:
# Guess the schema of the CSV file from the first chunk
parquet_schema = pa.Table.from_pandas(df=chunk).schema
# Open a Parquet file for writing
parquet_writer = pq.ParquetWriter(parquet_file,
parquet_schema, compression='snappy')
# Write CSV chunk to the parquet file
table = pa.Table.from_pandas(chunk, schema=parquet_schema)
parquet_writer.write_table(table)


parquet_writer.close()



But this code writes a single parquet file and I don't see any method in
Parquet writer to write to a dataset, It just has the write_table method.
Is there a way to do this ?

Also how do I write the metadata file in the example mentioned above and
the common metadata file as well as the metadata files in case of a
partitioned dataset?

Thanks in advanced.

-- 
*Regards,*
*Palak Harwani*


Re: [NIGHTLY] Arrow Build Report for Job nightly-2020-05-30-0

2020-05-30 Thread Uwe L. Korn
https://github.com/apache/arrow/pull/7305 should enable us to upload conda 
packages again.

On Sat, May 30, 2020, at 12:10 PM, Crossbow wrote:
> 
> Arrow Build Report for Job nightly-2020-05-30-0
> 
> All tasks: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0
> 
> Failed Tasks:
> - conda-linux-gcc-py36:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-azure-conda-linux-gcc-py36
> - conda-linux-gcc-py37:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-azure-conda-linux-gcc-py37
> - conda-linux-gcc-py38:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-azure-conda-linux-gcc-py38
> - conda-osx-clang-py36:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-azure-conda-osx-clang-py36
> - conda-osx-clang-py37:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-azure-conda-osx-clang-py37
> - conda-osx-clang-py38:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-azure-conda-osx-clang-py38
> - conda-win-vs2015-py36:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-azure-conda-win-vs2015-py36
> - conda-win-vs2015-py37:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-azure-conda-win-vs2015-py37
> - conda-win-vs2015-py38:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-azure-conda-win-vs2015-py38
> - homebrew-cpp:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-travis-homebrew-cpp
> - homebrew-r-autobrew:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-travis-homebrew-r-autobrew
> - test-conda-python-3.7-dask-latest:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-test-conda-python-3.7-dask-latest
> - test-conda-python-3.7-spark-master:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-test-conda-python-3.7-spark-master
> - test-conda-python-3.8-dask-master:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-test-conda-python-3.8-dask-master
> - test-conda-python-3.8-jpype:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-test-conda-python-3.8-jpype
> 
> Succeeded Tasks:
> - centos-6-amd64:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-centos-6-amd64
> - centos-7-aarch64:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-travis-centos-7-aarch64
> - centos-7-amd64:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-centos-7-amd64
> - centos-8-aarch64:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-travis-centos-8-aarch64
> - centos-8-amd64:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-centos-8-amd64
> - debian-buster-amd64:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-debian-buster-amd64
> - debian-buster-arm64:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-travis-debian-buster-arm64
> - debian-stretch-amd64:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-debian-stretch-amd64
> - debian-stretch-arm64:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-travis-debian-stretch-arm64
> - gandiva-jar-osx:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-travis-gandiva-jar-osx
> - gandiva-jar-xenial:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-travis-gandiva-jar-xenial
> - nuget:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-nuget
> - test-conda-cpp-valgrind:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-test-conda-cpp-valgrind
> - test-conda-cpp:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-test-conda-cpp
> - test-conda-python-3.6-pandas-0.23:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-test-conda-python-3.6-pandas-0.23
> - test-conda-python-3.6:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-test-conda-python-3.6
> - test-conda-python-3.7-hdfs-2.9.2:
>   URL: 
> https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-test-conda-python-3.7-hdfs-2.9.2
> - 

[NIGHTLY] Arrow Build Report for Job nightly-2020-05-30-0

2020-05-30 Thread Crossbow


Arrow Build Report for Job nightly-2020-05-30-0

All tasks: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0

Failed Tasks:
- conda-linux-gcc-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-azure-conda-linux-gcc-py36
- conda-linux-gcc-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-azure-conda-linux-gcc-py37
- conda-linux-gcc-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-azure-conda-linux-gcc-py38
- conda-osx-clang-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-azure-conda-osx-clang-py36
- conda-osx-clang-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-azure-conda-osx-clang-py37
- conda-osx-clang-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-azure-conda-osx-clang-py38
- conda-win-vs2015-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-azure-conda-win-vs2015-py36
- conda-win-vs2015-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-azure-conda-win-vs2015-py37
- conda-win-vs2015-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-azure-conda-win-vs2015-py38
- homebrew-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-travis-homebrew-cpp
- homebrew-r-autobrew:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-travis-homebrew-r-autobrew
- test-conda-python-3.7-dask-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-test-conda-python-3.7-dask-latest
- test-conda-python-3.7-spark-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-test-conda-python-3.7-spark-master
- test-conda-python-3.8-dask-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-test-conda-python-3.8-dask-master
- test-conda-python-3.8-jpype:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-test-conda-python-3.8-jpype

Succeeded Tasks:
- centos-6-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-centos-6-amd64
- centos-7-aarch64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-travis-centos-7-aarch64
- centos-7-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-centos-7-amd64
- centos-8-aarch64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-travis-centos-8-aarch64
- centos-8-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-centos-8-amd64
- debian-buster-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-debian-buster-amd64
- debian-buster-arm64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-travis-debian-buster-arm64
- debian-stretch-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-debian-stretch-amd64
- debian-stretch-arm64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-travis-debian-stretch-arm64
- gandiva-jar-osx:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-travis-gandiva-jar-osx
- gandiva-jar-xenial:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-travis-gandiva-jar-xenial
- nuget:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-nuget
- test-conda-cpp-valgrind:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-test-conda-cpp-valgrind
- test-conda-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-test-conda-cpp
- test-conda-python-3.6-pandas-0.23:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-test-conda-python-3.6-pandas-0.23
- test-conda-python-3.6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-test-conda-python-3.6
- test-conda-python-3.7-hdfs-2.9.2:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-test-conda-python-3.7-hdfs-2.9.2
- test-conda-python-3.7-kartothek-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-test-conda-python-3.7-kartothek-latest
- test-conda-python-3.7-kartothek-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-test-conda-python-3.7-kartothek-master
- test-conda-python-3.7-pandas-latest: