[jira] [Created] (ARROW-8986) [Archery][ursabot] Fix benchmark diff checkout of origin/master
Francois Saint-Jacques created ARROW-8986: - Summary: [Archery][ursabot] Fix benchmark diff checkout of origin/master Key: ARROW-8986 URL: https://issues.apache.org/jira/browse/ARROW-8986 Project: Apache Arrow Issue Type: Bug Reporter: Francois Saint-Jacques https://github.com/apache/arrow/pull/7300#issuecomment-635967095 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-8985) [Format] Add "byte width" field with default of 16 to Decimal Flatbuffers type for forward compatibility
Wes McKinney created ARROW-8985: --- Summary: [Format] Add "byte width" field with default of 16 to Decimal Flatbuffers type for forward compatibility Key: ARROW-8985 URL: https://issues.apache.org/jira/browse/ARROW-8985 Project: Apache Arrow Issue Type: Improvement Components: Format Reporter: Wes McKinney Fix For: 1.0.0 This will permit larger or smaller decimals to be added to the format later without having to add a new Type union value -- This message was sent by Atlassian Jira (v8.3.4#803005)
Writing Parquet datasets using pyarrow.parquet.ParquetWriter
Hi, I had a few questions regarding pyarrow.parquet. I want to write a Parquet dataset which is partitioned according to one column. I have a large csv file and I'm using chunks of csv using the following code : # csv_to_parquet.py import pandas as pdimport pyarrow as paimport pyarrow.parquet as pq csv_file = '/path/to/my.tsv' parquet_file = '/path/to/my.parquet' chunksize = 100_000 csv_stream = pd.read_csv(csv_file, sep='\t', chunksize=chunksize, low_memory=False) for i, chunk in enumerate(csv_stream): print("Chunk", i) if i == 0: # Guess the schema of the CSV file from the first chunk parquet_schema = pa.Table.from_pandas(df=chunk).schema # Open a Parquet file for writing parquet_writer = pq.ParquetWriter(parquet_file, parquet_schema, compression='snappy') # Write CSV chunk to the parquet file table = pa.Table.from_pandas(chunk, schema=parquet_schema) parquet_writer.write_table(table) parquet_writer.close() But this code writes a single parquet file and I don't see any method in Parquet writer to write to a dataset, It just has the write_table method. Is there a way to do this ? Also how do I write the metadata file in the example mentioned above and the common metadata file as well as the metadata files in case of a partitioned dataset? Thanks in advanced. -- *Regards,* *Palak Harwani*
Re: [NIGHTLY] Arrow Build Report for Job nightly-2020-05-30-0
https://github.com/apache/arrow/pull/7305 should enable us to upload conda packages again. On Sat, May 30, 2020, at 12:10 PM, Crossbow wrote: > > Arrow Build Report for Job nightly-2020-05-30-0 > > All tasks: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0 > > Failed Tasks: > - conda-linux-gcc-py36: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-azure-conda-linux-gcc-py36 > - conda-linux-gcc-py37: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-azure-conda-linux-gcc-py37 > - conda-linux-gcc-py38: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-azure-conda-linux-gcc-py38 > - conda-osx-clang-py36: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-azure-conda-osx-clang-py36 > - conda-osx-clang-py37: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-azure-conda-osx-clang-py37 > - conda-osx-clang-py38: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-azure-conda-osx-clang-py38 > - conda-win-vs2015-py36: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-azure-conda-win-vs2015-py36 > - conda-win-vs2015-py37: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-azure-conda-win-vs2015-py37 > - conda-win-vs2015-py38: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-azure-conda-win-vs2015-py38 > - homebrew-cpp: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-travis-homebrew-cpp > - homebrew-r-autobrew: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-travis-homebrew-r-autobrew > - test-conda-python-3.7-dask-latest: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-test-conda-python-3.7-dask-latest > - test-conda-python-3.7-spark-master: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-test-conda-python-3.7-spark-master > - test-conda-python-3.8-dask-master: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-test-conda-python-3.8-dask-master > - test-conda-python-3.8-jpype: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-test-conda-python-3.8-jpype > > Succeeded Tasks: > - centos-6-amd64: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-centos-6-amd64 > - centos-7-aarch64: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-travis-centos-7-aarch64 > - centos-7-amd64: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-centos-7-amd64 > - centos-8-aarch64: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-travis-centos-8-aarch64 > - centos-8-amd64: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-centos-8-amd64 > - debian-buster-amd64: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-debian-buster-amd64 > - debian-buster-arm64: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-travis-debian-buster-arm64 > - debian-stretch-amd64: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-debian-stretch-amd64 > - debian-stretch-arm64: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-travis-debian-stretch-arm64 > - gandiva-jar-osx: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-travis-gandiva-jar-osx > - gandiva-jar-xenial: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-travis-gandiva-jar-xenial > - nuget: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-nuget > - test-conda-cpp-valgrind: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-test-conda-cpp-valgrind > - test-conda-cpp: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-test-conda-cpp > - test-conda-python-3.6-pandas-0.23: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-test-conda-python-3.6-pandas-0.23 > - test-conda-python-3.6: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-test-conda-python-3.6 > - test-conda-python-3.7-hdfs-2.9.2: > URL: > https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-test-conda-python-3.7-hdfs-2.9.2 > -
[NIGHTLY] Arrow Build Report for Job nightly-2020-05-30-0
Arrow Build Report for Job nightly-2020-05-30-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0 Failed Tasks: - conda-linux-gcc-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-azure-conda-linux-gcc-py36 - conda-linux-gcc-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-azure-conda-linux-gcc-py37 - conda-linux-gcc-py38: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-azure-conda-linux-gcc-py38 - conda-osx-clang-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-azure-conda-osx-clang-py36 - conda-osx-clang-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-azure-conda-osx-clang-py37 - conda-osx-clang-py38: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-azure-conda-osx-clang-py38 - conda-win-vs2015-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-azure-conda-win-vs2015-py36 - conda-win-vs2015-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-azure-conda-win-vs2015-py37 - conda-win-vs2015-py38: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-azure-conda-win-vs2015-py38 - homebrew-cpp: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-travis-homebrew-cpp - homebrew-r-autobrew: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-travis-homebrew-r-autobrew - test-conda-python-3.7-dask-latest: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-test-conda-python-3.7-dask-latest - test-conda-python-3.7-spark-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-test-conda-python-3.7-spark-master - test-conda-python-3.8-dask-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-test-conda-python-3.8-dask-master - test-conda-python-3.8-jpype: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-test-conda-python-3.8-jpype Succeeded Tasks: - centos-6-amd64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-centos-6-amd64 - centos-7-aarch64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-travis-centos-7-aarch64 - centos-7-amd64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-centos-7-amd64 - centos-8-aarch64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-travis-centos-8-aarch64 - centos-8-amd64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-centos-8-amd64 - debian-buster-amd64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-debian-buster-amd64 - debian-buster-arm64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-travis-debian-buster-arm64 - debian-stretch-amd64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-debian-stretch-amd64 - debian-stretch-arm64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-travis-debian-stretch-arm64 - gandiva-jar-osx: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-travis-gandiva-jar-osx - gandiva-jar-xenial: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-travis-gandiva-jar-xenial - nuget: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-nuget - test-conda-cpp-valgrind: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-test-conda-cpp-valgrind - test-conda-cpp: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-test-conda-cpp - test-conda-python-3.6-pandas-0.23: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-test-conda-python-3.6-pandas-0.23 - test-conda-python-3.6: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-test-conda-python-3.6 - test-conda-python-3.7-hdfs-2.9.2: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-test-conda-python-3.7-hdfs-2.9.2 - test-conda-python-3.7-kartothek-latest: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-test-conda-python-3.7-kartothek-latest - test-conda-python-3.7-kartothek-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-30-0-github-test-conda-python-3.7-kartothek-master - test-conda-python-3.7-pandas-latest: