commit python-fastparquet for openSUSE:Factory

Source-Sync Thu, 12 Aug 2021 00:02:41 -0700

Script 'mail_helper' called by obssrc
Hello community,

here is the log from the commit of package python-fastparquet for 
openSUSE:Factory checked in at 2021-08-12 09:01:23
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Comparing /work/SRC/openSUSE:Factory/python-fastparquet (Old)
 and      /work/SRC/openSUSE:Factory/.python-fastparquet.new.1899 (New)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Package is "python-fastparquet"

Thu Aug 12 09:01:23 2021 rev:18 rq:911011 version:0.7.1

Changes:
--------
--- /work/SRC/openSUSE:Factory/python-fastparquet/python-fastparquet.changes    
2021-05-19 17:49:48.645426287 +0200
+++ 
/work/SRC/openSUSE:Factory/.python-fastparquet.new.1899/python-fastparquet.changes
  2021-08-12 09:02:16.194093327 +0200
@@ -1,0 +2,76 @@
+Sun Aug  8 15:13:55 UTC 2021 - Ben Greiner <[email protected]>
+
+- Update to version 0.7.1
+  * Back compile for older versions of numpy
+  * Make pandas nullable types opt-out. The old behaviour (casting
+    to float) is still available with ParquetFile(...,
+    pandas_nulls=False).
+  * Fix time field regression: IsAdjustedToUTC will be False when
+    there is no timezone
+  * Micro improvements to the speed of ParquetFile creation by
+    using simple simple string ops instead of regex and
+    regularising filenames once at the start. Effects datasets with
+    many files.
+- Release 0.7.0
+  * This version institutes major, breaking changes, listed here,
+    and incremental fixes and additions.
+  * Reading a directory without a _metadata summary file now works
+    by providing only the directory, instead of a list of
+    constituent files. This change also makes direct of use of
+    fsspec filesystems, if given, to be able to load the footer
+    metadata areas of the files concurrently, if the storage
+    backend supports it, and not directly instantiating
+    intermediate ParquetFile instances
+  * row-level filtering of the data. Whereas previously, only full 
+    row-groups could be excluded on the basis of their parquet 
+    metadata statistics (if present), filtering can now be done 
+    within row-groups too. The syntax is the same as before, 
+    allowing for multiple column expressions to be combined with 
+    AND|OR, depending on the list structure. This mechanism 
+    requires two passes: one to load the columns needed to create 
+    the boolean mask, and another to load the columns actually 
+    needed in the output. This will not be faster, and may be 
+    slower, but in some cases can save significant memory 
+    footprint, if a small fraction of rows are considered good and 
+    the columns for the filter expression are not in the output. 
+    Not currently supported for reading with DataPageV2.
+  * DELTA integer encoding (read-only): experimentally working, 
+    but we only have one test file to verify against, since it is 
+    not trivial to persuade Spark to produce files encoded this 
+    way. DELTA can be extremely compact a representation for 
+    slowly varying and/or monotonically increasing integers.
+  * nanosecond resolution times: the new extended "logical" types 
+    system supports nanoseconds alongside the previous millis and 
+    micros. We now emit these for the default pandas time type, 
+    and produce full parquet schema including both "converted" and 
+    "logical" type information. Note that all output has 
+    isAdjustedToUTC=True, i.e., these are timestamps rather than 
+    local time. The time-zone is stored in the metadata, as 
+    before, and will be successfully recreated only in fastparquet 
+    and (py)arrow. Otherwise, the times will appear to be UTC. For 
+    compatibility with Spark, you may still want to use 
+    times="int96" when writing.
+  * DataPageV2 writing: now we support both reading and writing. 
+    For writing, can be enabled with the environment variable 
+    FASTPARQUET_DATAPAGE_V2, or module global fastparquet.writer.
+    DATAPAGE_VERSION and is off by default. It will become on by 
+    default in the future. In many cases, V2 will result in better 
+    read performance, because the data and page headers are 
+    encoded separately, so data can be directly read into the 
+    output without addition allocation/copies. This feature is 
+    considered experimental, but we believe it working well for 
+    most use cases (i.e., our test suite) and should be readable 
+    by all modern parquet frameworks including arrow and spark.
+  * pandas nullable types: pandas supports "masked" extension 
+    arrays for types that previously could not support NULL at 
+    all: ints and bools. Fastparquet used to cast such columns to 
+    float, so that we could represent NULLs as NaN; now we use the 
+    new(er) masked types by default. This means faster reading of 
+    such columns, as there is no conversion. If the metadata 
+    guarantees that there are no nulls, we still use the 
+    non-nullable variant unless the data was written with 
+    fastparquet/pyarrow, and the metadata indicates that the 
+    original datatype was nullable. We already handled writing of 
+    nullable columns.
+
+-------------------------------------------------------------------

Old:
----
  fastparquet-0.6.3.tar.gz

New:
----
  fastparquet-0.7.1.tar.gz

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Other differences:
------------------
++++++ python-fastparquet.spec ++++++
--- /var/tmp/diff_new_pack.yqSY0Y/_old  2021-08-12 09:02:16.798092383 +0200
+++ /var/tmp/diff_new_pack.yqSY0Y/_new  2021-08-12 09:02:16.802092377 +0200
@@ -21,7 +21,7 @@
 %define         skip_python2 1
 %define         skip_python36 1
 Name:           python-fastparquet
-Version:        0.6.3
+Version:        0.7.1
 Release:        0
 Summary:        Python support for Parquet file format
 License:        Apache-2.0
@@ -29,8 +29,9 @@
 Source:         
https://github.com/dask/fastparquet/archive/%{version}.tar.gz#/fastparquet-%{version}.tar.gz
 BuildRequires:  %{python_module Cython}
 BuildRequires:  %{python_module cramjam >= 2.3.0}
-BuildRequires:  %{python_module fsspec}
-BuildRequires:  %{python_module numpy-devel >= 1.11}
+# version requirement not declared for runtime, but necessary for tests.
+BuildRequires:  %{python_module fsspec >= 2021.6.0}
+BuildRequires:  %{python_module numpy-devel >= 1.18}
 BuildRequires:  %{python_module pandas >= 1.1.0}
 BuildRequires:  %{python_module pytest}
 BuildRequires:  %{python_module python-lzo}
@@ -40,7 +41,7 @@
 BuildRequires:  python-rpm-macros
 Requires:       python-cramjam >= 2.3.0
 Requires:       python-fsspec
-Requires:       python-numpy >= 1.11
+Requires:       python-numpy >= 1.18
 Requires:       python-pandas >= 1.1.0
 Requires:       python-thrift >= 0.11.0
 Recommends:     python-python-lzo
@@ -54,6 +55,8 @@
 %setup -q -n fastparquet-%{version}
 # remove pytest-runner from setup_requires
 sed -i "s/'pytest-runner',//" setup.py
+# this is not meant for setup.py
+sed -i "s/oldest-supported-numpy/numpy/" setup.py
 # the tests import the fastparquet.test module and we need to import from 
sitearch, so install it.
 sed -i -e "s/^\s*packages=\[/&'fastparquet.test', /" -e 
"/exclude_package_data/ d" setup.py
 

++++++ fastparquet-0.6.3.tar.gz -> fastparquet-0.7.1.tar.gz ++++++
/work/SRC/openSUSE:Factory/python-fastparquet/fastparquet-0.6.3.tar.gz 
/work/SRC/openSUSE:Factory/.python-fastparquet.new.1899/fastparquet-0.7.1.tar.gz
 differ: char 13, line 1

commit python-fastparquet for openSUSE:Factory

Reply via email to