Samtools (and HTSlib and BCFtools) version 1.23 is now available from
GitHub and SourceForge.

https://github.com/samtools/htslib/releases/tag/1.23
https://github.com/samtools/samtools/releases/tag/1.23
https://github.com/samtools/bcftools/releases/tag/1.23
https://sourceforge.net/projects/samtools/ The main changes are listed below:

------------------------------------------------------------------------------
htslib - changes v1.23
------------------------------------------------------------------------------

Updates
-------

* HTSlib 1.22 changed the VCF reader so that it stored GT prefixed phasing
  information, but only for files specifying `fileformat=VCFv4.4` or higher.
  This caused problems when merging files with different versions, so the
  VCF reader will now store prefixed phasing information irrespective of the
  VCF version listed in the file headers.  For files up to VCFv4.3, the
  first phasing bit will be set if all other alleles are phased, and cleared
  otherwise (following the rules for VCFv4.4 onwards where no explicit
  phasing symbol is present).  This will also happen when reading BCF.

  When accessing GT data, it is no longer safe to assume that the phasing
  is set to zero even if the file reports a version earlier than VCFv4.4.
  Interfaces such as `bcf_gt_allele()` should always be used to access GT
  allele data.

  For compatibility, prefixed phasing will be stripped when writing VCF files
  with version 4.3 or earlier. (PR #1938, fixes #1932)

* Add support for VCFv4.4 / VCFv4.5 "Number=" fields. (PR #1874)

* Consolidate and simplify SAM header parsing.  This considerably speeds up
  parsing files with many SQ lines. (PR #1947. PR #1953 fixes oss-fuzz issues
  444492071, 444492076, 444547724, 444490034, PR #1977)

* Switch from strtol to hts_str2uint in mod parsing for speed increase.
  (PR #1957.  Thanks to Chris Wright)

* Add UMI support to FASTQ input and output.  See samtools/samtools#2270.
  (PR #1960, fixes samtools/samtools#2259.  Requested by Poshi)

* Removed direct access to htsFile struct members in some sample functions.
  (PR #1963, fixes #1961.  Reported by John Marshall)

* Improved operation of filters that work with header data.  Filter
  expressions set as an `HTS_OPT_FILTER` on a BAM or CRAM iterator failed
  to return records matching on `rname`, `mrname`, `rnext` or `library`.
  (PR #1959)

* Add Type to the INFO/FORMAT sanity check.  This produces a warning on
  incorrect Type usage. (PR #1967, fixes #1937 and samtools/bcftools#2431.
  Reported by Jukka Matilainen)

* S3 reading code now reads in `chunks` to limit the amount of data read (and
  therefore egress costs) from the object store when doing a range request.
  Also this combines the reading, writing and authorisation code into a
  single file. (PR #1958, fixes #1670.  Reported by Stephan Drukewitz)

Build Changes
-------------

* Change optimisation for -fsanitize=address,undefined test build to counter
  slow build and high compiler memory use. (PR #1924)

* Fix compilation failure on MacOS X 10.9 (and likely other very old
  platforms). (PR #1945, fixes #1941.  Reported by Ryan Carsten Schmidt)

* Fix htslib.map update due to recent change in nm behaviour. (PR #1975,
  fixes #1971.  Reported by John Marshall).

* The htscodecs submodule is updated to v1.6.5. This includes a fix to the
  rANS encoder when running on x86-64 hardware with some SIMD features
  disabled. (Fixes samtools/samtools#2256. Reported by Ran Fan)

Bug fixes
---------

* Fix segfault on an empty valid MM tag. (PR #1939, fixes #1936.  Reported by
  John Marshall)

* Fix bam_next_basemod + HTS_MOD_REPORT_UNCHECKED flag. (PR #1946,
  fixes #1943)

* For the VCF rlen calculation, only use SVLEN for DEL, DUP and CNV symbolic
  alleles.  A bug is also fixed on big-endian platforms where INFO and FORMAT
  values were being accessed incorrectly. (PR #1942, fixes #1940)

* Correct TLEN assignment in CRAM decode.  Also improve decoder when dealing
  with multiple secondary alignments.  See also samtools/hts-specs#842.
  (PR #1951, fixes #1948.  Reported by Matt Sexton)

* Make tabix skip comments (-c) wherever they occur, not just at the start of
  the file. (PR #1952, fixes #1950.  Reported by Victor Negîrneac)

* Update htscodecs for better AVX2 / AVX512 runtime detection. (PR #1954,
  fixes samtools/samtools#2256.  Reported by Ran Fan)

* Fix embed_ref=2 on SEQ * and MD:Z tag. The combination of no sequence and
  MD:Z with embed_ref=2 caused the slice extents to be miscalculated, causing
  invalid CRAM output to be written. (PR #1964, fixes samtools/samtools#2277.
  Reported by fo40225)

* Try to ensure CSI indexes are built with valid parameters.  Adjusts the
  min_shift and n_lvls to cover the size of the genome.  This may override
  the user setting of min_shift (with warning) if needed. (PR #1968, fixes
  #1966. Reported by Marc Sturm)

* Fix bug where multi-threaded CRAM iterators could drop long alignments
  starting significantly before, but overlapping, the region of interest.
  (PR #1973, fixes samtools/samtools#2285,  Reported by Nick Owens)

Documentation updates
---------------------

* Added support information and samtools email for security issues.
  (PR #1956)

* Fix spelling in function name in sam.h. (PR #1972.  Thanks to Jack Turpitt)

------------------------------------------------------------------------------
samtools - changes v1.23
------------------------------------------------------------------------------

New work and changes:

* New reference stats in `samtools stats`. First line in RFS section gives
  the total sequence count, count of regions, average GC, min, max, average
  and total counts.  Second line onwards gives regions, lengths, GC and
  unknown base count. (PR #2224, implements #2139.  Requested by
  Filipe G. Vieira)

* New, faster Python version of seq_cache_populate to create and update
  REF_CACHE. (PR #2231.  Thanks to Ruben Vorderman)

* Add a minimum depth (`--min-depth`) option to `samtools coverage`.
  (PR #2235, implements #1563.  Requested by Charles Foster)

* Add an option to exclude reads (`--exclude-no-read-group`) that have no
  read group from `samtools view` when the `-r` (or `-R`) options are used.
  (PR #2271, fixes #2265.  Reported by Matt Sexton)

* Add UMI support to `samtools fastq` and `samtools import`. See
  samtools/htslib#1960. (PR #2270, fixes #2259 amd #2262. Requested by Poshi)

* Optionally trim soft clips from reads in `samtools fastq` output.
  (PR #2233, fixes #1275.  Requested by Torsten Seemann)

* If sam file is sorted by tag, `samtools split` will output data
  sequentially to avoid having simultaneous open files. (PR #2281, fixes
  #2276.  Requested by Clint Valentine)

Documentation:

* In the command help output add a link to the global options in samtools.1
  page on the [HTSlib](https://www.htslib.org/) site. (PR #2258, addresses
  #2236.  Reported by Chris Saunders)

* Add a support section to README.md.  This mentions the GitHub issue tracker
  and an email address for security issues. (PR #2267)

Bug fixes:

* Prevent `samtools coverage` from printing a coverage table on failure. (PR
  #2247, fixes #2242.  Reported by Georges Kanaan)

* Remove deprecated line style commands from plot-bamstats. (PR #2251, fixes
  #2243.  Reported by Suhas Srinivasan)

* Add missing sam_global_args_free calls to address (harmless) memory leaks.
  (PR #2274)

* Fix `samtools consensus` crash when used with threads and iterators.  See
  also samtools/htslib#1959 (PR #2269)

Non user-visible changes and build improvements:

* Ignore and testclean test/stat/*.fa.fai (PR #2241. Thanks to John Marshall)

* Remove use of C variables starting in _ from bam_consensus.c. (PR #2250,
  fixes #2248.  Reported by Ghanji125)

* Add Replace RG check exit and add some comments to bam_addrprg.c.
  (PR #2254.  Thanks to Martin Pollard)

------------------------------------------------------------------------------
bcftools - changes v1.23
------------------------------------------------------------------------------

Changes affecting the whole of bcftools, or multiple commands:

* The `-i/-e` filtering expressions and `-f` formatting in `query`

    - Add a new function `smpl_COUNT()/sCOUNT()` which returns the number of
      elements (#2423)

Changes affecting specific commands:

* bcftools annotate

    - Make dynamic variables read from a tab-delimited annotation file
      (#2151) work also for regions. For example, while the first command
      below was functional, the second was not (#2441)

      bcftools annotate -a ann.tsv.gz -c CHROM,POS,-,SCORE,~STR \
         -i'TAG={STR}' -k in.vcf
      bcftools annotate -a ann.tsv.gz -c CHROM,BEG,END,SCORE,~STR \
         -i'TAG={STR}' -k in.vcf

* bcftools consensus

    - Fix a bug which prevented reading fasta files containing empty lines in
      their entirety (#2424)

    - Fix a bug which causes `--absent` miss some absent positions

* bcftools csq

    - Add support for complex substitutions, such as AC>TAA

* bcftools +fill-tags

    - Fix header formatting error for INFO/F_MISSING which must be Number=1
      (#2442)

    - Make `-t 'F_MISSING'` work with `-S groups.txt` (#2447)

* bcftools gtcheck

    - The program is now able to process gVCF blocks. Newly, monoallelic
      sites are excluded only when the site is monoallelic in both query and
      genotype file. The new option --keep-refs allows to always include
      monoallelic sites.

    - Fix an error in parsing -i/-e command line options where the `qry:` and
      `gt:` prefix was not stripped (#2432)

* bcftools mpileup

    - Make `-d, --max-depth 0` set the depth to unlimited (#2435)

* bcftools norm

    - Make the -i/-e filtering option work for all options, such as line
      merging and duplication removal (#2415)

* bcftools query

    - Numerical functions, such as SUM(INFO/DP), would previously return the
      value 0 when executed on missing values. This was incorrect, newly a
      missing value is printed.

* bcftools reheader

    - Add options `--samples-list` and `--samples-file` to allow renaming
      samples from a list of samples on command line, rather than from a file
      of sample names (#2383)

* bcftools +split-vep

    - Fix the option `-A, --all-fields`, it was not working properly and
      could lead to a segfault (#2473)

----------------------------------------------------------------------
The Wellcome Sanger Institute is operated by Genome Research Limited, a charity 
registered in England with number 1021457 and a company registered in England 
with number 2742969, whose registered office is Wellcome Sanger Institute, 
Wellcome Genome Campus, Hinxton, CB10 1SA.
_______________________________________________
Samtools-help mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/samtools-help

Reply via email to