from:"Steven Hartland"

Re: svn commit: r356755 - in head/sys: net netinet netinet6 netpfil/ipfw/nat64 sys

2020-01-14 Thread Steven Hartland

Aren’t the current and suggested the same there or do I need more coffee
this morning?

On Wed, 15 Jan 2020 at 06:10, Gleb Smirnoff  wrote:

>   Hi,
>
> On Wed, Jan 15, 2020 at 06:05:20AM +, Gleb Smirnoff wrote:
> T> Log:
> T>   Introduce NET_EPOCH_CALL() macro and use it everywhere where we free
> T>   data based on the network epoch.   The macro reverses the argument
> T>   order of epoch_call(9) - first function, then its argument. NFC
>
> I really want to reverse the argument order of epoch_call() as well.
> The current order is really backwards:
>
>  void
>  epoch_call(epoch_t epoch, epoch_context_t ctx,
>  void (*callback)(epoch_context_t));
>
> Suggested declaration is:
>
>  void
>  epoch_call(epoch_t epoch, epoch_context_t ctx,
> void (*callback)(epoch_context_t));
>
> This will be a very easy change, since today function is
> used just in few places.
>
> Before branching stable/12 we intentionally put this
> note in epoch.9 manual page:
>
> NOTES
>  The epoch kernel programming interface is under development and is
>  subject to change.
>
> Any objections?
>
> --
> Gleb Smirnoff
>
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r355831 - head/sys/cam/nvme

2019-12-18 Thread Steven Hartland

Thanks for all the feedback Warner, some more comments in line below, 
would be interested in your thoughts.


On 17/12/2019 02:53, Warner Losh wrote:
On Mon, Dec 16, 2019, 5:28 PM Steven Hartland 
<mailto:steven.hartl...@multiplay.co.uk>> wrote:


Be aware that ZFS already does a pretty decent job of this
already, so the statement about upper layers isn't true for all.
It even has different priorities
for different request types so I'm a little concerned that doing
it at both layers could cause issues.


ZFS' BIO_DELETE scheduling works well for enterprise drives, but needs 
tuning the further away you get from enterprise performance. I don't 
anticipate any effect on performance here since this is not enabled by 
default, unless I've messed something up (and if I have screwed this 
up, please let me know). I've honestly not tried to enable these 
things on ZFS.


In addition to this if its anything like SSD's numbers of requests
are only a small part of the story with total trim size being the
other one. I this case you could hit total desired size with just
one BIO_DELETE request.

With this code what's the impact of this?


You're correct.  It tends to be the number of segments and/or the size 
of the segment. This steers cases where the number of segments 
dominates. For cases where total size dominates, you're often better 
off using the I/O scheduler to rate limit the size of the trims.
This is also one of the reasons I introduced 
kern.geom.dev.delete_max_sectors.


It would be worth at some time writing up a guide to all the logic in 
the various layers with regards to how we treat TRIM requests. There are 
quite few elements now and I don't believe its clear where they all are 
and what they are trying to achieve, which makes it easy for them to 
start fighting against either other.
This feature is designed to allow a large number of files to be 
deleted at once while doing the trims from them a little at a time to 
even the load out.
That's pretty similar in concept to our current ZFS TRIM code, only time 
will tell once the new upstream gets merged, if this is still the case.


   Regards
   Steve

___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r355837 - head/sys/cam

2019-12-16 Thread Steven Hartland


Sticky keyboard there Warner?

On a more serious note the fact that the controllers lie about the 
underlying
location of data, the impact of skipping the TRIM requests can have a 
much more

serious impact than one might think depending on the drive, so this type of
optimisation can significantly harm performance instead of increasing it.

This was the main reasons we sponsored the initial ZFS TRIM 
implementation; as
drive performance go so bad with no TRIM that SSD's performed worse than 
HDD's.


Now obviously this was some time ago, but I wouldn't be surprised if 
there's bad

hardware / firmware like this still being produced.

Given that might be a good idea to make this optional, possibly even opt 
in not opt

out?

    Regards
    Steve

On 17/12/2019 00:13, Warner Losh wrote:

Author: imp
Date: Tue Dec 17 00:13:45 2019
New Revision: 355837
URL: https://svnweb.freebsd.org/changeset/base/355837

Log:
   Implement bio_speedup
   
   React to the BIO_SPEED command in the cam io scheduler by completing

   as successful BIO_DELETE commands that are pending, up to the length
   passed down in the BIO_SPEEDUP cmomand. The length passed down is a
   hint for how much space on the drive needs to be recovered. By
   completing the BIO_DELETE comomands, this allows the upper layers to
   allocate and write to the blocks that were about to be trimmed. Since
   FreeBSD implements TRIMSs as advisory, we can eliminliminate them and
   go directly to writing.
   
   The biggest benefit from TRIMS coomes ffrom the drive being able t

   ooptimize its free block pool inthe log run. There's little nto no
   bene3efit in the shoort term. , sepeciall whn the trim is followed by
   a write. Speedup lets  us make this tradeoff.
   
   Reviewed by: kirk, kib

   Sponsored by: Netflix
   Differential Revision: https://reviews.freebsd.org/D18351

Modified:
   head/sys/cam/cam_iosched.c

Modified: head/sys/cam/cam_iosched.c
==
--- head/sys/cam/cam_iosched.c  Tue Dec 17 00:13:40 2019(r355836)
+++ head/sys/cam/cam_iosched.c  Tue Dec 17 00:13:45 2019(r355837)
@@ -1534,6 +1534,41 @@ cam_iosched_queue_work(struct cam_iosched_softc *isc,
  {
  
  	/*

+* A BIO_SPEEDUP from the uppper layers means that they have a block
+* shortage. At the present, this is only sent when we're trying to
+* allocate blocks, but have a shortage before giving up. bio_length is
+* the size of their shortage. We will complete just enough BIO_DELETEs
+* in the queue to satisfy the need. If bio_length is 0, we'll complete
+* them all. This allows the scheduler to delay BIO_DELETEs to improve
+* read/write performance without worrying about the upper layers. When
+* it's possibly a problem, we respond by pretending the BIO_DELETEs
+* just worked. We can't do anything about the BIO_DELETEs in the
+* hardware, though. We have to wait for them to complete.
+*/
+   if (bp->bio_cmd == BIO_SPEEDUP) {
+   off_t len;
+   struct bio *nbp;
+
+   len = 0;
+   while (bioq_first(>trim_queue) &&
+   (bp->bio_length == 0 || len < bp->bio_length)) {
+   nbp = bioq_takefirst(>trim_queue);
+   len += nbp->bio_length;
+   nbp->bio_error = 0;
+   biodone(nbp);
+   }
+   if (bp->bio_length > 0) {
+   if (bp->bio_length > len)
+   bp->bio_resid = bp->bio_length - len;
+   else
+   bp->bio_resid = 0;
+   }
+   bp->bio_error = 0;
+   biodone(bp);
+   return;
+   }
+
+   /*
 * If we get a BIO_FLUSH, and we're doing delayed BIO_DELETEs then we
 * set the last tick time to one less than the current ticks minus the
 * delay to force the BIO_DELETEs to be presented to the client driver.
@@ -1919,8 +1954,8 @@ DB_SHOW_COMMAND(iosched, cam_iosched_db_show)
db_printf("Trim Q len %d\n", biolen(>trim_queue));
db_printf("read_bias: %d\n", isc->read_bias);
db_printf("current_read_bias: %d\n", isc->current_read_bias);
-   db_printf("Trims active   %d\n", isc->pend_trim);
-   db_printf("Max trims active   %d\n", isc->max_trim);
+   db_printf("Trims active   %d\n", isc->pend_trims);
+   db_printf("Max trims active   %d\n", isc->max_trims);
  }
  #endif
  #endif


___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r355832 - head/sys/cam

2019-12-16 Thread Steven Hartland


What if any is the impact on request ordering with this new delayed TRIM?

On 17/12/2019 00:13, Warner Losh wrote:

Author: imp
Date: Tue Dec 17 00:13:21 2019
New Revision: 355832
URL: https://svnweb.freebsd.org/changeset/base/355832

Log:
   Add rate limiters to TRIM.
   
   Add rate limiters to trims. Trims are a bit different than reads or

   writes in that they can be combined, so some care needs to be taken
   where we rate limit them. Additional work will be needed to push the
   working rate limit below the I/O quanta rate for things like IOPS.
   
   Sponsored by: Netflix


Modified:
   head/sys/cam/cam_iosched.c

Modified: head/sys/cam/cam_iosched.c
==
--- head/sys/cam/cam_iosched.c  Tue Dec 17 00:11:48 2019(r355831)
+++ head/sys/cam/cam_iosched.c  Tue Dec 17 00:13:21 2019(r355832)
@@ -755,7 +755,20 @@ cam_iosched_has_io(struct cam_iosched_softc *isc)
  static inline bool
  cam_iosched_has_more_trim(struct cam_iosched_softc *isc)
  {
+   struct bio *bp;
  
+	bp = bioq_first(>trim_queue);

+#ifdef CAM_IOSCHED_DYNAMIC
+   if (do_dynamic_iosched) {
+   /*
+* If we're limiting trims, then defer action on trims
+* for a bit.
+*/
+   if (bp == NULL || cam_iosched_limiter_caniop(>trim_stats, 
bp) != 0)
+   return false;
+   }
+#endif
+
/*
 * If we've set a trim_goal, then if we exceed that allow trims
 * to be passed back to the driver. If we've also set a tick timeout
@@ -771,8 +784,8 @@ cam_iosched_has_more_trim(struct cam_iosched_softc *is
return false;
}
  
-	return !(isc->flags & CAM_IOSCHED_FLAG_TRIM_ACTIVE) &&

-   bioq_first(>trim_queue);
+   /* NB: Should perhaps have a max trim active independent of I/O 
limiters */
+   return !(isc->flags & CAM_IOSCHED_FLAG_TRIM_ACTIVE) && bp != NULL;
  }
  
  #define cam_iosched_sort_queue(isc)	((isc)->sort_io_queue >= 0 ?	\

@@ -1389,10 +1402,17 @@ cam_iosched_next_trim(struct cam_iosched_softc *isc)
  struct bio *
  cam_iosched_get_trim(struct cam_iosched_softc *isc)
  {
+#ifdef CAM_IOSCHED_DYNAMIC
+   struct bio *bp;
+#endif
  
  	if (!cam_iosched_has_more_trim(isc))

return NULL;
  #ifdef CAM_IOSCHED_DYNAMIC
+   bp  = bioq_first(>trim_queue);
+   if (bp == NULL)
+   return NULL;
+
/*
 * If pending read, prefer that based on current read bias setting. The
 * read bias is shared for both writes and TRIMs, but on TRIMs the bias
@@ -1414,6 +1434,26 @@ cam_iosched_get_trim(struct cam_iosched_softc *isc)
 */
isc->current_read_bias = isc->read_bias;
}
+
+   /*
+* See if our current limiter allows this I/O. Because we only call this
+* here, and not in next_trim, the 'bandwidth' limits for trims won't
+* work, while the iops or max queued limits will work. It's tricky
+* because we want the limits to be from the perspective of the
+* "commands sent to the device." To make iops work, we need to check
+* only here (since we want all the ops we combine to count as one). To
+* make bw limits work, we'd need to check in next_trim, but that would
+* have the effect of limiting the iops as seen from the upper layers.
+*/
+   if (cam_iosched_limiter_iop(>trim_stats, bp) != 0) {
+   if (iosched_debug)
+   printf("Can't trim because limiter says no.\n");
+   isc->trim_stats.state_flags |= IOP_RATE_LIMITED;
+   return NULL;
+   }
+   isc->current_read_bias = isc->read_bias;
+   isc->trim_stats.state_flags &= ~IOP_RATE_LIMITED;
+   /* cam_iosched_next_trim below keeps proper book */
  #endif
return cam_iosched_next_trim(isc);
  }


___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r355831 - head/sys/cam/nvme

2019-12-16 Thread Steven Hartland

Be aware that ZFS already does a pretty decent job of this already, so 
the statement
about upper layers isn't true for all. It even has different priorities 
for different request
types so I'm a little concerned that doing it at both layers could cause 
issues.


In addition to this if its anything like SSD's numbers of requests are 
only a small part
of the story with total trim size being the other one. I this case you 
could hit total

desired size with just one BIO_DELETE request.

With this code what's the impact of this?

On 17/12/2019 00:11, Warner Losh wrote:

Author: imp
Date: Tue Dec 17 00:11:48 2019
New Revision: 355831
URL: https://svnweb.freebsd.org/changeset/base/355831

Log:
   NVME trim stuff.
   
   Add two sysctls to control pacing of nvme

   trims. kern.cam.nda.X.goal_trim is the number of upper layer
   BIO_DEELETE requests to try to collecet before sending TRIM down too
   the nvme drive. trim_ticks is the number of ticks, at mosot, to wait
   for at least goal_trim BIOS_DELEETE requests to come in.
   
   Trim pacing is useful when a large number off disjoint trims are

   comoing in from the upper layers. Since we have no way to chain
   toogether trims from the upper layers that are sent down, this acts as
   a hueristic to group trims into reasonable sized chunks. What's
   reasonable varies from drive to drive.
   
   Sponsored by: Netflix


Modified:
   head/sys/cam/nvme/nvme_da.c

Modified: head/sys/cam/nvme/nvme_da.c
==
--- head/sys/cam/nvme/nvme_da.c Tue Dec 17 00:10:19 2019(r355830)
+++ head/sys/cam/nvme/nvme_da.c Tue Dec 17 00:11:48 2019(r355831)
@@ -177,6 +177,14 @@ static int nda_max_trim_entries = NDA_MAX_TRIM_ENTRIES
  SYSCTL_INT(_kern_cam_nda, OID_AUTO, max_trim, CTLFLAG_RDTUN,
  _max_trim_entries, NDA_MAX_TRIM_ENTRIES,
  "Maximum number of BIO_DELETE to send down as a DSM TRIM.");
+static int nda_goal_trim_entries = NDA_MAX_TRIM_ENTRIES / 2;
+SYSCTL_INT(_kern_cam_nda, OID_AUTO, goal_trim, CTLFLAG_RDTUN,
+_goal_trim_entries, NDA_MAX_TRIM_ENTRIES / 2,
+"Number of BIO_DELETE to try to accumulate before sending a DSM TRIM.");
+static int nda_trim_ticks = 50;/* 50ms ~ 1000 Hz */
+SYSCTL_INT(_kern_cam_nda, OID_AUTO, trim_ticks, CTLFLAG_RDTUN,
+_trim_ticks, 50,
+"Number of ticks to hold BIO_DELETEs before sending down a trim");
  
  /*

   * All NVMe media is non-rotational, so all nvme device instances
@@ -741,6 +749,9 @@ ndaregister(struct cam_periph *periph, void *arg)
free(softc, M_DEVBUF);
return(CAM_REQ_CMP_ERR);
}
+   /* Statically set these for the moment */
+   cam_iosched_set_trim_goal(softc->cam_iosched, nda_goal_trim_entries);
+   cam_iosched_set_trim_ticks(softc->cam_iosched, nda_trim_ticks);
  
  	/* ident_data parsing */
  


___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r355430 - head/sys/cam/scsi

2019-12-06 Thread Steven Hartland

If the illegal chars where removed or replaced would the result be useful,
if so might that be a better approach?

On Fri, 6 Dec 2019 at 00:06, Alan Somers  wrote:

> Author: asomers
> Date: Fri Dec  6 00:06:05 2019
> New Revision: 355430
> URL: https://svnweb.freebsd.org/changeset/base/355430
>
> Log:
>   ses: sanitize illegal strings in SES element descriptors
>
>   The SES4r3 standard requires that element descriptors may only contain
> ASCII
>   characters in the range 0x20 to 0x7e.  Some SuperMicro expanders violate
>   that rule.  This patch adds a sanity check to ses(4).  Descriptors in
>   violation will be replaced by "".
>
>   This patch fixes "sesutil --libxo xml" on such systems.  Previously it
> would
>   generate non-well-formed XML output.
>
>   PR:   241929
>   Reviewed by:  allanjude
>   MFC after:2 weeks
>   Sponsored by: Axcient
>
> Modified:
>   head/sys/cam/scsi/scsi_enc_ses.c
>
> Modified: head/sys/cam/scsi/scsi_enc_ses.c
>
> ==
> --- head/sys/cam/scsi/scsi_enc_ses.cThu Dec  5 19:39:51 2019
> (r355429)
> +++ head/sys/cam/scsi/scsi_enc_ses.cFri Dec  6 00:06:05 2019
> (r355430)
> @@ -110,7 +110,7 @@ typedef struct ses_addl_status {
>  typedef struct ses_element {
> uint8_t eip;/* eip bit is set */
> uint16_t descr_len; /* length of the descriptor */
> -   char *descr;/* descriptor for this object */
> +   const char *descr;  /* descriptor for this object */
> struct ses_addl_status addl;/* additional status info */
>  } ses_element_t;
>
> @@ -1977,6 +1977,35 @@ ses_publish_cache(enc_softc_t *enc, struct
> enc_fsm_sta
> return (0);
>  }
>
> +/*
> + * \brief Sanitize an element descriptor
> + *
> + * The SES4r3 standard, sections 3.1.2 and 6.1.10, specifies that element
> + * descriptors may only contain ASCII characters in the range 0x20 to
> 0x7e.
> + * But some vendors violate that rule.  Ensure that we only expose
> compliant
> + * descriptors to userland.
> + *
> + * \param desc SES element descriptor as reported by the hardware
> + * \param len  Length of desc in bytes, not necessarily including
> + * trailing NUL.  It will be modified if desc is
> invalid.
> + */
> +static const char*
> +ses_sanitize_elm_desc(const char *desc, uint16_t *len)
> +{
> +   const char *invalid = "";
> +   int i;
> +
> +   for (i = 0; i < *len; i++) {
> +   if (desc[i] < 0x20 || desc[i] > 0x7e) {
> +   *len = strlen(invalid);
> +   return (invalid);
> +   } else if (desc[i] == 0) {
> +   break;
> +   }
> +   }
> +   return (desc);
> +}
> +
>  /**
>   * \brief Parse the descriptors for each object.
>   *
> @@ -2061,7 +2090,8 @@ ses_process_elm_descs(enc_softc_t *enc, struct
> enc_fsm
> if (length > 0) {
> elmpriv = element->elm_private;
> elmpriv->descr_len = length;
> -   elmpriv->descr = [offset];
> +   elmpriv->descr =
> ses_sanitize_elm_desc([offset],
> +   >descr_len);
> }
>
> /* skip over the descriptor itself */
>
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r354283 - in head: stand/libsa/zfs sys/cddl/boot/zfs

2019-11-03 Thread Steven Hartland

Pretty sure we had at least two systems using root with log just fine, so
would be interested to know why this isn’t supported anymore?

On Sun, 3 Nov 2019 at 13:26, Toomas Soome  wrote:

> Author: tsoome
> Date: Sun Nov  3 13:25:47 2019
> New Revision: 354283
> URL: https://svnweb.freebsd.org/changeset/base/354283
>
> Log:
>   loader: we do not support booting from pool with log device
>
>   If pool has log device, stop there and tell about it.
>
> Modified:
>   head/stand/libsa/zfs/zfs.c
>   head/stand/libsa/zfs/zfsimpl.c
>   head/sys/cddl/boot/zfs/zfsimpl.h
>
> Modified: head/stand/libsa/zfs/zfs.c
>
> ==
> --- head/stand/libsa/zfs/zfs.c  Sun Nov  3 13:03:47 2019(r354282)
> +++ head/stand/libsa/zfs/zfs.c  Sun Nov  3 13:25:47 2019(r354283)
> @@ -668,6 +668,11 @@ zfs_dev_open(struct open_file *f, ...)
> spa = spa_find_by_guid(dev->pool_guid);
> if (!spa)
> return (ENXIO);
> +   if (spa->spa_with_log) {
> +   printf("Reading pool %s is not supported due to log
> device.\n",
> +   spa->spa_name);
> +   return (ENXIO);
> +   }
> mount = malloc(sizeof(*mount));
> if (mount == NULL)
> return (ENOMEM);
>
> Modified: head/stand/libsa/zfs/zfsimpl.c
>
> ==
> --- head/stand/libsa/zfs/zfsimpl.c  Sun Nov  3 13:03:47 2019
> (r354282)
> +++ head/stand/libsa/zfs/zfsimpl.c  Sun Nov  3 13:25:47 2019
> (r354283)
> @@ -1109,6 +1109,7 @@ vdev_init_from_nvlist(const unsigned char *nvlist,
> vde
> const unsigned char *kids;
> int nkids, i, is_new;
> uint64_t is_offline, is_faulted, is_degraded, is_removed,
> isnt_present;
> +   uint64_t is_log;
>
> if (nvlist_find(nvlist, ZPOOL_CONFIG_GUID, DATA_TYPE_UINT64,
> NULL, )
> @@ -1132,17 +1133,20 @@ vdev_init_from_nvlist(const unsigned char *nvlist,
> vde
> }
>
> is_offline = is_removed = is_faulted = is_degraded = isnt_present
> = 0;
> +   is_log = 0;
>
> nvlist_find(nvlist, ZPOOL_CONFIG_OFFLINE, DATA_TYPE_UINT64, NULL,
> -   _offline);
> +   _offline);
> nvlist_find(nvlist, ZPOOL_CONFIG_REMOVED, DATA_TYPE_UINT64, NULL,
> -   _removed);
> +   _removed);
> nvlist_find(nvlist, ZPOOL_CONFIG_FAULTED, DATA_TYPE_UINT64, NULL,
> -   _faulted);
> +   _faulted);
> nvlist_find(nvlist, ZPOOL_CONFIG_DEGRADED, DATA_TYPE_UINT64, NULL,
> -   _degraded);
> +   _degraded);
> nvlist_find(nvlist, ZPOOL_CONFIG_NOT_PRESENT, DATA_TYPE_UINT64,
> NULL,
> -   _present);
> +   _present);
> +   nvlist_find(nvlist, ZPOOL_CONFIG_IS_LOG, DATA_TYPE_UINT64, NULL,
> +   _log);
>
> vdev = vdev_find(guid);
> if (!vdev) {
> @@ -1217,6 +1221,7 @@ vdev_init_from_nvlist(const unsigned char *nvlist,
> vde
> return (ENOMEM);
> vdev->v_name = name;
> }
> +   vdev->v_islog = is_log == 1;
> } else {
> is_new = 0;
> }
> @@ -1433,6 +1438,12 @@ vdev_status(vdev_t *vdev, int indent)
>  {
> vdev_t *kid;
> int ret;
> +
> +   if (vdev->v_islog) {
> +   (void)pager_output("logs\n");
> +   indent++;
> +   }
> +
> ret = print_state(indent, vdev->v_name, vdev->v_state);
> if (ret != 0)
> return (ret);
> @@ -1737,6 +1748,12 @@ vdev_probe(vdev_phys_read_t *_read, void
> *read_priv, s
> printf("ZFS: inconsistent nvlist contents\n");
> return (EIO);
> }
> +
> +   /*
> +* We do not support reading pools with log device.
> +*/
> +   if (vdev->v_islog)
> +   spa->spa_with_log = vdev->v_islog;
>
> /*
>  * Re-evaluate top-level vdev state.
>
> Modified: head/sys/cddl/boot/zfs/zfsimpl.h
>
> ==
> --- head/sys/cddl/boot/zfs/zfsimpl.hSun Nov  3 13:03:47 2019
> (r354282)
> +++ head/sys/cddl/boot/zfs/zfsimpl.hSun Nov  3 13:25:47 2019
> (r354283)
> @@ -1670,6 +1670,7 @@ typedef struct vdev {
> vdev_phys_read_t *v_phys_read;  /* read from raw leaf vdev */
> vdev_read_t *v_read;/* read from vdev */
> void*v_read_priv;   /* private data for read function
> */
> +   boolean_t   v_islog;
> struct spa  *spa;   /* link to spa */
> /*
>  * Values stored in the config for an indirect or removing vdev.
> @@ -1694,6 +1695,7 @@ typedef struct spa {
> zio_cksum_salt_t spa_cksum_salt;/* secret salt for cksum */
> void

svn commit: r346594 - head/sbin/camcontrol

2019-09-03 Thread Steven Hartland

Author: smh
Date: Tue Apr 23 07:46:38 2019
New Revision: 346594
URL: https://svnweb.freebsd.org/changeset/base/346594

Log:
  Add ATA power mode support to camcontrol
  
  Add the ability to report ATA device power mode with the cmmand 'powermode'
  to compliment the existing ability to set it using idle, standby and sleep
  commands.
  
  MFC after:2 weeks
  Sponsored by: Multiplay

Modified:
  head/sbin/camcontrol/camcontrol.8
  head/sbin/camcontrol/camcontrol.c

Modified: head/sbin/camcontrol/camcontrol.8
==
--- head/sbin/camcontrol/camcontrol.8   Tue Apr 23 06:36:32 2019
(r346593)
+++ head/sbin/camcontrol/camcontrol.8   Tue Apr 23 07:46:38 2019
(r346594)
@@ -27,7 +27,7 @@
 .\"
 .\" $FreeBSD$
 .\"
-.Dd March 12, 2019
+.Dd April 22, 2019
 .Dt CAMCONTROL 8
 .Os
 .Sh NAME
@@ -243,6 +243,10 @@
 .Op device id
 .Op generic args
 .Nm
+.Ic powermode
+.Op device id
+.Op generic args
+.Nm
 .Ic apm
 .Op device id
 .Op generic args
@@ -1388,6 +1392,8 @@ Value 0 disables timer.
 Put ATA device into SLEEP state.
 Note that the only way get device out of
 this state may be reset.
+.It Ic powermode
+Report ATA device power mode.
 .It Ic apm
 It optional parameter
 .Pq Fl l

Modified: head/sbin/camcontrol/camcontrol.c
==
--- head/sbin/camcontrol/camcontrol.c   Tue Apr 23 06:36:32 2019
(r346593)
+++ head/sbin/camcontrol/camcontrol.c   Tue Apr 23 07:46:38 2019
(r346594)
@@ -109,7 +109,8 @@ typedef enum {
CAM_CMD_ZONE= 0x0026,
CAM_CMD_EPC = 0x0027,
CAM_CMD_TIMESTAMP   = 0x0028,
-   CAM_CMD_MMCSD_CMD   = 0x0029
+   CAM_CMD_MMCSD_CMD   = 0x0029,
+   CAM_CMD_POWER_MODE  = 0x002a,
 } cam_cmdmask;
 
 typedef enum {
@@ -236,6 +237,7 @@ static struct camcontrol_opts option_table[] = {
{"idle", CAM_CMD_IDLE, CAM_ARG_NONE, "t:"},
{"standby", CAM_CMD_STANDBY, CAM_ARG_NONE, "t:"},
{"sleep", CAM_CMD_SLEEP, CAM_ARG_NONE, ""},
+   {"powermode", CAM_CMD_POWER_MODE, CAM_ARG_NONE, ""},
{"apm", CAM_CMD_APM, CAM_ARG_NONE, "l:"},
{"aam", CAM_CMD_AAM, CAM_ARG_NONE, "l:"},
{"fwdownload", CAM_CMD_DOWNLOAD_FW, CAM_ARG_NONE, "f:qsy"},
@@ -8885,6 +8887,61 @@ bailout:
 }
 
 static int
+atapm_proc_resp(struct cam_device *device, union ccb *ccb)
+{
+struct ata_res *res;
+
+res = >ataio.res;
+if (res->status & ATA_STATUS_ERROR) {
+if (arglist & CAM_ARG_VERBOSE) {
+cam_error_print(device, ccb, CAM_ESF_ALL,
+CAM_EPF_ALL, stderr);
+printf("error = 0x%02x, sector_count = 0x%04x, "
+   "device = 0x%02x, status = 0x%02x\n",
+   res->error, res->sector_count,
+   res->device, res->status);
+}
+
+return (1);
+}
+
+if (arglist & CAM_ARG_VERBOSE) {
+fprintf(stdout, "%s%d: Raw native check power data:\n",
+device->device_name, device->dev_unit_num);
+/* res is 4 byte aligned */
+dump_data((uint16_t*)(uintptr_t)res, sizeof(struct ata_res));
+
+printf("error = 0x%02x, sector_count = 0x%04x, device = 0x%02x, "
+   "status = 0x%02x\n", res->error, res->sector_count,
+   res->device, res->status);
+}
+
+printf("%s%d: ", device->device_name, device->dev_unit_num);
+switch (res->sector_count) {
+case 0x00:
+   printf("Standby mode\n");
+   break;
+case 0x40:
+   printf("NV Cache Power Mode and the spindle is spun down or spinning 
down\n");
+   break;
+case 0x41:
+   printf("NV Cache Power Mode and the spindle is spun up or spinning 
up\n");
+   break;
+case 0x80:
+   printf("Idle mode\n");
+   break;
+case 0xff:
+   printf("Active or Idle mode\n");
+   break;
+default:
+   printf("Unknown mode 0x%02x\n", res->sector_count);
+   break;
+}
+
+return (0);
+}
+
+static int
 atapm(struct cam_device *device, int argc, char **argv,
 char *combinedopt, int retry_count, int timeout)
 {
@@ -8892,6 +8949,7 @@ atapm(struct cam_device *device, int argc, char **argv
int retval = 0;
int t = -1;
int c;
+   u_int8_t ata_flags = 0;
u_char cmd, sc;
 
ccb = cam_getccb(device);
@@ -8920,6 +8978,10 @@ atapm(struct cam_device *device, int argc, char **argv
cmd = ATA_STANDBY_IMMEDIATE;
else
cmd = ATA_STANDBY_CMD;
+   } else if (strcmp(argv[1], "powermode") == 0) {
+   cmd = ATA_CHECK_POWER_MODE;
+   ata_flags = AP_FLAG_CHK_COND;
+   t = -1;
} else {
cmd = ATA_SLEEP;
t = -1;
@@ -8937,11 +8999,12 @@ atapm(struct cam_device *device, int argc, char **argv
else

Re: svn commit: r348255 - head/sys/kern

2019-05-24 Thread Steven Hartland

Just wanted to say I really appreciate the details in this commit message.

Its often the case the message get overlooked when it comes to the time
needed to write a truly useful message to others and this a great example
of the quality we should all try to follow.

  Regards
  Steve

On Fri, 24 May 2019 at 23:33, Conrad Meyer  wrote:

> Author: cem
> Date: Fri May 24 22:33:14 2019
> New Revision: 348255
> URL: https://svnweb.freebsd.org/changeset/base/348255
>
> Log:
>   Disable intr_storm_threshold mechanism by default
>
>   The ixl.4 manual page has documented that the threshold falsely detects
>   interrupt storms on 40Gbit NICs as long ago as 2015, and we have seen
>   similar false positives with the ioat(4) DMA device (which can push
> GB/s).
>
>   For example, synthetic load can be generated with tools/tools/ioat
>   'ioatcontrol 0 200 8192 1 1000' (allocate 200x8kB buffers, generate an
>   interrupt for each one, and do this for 1000 milliseconds).  With
>   storm-detection disabled, the Broadwell-EP version of this device is
> capable
>   of generating ~350k real interrupts per second.
>
>   The following historical context comes from jhb@: Originally, the
> threshold
>   worked around incorrect routing of PCI INTx interrupts on single-CPU
> systems
>   which would end up in a hard hang during boot.  Since the threshold was
>   added, our PCI interrupt routing was improved, most PCI interrupts use
>   edge-triggered MSI instead of level-triggered INTx, and typical systems
> have
>   multiple CPUs available to service interrupts.
>
>   On the off chance that the threshold is useful in the future, it remains
>   available as a tunable and sysctl.
>
>   Reviewed by:  jhb
>   Sponsored by: Dell EMC Isilon
>   Differential Revision:https://reviews.freebsd.org/D20401
>
> Modified:
>   head/sys/kern/kern_intr.c
>
> Modified: head/sys/kern/kern_intr.c
>
> ==
> --- head/sys/kern/kern_intr.c   Fri May 24 22:30:40 2019(r348254)
> +++ head/sys/kern/kern_intr.c   Fri May 24 22:33:14 2019(r348255)
> @@ -91,7 +91,7 @@ struct proc *intrproc;
>
>  static MALLOC_DEFINE(M_ITHREAD, "ithread", "Interrupt Threads");
>
> -static int intr_storm_threshold = 1000;
> +static int intr_storm_threshold = 0;
>  SYSCTL_INT(_hw, OID_AUTO, intr_storm_threshold, CTLFLAG_RWTUN,
>  _storm_threshold, 0,
>  "Number of consecutive interrupts before storm protection is
> enabled");
>
>
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

svn commit: r347384 - stable/12/sbin/camcontrol

2019-05-09 Thread Steven Hartland

Author: smh
Date: Thu May  9 08:35:50 2019
New Revision: 347384
URL: https://svnweb.freebsd.org/changeset/base/347384

Log:
  MFC r346594: Add ATA power mode support to camcontrol
  
  Sponsored by: Multiplay

Modified:
  stable/12/sbin/camcontrol/camcontrol.8
  stable/12/sbin/camcontrol/camcontrol.c
Directory Properties:
  stable/12/   (props changed)

Modified: stable/12/sbin/camcontrol/camcontrol.8
==
--- stable/12/sbin/camcontrol/camcontrol.8  Thu May  9 07:57:33 2019
(r347383)
+++ stable/12/sbin/camcontrol/camcontrol.8  Thu May  9 08:35:50 2019
(r347384)
@@ -27,7 +27,7 @@
 .\"
 .\" $FreeBSD$
 .\"
-.Dd May 3, 2017
+.Dd May 9, 2019
 .Dt CAMCONTROL 8
 .Os
 .Sh NAME
@@ -242,6 +242,10 @@
 .Op device id
 .Op generic args
 .Nm
+.Ic powermode
+.Op device id
+.Op generic args
+.Nm
 .Ic apm
 .Op device id
 .Op generic args
@@ -1382,6 +1386,8 @@ Value 0 disables timer.
 Put ATA device into SLEEP state.
 Note that the only way get device out of
 this state may be reset.
+.It Ic powermode
+Report ATA device power mode.
 .It Ic apm
 It optional parameter
 .Pq Fl l

Modified: stable/12/sbin/camcontrol/camcontrol.c
==
--- stable/12/sbin/camcontrol/camcontrol.c  Thu May  9 07:57:33 2019
(r347383)
+++ stable/12/sbin/camcontrol/camcontrol.c  Thu May  9 08:35:50 2019
(r347384)
@@ -109,7 +109,8 @@ typedef enum {
CAM_CMD_ZONE= 0x0026,
CAM_CMD_EPC = 0x0027,
CAM_CMD_TIMESTAMP   = 0x0028,
-   CAM_CMD_MMCSD_CMD   = 0x0029
+   CAM_CMD_MMCSD_CMD   = 0x0029,
+   CAM_CMD_POWER_MODE  = 0x002a,
 } cam_cmdmask;
 
 typedef enum {
@@ -236,6 +237,7 @@ static struct camcontrol_opts option_table[] = {
{"idle", CAM_CMD_IDLE, CAM_ARG_NONE, "t:"},
{"standby", CAM_CMD_STANDBY, CAM_ARG_NONE, "t:"},
{"sleep", CAM_CMD_SLEEP, CAM_ARG_NONE, ""},
+   {"powermode", CAM_CMD_POWER_MODE, CAM_ARG_NONE, ""},
{"apm", CAM_CMD_APM, CAM_ARG_NONE, "l:"},
{"aam", CAM_CMD_AAM, CAM_ARG_NONE, "l:"},
{"fwdownload", CAM_CMD_DOWNLOAD_FW, CAM_ARG_NONE, "f:qsy"},
@@ -8876,6 +8878,61 @@ bailout:
 }
 
 static int
+atapm_proc_resp(struct cam_device *device, union ccb *ccb)
+{
+struct ata_res *res;
+
+res = >ataio.res;
+if (res->status & ATA_STATUS_ERROR) {
+if (arglist & CAM_ARG_VERBOSE) {
+cam_error_print(device, ccb, CAM_ESF_ALL,
+CAM_EPF_ALL, stderr);
+printf("error = 0x%02x, sector_count = 0x%04x, "
+   "device = 0x%02x, status = 0x%02x\n",
+   res->error, res->sector_count,
+   res->device, res->status);
+}
+
+return (1);
+}
+
+if (arglist & CAM_ARG_VERBOSE) {
+fprintf(stdout, "%s%d: Raw native check power data:\n",
+device->device_name, device->dev_unit_num);
+/* res is 4 byte aligned */
+dump_data((uint16_t*)(uintptr_t)res, sizeof(struct ata_res));
+
+printf("error = 0x%02x, sector_count = 0x%04x, device = 0x%02x, "
+   "status = 0x%02x\n", res->error, res->sector_count,
+   res->device, res->status);
+}
+
+printf("%s%d: ", device->device_name, device->dev_unit_num);
+switch (res->sector_count) {
+case 0x00:
+   printf("Standby mode\n");
+   break;
+case 0x40:
+   printf("NV Cache Power Mode and the spindle is spun down or spinning 
down\n");
+   break;
+case 0x41:
+   printf("NV Cache Power Mode and the spindle is spun up or spinning 
up\n");
+   break;
+case 0x80:
+   printf("Idle mode\n");
+   break;
+case 0xff:
+   printf("Active or Idle mode\n");
+   break;
+default:
+   printf("Unknown mode 0x%02x\n", res->sector_count);
+   break;
+}
+
+return (0);
+}
+
+static int
 atapm(struct cam_device *device, int argc, char **argv,
 char *combinedopt, int retry_count, int timeout)
 {
@@ -8883,6 +8940,7 @@ atapm(struct cam_device *device, int argc, char **argv
int retval = 0;
int t = -1;
int c;
+   u_int8_t ata_flags = 0;
u_char cmd, sc;
 
ccb = cam_getccb(device);
@@ -8911,6 +8969,10 @@ atapm(struct cam_device *device, int argc, char **argv
cmd = ATA_STANDBY_IMMEDIATE;
else
cmd = ATA_STANDBY_CMD;
+   } else if (strcmp(argv[1], "powermode") == 0) {
+   cmd = ATA_CHECK_POWER_MODE;
+   ata_flags = AP_FLAG_CHK_COND;
+   t = -1;
} else {
cmd = ATA_SLEEP;
t = -1;
@@ -8928,11 +8990,12 @@ atapm(struct cam_device *device, int argc, char **argv
else
sc = 253;
 
-   retval = ata_do_28bit_cmd(device,
+   retval

svn commit: r346594 - head/sbin/camcontrol

2019-04-23 Thread Steven Hartland

Author: smh
Date: Tue Apr 23 07:46:38 2019
New Revision: 346594
URL: https://svnweb.freebsd.org/changeset/base/346594

Log:
  Add ATA power mode support to camcontrol
  
  Add the ability to report ATA device power mode with the cmmand 'powermode'
  to compliment the existing ability to set it using idle, standby and sleep
  commands.
  
  MFC after:2 weeks
  Sponsored by: Multiplay

Modified:
  head/sbin/camcontrol/camcontrol.8
  head/sbin/camcontrol/camcontrol.c

Modified: head/sbin/camcontrol/camcontrol.8
==
--- head/sbin/camcontrol/camcontrol.8   Tue Apr 23 06:36:32 2019
(r346593)
+++ head/sbin/camcontrol/camcontrol.8   Tue Apr 23 07:46:38 2019
(r346594)
@@ -27,7 +27,7 @@
 .\"
 .\" $FreeBSD$
 .\"
-.Dd March 12, 2019
+.Dd April 22, 2019
 .Dt CAMCONTROL 8
 .Os
 .Sh NAME
@@ -243,6 +243,10 @@
 .Op device id
 .Op generic args
 .Nm
+.Ic powermode
+.Op device id
+.Op generic args
+.Nm
 .Ic apm
 .Op device id
 .Op generic args
@@ -1388,6 +1392,8 @@ Value 0 disables timer.
 Put ATA device into SLEEP state.
 Note that the only way get device out of
 this state may be reset.
+.It Ic powermode
+Report ATA device power mode.
 .It Ic apm
 It optional parameter
 .Pq Fl l

Modified: head/sbin/camcontrol/camcontrol.c
==
--- head/sbin/camcontrol/camcontrol.c   Tue Apr 23 06:36:32 2019
(r346593)
+++ head/sbin/camcontrol/camcontrol.c   Tue Apr 23 07:46:38 2019
(r346594)
@@ -109,7 +109,8 @@ typedef enum {
CAM_CMD_ZONE= 0x0026,
CAM_CMD_EPC = 0x0027,
CAM_CMD_TIMESTAMP   = 0x0028,
-   CAM_CMD_MMCSD_CMD   = 0x0029
+   CAM_CMD_MMCSD_CMD   = 0x0029,
+   CAM_CMD_POWER_MODE  = 0x002a,
 } cam_cmdmask;
 
 typedef enum {
@@ -236,6 +237,7 @@ static struct camcontrol_opts option_table[] = {
{"idle", CAM_CMD_IDLE, CAM_ARG_NONE, "t:"},
{"standby", CAM_CMD_STANDBY, CAM_ARG_NONE, "t:"},
{"sleep", CAM_CMD_SLEEP, CAM_ARG_NONE, ""},
+   {"powermode", CAM_CMD_POWER_MODE, CAM_ARG_NONE, ""},
{"apm", CAM_CMD_APM, CAM_ARG_NONE, "l:"},
{"aam", CAM_CMD_AAM, CAM_ARG_NONE, "l:"},
{"fwdownload", CAM_CMD_DOWNLOAD_FW, CAM_ARG_NONE, "f:qsy"},
@@ -8885,6 +8887,61 @@ bailout:
 }
 
 static int
+atapm_proc_resp(struct cam_device *device, union ccb *ccb)
+{
+struct ata_res *res;
+
+res = >ataio.res;
+if (res->status & ATA_STATUS_ERROR) {
+if (arglist & CAM_ARG_VERBOSE) {
+cam_error_print(device, ccb, CAM_ESF_ALL,
+CAM_EPF_ALL, stderr);
+printf("error = 0x%02x, sector_count = 0x%04x, "
+   "device = 0x%02x, status = 0x%02x\n",
+   res->error, res->sector_count,
+   res->device, res->status);
+}
+
+return (1);
+}
+
+if (arglist & CAM_ARG_VERBOSE) {
+fprintf(stdout, "%s%d: Raw native check power data:\n",
+device->device_name, device->dev_unit_num);
+/* res is 4 byte aligned */
+dump_data((uint16_t*)(uintptr_t)res, sizeof(struct ata_res));
+
+printf("error = 0x%02x, sector_count = 0x%04x, device = 0x%02x, "
+   "status = 0x%02x\n", res->error, res->sector_count,
+   res->device, res->status);
+}
+
+printf("%s%d: ", device->device_name, device->dev_unit_num);
+switch (res->sector_count) {
+case 0x00:
+   printf("Standby mode\n");
+   break;
+case 0x40:
+   printf("NV Cache Power Mode and the spindle is spun down or spinning 
down\n");
+   break;
+case 0x41:
+   printf("NV Cache Power Mode and the spindle is spun up or spinning 
up\n");
+   break;
+case 0x80:
+   printf("Idle mode\n");
+   break;
+case 0xff:
+   printf("Active or Idle mode\n");
+   break;
+default:
+   printf("Unknown mode 0x%02x\n", res->sector_count);
+   break;
+}
+
+return (0);
+}
+
+static int
 atapm(struct cam_device *device, int argc, char **argv,
 char *combinedopt, int retry_count, int timeout)
 {
@@ -8892,6 +8949,7 @@ atapm(struct cam_device *device, int argc, char **argv
int retval = 0;
int t = -1;
int c;
+   u_int8_t ata_flags = 0;
u_char cmd, sc;
 
ccb = cam_getccb(device);
@@ -8920,6 +8978,10 @@ atapm(struct cam_device *device, int argc, char **argv
cmd = ATA_STANDBY_IMMEDIATE;
else
cmd = ATA_STANDBY_CMD;
+   } else if (strcmp(argv[1], "powermode") == 0) {
+   cmd = ATA_CHECK_POWER_MODE;
+   ata_flags = AP_FLAG_CHK_COND;
+   t = -1;
} else {
cmd = ATA_SLEEP;
t = -1;
@@ -8937,11 +8999,12 @@ atapm(struct cam_device *device, int argc, char **argv
else

svn commit: r345129 - stable/12/stand/libsa/zfs

2019-03-14 Thread Steven Hartland

Author: smh
Date: Thu Mar 14 10:06:46 2019
New Revision: 345129
URL: https://svnweb.freebsd.org/changeset/base/345129

Log:
  Revert zfsimpl.c accidentally committed in r345128
  
  Revert an unrelated change to zfsimpl.c accidentally committed in r345128.
  
  Sponsored by: Multiplay

Modified:
  stable/12/stand/libsa/zfs/zfsimpl.c

Modified: stable/12/stand/libsa/zfs/zfsimpl.c
==
--- stable/12/stand/libsa/zfs/zfsimpl.c Thu Mar 14 10:03:04 2019
(r345128)
+++ stable/12/stand/libsa/zfs/zfsimpl.c Thu Mar 14 10:06:46 2019
(r345129)
@@ -2076,7 +2076,6 @@ zfs_mount_dataset(const spa_t *spa, uint64_t objnum, o
 {
dnode_phys_t dataset;
dsl_dataset_phys_t *ds;
-   int err;
 
if (objset_get_dnode(spa, >spa_mos, objnum, )) {
printf("ZFS: can't find dataset %ju\n", (uintmax_t)objnum);
@@ -2084,9 +2083,9 @@ zfs_mount_dataset(const spa_t *spa, uint64_t objnum, o
}
 
ds = (dsl_dataset_phys_t *) _bonus;
-   if ((err = zio_read(spa, >ds_bp, objset)) != 0) {
-   printf("ZFS: can't read object set for dataset %ju (error 
%d)\n",
-   (uintmax_t)objnum, err);
+   if (zio_read(spa, >ds_bp, objset)) {
+   printf("ZFS: can't read object set for dataset %ju\n",
+   (uintmax_t)objnum);
return (EIO);
}
 
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

svn commit: r345128 - in stable/12: sbin/camcontrol stand/libsa/zfs

2019-03-14 Thread Steven Hartland

Author: smh
Date: Thu Mar 14 10:03:04 2019
New Revision: 345128
URL: https://svnweb.freebsd.org/changeset/base/345128

Log:
  MFC r344701: Fix incorrect / unused sector_count for identify requests
  
  Fix unused sector_count for identify requests from camcontrol by changing
  to zero which is a more appropriate value when the parameter is unused.
  
  Sponsored by: Multiplay

Modified:
  stable/12/sbin/camcontrol/camcontrol.c
  stable/12/stand/libsa/zfs/zfsimpl.c
Directory Properties:
  stable/12/   (props changed)

Modified: stable/12/sbin/camcontrol/camcontrol.c
==
--- stable/12/sbin/camcontrol/camcontrol.c  Thu Mar 14 09:18:54 2019
(r345127)
+++ stable/12/sbin/camcontrol/camcontrol.c  Thu Mar 14 10:03:04 2019
(r345128)
@@ -2292,7 +2292,7 @@ ata_do_identify(struct cam_device *device, int retry_c
 /*command*/command,
 /*features*/0,
 /*lba*/0,
-/*sector_count*/(u_int8_t)sizeof(struct 
ata_params),
+/*sector_count*/0,
 /*data_ptr*/(u_int8_t *)ptr,
 /*dxfer_len*/sizeof(struct ata_params),
 /*timeout*/timeout ? timeout : 30 * 1000,
@@ -2312,8 +2312,7 @@ ata_do_identify(struct cam_device *device, int retry_c
 /*command*/retry_command,
 /*features*/0,
 /*lba*/0,
-/*sector_count*/(u_int8_t)
-sizeof(struct ata_params),
+/*sector_count*/0,
 /*data_ptr*/(u_int8_t *)ptr,
 /*dxfer_len*/sizeof(struct ata_params),
 /*timeout*/timeout ? timeout : 30 * 
1000,

Modified: stable/12/stand/libsa/zfs/zfsimpl.c
==
--- stable/12/stand/libsa/zfs/zfsimpl.c Thu Mar 14 09:18:54 2019
(r345127)
+++ stable/12/stand/libsa/zfs/zfsimpl.c Thu Mar 14 10:03:04 2019
(r345128)
@@ -2076,6 +2076,7 @@ zfs_mount_dataset(const spa_t *spa, uint64_t objnum, o
 {
dnode_phys_t dataset;
dsl_dataset_phys_t *ds;
+   int err;
 
if (objset_get_dnode(spa, >spa_mos, objnum, )) {
printf("ZFS: can't find dataset %ju\n", (uintmax_t)objnum);
@@ -2083,9 +2084,9 @@ zfs_mount_dataset(const spa_t *spa, uint64_t objnum, o
}
 
ds = (dsl_dataset_phys_t *) _bonus;
-   if (zio_read(spa, >ds_bp, objset)) {
-   printf("ZFS: can't read object set for dataset %ju\n",
-   (uintmax_t)objnum);
+   if ((err = zio_read(spa, >ds_bp, objset)) != 0) {
+   printf("ZFS: can't read object set for dataset %ju (error 
%d)\n",
+   (uintmax_t)objnum, err);
return (EIO);
}
 
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r344701 - head/sbin/camcontrol

2019-03-03 Thread Steven Hartland


Not really much more to say that isn't explained by that and the code.

Sure I could have used a different sentence structure for the body but 
it wouldn't add anything IMO, thoughts?


On 02/03/2019 10:49, Alexey Dokuchaev wrote:

On Fri, Mar 01, 2019 at 02:39:15PM +, Steven Hartland wrote:

New Revision: 344701
URL: https://svnweb.freebsd.org/changeset/base/344701

Log:
   Fix incorrect / unused sector_count for identify requests
   
   Fix incorrect / unused sector_count for identify requests from camcontrol.
   
   Submitted by:	Alexey Dokuchaev

Thanks, although commit message is a bit scarce.  Also, for some reason,
it consists of two nearly identical lines -- unnoticed copy paste error?

./danfe


___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

svn commit: r344701 - head/sbin/camcontrol

2019-03-01 Thread Steven Hartland

Author: smh
Date: Fri Mar  1 14:39:15 2019
New Revision: 344701
URL: https://svnweb.freebsd.org/changeset/base/344701

Log:
  Fix incorrect / unused sector_count for identify requests
  
  Fix incorrect / unused sector_count for identify requests from camcontrol.
  
  Submitted by: Alexey Dokuchaev
  Reported by:  Alexey Dokuchaev
  MFC after:1 week
  Sponsored by: Multiplay
  Differential Revision:https://reviews.freebsd.org/D19408

Modified:
  head/sbin/camcontrol/camcontrol.c

Modified: head/sbin/camcontrol/camcontrol.c
==
--- head/sbin/camcontrol/camcontrol.c   Fri Mar  1 14:33:20 2019
(r344700)
+++ head/sbin/camcontrol/camcontrol.c   Fri Mar  1 14:39:15 2019
(r344701)
@@ -2292,7 +2292,7 @@ ata_do_identify(struct cam_device *device, int retry_c
 /*command*/command,
 /*features*/0,
 /*lba*/0,
-/*sector_count*/(u_int8_t)sizeof(struct 
ata_params),
+/*sector_count*/0,
 /*data_ptr*/(u_int8_t *)ptr,
 /*dxfer_len*/sizeof(struct ata_params),
 /*timeout*/timeout ? timeout : 30 * 1000,
@@ -2312,8 +2312,7 @@ ata_do_identify(struct cam_device *device, int retry_c
 /*command*/retry_command,
 /*features*/0,
 /*lba*/0,
-/*sector_count*/(u_int8_t)
-sizeof(struct ata_params),
+/*sector_count*/0,
 /*data_ptr*/(u_int8_t *)ptr,
 /*dxfer_len*/sizeof(struct ata_params),
 /*timeout*/timeout ? timeout : 30 * 
1000,
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r343745 - head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs

2019-02-04 Thread Steven Hartland





On 04/02/2019 16:13, Alexander Motin wrote:

Author: mav
Date: Mon Feb  4 16:13:41 2019
New Revision: 343745
URL: https://svnweb.freebsd.org/changeset/base/343745

Log:
   Add missed tunables/sysctls for some new vdev variables.
   
   While there, make few existing sysctls writeable, since there is no reason

   not to.
   
   MFC after:	1 week


Modified:
   head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c

Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c
==
--- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c  Mon Feb  4 
16:02:03 2019(r343744)
+++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c  Mon Feb  4 
16:13:41 2019(r343745)
@@ -165,29 +165,38 @@ static vdev_ops_t *vdev_ops_table[] = {
  
  /* target number of metaslabs per top-level vdev */

  int vdev_max_ms_count = 200;
-SYSCTL_INT(_vfs_zfs_vdev, OID_AUTO, max_ms_count, CTLFLAG_RDTUN,
+SYSCTL_INT(_vfs_zfs_vdev, OID_AUTO, max_ms_count, CTLFLAG_RWTUN,
  _max_ms_count, 0,
-"Maximum number of metaslabs per top-level vdev");
+"Target number of metaslabs per top-level vdev");
  
  /* minimum number of metaslabs per top-level vdev */

  int vdev_min_ms_count = 16;
-SYSCTL_INT(_vfs_zfs_vdev, OID_AUTO, min_ms_count, CTLFLAG_RDTUN,
+SYSCTL_INT(_vfs_zfs_vdev, OID_AUTO, min_ms_count, CTLFLAG_RWTUN,
  _min_ms_count, 0,
  "Minimum number of metaslabs per top-level vdev");
  
  /* practical upper limit of total metaslabs per top-level vdev */

  int vdev_ms_count_limit = 1ULL << 17;
+SYSCTL_INT(_vfs_zfs_vdev, OID_AUTO, max_ms_count_limit, CTLFLAG_RWTUN,
+_ms_count_limit, 0,
+"Maximum number of metaslabs per top-level vdev");
  
  /* lower limit for metaslab size (512M) */

  int vdev_default_ms_shift = 29;
-SYSCTL_INT(_vfs_zfs_vdev, OID_AUTO, default_ms_shift, CTLFLAG_RDTUN,
+SYSCTL_INT(_vfs_zfs_vdev, OID_AUTO, default_ms_shift, CTLFLAG_RWTUN,
  _default_ms_shift, 0,
-"Shift between vdev size and number of metaslabs");
+"Default shift between vdev size and number of metaslabs");
  
  /* upper limit for metaslab size (256G) */

  int vdev_max_ms_shift = 38;
+SYSCTL_INT(_vfs_zfs_vdev, OID_AUTO, max_ms_shift, CTLFLAG_RWTUN,
+_max_ms_shift, 0,
+"Maximal shift between vdev size and number of metaslabs");
It's a just a nit but I believe this should Maximum, like the others, 
instead of Maximal.
  
  boolean_t vdev_validate_skip = B_FALSE;

+SYSCTL_INT(_vfs_zfs_vdev, OID_AUTO, validate_skip, CTLFLAG_RWTUN,
+_validate_skip, 0,
+"Bypass vdev validation");
  
  /*

   * Since the DTL space map of a vdev is not expected to have a lot of



___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn: head/usr.bin: . trim

2018-11-30 Thread Steven Hartland


On 30/11/2018 22:09, Eugene Grosbein wrote:

01.12.2018 4:29, Steven Hartland wrote:


On 30/11/2018 21:16, Eugene Grosbein wrote:

30.11.2018 21:23, Warner Losh wrote:


So I'm back to my point: we should just put it into dd and move on with our 
lives. It's really the right place for it.

Why can't we have two implementations? Diversity is good thing.

I can imagine erasing a partition with ZFS Cache or ZIL inside and
"trim /dev/da0p2 /dev/da0p3" looks much better :-)

ZFS already does that no need for a separate tool

Think of media taken out of (possibly already dead) ZFS-based to UFS-only 
system.

By the way, how exactly do you trim previously ZIL partition withing working 
ZFS-based system?

You could use camcontrol which can perform a secure erase on the device, 
but that's obviously device wide not a specific partition.


What I was referring to is ZFS performs a delete of blocks when it 
initializes a volume, so there's usually no need to perform a manual 
step there.


For reference this behavior can be disabled by setting 
vfs.zfs.vdev.trim_on_init=0


    Regards
    Steve
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn: head/usr.bin: . trim

2018-11-30 Thread Steven Hartland


ZFS already does that no need for a separate tool

On 30/11/2018 21:16, Eugene Grosbein wrote:

30.11.2018 21:23, Warner Losh wrote:


So I'm back to my point: we should just put it into dd and move on with our 
lives. It's really the right place for it.

Why can't we have two implementations? Diversity is good thing.

I can imagine erasing a partition with ZFS Cache or ZIL inside and
"trim /dev/da0p2 /dev/da0p3" looks much better :-)




___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn: head/usr.bin: . trim

2018-11-30 Thread Steven Hartland

Personally I disagree, chances of people finding that option in dd is 
slim, a dedicated trim utility makes much more sense to me. Sure have 
both that's cool but keep the trim would be my vote.

On 30/11/2018 01:17, Cy Schubert wrote:

Agreed.

---
Sent using a tiny phone keyboard.
Apologies for any typos and autocorrect.
Also, this old phone only supports top post. Apologies.

Cy Schubert
 or 
The need of the many outweighs the greed of the few.
---

From: Alexey Dokuchaev
Sent: 29/11/2018 17:01
To: Maxim Sobolev
Cc: eu...@freebsd.org; svn-src-h...@freebsd.org; 
svn-src-all@freebsd.org; src-committers

Subject: Re: svn: head/usr.bin: . trim

On Thu, Nov 29, 2018 at 10:36:02AM -0800, Maxim Sobolev wrote:
> Interesting. I have a similar functionality implemented as an option for
> the dd utility in my pipeline (conv=erase).

Which probably makes a better place rather than adding 4-letter program,
commonly named ("trim" is a simple word), into global namespace. :-/

./danfe

___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

svn commit: r339035 - stable/11/sys/netinet

2018-10-01 Thread Steven Hartland

Author: smh
Date: Mon Oct  1 07:49:16 2018
New Revision: 339035
URL: https://svnweb.freebsd.org/changeset/base/339035

Log:
  MFC r336165:
  
  Removed pointless NULL check in rip_pcblist.
  
  Sponsored by: Multiplay

Modified:
  stable/11/sys/netinet/raw_ip.c
Directory Properties:
  stable/11/   (props changed)

Modified: stable/11/sys/netinet/raw_ip.c
==
--- stable/11/sys/netinet/raw_ip.c  Mon Oct  1 04:08:47 2018
(r339034)
+++ stable/11/sys/netinet/raw_ip.c  Mon Oct  1 07:49:16 2018
(r339035)
@@ -1053,8 +1053,6 @@ rip_pcblist(SYSCTL_HANDLER_ARGS)
return (error);
 
inp_list = malloc(n * sizeof *inp_list, M_TEMP, M_WAITOK);
-   if (inp_list == NULL)
-   return (ENOMEM);
 
INP_INFO_RLOCK(_ripcbinfo);
for (inp = LIST_FIRST(V_ripcbinfo.ipi_listhead), i = 0; inp && i < n;
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

svn commit: r336165 - head/sys/netinet

2018-07-10 Thread Steven Hartland

Author: smh
Date: Tue Jul 10 08:05:32 2018
New Revision: 336165
URL: https://svnweb.freebsd.org/changeset/base/336165

Log:
  Removed pointless NULL check
  
  Removed pointless NULL check after malloc with M_WAITOK which can never
  return NULL.
  
  Sponsored by: Multiplay

Modified:
  head/sys/netinet/raw_ip.c

Modified: head/sys/netinet/raw_ip.c
==
--- head/sys/netinet/raw_ip.c   Tue Jul 10 07:29:51 2018(r336164)
+++ head/sys/netinet/raw_ip.c   Tue Jul 10 08:05:32 2018(r336165)
@@ -1069,8 +1069,6 @@ rip_pcblist(SYSCTL_HANDLER_ARGS)
return (error);
 
inp_list = malloc(n * sizeof *inp_list, M_TEMP, M_WAITOK);
-   if (inp_list == NULL)
-   return (ENOMEM);
 
INP_INFO_RLOCK_ET(_ripcbinfo, et);
for (inp = CK_LIST_FIRST(V_ripcbinfo.ipi_listhead), i = 0; inp && i < n;
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r335856 - in head/sys: netinet sys

2018-07-10 Thread Steven Hartland

Sorry guys I didn't spot it was just a revert as it was tagged on to the 
end of the description, I would have expected that to be in the subject.


What do others think, is there an recommend style for revert commit 
messages?


    Regards
    Steve

On 02/07/2018 17:30, Rodney W. Grimes wrote:

[ Charset UTF-8 unsupported, converting... ]

On Mon, Jul 2, 2018 at 10:44 AM Steven Hartland <
steven.hartl...@multiplay.co.uk> wrote:

You have M_WAITOK and a null check in this change

And, that's the same as the way it was before his commits. So, he did
exactly what he said he was doing and reverted his commits. I don't think
it is good practice to mix reverts with other changes.

It is a very bad practive to mix a revert with anything.


Since you've noticed this, I think you should feel free to make the change.
Jonathan


___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r335856 - in head/sys: netinet sys

2018-07-02 Thread Steven Hartland

You have M_WAITOK and a null check in this change

On Mon, 2 Jul 2018 at 06:20, Matt Macy  wrote:

> Author: mmacy
> Date: Mon Jul  2 05:19:44 2018
> New Revision: 335856
> URL: https://svnweb.freebsd.org/changeset/base/335856
>
> Log:
>   inpcb: don't gratuitously defer frees
>
>   Don't defer frees in sysctl handlers. It isn't necessary
>   and it just confuses things.
>   revert: r333911, r334104, and r334125
>
>   Requested by: jtl
>
> Modified:
>   head/sys/netinet/ip_divert.c
>   head/sys/netinet/raw_ip.c
>   head/sys/netinet/tcp_subr.c
>   head/sys/netinet/udp_usrreq.c
>   head/sys/sys/malloc.h
>
> Modified: head/sys/netinet/ip_divert.c
>
> ==
> --- head/sys/netinet/ip_divert.cMon Jul  2 01:30:33 2018
> (r335855)
> +++ head/sys/netinet/ip_divert.cMon Jul  2 05:19:44 2018
> (r335856)
> @@ -552,7 +552,6 @@ div_detach(struct socket *so)
> KASSERT(inp != NULL, ("div_detach: inp == NULL"));
> INP_INFO_WLOCK(_divcbinfo);
> INP_WLOCK(inp);
> -   /* XXX defer destruction to epoch_call */
> in_pcbdetach(inp);
> in_pcbfree(inp);
> INP_INFO_WUNLOCK(_divcbinfo);
> @@ -632,7 +631,6 @@ static int
>  div_pcblist(SYSCTL_HANDLER_ARGS)
>  {
> int error, i, n;
> -   struct in_pcblist *il;
> struct inpcb *inp, **inp_list;
> inp_gen_t gencnt;
> struct xinpgen xig;
> @@ -672,8 +670,9 @@ div_pcblist(SYSCTL_HANDLER_ARGS)
> if (error)
> return error;
>
> -   il = malloc(sizeof(struct in_pcblist) + n * sizeof(struct inpcb
> *), M_TEMP, M_WAITOK|M_ZERO_INVARIANTS);
> -   inp_list = il->il_inp_list;
> +   inp_list = malloc(n * sizeof *inp_list, M_TEMP, M_WAITOK);
> +   if (inp_list == NULL)
> +   return ENOMEM;
>
> INP_INFO_RLOCK(_divcbinfo);
> for (inp = CK_LIST_FIRST(V_divcbinfo.ipi_listhead), i = 0; inp &&
> i < n;
> @@ -702,9 +701,14 @@ div_pcblist(SYSCTL_HANDLER_ARGS)
> } else
> INP_RUNLOCK(inp);
> }
> -   il->il_count = n;
> -   il->il_pcbinfo = _divcbinfo;
> -   epoch_call(net_epoch_preempt, >il_epoch_ctx,
> in_pcblist_rele_rlocked);
> +   INP_INFO_WLOCK(_divcbinfo);
> +   for (i = 0; i < n; i++) {
> +   inp = inp_list[i];
> +   INP_RLOCK(inp);
> +   if (!in_pcbrele_rlocked(inp))
> +   INP_RUNLOCK(inp);
> +   }
> +   INP_INFO_WUNLOCK(_divcbinfo);
>
> if (!error) {
> /*
> @@ -721,6 +725,7 @@ div_pcblist(SYSCTL_HANDLER_ARGS)
> INP_INFO_RUNLOCK(_divcbinfo);
> error = SYSCTL_OUT(req, , sizeof xig);
> }
> +   free(inp_list, M_TEMP);
> return error;
>  }
>
> @@ -800,7 +805,6 @@ div_modevent(module_t mod, int type, void *unused)
> break;
> }
> ip_divert_ptr = NULL;
> -   /* XXX defer to epoch_call ? */
> err = pf_proto_unregister(PF_INET, IPPROTO_DIVERT,
> SOCK_RAW);
> INP_INFO_WUNLOCK(_divcbinfo);
>  #ifndef VIMAGE
>
> Modified: head/sys/netinet/raw_ip.c
>
> ==
> --- head/sys/netinet/raw_ip.c   Mon Jul  2 01:30:33 2018(r335855)
> +++ head/sys/netinet/raw_ip.c   Mon Jul  2 05:19:44 2018(r335856)
> @@ -863,7 +863,6 @@ rip_detach(struct socket *so)
> ip_rsvp_force_done(so);
> if (so == V_ip_rsvpd)
> ip_rsvp_done();
> -   /* XXX defer to epoch_call */
> in_pcbdetach(inp);
> in_pcbfree(inp);
> INP_INFO_WUNLOCK(_ripcbinfo);
> @@ -1033,7 +1032,6 @@ static int
>  rip_pcblist(SYSCTL_HANDLER_ARGS)
>  {
> int error, i, n;
> -   struct in_pcblist *il;
> struct inpcb *inp, **inp_list;
> inp_gen_t gencnt;
> struct xinpgen xig;
> @@ -1068,8 +1066,9 @@ rip_pcblist(SYSCTL_HANDLER_ARGS)
> if (error)
> return (error);
>
> -   il = malloc(sizeof(struct in_pcblist) + n * sizeof(struct inpcb
> *), M_TEMP, M_WAITOK|M_ZERO_INVARIANTS);
> -   inp_list = il->il_inp_list;
> +   inp_list = malloc(n * sizeof *inp_list, M_TEMP, M_WAITOK);
> +   if (inp_list == NULL)
> +   return (ENOMEM);
>
> INP_INFO_RLOCK(_ripcbinfo);
> for (inp = CK_LIST_FIRST(V_ripcbinfo.ipi_listhead), i = 0; inp &&
> i < n;
> @@ -1098,9 +1097,14 @@ rip_pcblist(SYSCTL_HANDLER_ARGS)
> } else
> INP_RUNLOCK(inp);
> }
> -   il->il_count = n;
> -   il->il_pcbinfo = _ripcbinfo;
> -   epoch_call(net_epoch_preempt, >il_epoch_ctx,
> in_pcblist_rele_rlocked);
> +   INP_INFO_WLOCK(_ripcbinfo);
> +   for (i = 0; i < n; i++) {
> +   inp = inp_list[i];
> +   INP_RLOCK(inp);
> +   if

Re: svn commit: r335171 - head/sys/vm

2018-06-15 Thread Steven Hartland


On 15/06/2018 00:07, Alan Cox wrote:


On Jun 14, 2018, at 5:54 PM, Steven Hartland 
<mailto:steven.hartl...@multiplay.co.uk>> wrote:


Out of interest, how would this exhibit itself?



A panic in vm_page_insert_after().

So just to confirm this couldn't cause random memory corruption of the 
parent process?


    Regards
    Steve
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r335171 - head/sys/vm

2018-06-14 Thread Steven Hartland


Out of interest, how would this exhibit itself?

On 14/06/2018 20:41, Konstantin Belousov wrote:

Author: kib
Date: Thu Jun 14 19:41:02 2018
New Revision: 335171
URL: https://svnweb.freebsd.org/changeset/base/335171

Log:
   Handle the race between fork/vm_object_split() and faults.
   
   If fault started before vmspace_fork() locked the map, and then during

   fork, vm_map_copy_entry()->vm_object_split() is executed, it is
   possible that the fault instantiate the page into the original object
   when the page was already copied into the new object (see
   vm_map_split() for the orig/new objects terminology). This can happen
   if split found a busy page (e.g. from the fault) and slept dropping
   the objects lock, which allows the swap pager to instantiate
   read-behind pages for the fault.  Then the restart of the scan can see
   a page in the scanned range, where it was already copied to the upper
   object.
   
   Fix it by instantiating the read-ahead pages before

   swap_pager_getpages() method drops the lock to allocate pbuf.  The
   object scan would see the whole range prefilled with the busy pages
   and not proceed the range.
   
   Note that vm_fault rechecks the map generation count after the object

   unlock, so that it restarts the handling if raced with split, and
   re-lookups the right page from the upper object.
   
   In collaboration with:	alc

   Tested by:   pho
   Sponsored by:The FreeBSD Foundation
   MFC after:   1 week

Modified:
   head/sys/vm/swap_pager.c

Modified: head/sys/vm/swap_pager.c
==
--- head/sys/vm/swap_pager.cThu Jun 14 19:01:40 2018(r335170)
+++ head/sys/vm/swap_pager.cThu Jun 14 19:41:02 2018(r335171)
@@ -1096,21 +1096,24 @@ swap_pager_getpages(vm_object_t object, vm_page_t *ma,
  int *rahead)
  {
struct buf *bp;
-   vm_page_t mpred, msucc, p;
+   vm_page_t bm, mpred, msucc, p;
vm_pindex_t pindex;
daddr_t blk;
-   int i, j, maxahead, maxbehind, reqcount, shift;
+   int i, maxahead, maxbehind, reqcount;
  
  	reqcount = count;
  
-	VM_OBJECT_WUNLOCK(object);

-   bp = getpbuf(_rcount);
-   VM_OBJECT_WLOCK(object);
-
-   if (!swap_pager_haspage(object, ma[0]->pindex, , )) {
-   relpbuf(bp, _rcount);
+   /*
+* Determine the final number of read-behind pages and
+* allocate them BEFORE releasing the object lock.  Otherwise,
+* there can be a problematic race with vm_object_split().
+* Specifically, vm_object_split() might first transfer pages
+* that precede ma[0] in the current object to a new object,
+* and then this function incorrectly recreates those pages as
+* read-behind pages in the current object.
+*/
+   if (!swap_pager_haspage(object, ma[0]->pindex, , ))
return (VM_PAGER_FAIL);
-   }
  
  	/*

 * Clip the readahead and readbehind ranges to exclude resident pages.
@@ -1132,35 +1135,31 @@ swap_pager_getpages(vm_object_t object, vm_page_t *ma,
*rbehind = pindex - mpred->pindex - 1;
}
  
+	bm = ma[0];

+   for (i = 0; i < count; i++)
+   ma[i]->oflags |= VPO_SWAPINPROG;
+
/*
 * Allocate readahead and readbehind pages.
 */
-   shift = rbehind != NULL ? *rbehind : 0;
-   if (shift != 0) {
-   for (i = 1; i <= shift; i++) {
+   if (rbehind != NULL) {
+   for (i = 1; i <= *rbehind; i++) {
p = vm_page_alloc(object, ma[0]->pindex - i,
VM_ALLOC_NORMAL);
-   if (p == NULL) {
-   /* Shift allocated pages to the left. */
-   for (j = 0; j < i - 1; j++)
-   bp->b_pages[j] =
-   bp->b_pages[j + shift - i + 1];
+   if (p == NULL)
break;
-   }
-   bp->b_pages[shift - i] = p;
+   p->oflags |= VPO_SWAPINPROG;
+   bm = p;
}
-   shift = i - 1;
-   *rbehind = shift;
+   *rbehind = i - 1;
}
-   for (i = 0; i < reqcount; i++)
-   bp->b_pages[i + shift] = ma[i];
if (rahead != NULL) {
for (i = 0; i < *rahead; i++) {
p = vm_page_alloc(object,
ma[reqcount - 1]->pindex + i + 1, VM_ALLOC_NORMAL);
if (p == NULL)
break;
-   bp->b_pages[shift + reqcount + i] = p;
+   p->oflags |= VPO_SWAPINPROG;
}
*rahead = i;
}
@@ -1171,15 +1170,18 @@ swap_pager_getpages(vm_object_t object, vm_page_t *ma,

Re: svn commit: r333267 - head/sys/kern

2018-05-04 Thread Steven Hartland

Again why?

On Fri, 4 May 2018 at 23:48, Mateusz Guzik  wrote:

> Author: mjg
> Date: Fri May  4 22:48:10 2018
> New Revision: 333267
> URL: https://svnweb.freebsd.org/changeset/base/333267
>
> Log:
>   tc: bcopy -> memcpy
>
> Modified:
>   head/sys/kern/kern_tc.c
>
> Modified: head/sys/kern/kern_tc.c
>
> ==
> --- head/sys/kern/kern_tc.c Fri May  4 22:41:12 2018(r333266)
> +++ head/sys/kern/kern_tc.c Fri May  4 22:48:10 2018(r333267)
> @@ -1352,7 +1352,7 @@ tc_windup(struct bintime *new_boottimebin)
> ogen = th->th_generation;
> th->th_generation = 0;
> atomic_thread_fence_rel();
> -   bcopy(tho, th, offsetof(struct timehands, th_generation));
> +   memcpy(th, tho, offsetof(struct timehands, th_generation));
> if (new_boottimebin != NULL)
> th->th_boottime = *new_boottimebin;
>
>
>
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r333266 - head/sys/amd64/amd64

2018-05-04 Thread Steven Hartland

Can we get the why in commit messages please?

This sort of message doesnt provide anything more that can be obtained from
reading the diff, which just leaves us wondering why?

I’m sure there is a good reason, but without confirmation we’re just left
guessing. The knock on to this is if some assumption that caused the why
changes, anyone looking at this will not be able to make an informed
descision that that was the case.

On Fri, 4 May 2018 at 23:41, Mateusz Guzik  wrote:

> Author: mjg
> Date: Fri May  4 22:41:12 2018
> New Revision: 333266
> URL: https://svnweb.freebsd.org/changeset/base/333266
>
> Log:
>   amd64: syscall path bcopy -> memcpy
>
> Modified:
>   head/sys/amd64/amd64/trap.c
>
> Modified: head/sys/amd64/amd64/trap.c
>
> ==
> --- head/sys/amd64/amd64/trap.c Fri May  4 22:33:54 2018(r333265)
> +++ head/sys/amd64/amd64/trap.c Fri May  4 22:41:12 2018(r333266)
> @@ -908,7 +908,7 @@ cpu_fetch_syscall_args(struct thread *td)
> error = 0;
> argp = >tf_rdi;
> argp += reg;
> -   bcopy(argp, sa->args, sizeof(sa->args[0]) * 6);
> +   memcpy(sa->args, argp, sizeof(sa->args[0]) * 6);
> if (sa->narg > regcnt) {
> KASSERT(params != NULL, ("copyin args with no params!"));
> error = copyin(params, >args[regcnt],
>
>
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r332523 - in head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs: . sys

2018-04-16 Thread Steven Hartland

Hey Mav, this seems like an important one to get in for 11.2 so just 
wanted to check if that was your intention as there's no MFC tag on the 
commit?


On 16/04/2018 01:54, Alexander Motin wrote:

Author: mav
Date: Mon Apr 16 00:54:58 2018
New Revision: 332523
URL: https://svnweb.freebsd.org/changeset/base/332523

Log:
   9433 Fix ARC hit rate
   
   When the compressed ARC feature was added in commit d3c2ae1

   the method of reference counting in the ARC was modified.  As
   part of this accounting change the arc_buf_add_ref() function
   was removed entirely.
   
   This would have be fine but the arc_buf_add_ref() function

   served a second undocumented purpose of updating the ARC access
   information when taking a hold on a dbuf.  Without this logic
   in place a cached dbuf would not migrate its associated
   arc_buf_hdr_t to the MFU list.  This would negatively impact
   the ARC hit rate, particularly on systems with a small ARC.
   
   This change reinstates the missing call to arc_access() from

   dbuf_hold() by implementing a new arc_buf_access() function.
   
   Reviewed-by: Giuseppe Di Natale 

   Reviewed-by: Tony Hutter 
   Reviewed-by: Tim Chase 
   Reviewed by: George Wilson 
   Reviewed-by: George Melikov 
   Signed-off-by: Brian Behlendorf 

Modified:
   head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
   head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c
   head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/arc.h

Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
==
--- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c   Mon Apr 16 
00:42:45 2018(r332522)
+++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c   Mon Apr 16 
00:54:58 2018(r332523)
@@ -540,8 +540,13 @@ typedef struct arc_stats {
 */
kstat_named_t arcstat_mutex_miss;
/*
+* Number of buffers skipped when updating the access state due to the
+* header having already been released after acquiring the hash lock.
+*/
+   kstat_named_t arcstat_access_skip;
+   /*
 * Number of buffers skipped because they have I/O in progress, are
-* indrect prefetch buffers that have not lived long enough, or are
+* indirect prefetch buffers that have not lived long enough, or are
 * not from the spa we're trying to evict from.
 */
kstat_named_t arcstat_evict_skip;
@@ -796,6 +801,7 @@ static arc_stats_t arc_stats = {
{ "allocated",KSTAT_DATA_UINT64 },
{ "deleted",  KSTAT_DATA_UINT64 },
{ "mutex_miss",   KSTAT_DATA_UINT64 },
+   { "access_skip",  KSTAT_DATA_UINT64 },
{ "evict_skip",   KSTAT_DATA_UINT64 },
{ "evict_not_enough", KSTAT_DATA_UINT64 },
{ "evict_l2_cached",  KSTAT_DATA_UINT64 },
@@ -5063,6 +5069,51 @@ arc_access(arc_buf_hdr_t *hdr, kmutex_t *hash_lock)
} else {
ASSERT(!"invalid arc state");
}
+}
+
+/*
+ * This routine is called by dbuf_hold() to update the arc_access() state
+ * which otherwise would be skipped for entries in the dbuf cache.
+ */
+void
+arc_buf_access(arc_buf_t *buf)
+{
+   mutex_enter(>b_evict_lock);
+   arc_buf_hdr_t *hdr = buf->b_hdr;
+
+   /*
+* Avoid taking the hash_lock when possible as an optimization.
+* The header must be checked again under the hash_lock in order
+* to handle the case where it is concurrently being released.
+*/
+   if (hdr->b_l1hdr.b_state == arc_anon || HDR_EMPTY(hdr)) {
+   mutex_exit(>b_evict_lock);
+   ARCSTAT_BUMP(arcstat_access_skip);
+   return;
+   }
+
+   kmutex_t *hash_lock = HDR_LOCK(hdr);
+   mutex_enter(hash_lock);
+
+   if (hdr->b_l1hdr.b_state == arc_anon || HDR_EMPTY(hdr)) {
+   mutex_exit(hash_lock);
+   mutex_exit(>b_evict_lock);
+   ARCSTAT_BUMP(arcstat_access_skip);
+   return;
+   }
+
+   mutex_exit(>b_evict_lock);
+
+   ASSERT(hdr->b_l1hdr.b_state == arc_mru ||
+   hdr->b_l1hdr.b_state == arc_mfu);
+
+   DTRACE_PROBE1(arc__hit, arc_buf_hdr_t *, hdr);
+   arc_access(hdr, hash_lock);
+   mutex_exit(hash_lock);
+
+   ARCSTAT_BUMP(arcstat_hits);
+   ARCSTAT_CONDSTAT(!HDR_PREFETCH(hdr),
+   demand, prefetch, !HDR_ISTYPE_METADATA(hdr), data, metadata, hits);
  }
  
  /* a generic arc_done_func_t which you can use */


Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c
==
--- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c

svn commit: r332318 - in stable/11: . sys/net

2018-04-09 Thread Steven Hartland

Author: smh
Date: Mon Apr  9 08:25:29 2018
New Revision: 332318
URL: https://svnweb.freebsd.org/changeset/base/332318

Log:
  MFC r327559:
  
  Disabled the use of flowid for lagg by default
  
  Sponsored by: Multiplay

Modified:
  stable/11/UPDATING
  stable/11/sys/net/if_lagg.c
Directory Properties:
  stable/11/   (props changed)

Modified: stable/11/UPDATING
==
--- stable/11/UPDATING  Mon Apr  9 05:48:12 2018(r332317)
+++ stable/11/UPDATING  Mon Apr  9 08:25:29 2018(r332318)
@@ -16,6 +16,14 @@ from older versions of FreeBSD, try WITHOUT_CLANG and 
 the tip of head, and then rebuild without this option. The bootstrap process
 from older version of current across the gcc/clang cutover is a bit fragile.
 
+20180409:
+   The use of RSS hash from the network card aka flowid has been
+   disabled by default for lagg(4) as it's currently incompatible with
+   the lacp and loadbalance protocols.
+
+   This can be re-enabled by setting the following in loader.conf:
+   net.link.lagg.default_use_flowid="1"
+
 20180331:
Clang, llvm, lld, lldb, compiler-rt and libc++ have been upgraded to
6.0.0.  Please see the 20141231 entry below for information about

Modified: stable/11/sys/net/if_lagg.c
==
--- stable/11/sys/net/if_lagg.c Mon Apr  9 05:48:12 2018(r332317)
+++ stable/11/sys/net/if_lagg.c Mon Apr  9 08:25:29 2018(r332318)
@@ -238,7 +238,7 @@ SYSCTL_INT(_net_link_lagg, OID_AUTO, failover_rx_all, 
 "Accept input from any interface in a failover lagg");
 
 /* Default value for using flowid */
-static VNET_DEFINE(int, def_use_flowid) = 1;
+static VNET_DEFINE(int, def_use_flowid) = 0;
 #defineV_def_use_flowidVNET(def_use_flowid)
 SYSCTL_INT(_net_link_lagg, OID_AUTO, default_use_flowid, CTLFLAG_RWTUN,
 _NAME(def_use_flowid), 0,
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r332285 - head/sys/kern

2018-04-08 Thread Steven Hartland


Worth making the sysctls so they can be tuned the the HW / use case?

On 08/04/2018 17:34, Mateusz Guzik wrote:

Author: mjg
Date: Sun Apr  8 16:34:10 2018
New Revision: 332285
URL: https://svnweb.freebsd.org/changeset/base/332285

Log:
   locks: tweak backoff a little bit
   
   Previous limits were chosen when locking primitives had spurious lock

   accesses.
   
   Flipping the starting point to 1 (or rather 2 as the first call shifts it)

   provides a modest win when mild contention is seen while not hurting worse
   cases. Tested on a bunch of one, two and four socket old and new systems
   (Westmere, Skylake, Threadreaper and others) by doing concurrent page faults,
   buildkernel/buildworld and other stuff (although not all systems got all the
   tests).
   
   Another thing is the upper limit. It is semi-arbitrarily chosen as it was

   getting out of hand for slightly less small systems (e.g. a 128-thread one).
   
   Note that backoff is fundamentally a speculative bandaid and this change just

   makes it fit a little bit better. It remains completely oblivious to the
   hardware topology or the contention pattern. This is being experimented with.

Modified:
   head/sys/kern/subr_lock.c

Modified: head/sys/kern/subr_lock.c
==
--- head/sys/kern/subr_lock.c   Sun Apr  8 16:29:24 2018(r332284)
+++ head/sys/kern/subr_lock.c   Sun Apr  8 16:34:10 2018(r332285)
@@ -156,8 +156,10 @@ void
  lock_delay_default_init(struct lock_delay_config *lc)
  {
  
-	lc->base = lock_roundup_2(mp_ncpus) / 4;

-   lc->max = lc->base * 1024;
+   lc->base = 1;
+   lc->max = lock_roundup_2(mp_ncpus) * 256;
+   if (lc->max > 32678)
+   lc->max = 32678;
  }
  
  #ifdef DDB




___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r327559 - in head: . sys/net

2018-03-31 Thread Steven Hartland


On 05/01/2018 13:11, Slawa Olhovchenkov wrote:

On Fri, Jan 05, 2018 at 03:50:31AM +0700, Eugene Grosbein wrote:


05.01.2018 3:05, Steven Hartland wrote:


Author: smh
Date: Thu Jan  4 20:05:47 2018
New Revision: 327559
URL: https://svnweb.freebsd.org/changeset/base/327559

Log:
   Disabled the use of flowid for lagg by default
   
   Disabled the use of RSS hash from the network card aka flowid for

   lagg(4) interfaces by default as it's currently incompatible with
   the lacp and loadbalance protocols.
   
   The incompatibility is due to the fact that the flowid isn't know

   for the first packet of a new outbound stream which can result in
   the hash calculation method changing and hence a stream being
   incorrectly split across multiple interfaces during normal
   operation.
   
   This can be re-enabled by setting the following in loader.conf:

   net.link.lagg.default_use_flowid="1"
   
   Discussed with: kmacy

   Sponsored by:Multiplay

RSS by definition has meaning to received stream. What is "outbound" stream
in this context, why can the hash calculatiom method change and what exactly
does it mean "a stream being incorrectly split"?

Defaults should not be changed so easily just because they are not optimal
for some specific case. Each lagg has its own setting for flowid usage
and why one cannot just use "ifconfig lagg0 -use_flowid" for such cases?

Irrelevant to RSS and etc. flowid distribution in lacp case work very
bad. This is good and must be MFC (IMHO).
There was no concrete conclusion to this thread and I've not had time to 
look into this more and it's on my open list to MFC to stable/11 in time 
for 11.2.


Even given the drop in performance, I think we should prefer correctness 
over increased performance and given the new default can still be 
overridden in loader.conf I'm looking to MFC this shortly unless I get 
any strong objections with a clear path forward.


    Regards
    Steve
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

svn commit: r331851 - stable/11/usr.sbin/bsdinstall/scripts

2018-03-31 Thread Steven Hartland

Author: smh
Date: Sat Mar 31 19:21:57 2018
New Revision: 331851
URL: https://svnweb.freebsd.org/changeset/base/331851

Log:
  MFC r320138:
  
  Fixed bsdinstall location of vfs.zfs.min_auto_ashift
  
  Sponsored by: Multiplay

Modified:
  stable/11/usr.sbin/bsdinstall/scripts/config
  stable/11/usr.sbin/bsdinstall/scripts/zfsboot
Directory Properties:
  stable/11/   (props changed)

Modified: stable/11/usr.sbin/bsdinstall/scripts/config
==
--- stable/11/usr.sbin/bsdinstall/scripts/configSat Mar 31 19:19:22 
2018(r331850)
+++ stable/11/usr.sbin/bsdinstall/scripts/configSat Mar 31 19:21:57 
2018(r331851)
@@ -32,7 +32,7 @@
 cat $BSDINSTALL_TMPETC/rc.conf.* >> $BSDINSTALL_TMPETC/rc.conf
 rm $BSDINSTALL_TMPETC/rc.conf.*
 
-cat $BSDINSTALL_CHROOT/etc/sysctl.conf 
$BSDINSTALL_TMPETC/sysctl.conf.hardening >> $BSDINSTALL_TMPETC/sysctl.conf
+cat $BSDINSTALL_CHROOT/etc/sysctl.conf $BSDINSTALL_TMPETC/sysctl.conf.* >> 
$BSDINSTALL_TMPETC/sysctl.conf
 rm $BSDINSTALL_TMPETC/sysctl.conf.*
 
 cp $BSDINSTALL_TMPETC/* $BSDINSTALL_CHROOT/etc

Modified: stable/11/usr.sbin/bsdinstall/scripts/zfsboot
==
--- stable/11/usr.sbin/bsdinstall/scripts/zfsboot   Sat Mar 31 19:19:22 
2018(r331850)
+++ stable/11/usr.sbin/bsdinstall/scripts/zfsboot   Sat Mar 31 19:21:57 
2018(r331851)
@@ -1446,7 +1446,7 @@ zfs_create_boot()
if [ "$ZFSBOOT_FORCE_4K_SECTORS" ]; then
f_eval_catch $funcname echo "$ECHO_APPEND" \
 'vfs.zfs.min_auto_ashift=12' \
-$BSDINSTALL_TMPBOOT/loader.conf.zfs || return $FAILURE
+$BSDINSTALL_TMPETC/sysctl.conf.zfs || return $FAILURE
fi
 
if [ "$ZFSBOOT_SWAP_MIRROR" ]; then
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

svn commit: r331850 - stable/11/sys/net

2018-03-31 Thread Steven Hartland

Author: smh
Date: Sat Mar 31 19:19:22 2018
New Revision: 331850
URL: https://svnweb.freebsd.org/changeset/base/331850

Log:
  MFC r328321:
  
  Added missing CTLFLAG_VNET to lacp default_strict_mode
  
  Sponsored by: Multiplay

Modified:
  stable/11/sys/net/ieee8023ad_lacp.c
Directory Properties:
  stable/11/   (props changed)

Modified: stable/11/sys/net/ieee8023ad_lacp.c
==
--- stable/11/sys/net/ieee8023ad_lacp.c Sat Mar 31 19:18:07 2018
(r331849)
+++ stable/11/sys/net/ieee8023ad_lacp.c Sat Mar 31 19:19:22 2018
(r331850)
@@ -197,8 +197,8 @@ SYSCTL_INT(_net_link_lagg_lacp, OID_AUTO, debug, CTLFL
 _NAME(lacp_debug), 0, "Enable LACP debug logging (1=debug, 2=trace)");
 
 static VNET_DEFINE(int, lacp_default_strict_mode) = 1;
-SYSCTL_INT(_net_link_lagg_lacp, OID_AUTO, default_strict_mode, CTLFLAG_RWTUN,
-_NAME(lacp_default_strict_mode), 0,
+SYSCTL_INT(_net_link_lagg_lacp, OID_AUTO, default_strict_mode,
+CTLFLAG_RWTUN | CTLFLAG_VNET, _NAME(lacp_default_strict_mode), 0,
 "LACP strict protocol compliance default");
 
 #define LACP_DPRINTF(a) if (V_lacp_debug & 0x01) { lacp_dprintf a ; }
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

svn commit: r331849 - stable/11/sys/dev/mps

2018-03-31 Thread Steven Hartland

Author: smh
Date: Sat Mar 31 19:18:07 2018
New Revision: 331849
URL: https://svnweb.freebsd.org/changeset/base/331849

Log:
  MFC r330951:
  
  Fix mps deadlock when handling panic
  
  Sponsored by: Multiplay

Modified:
  stable/11/sys/dev/mps/mps_sas_lsi.c
  stable/11/sys/dev/mps/mpsvar.h
Directory Properties:
  stable/11/   (props changed)

Modified: stable/11/sys/dev/mps/mps_sas_lsi.c
==
--- stable/11/sys/dev/mps/mps_sas_lsi.c Sat Mar 31 19:16:25 2018
(r331848)
+++ stable/11/sys/dev/mps/mps_sas_lsi.c Sat Mar 31 19:18:07 2018
(r331849)
@@ -50,6 +50,7 @@ __FBSDID("$FreeBSD$");
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -124,7 +125,7 @@ int mpssas_get_sas_address_for_sata_disk(struct mps_so
 u64 *sas_address, u16 handle, u32 device_info, u8 *is_SATA_SSD);
 static int mpssas_volume_add(struct mps_softc *sc,
 u16 handle);
-static void mpssas_SSU_to_SATA_devices(struct mps_softc *sc);
+static void mpssas_SSU_to_SATA_devices(struct mps_softc *sc, int howto);
 static void mpssas_stop_unit_done(struct cam_periph *periph,
 union ccb *done_ccb);
 
@@ -1112,7 +1113,7 @@ out:
  * Return nothing.
  */
 static void
-mpssas_SSU_to_SATA_devices(struct mps_softc *sc)
+mpssas_SSU_to_SATA_devices(struct mps_softc *sc, int howto)
 {
struct mpssas_softc *sassc = sc->sassc;
union ccb *ccb;
@@ -1120,7 +1121,7 @@ mpssas_SSU_to_SATA_devices(struct mps_softc *sc)
target_id_t targetid;
struct mpssas_target *target;
char path_str[64];
-   struct timeval cur_time, start_time;
+   int timeout;
 
/*
 * For each target, issue a StartStopUnit command to stop the device.
@@ -1183,17 +1184,23 @@ mpssas_SSU_to_SATA_devices(struct mps_softc *sc)
}
 
/*
-* Wait until all of the SSU commands have completed or time has
-* expired (60 seconds).  Pause for 100ms each time through.  If any
-* command times out, the target will be reset in the SCSI command
-* timeout routine.
+* Timeout after 60 seconds by default or 10 seconds if howto has
+* RB_NOSYNC set which indicates we're likely handling a panic.
 */
-   getmicrotime(_time);
-   while (sc->SSU_refcount) {
+   timeout = 600;
+   if (howto & RB_NOSYNC)
+   timeout = 100;
+
+   /*
+* Wait until all of the SSU commands have completed or timeout has
+* expired.  Pause for 100ms each time through.  If any command
+* times out, the target will be reset in the SCSI command timeout
+* routine.
+*/
+   while (sc->SSU_refcount > 0) {
pause("mpswait", hz/10);

-   getmicrotime(_time);
-   if ((cur_time.tv_sec - start_time.tv_sec) > 60) {
+   if (--timeout == 0) {
mps_dprint(sc, MPS_FAULT, "Time has expired waiting "
"for SSU commands to complete.\n");
break;
@@ -1235,7 +1242,7 @@ mpssas_stop_unit_done(struct cam_periph *periph, union
  * Return nothing.
  */
 void
-mpssas_ir_shutdown(struct mps_softc *sc)
+mpssas_ir_shutdown(struct mps_softc *sc, int howto)
 {
u16 volume_mapping_flags;
u16 ioc_pg8_flags = le16toh(sc->ioc_pg8.Flags);
@@ -1340,5 +1347,5 @@ out:
}
}
}
-   mpssas_SSU_to_SATA_devices(sc);
+   mpssas_SSU_to_SATA_devices(sc, howto);
 }

Modified: stable/11/sys/dev/mps/mpsvar.h
==
--- stable/11/sys/dev/mps/mpsvar.h  Sat Mar 31 19:16:25 2018
(r331848)
+++ stable/11/sys/dev/mps/mpsvar.h  Sat Mar 31 19:18:07 2018
(r331849)
@@ -722,7 +722,7 @@ int mps_config_get_volume_wwid(struct mps_softc *sc, u
 int mps_config_get_raid_pd_pg0(struct mps_softc *sc,
 Mpi2ConfigReply_t *mpi_reply, Mpi2RaidPhysDiskPage0_t *config_page,
 u32 page_address);
-void mpssas_ir_shutdown(struct mps_softc *sc);
+void mpssas_ir_shutdown(struct mps_softc *sc, int howto);
 
 int mps_reinit(struct mps_softc *sc);
 void mpssas_handle_reinit(struct mps_softc *sc);
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

svn commit: r331848 - stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs

2018-03-31 Thread Steven Hartland

Author: smh
Date: Sat Mar 31 19:16:25 2018
New Revision: 331848
URL: https://svnweb.freebsd.org/changeset/base/331848

Log:
  MFC r330950:
  
  Prevent ZFS TRIM breaking VTOC8 partitions
  
  Sponsored by: Multiplay

Modified:
  stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_label.c
Directory Properties:
  stable/11/   (props changed)

Modified: stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_label.c
==
--- stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_label.c   
Sat Mar 31 17:28:30 2018(r331847)
+++ stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_label.c   
Sat Mar 31 19:16:25 2018(r331848)
@@ -728,7 +728,9 @@ vdev_label_init(vdev_t *vd, uint64_t crtxg, vdev_label
}
 
/*
-* TRIM the whole thing so that we start with a clean slate.
+* TRIM the whole thing, excluding the blank space and boot header
+* as specified by ZFS On-Disk Specification (section 1.3), so that
+* we start with a clean slate.
 * It's just an optimization, so we don't care if it fails.
 * Don't TRIM if removing so that we don't interfere with zpool
 * disaster recovery.
@@ -736,7 +738,8 @@ vdev_label_init(vdev_t *vd, uint64_t crtxg, vdev_label
if (zfs_trim_enabled && vdev_trim_on_init && !vd->vdev_notrim && 
(reason == VDEV_LABEL_CREATE || reason == VDEV_LABEL_SPARE ||
reason == VDEV_LABEL_L2CACHE))
-   zio_wait(zio_trim(NULL, spa, vd, 0, vd->vdev_psize));
+   zio_wait(zio_trim(NULL, spa, vd, VDEV_SKIP_SIZE,
+   vd->vdev_psize - VDEV_SKIP_SIZE));
 
/*
 * Initialize its label.
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r331209 - head

2018-03-22 Thread Steven Hartland

I think it would be worth specifically detailing the steps to achieve 
this, as its not immediately obvious how this would be done.


On 19/03/2018 15:27, Kyle Evans wrote:

Author: kevans
Date: Mon Mar 19 15:27:53 2018
New Revision: 331209
URL: https://svnweb.freebsd.org/changeset/base/331209

Log:
   Add note to UPDATING about UEFI changes requiring loader(8) update
   
   These problems have only been observed with boards using U-Boot (e.g. ARM)

   where virtual addresses are already set in the memory map by the firmware
   and the firmware is expecting a call to SetVirtualAddressMap to be made.
   I refrain from mentioning this in the note because this could also be the
   case on some not-yet-tested firmware on amd64 and it's not a bad
   recommendation for the general case.

Modified:
   head/UPDATING

Modified: head/UPDATING
==
--- head/UPDATING   Mon Mar 19 15:11:10 2018(r331208)
+++ head/UPDATING   Mon Mar 19 15:27:53 2018(r331209)
@@ -51,6 +51,13 @@ NOTE TO PEOPLE WHO THINK THAT FreeBSD 12.x IS SLOW:
  
  ** SPECIAL WARNING: **
  
+20180319:

+   For UEFI systems: the UEFI loader(8), loader.efi, should be updated in
+   conjunction with installing a new kernel after r330868. The kernel,
+   after this revision, will be more lenient when mapping addresses for
+   UEFI Runtime Services and this may result in a kernel panic without the
+   corresponding loader(8) update.
+
  20180212:
FreeBSD boot loader enhanced with Lua scripting. It's purely opt-in for
now by building WITH_LOADER_LUA and WITHOUT_FORTH in /etc/src.conf.



___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

svn commit: r330951 - head/sys/dev/mps

2018-03-14 Thread Steven Hartland

Author: smh
Date: Wed Mar 14 21:32:23 2018
New Revision: 330951
URL: https://svnweb.freebsd.org/changeset/base/330951

Log:
  Fix mps deadlock when handling panic
  
  During shutdown mps waits for its SSU requests to complete however when
  performing a reboot after handling a panic the scheduler is stopped so
  getmicrotime which is used can be non-functional.
  
  Switch to using the same method as shutdown_panic to ensure we actually
  complete.
  
  In addition reduce the timeout when RB_NOSYNC is set in howto as we expect
  this to fail.
  
  Reviewed by:  slm
  MFC after:1 week
  Sponsored by: Multiplay
  Differential Revision:https://reviews.freebsd.org/D12776

Modified:
  head/sys/dev/mps/mps_sas_lsi.c
  head/sys/dev/mps/mpsvar.h

Modified: head/sys/dev/mps/mps_sas_lsi.c
==
--- head/sys/dev/mps/mps_sas_lsi.c  Wed Mar 14 21:21:03 2018
(r330950)
+++ head/sys/dev/mps/mps_sas_lsi.c  Wed Mar 14 21:32:23 2018
(r330951)
@@ -52,6 +52,7 @@ __FBSDID("$FreeBSD$");
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -126,7 +127,7 @@ int mpssas_get_sas_address_for_sata_disk(struct mps_so
 u64 *sas_address, u16 handle, u32 device_info, u8 *is_SATA_SSD);
 static int mpssas_volume_add(struct mps_softc *sc,
 u16 handle);
-static void mpssas_SSU_to_SATA_devices(struct mps_softc *sc);
+static void mpssas_SSU_to_SATA_devices(struct mps_softc *sc, int howto);
 static void mpssas_stop_unit_done(struct cam_periph *periph,
 union ccb *done_ccb);
 
@@ -1122,7 +1123,7 @@ out:
  * Return nothing.
  */
 static void
-mpssas_SSU_to_SATA_devices(struct mps_softc *sc)
+mpssas_SSU_to_SATA_devices(struct mps_softc *sc, int howto)
 {
struct mpssas_softc *sassc = sc->sassc;
union ccb *ccb;
@@ -1130,7 +1131,7 @@ mpssas_SSU_to_SATA_devices(struct mps_softc *sc)
target_id_t targetid;
struct mpssas_target *target;
char path_str[64];
-   struct timeval cur_time, start_time;
+   int timeout;
 
/*
 * For each target, issue a StartStopUnit command to stop the device.
@@ -1193,17 +1194,23 @@ mpssas_SSU_to_SATA_devices(struct mps_softc *sc)
}
 
/*
-* Wait until all of the SSU commands have completed or time has
-* expired (60 seconds).  Pause for 100ms each time through.  If any
-* command times out, the target will be reset in the SCSI command
-* timeout routine.
+* Timeout after 60 seconds by default or 10 seconds if howto has
+* RB_NOSYNC set which indicates we're likely handling a panic.
 */
-   getmicrotime(_time);
-   while (sc->SSU_refcount) {
+   timeout = 600;
+   if (howto & RB_NOSYNC)
+   timeout = 100;
+
+   /*
+* Wait until all of the SSU commands have completed or timeout has
+* expired.  Pause for 100ms each time through.  If any command
+* times out, the target will be reset in the SCSI command timeout
+* routine.
+*/
+   while (sc->SSU_refcount > 0) {
pause("mpswait", hz/10);

-   getmicrotime(_time);
-   if ((cur_time.tv_sec - start_time.tv_sec) > 60) {
+   if (--timeout == 0) {
mps_dprint(sc, MPS_FAULT, "Time has expired waiting "
"for SSU commands to complete.\n");
break;
@@ -1245,7 +1252,7 @@ mpssas_stop_unit_done(struct cam_periph *periph, union
  * Return nothing.
  */
 void
-mpssas_ir_shutdown(struct mps_softc *sc)
+mpssas_ir_shutdown(struct mps_softc *sc, int howto)
 {
u16 volume_mapping_flags;
u16 ioc_pg8_flags = le16toh(sc->ioc_pg8.Flags);
@@ -1350,5 +1357,5 @@ out:
}
}
}
-   mpssas_SSU_to_SATA_devices(sc);
+   mpssas_SSU_to_SATA_devices(sc, howto);
 }

Modified: head/sys/dev/mps/mpsvar.h
==
--- head/sys/dev/mps/mpsvar.h   Wed Mar 14 21:21:03 2018(r330950)
+++ head/sys/dev/mps/mpsvar.h   Wed Mar 14 21:32:23 2018(r330951)
@@ -772,7 +772,7 @@ int mps_config_get_volume_wwid(struct mps_softc *sc, u
 int mps_config_get_raid_pd_pg0(struct mps_softc *sc,
 Mpi2ConfigReply_t *mpi_reply, Mpi2RaidPhysDiskPage0_t *config_page,
 u32 page_address);
-void mpssas_ir_shutdown(struct mps_softc *sc);
+void mpssas_ir_shutdown(struct mps_softc *sc, int howto);
 
 int mps_reinit(struct mps_softc *sc);
 void mpssas_handle_reinit(struct mps_softc *sc);
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r330950 - head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs

2018-03-14 Thread Steven Hartland


Missed the differential review: https://reviews.freebsd.org/D14695

On 14/03/2018 21:21, Steven Hartland wrote:

Author: smh
Date: Wed Mar 14 21:21:03 2018
New Revision: 330950
URL: https://svnweb.freebsd.org/changeset/base/330950

Log:
   Prevent ZFS TRIM breaking VTOC8 partitions
   
   Update the ZFS TRIM code to ensure it respects VTOC8 partition headers as

   documented by the ZFS On-Disk Specification section 1.3
   
   Before this a zpool create on a VTOC8 partitioned device would overwrite the

   partition metadata.
   
   Reported by:	marius

   Reviewed by: marius agv
   MFC after:   1 week
   Sponsored by:Multiplay

Modified:
   head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_label.c

Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_label.c
==
--- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_label.cWed Mar 
14 21:11:41 2018(r330949)
+++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_label.cWed Mar 
14 21:21:03 2018(r330950)
@@ -802,7 +802,9 @@ vdev_label_init(vdev_t *vd, uint64_t crtxg, vdev_label
}
  
  	/*

-* TRIM the whole thing so that we start with a clean slate.
+* TRIM the whole thing, excluding the blank space and boot header
+* as specified by ZFS On-Disk Specification (section 1.3), so that
+* we start with a clean slate.
 * It's just an optimization, so we don't care if it fails.
 * Don't TRIM if removing so that we don't interfere with zpool
 * disaster recovery.
@@ -810,7 +812,8 @@ vdev_label_init(vdev_t *vd, uint64_t crtxg, vdev_label
if (zfs_trim_enabled && vdev_trim_on_init && !vd->vdev_notrim &&
(reason == VDEV_LABEL_CREATE || reason == VDEV_LABEL_SPARE ||
reason == VDEV_LABEL_L2CACHE))
-   zio_wait(zio_trim(NULL, spa, vd, 0, vd->vdev_psize));
+   zio_wait(zio_trim(NULL, spa, vd, VDEV_SKIP_SIZE,
+   vd->vdev_psize - VDEV_SKIP_SIZE));
  
  	/*

 * Initialize its label.



___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

svn commit: r330950 - head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs

2018-03-14 Thread Steven Hartland

Author: smh
Date: Wed Mar 14 21:21:03 2018
New Revision: 330950
URL: https://svnweb.freebsd.org/changeset/base/330950

Log:
  Prevent ZFS TRIM breaking VTOC8 partitions
  
  Update the ZFS TRIM code to ensure it respects VTOC8 partition headers as
  documented by the ZFS On-Disk Specification section 1.3
  
  Before this a zpool create on a VTOC8 partitioned device would overwrite the
  partition metadata.
  
  Reported by:  marius
  Reviewed by:  marius agv
  MFC after:1 week
  Sponsored by: Multiplay

Modified:
  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_label.c

Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_label.c
==
--- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_label.cWed Mar 
14 21:11:41 2018(r330949)
+++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_label.cWed Mar 
14 21:21:03 2018(r330950)
@@ -802,7 +802,9 @@ vdev_label_init(vdev_t *vd, uint64_t crtxg, vdev_label
}
 
/*
-* TRIM the whole thing so that we start with a clean slate.
+* TRIM the whole thing, excluding the blank space and boot header
+* as specified by ZFS On-Disk Specification (section 1.3), so that
+* we start with a clean slate.
 * It's just an optimization, so we don't care if it fails.
 * Don't TRIM if removing so that we don't interfere with zpool
 * disaster recovery.
@@ -810,7 +812,8 @@ vdev_label_init(vdev_t *vd, uint64_t crtxg, vdev_label
if (zfs_trim_enabled && vdev_trim_on_init && !vd->vdev_notrim && 
(reason == VDEV_LABEL_CREATE || reason == VDEV_LABEL_SPARE ||
reason == VDEV_LABEL_L2CACHE))
-   zio_wait(zio_trim(NULL, spa, vd, 0, vd->vdev_psize));
+   zio_wait(zio_trim(NULL, spa, vd, VDEV_SKIP_SIZE,
+   vd->vdev_psize - VDEV_SKIP_SIZE));
 
/*
 * Initialize its label.
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r329812 - head/sys/cam

2018-02-22 Thread Steven Hartland

In our experience this is very device dependent, what lead you to this 
conclusion?


On 22/02/2018 05:43, Warner Losh wrote:

Author: imp
Date: Thu Feb 22 05:43:20 2018
New Revision: 329812
URL: https://svnweb.freebsd.org/changeset/base/329812

Log:
   Don't sort TRIMs.
   
   While the code for ada and da both assume that the trim list is

   ordered when doing the coaleascing the TRIMs, it turns out that
   creating the sorted list uses more resources than are saved by having
   slightly fewer trims sent to the device.
   
   Sponsored by: Netflix


Modified:
   head/sys/cam/cam_iosched.c

Modified: head/sys/cam/cam_iosched.c
==
--- head/sys/cam/cam_iosched.c  Thu Feb 22 04:30:52 2018(r329811)
+++ head/sys/cam/cam_iosched.c  Thu Feb 22 05:43:20 2018(r329812)
@@ -1392,7 +1392,7 @@ cam_iosched_queue_work(struct cam_iosched_softc *isc,
 * the work on the bio queue.
 */
if (bp->bio_cmd == BIO_DELETE) {
-   bioq_disksort(>trim_queue, bp);
+   bioq_insert_tail(>trim_queue, bp);
  #ifdef CAM_IOSCHED_DYNAMIC
isc->trim_stats.in++;
isc->trim_stats.queued++;



___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r328996 - head/sys/kern

2018-02-07 Thread Steven Hartland

What would be the expected behavior if this was triggered, app crash or 
kernel panic...?


On 07/02/2018 21:52, Andriy Gapon wrote:

Author: avg
Date: Wed Feb  7 21:51:59 2018
New Revision: 328996
URL: https://svnweb.freebsd.org/changeset/base/328996

Log:
   exec_map_first_page: fix an inverse condition introduced in r254138
   
   While the bug itself was serious, as we could either pass a non-busied

   page to vm_pager_get_pages() or leak a busy page, it could only be
   triggered under a very rare condition where the page is already inserted
   into the object, but it is not valid yet.
   
   Reviewed by:	kib

   MFC after:   2 weeks

Modified:
   head/sys/kern/kern_exec.c

Modified: head/sys/kern/kern_exec.c
==
--- head/sys/kern/kern_exec.c   Wed Feb  7 20:36:37 2018(r328995)
+++ head/sys/kern/kern_exec.c   Wed Feb  7 21:51:59 2018(r328996)
@@ -1009,7 +1009,7 @@ exec_map_first_page(imgp)
if ((ma[i] = vm_page_next(ma[i - 1])) != NULL) {
if (ma[i]->valid)
break;
-   if (vm_page_tryxbusy(ma[i]))
+   if (!vm_page_tryxbusy(ma[i]))
break;
} else {
ma[i] = vm_page_alloc(object, i,



___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r328625 - in head/sys: amd64/amd64 amd64/ia32 amd64/include dev/cpuctl i386/i386 x86/include x86/x86

2018-01-31 Thread Steven Hartland

Pretty sure I’ve seen that too

On Wed, 31 Jan 2018 at 18:05, Rodney W. Grimes <
free...@pdx.rh.cn85.dnsmgr.net> wrote:

> > On Wed, Jan 31, 2018 at 02:56:24PM +, Bjoern A. Zeeb wrote:
> > > On 31 Jan 2018, at 14:36, Konstantin Belousov wrote:
> > >
> > > > Author: kib
> > > > Date: Wed Jan 31 14:36:27 2018
> > > > New Revision: 328625
> > > > URL: https://svnweb.freebsd.org/changeset/base/328625
> > > >
> > > > Log:
> > > >   IBRS support, AKA Spectre hardware mitigation.
> > >
> > > >   For existing processors, you need a microcode update which adds
> IBRS
> > > >   CPU features, and to manually enable it by setting the
> > > > tunable/sysctl
> > > >   hw.ibrs_disable to 0.  Current status can be checked in sysctl
> > > >   hw.ibrs_active.  The mitigation might be inactive if the CPU
> feature
> > >
> > > Can you change the tunable/sysctl to hw.ibrs_enable[d] (and toggle the
> > > default setting along).
> > This is done consistently with the hw.clflush_disable.
> > Anyway, the intent is that the knob will be used for disabling,
> > since defaults are going to be changed in the near future.
>
> I thought we had something some place that said negative assertions
> should be avoided if possible.
>
> > > I find it highly confusing to have two different sysctls ???disable???
> > > and ???active??? and a lot
> > > of people (and cultures) have trouble with the double negative.
> > > Also the ???enable[d]??? variant seems to be pre-dominant in the
> kernel.
> > >
> > > Also can we spell IBRS in the sysctl description as ???Indirect Branch
> > > Restricted Speculation (IBRS)
> > Will do in half a hour.
>
>
> --
> Rod Grimes
> rgri...@freebsd.org
>
>
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

svn commit: r328321 - head/sys/net

2018-01-24 Thread Steven Hartland

Author: smh
Date: Wed Jan 24 10:13:14 2018
New Revision: 328321
URL: https://svnweb.freebsd.org/changeset/base/328321

Log:
  Added missing CTLFLAG_VNET to lacp default_strict_mode
  
  Added CTLFLAG_VNET to net.link.lagg.lacp.default_strict_mode which was missed
  in r290450.
  
  Reported by:  julian@
  MFC after:1 week
  Sponsored by: Multiplay

Modified:
  head/sys/net/ieee8023ad_lacp.c

Modified: head/sys/net/ieee8023ad_lacp.c
==
--- head/sys/net/ieee8023ad_lacp.c  Wed Jan 24 07:54:05 2018
(r328320)
+++ head/sys/net/ieee8023ad_lacp.c  Wed Jan 24 10:13:14 2018
(r328321)
@@ -201,8 +201,8 @@ SYSCTL_INT(_net_link_lagg_lacp, OID_AUTO, debug, CTLFL
 _NAME(lacp_debug), 0, "Enable LACP debug logging (1=debug, 2=trace)");
 
 static VNET_DEFINE(int, lacp_default_strict_mode) = 1;
-SYSCTL_INT(_net_link_lagg_lacp, OID_AUTO, default_strict_mode, CTLFLAG_RWTUN,
-_NAME(lacp_default_strict_mode), 0,
+SYSCTL_INT(_net_link_lagg_lacp, OID_AUTO, default_strict_mode,
+CTLFLAG_RWTUN | CTLFLAG_VNET, _NAME(lacp_default_strict_mode), 0,
 "LACP strict protocol compliance default");
 
 #define LACP_DPRINTF(a) if (V_lacp_debug & 0x01) { lacp_dprintf a ; }
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r328136 - in head/etc: defaults rc.d

2018-01-18 Thread Steven Hartland

Did you intend to add the growfs option at the same time as it wasn’t
mentioned in the commit msg

On Thu, 18 Jan 2018 at 20:46, Brad Davis  wrote:

> Author: brd (doc,ports committer)
> Date: Thu Jan 18 20:45:41 2018
> New Revision: 328136
> URL: https://svnweb.freebsd.org/changeset/base/328136
>
> Log:
>   Teach the resolv startup script to respect its enable flag.
>
>   Reviewed by:  will, imp
>   Approved by:  imp
>
> Modified:
>   head/etc/defaults/rc.conf
>   head/etc/rc.d/resolv
>
> Modified: head/etc/defaults/rc.conf
>
> ==
> --- head/etc/defaults/rc.conf   Thu Jan 18 20:12:12 2018(r328135)
> +++ head/etc/defaults/rc.conf   Thu Jan 18 20:45:41 2018(r328136)
> @@ -96,6 +96,7 @@ fsck_y_enable="NO"# Set to YES to do fsck -y if the i
>  fsck_y_flags="-T ffs:-R -T ufs:-R" # Additional flags for fsck -y
>  background_fsck="YES"  # Attempt to run fsck in the background where
> possible.
>  background_fsck_delay="60" # Time to wait (seconds) before starting the
> fsck.
> +growfs_enable="NO" # Set to YES to attempt to grow the root
> filesystem on boot
>  netfs_types="nfs:NFS smbfs:SMB" # Net filesystems.
>  extra_netfs_types="NO" # List of network extra filesystem types for
> delayed
> # mount at startup (or NO).
> @@ -276,6 +277,7 @@ ctld_enable="NO"# CAM Target Layer / iSCSI
> target da
>  local_unbound_enable="NO"  # local caching resolver
>  blacklistd_enable="NO" # Run blacklistd daemon (YES/NO).
>  blacklistd_flags=""# Optional flags for blacklistd(8).
> +resolv_enable="YES"# Enable resolv / resolvconf
>
>  #
>  # kerberos. Do not run the admin daemons on slave servers
>
> Modified: head/etc/rc.d/resolv
>
> ==
> --- head/etc/rc.d/resolvThu Jan 18 20:12:12 2018(r328135)
> +++ head/etc/rc.d/resolvThu Jan 18 20:45:41 2018(r328136)
> @@ -35,6 +35,7 @@
>
>  name="resolv"
>  desc="Create /etc/resolv.conf from kenv"
> +start_cmd="${name}_start"
>  stop_cmd=':'
>
>  load_rc_config $name
> @@ -42,17 +43,20 @@ load_rc_config $name
>  # if the info is available via dhcp/kenv
>  # build the resolv.conf
>  #
> -if [ -n "`/bin/kenv dhcp.domain-name-servers 2> /dev/null`" ]; then
> -   interface="`/bin/kenv boot.netif.name`"
> -   (
> -   if [ -n "`/bin/kenv dhcp.domain-name 2> /dev/null`" ]; then
> -   echo domain `/bin/kenv dhcp.domain-name`
> +resolv_start()
> +{
> +   if [ -n "`/bin/kenv dhcp.domain-name-servers 2> /dev/null`" ]; then
> +   interface="`/bin/kenv boot.netif.name`"
> +   (
> +   if [ -n "`/bin/kenv dhcp.domain-name 2> /dev/null`" ]; then
> +   echo domain `/bin/kenv dhcp.domain-name`
> +   fi
> +
> +   set -- `/bin/kenv dhcp.domain-name-servers`
> +   for ns in `IFS=','; echo $*`; do
> +   echo nameserver $ns
> +   done
> +   ) | /sbin/resolvconf -a ${interface}:dhcp4
> fi
> -
> -   set -- `/bin/kenv dhcp.domain-name-servers`
> -   for ns in `IFS=','; echo $*`; do
> -   echo nameserver $ns
> -   done
> -   ) | /sbin/resolvconf -a ${interface}:dhcp4
> -fi
> +}
>
>
>
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r327559 - in head: . sys/net

2018-01-05 Thread Steven Hartland


On 05/01/2018 23:30, Scott Long wrote:



On Jan 5, 2018, at 11:20 AM, Eugene Grosbein  wrote:

CC'ng scottl@ as author of the change in question.

06.01.2018 0:39, Matt Joras wrote:


For what it's worth, this was the conclusion I came to, and at Isilon
we've made the same change being discussed here. For the case of
drivers that end up using a queue index for the flowid, you end up
with pathological behavior on the lagg; the flowid ends up getting
right shifted by (default) 16. So in the case of e.g. two bxe(4)
interfaces with 4 queues, you always end up choosing the interface in
the lagg at index 0.

Then why does if_lagg shifts 16 bits by default? Is seems senseless.
This was introduced with r260070 by scottl:


At the time, we were using cxgbe interfaces which inserted a reasonable RSS
hash into the flowid field.  The shift was done to expose different bits to the
index/modulo scheme used by the LACP module vs the cxgbe module.  In
hindsight, I should not have set a default value of 16, I should have left it at
zero so that default behavior would be preserved.


Multi-queue NIC drivers and multi-port lagg tend to use the same lower
bits of the flowid as each other, resulting in a poor distribution of
packets among queues in certain cases.  Work around this by adding a
set of sysctls for controlling a bit-shift on the flowid when doing
multi-port aggrigation in lagg and lacp.  By default, lagg/lacp will
now use bits 16 and higher instead of 0 and higher.

Reviewed by:max
Obtained from:  Netflix
MFC after:  3 days

This commit message does not point to an example of NIC driver that would set
non-zero bits 16 and higher for flowid so that shift result would be non-zero.
Do we really have such a driver?


Yes.


Anyway, this lagg's default seems to be very driver-centric.

For example, Intel driver family also do not use such high bits for flowid
just like mentioned bxe(4).


scottl@moe:~/svn/head/sys/dev % grep -r iri_flowid *
bnxt/bnxt_txrx.c:   ri->iri_flowid = le32toh(rcp->rss_hash);
bnxt/bnxt_txrx.c:   ri->iri_flowid = le32toh(tpas->low.rss_hash);
e1000/em_txrx.c:ri->iri_flowid = le32toh(rxd->wb.lower.hi_dword.rss);
e1000/igb_txrx.c:   ri->iri_flowid =
ixgbe/ix_txrx.c:ri->iri_flowid = le32toh(rxd->wb.lower.hi_dword.rss);

The number of drivers that set m_pkhhdr.flowid directly to an RSS hash looks
to be:

cxgb
cxgbe
mlx4
mlx5
qlnx
qlxgbe
qlxge
vmxnet3

Maybe the hardware doesn’t do a great job with generating a useful RSS hash,
but that’s tangential to this conversation.


We should change flowid_shift default to 0 for if_lagg(4), shouldn't we?


In the short term, yes.  What I see is that it’s too expensive to do a hash 
calculation
on every TX packet in LACP (for anything resembling line rate), and flowid is 
unreliable
when a connection is initiated without an RX packet triggering it.  I don’t 
understand
the consequences of the TX initiation problem well enough to offer a solution.  
For the
problem of flowid being used inconsistently by drivers (i.e. not populating all 
32 bits
or using a weak RSS), that’s really a driver problem.

What I’d recommend is that the LACP code check for M_HASHTYPE_NONE and
M_HASHTYPE_OPAQUE and calculate a new hash if either are set (effectively
ignoring the flowid).  It’s then up to the drivers to set the correct hash type 
that
corresponds with what they’re putting into the flowid field.  An opaque type 
would
mean that the value is useful to the driver but should not be considered useful
anywhere else.  You’ll get more correct and less surprising behavior from that, 
at
the penalty of every TX packet requiring a hash calculation for hardware/drivers
that are crummy.


Mixing the hash methods causes problems with out of order packets even 
just for the first packet, and using a hash which is not what's 
configured by lagghash is confusing at best so that could be fixed to 
say "flowid" if that's whats going to happen or perhaps update it to the 
hash type that flowid represents if that's possible.


LACP already checks for M_HASHTYPE_NONE if use_flowid is enabled and 
manually calculates a hash, which is what leads the the first packet 
port selection inconsistency.


It's not clear what all the implications of the first packet port 
inconsistency is, it will likely be dependent a large number of factors 
including network hardware, layout and dest host + config., but when 
we've seen this in the 3 and 4 packet of a stream it leads to the 
destination sending RST, dropping the stream on the floor for 2% of all 
streams on our test box with 2 x ixgbe interfaces.


Questions:

1. Is it possible to determine the hash method used by the HW and use
   that for all first packets?
2. Is it possible to significantly improve the performance the CPU hashing?
3. Is it possible to tell from the mbuf that its part of a connection
   instigated from the current host?

    Regards
    Steve

Re: svn commit: r327559 - in head: . sys/net

2018-01-05 Thread Steven Hartland




On 05/01/2018 17:39, Matt Joras wrote:

On Fri, Jan 5, 2018 at 9:32 AM, Eugene Grosbein  wrote:

06.01.2018 0:28, Matt Joras wrote:


For what it's worth, this was the conclusion I came to, and at Isilon
we've made the same change being discussed here. For the case of
drivers that end up using a queue index for the flowid, you end up
with pathological behavior on the lagg; the flowid ends up getting
right shifted by (default) 16. So in the case of e.g. two bxe(4)
interfaces with 4 queues, you always end up choosing the interface in
the lagg at index 0.

Not all drivers have this bug. These are drivers that needs to be fixed to not 
shift by 16, not lagg.


I don't follow. It is if_lagg that does the shifting. For loadbalance
it is done directly in lagg_snd_tag_alloc, and for LACP it is done in
a separate fucntion, lacp_select_tx_port_by_hash. For both it shifts
the flowid by the flowid_shift set on the lagg sc, which defaults to
16.
For reference lacp_select_tx_port is the normal method, 
lacp_select_tx_port_by_hash is only used if RATELIMIT is enabled. They 
both do the same shift though, so ...

You could make the argument that we should fix every driver that sets
a queue index to instead use an RSS hash, but that seems like more
work than simply disabling the use of flowid in if_lagg by default.
For cases where this has an appreciable impact on forwarding
performance the sysctl can be flipped back. That seems more reasonable
to me than making laggs effectively useless for anyone using any one
of a random set of drivers that set the flowid to a queue index (grep
for "flowid =" and you can see which drivers do this).

Matt


___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r327559 - in head: . sys/net

2018-01-05 Thread Steven Hartland


On 05/01/2018 17:06, Eugene Grosbein wrote:

05.01.2018 23:11, Steven Hartland wrote:


What do others think, am I missing something?

You still consider only TCP case missing IP forwarning case when all IP packets
are transit coming from lagg0 and going out via lagg1.

Just going out via a laggX


IP forwarding case benefits from pre-computed RSS flowid since 8.0-RELEASE
and your change breaks it.

Is there a way to determine if the mbuf is a forwarded mbuf of not?

I know I've said it before but just to be totally clear, changing the 
default was done to prevent broken behavior, if you're not concerned 
about the issue or you know you're not effected you can enable 
use_flowid to restore the original behavior.


This doesn't have to be the final fix, if there are improvements that 
can be made to make the default more intelligent for example and use 
flowid if its known to be good then that can be looked into. In the mean 
time the new "default" will prevent others from configuring lagg(4) with 
LACP or loadbalance and ending up with problems; yes this may mean that 
IP forwarding in HEAD will use manual hashing hence will perform a 
little worse for now but that's the lesser of two evils.


    Regards
    Steve
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r327559 - in head: . sys/net

2018-01-05 Thread Steven Hartland


On 05/01/2018 17:16, Eugene Grosbein wrote:



That is, there is no guarantee of persistance of flowid of incoming packets
as they can be received with distinct ports of lagg being distinct hardware
computing flowid differently. Some ports may not support RSS at all.
We should not use incoming hardware flowid for anything by default in case of 
TCP.

I don't believe your statement about persistence of flowid due to the use of 
variant ports is correct
as LACP states that packets from the same flow "should" under normal conditions 
(no failure) be received on the same port.

It still does not guarantee that and you miss opportunity of network failures 
that can easily change flowid of incoming packets.
Correct, but that's not the normal behavior so the chances of seeing any 
impact of out of order packets is very small.



In the case where the HW doesn't support RSS, then flowid should either always 
be unset or be set by OS to consistent value hence that should function as 
expected.

That said I don't disagree that all hostA -> hostB should use Manual hash, as I 
can't see anyway to use to HW hash,
however the ports in your example are wrong

Yes, I stand corrected (just copied your example and adjusted it incompletly).


Why do you mix flowid of incoming stream with flowid of outgoing stream?


I expect this was done so we don't have the overhead of calculating a packet 
hash for every outgoing packet
i.e. its an optimization, however I believe this is only possible for the 
destination host which always
has a valid flowid, and not for the source host.

How do you know that flowid of incoming packet is preserved on outgoing path? 
It should not.

https://github.com/freebsd/freebsd/blob/master/sys/netinet/ip_output.c#L234

    Regards
    Steve
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r327559 - in head: . sys/net

2018-01-05 Thread Steven Hartland




On 05/01/2018 17:02, Eugene Grosbein wrote:

05.01.2018 22:13, Steven Hartland wrote:


I hope there's some improvements that can be made, for example if we can 
determine
the stream was instigated remotely then flowid would always be valid hence we 
can use it assuming it
matches the requested spec or if we can make it clear to the user that 
laggproto is not the one they requested, I'm open to ideas?

We just need to clear flow id from incoming TCP segments and always generate 
new flow id for responses
keeping old flow id for IP forwarding case. Please back out your change to not 
degrade IP forwarding performance.

Sorry I don't follow you. You seem to be inferring that we can easily generate 
a flowid without involving the sending hardware

RSS has nothing to do with sending hardware. It's operating system's job to 
choose outgoing port, not hardware's job.

The OS is deciding which outgoing, however its using the hash based on the 
flowid to do so

It should use flowid for transit forwarding IP packet only. It should not use 
flowid from incoming TCP segment.
Not sure I follow your meaning, LACP has nothing to do with incoming 
TCP, its balancing and hence hashing is performed on outbound (tx) 
traffic only.



I can't see how that is possible as that's chicken and egg i.e. you can't get 
the HW interface
to generate the flowid without sending a packet and you can't send a packet
until you have a the flowid to decide which interface to send it from.

Outgoing packet flow does not and should not depend on incoming flow,
they are independent things in case of LACP. There is no "chicken and egg" 
problem at all.


But this is how it works ATM, it uses the flowid which is only valid after the 
first rx.

Then this is a bug that should be fixed to solve your problem,
instead of change of lagg defaults that degrades IP forwarding performance.

You seem to be confusing IP forwarding with choice of port in the lagg 
interface?


Once lagg (lacp in this case) has chosen the port then the stack 
continues as it always has, if this means using flowid to balance queues 
then that's fine. This change only changes the hash calculation which is 
used to determine the port that's used.


    Regards
    Steve
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r327559 - in head: . sys/net

2018-01-05 Thread Steven Hartland


On 05/01/2018 09:41, hiren panchasara wrote:

IIRC, with 'RSS' in kernconf, most NIC drivers and stack should do the
right thing. Look at drivers and also conn startup code in TCP as I
recall it doing the flowid mapping correctly when stream originated from
the other side and had flowid assigned to it by the NIC.

I am mostly concerned about the overhead of manual calculation but my
knowledge is a bit rusty right now and lagg has always been special so
please try this out and see.



I've not been able to find any such option:
head:src> grep -ri rss sys/amd64/conf/
head:src>

Any other ideas on where it might be or is it just the default on HEAD?

That said the more I think / talk about this the more I believe manual 
calculation is the right option for LACP.


The reason I believe this is:

 * When configuring LACP in a network knowing the hash method is
   important, so using an unknown "flowid" based hash could produce
   unexpected results.
 * There's no easy way (possibly no way at all) to determine the flowid
   from the HW for the first packet of a new outbound connection
 * Having the hash algorithm vary for inbound and outbound connections
   increases the chance of unexpected results.
 * LCAP combines NIC's of even speed, however they can be different HW
   so there's no guarantee that the partaking ports use the same flowid
   calculation, again increasing the chance of a problem.

So as mentioned in a previous reply the more I think about the more 
believe flowid can't be successfully used as a hash source for LACP or 
loadbalance.


What do others think, am I missing something?

    Regards
    Steve
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r327559 - in head: . sys/net

2018-01-05 Thread Steven Hartland


On 05/01/2018 14:38, Slawa Olhovchenkov wrote:

On Fri, Jan 05, 2018 at 08:36:48PM +0700, Eugene Grosbein wrote:


05.01.2018 20:11, Slawa Olhovchenkov wrote:


Irrelevant to RSS and etc. flowid distribution in lacp case work very
bad. This is good and must be MFC (IMHO).

It may work bad depending on NIC and/or traffic type.
It works just fine in common case of IP forwarding for packets with TCP/UDP 
inside.

It can be easily disabled locally for specific cases when it does not work.


Packet distrubuting on network equipment (lacp case) w/ enabled flowid cause
uneven queue distributing. Yes, this is may be disabled locally, but
diagnostic this root cause need uncommon skills.

Indeed, the same for packet ordering issue, it took a good amount of 
effort here from multiple parties to determine the there was a bug in 
FreeBSD LACP implementation due to the use of flowid, which is why I 
opted to disable it by default.


    Regards
    Steve
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r327559 - in head: . sys/net

2018-01-05 Thread Steven Hartland


On 05/01/2018 13:49, Eugene Grosbein wrote:

05.01.2018 16:26, Steven Hartland пишет:

On 05/01/2018 02:01, Eugene Grosbein wrote:

05.01.2018 4:52, Steven Hartland wrote:


RSS by definition has meaning to received stream. What is "outbound" stream
in this context, why can the hash calculatiom method change and what exactly
does it mean "a stream being incorrectly split"?

Yes RSS is indeed a received stream but that is used by lagg for lacp and 
loadbalance protocols
to decide which port of the lagg to "send" the packet out of.
As the flowid is not known when a new "output" stream is instigated the current 
code
falls back to manual hash calculation to determine which port to send the 
initial packet from.
Once a response is received a tx then uses the flowid.
This change of hash calculation method can result in the initial packet being 
sent
from a different port than the rest of the stream; this is what I meant by 
"incorrectly split".

See the following:
https://github.com/freebsd/freebsd/blob/master/sys/net/if_lagg.c#L2066
https://github.com/freebsd/freebsd/blob/master/sys/net/ieee8023ad_lacp.c#L846

I still do not get what is "output stream" for you.

If you are talking on forwarding (routing) transit packets at IP layer,
they all have flowid from the beginning and first packet does not differ from 
others at all.

At the simplest level its a tcp stream that is started from the host. So given 
we have hostA (src) and hostB (dest), the output stream is one started by hostA 
with a destination of hostB where hostA is configured with lagg.

In this case with use_flowid we've confirmed we get the following (the 
interfaces used vary per flow of cause):
hostA - SYN (ix0)   -> hostB # Manual hash calculated
hostB - SYN,ACK (ix0)   -> hostA# flowid used
hostA - ACK (ix1)   -> hostB # flowid used
hostA - Data(ix1)   -> hostB # flowid used
hostB - ACK (ix0)   -> hostA # flowid used
...

Here hostA and hostB both had lagg0 comprising of ix0 and ix1.

It should be:

hostA - SYN (ix0)   -> hostB # Manual hash (1) calculated
hostB - SYN,ACK (ix0)   -> hostA# hardware flowid (2) received
hostA - ACK (ix1)   -> hostB # Manual hash (1) calculated
hostA - Data(ix1)   -> hostB # hardware flowid (2 or 3) received
hostB - ACK (ix0)   -> hostA # Manual hash (1) calculated

That is, there is no guarantee of persistance of flowid of incoming packets
as they can be received with distinct ports of lagg being distinct hardware
computing flowid differently. Some ports may not support RSS at all.
We should not use incoming hardware flowid for anything by default in case of 
TCP.
I don't believe your statement about persistence of flowid due to the 
use of variant ports is correct as LACP states that packets from the 
same flow "should" under normal conditions (no failure) be received on 
the same port.


In the case where the HW doesn't support RSS, then flowid should either 
always be unset or be set by OS to consistent value hence that should 
function as expected.


That said I don't disagree that all hostA -> hostB should use Manual 
hash, as I can't see anyway to use to HW hash, however the ports in your 
example are wrong, all hostA -> hostB should be sent from the same ixY 
and all hostB -> hostA should be sent from the same ixZ (under normal 
circumstances) of course.

If you are talking on locally originated (not transit) data streem from local 
TCP socket
being sent in response to corresponding incoming TCP segments, then these 
outgoing
packets should have their own fixed flow id by default in case of LACP
and thhis flow id should not depend on (possibly ever changing) flow id of 
incoming TCP segments.

Nope in this case we have all the information needed, but I don't believe we 
can't tell that's the case.

If you insist that flow id of outgoing packets does depend on ever changing 
incoming packet's flow id,
then this is the bug that should be fixed and not lagg's defaults.

As detailed above once the session is established then the flowid remains fixed.

Why do you mix flowid of incoming stream with flowid of outgoing stream?

I expect this was done so we don't have the overhead of calculating a 
packet hash for every outgoing packet i.e. its an optimization, however 
I believe this is only possible for the destination host which always 
has a valid flowid, and not for the source host.


My current thinking is that flowid shouldn't be used for either LACP or 
loadbalance protocols as doing so will almost certainly lead to 
unexpected behavior (the stated lagghash may not be valid).


    Regards
    Steve

___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r327559 - in head: . sys/net

2018-01-05 Thread Steven Hartland


On 05/01/2018 13:41, Eugene Grosbein wrote:

05.01.2018 16:34, Steven Hartland wrote:


I hope there's some improvements that can be made, for example if we can 
determine
the stream was instigated remotely then flowid would always be valid hence we 
can use it assuming it
matches the requested spec or if we can make it clear to the user that 
laggproto is not the one they requested, I'm open to ideas?

We just need to clear flow id from incoming TCP segments and always generate 
new flow id for responses
keeping old flow id for IP forwarding case. Please back out your change to not 
degrade IP forwarding performance.

Sorry I don't follow you. You seem to be inferring that we can easily generate 
a flowid without involving the sending hardware

RSS has nothing to do with sending hardware. It's operating system's job to 
choose outgoing port, not hardware's job.
The OS is deciding which outgoing, however its using the hash based on 
the flowid to do so, which is only valid after the first rx hence the 
problem; as this results in the hash calculation being different for the 
first packet.



I can't see how that is possible as that's chicken and egg i.e. you can't get 
the HW interface
to generate the flowid without sending a packet and you can't send a packet
until you have a the flowid to decide which interface to send it from.

Outgoing packet flow does not and should not depend on incoming flow,
they are independent things in case of LACP. There is no "chicken and egg" 
problem at all.

But this is how it works ATM, it uses the flowid which is only valid 
after the first rx.

___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r327559 - in head: . sys/net

2018-01-05 Thread Steven Hartland

I found https://wiki.freebsd.org/NetworkRSS but I couldn't see any 
options mentioned, is there a sysctl or kernel option for that Adrian?


For reference our current test is on a production LB running 
11.0-RELEASE. We're in the process of updating our HEAD box for 
additional testing.


On 05/01/2018 02:55, Adrian Chadd wrote:

does it also happen when you actually enable RSS in the kernel? Since
like I went through a whole lot of pain to assign a flowid at
connection setup time.



-a


On 4 January 2018 at 15:37, Steven Hartland <ste...@multiplay.co.uk> wrote:


On 04/01/2018 22:42, hiren panchasara wrote:

On 01/04/18 at 09:52P, Steven Hartland wrote:

On 04/01/2018 20:50, Eugene Grosbein wrote:

05.01.2018 3:05, Steven Hartland wrote:

Author: smh
Date: Thu Jan  4 20:05:47 2018
New Revision: 327559
URL: https://svnweb.freebsd.org/changeset/base/327559

Log:
Disabled the use of flowid for lagg by default

Disabled the use of RSS hash from the network card aka flowid for
lagg(4) interfaces by default as it's currently incompatible with
the lacp and loadbalance protocols.

The incompatibility is due to the fact that the flowid isn't know
for the first packet of a new outbound stream which can result in
the hash calculation method changing and hence a stream being
incorrectly split across multiple interfaces during normal
operation.

This can be re-enabled by setting the following in loader.conf:
net.link.lagg.default_use_flowid="1"

Discussed with: kmacy
Sponsored by:   Multiplay

RSS by definition has meaning to received stream. What is "outbound" stream
in this context, why can the hash calculatiom method change and what exactly
does it mean "a stream being incorrectly split"?

Yes RSS is indeed a received stream but that is used by lagg for lacp
and loadbalance protocols to decide which port of the lagg to "send" the
packet out of. As the flowid is not known when a new "output" stream is
instigated the current code falls back to manual hash calculation to
determine which port to send the initial packet from. Once a response is
received a tx then uses the flowid. This change of hash calculation
method can result in the initial packet being sent from a different port
than the rest of the stream; this is what I meant by "incorrectly split".

For my understanding, is this just an issue for the first packet when we
originate the flow? Once we have a response and if flowid is there, we'd
use it, right? OR am I missing something?

Initially yes, but that can cause a whole cascading set of problems. If the
source machine sends from two different ports then flow can traverse across
the network using different paths and hence arrive at the destination on
different ports too, causing the corresponding  issue on the other side.

And with this change, we'd always go and do manual calculation even when
we have a valid flowid (i.e. we didn't initiate a connection)?

Correct, but there's potentially no easy way to correctly determine what the
flowid and hence hash should be in this case, likely impossible if the lagg
consists of different interface types.

In addition if the hardware hash doesn't match the requested one as per
laggproto then additional issues could also be triggered.

Our TCP stack seems fragile during setup to out of order packets which this
multipath behavior causes, we've seen this on our loadbalancers which is
what triggered the investigation. The concrete result is many aborted TCP
connections, over 300k ~2% on the machine I'm looking at.

I hope there's some improvements that can be made, for example if we can
determine the stream was instigated remotely then flowid would always be
valid hence we can use it assuming it matches the requested spec or if we
can make it clear to the user that laggproto is not the one they requested,
I'm open to ideas?

 Regards
 Steve



___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r327559 - in head: . sys/net

2018-01-05 Thread Steven Hartland


On 05/01/2018 02:09, Eugene Grosbein wrote:

05.01.2018 6:37, Steven Hartland wrote:


Our TCP stack seems fragile during setup to out of order packets
which this multipath behavior causes, we've seen this on our loadbalancers
which is what triggered the investigation. The concrete result is many aborted 
TCP connections,
over 300k ~2% on the machine I'm looking at.

This is another problem that needs to be fixed in general and not hidden under 
the carpet.
Meantime, practical problems you see can be solved locally with any settings 
you like.
While it may seem like it, there's not denying that the problem is 
caused by fact that the packets for a single flow arrive on two 
different interfaces in normal (none failure) workflow, which 
contravenes 802.3ad which states:


43.2.4 Frame Distributor
…
This standard does not mandate any particular distribution algorithm(s); 
however, any distribution algorithm shall ensure that, when frames are 
received by a Frame Collector as specified in 43.2.3, the algorithm 
shall not cause

a) Mis-ordering of frames that are part of any given conversation, or
b) Duplication of frames.
The above requirement to maintain frame ordering is met by *ensuring 
that all frames that compose a given conversation are transmitted on a 
single link in the order* that they are generated by the MAC Client; 
hence, this requirement does not involve the addition (or modification) 
of any information to the MAC frame, nor any buffering or processing on 
the part of the corresponding Frame Collector in order to re-order frames.





I hope there's some improvements that can be made, for example if we can 
determine
the stream was instigated remotely then flowid would always be valid hence we 
can use it assuming it
matches the requested spec or if we can make it clear to the user that 
laggproto is not the one they requested, I'm open to ideas?

We just need to clear flow id from incoming TCP segments and always generate 
new flow id for responses
keeping old flow id for IP forwarding case. Please back out your change to not 
degrade IP forwarding performance.
Sorry I don't follow you. You seem to be inferring that we can easily 
generate a flowid without involving the sending hardware; I can't see 
how that is possible as that's chicken and egg i.e. you can't get the HW 
interface to generate the flowid without sending a packet and you can't 
send a packet until you have a the flowid to decide which interface to 
send it from.


    Regards
    Steve
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r327559 - in head: . sys/net

2018-01-05 Thread Steven Hartland



On 05/01/2018 02:01, Eugene Grosbein wrote:

05.01.2018 4:52, Steven Hartland wrote:


RSS by definition has meaning to received stream. What is "outbound" stream
in this context, why can the hash calculatiom method change and what exactly
does it mean "a stream being incorrectly split"?

Yes RSS is indeed a received stream but that is used by lagg for lacp and 
loadbalance protocols
to decide which port of the lagg to "send" the packet out of.
As the flowid is not known when a new "output" stream is instigated the current 
code
falls back to manual hash calculation to determine which port to send the 
initial packet from.
Once a response is received a tx then uses the flowid.
This change of hash calculation method can result in the initial packet being 
sent
from a different port than the rest of the stream; this is what I meant by 
"incorrectly split".

See the following:
https://github.com/freebsd/freebsd/blob/master/sys/net/if_lagg.c#L2066
https://github.com/freebsd/freebsd/blob/master/sys/net/ieee8023ad_lacp.c#L846

I still do not get what is "output stream" for you.

If you are talking on forwarding (routing) transit packets at IP layer,
they all have flowid from the beginning and first packet does not differ from 
others at all.
At the simplest level its a tcp stream that is started from the host. So 
given we have hostA (src) and hostB (dest), the output stream is one 
started by hostA with a destination of hostB where hostA is configured 
with lagg.


In this case with use_flowid we've confirmed we get the following (the 
interfaces used vary per flow of cause):

hostA - SYN (ix0)   -> hostB # Manual hash calculated
hostB - SYN,ACK (ix0)   -> hostA# flowid used
hostA - ACK (ix1)   -> hostB # flowid used
hostA - Data(ix1)   -> hostB # flowid used
hostB - ACK (ix0)   -> hostA # flowid used
...

Here hostA and hostB both had lagg0 comprising of ix0 and ix1.

I believe your referring to packets flowing through the physical 
interface, if so then this is too late as for LACP the flowid would need 
to be per-calculated for the first packet in order to make the decision 
on which port to send it on. Unless I'm missing something, this is a 
chicken and egg situation.



If you are talking on locally originated (not transit) data streem from local 
TCP socket
being sent in response to corresponding incoming TCP segments, then these 
outgoing
packets should have their own fixed flow id by default in case of LACP
and thhis flow id should not depend on (possibly ever changing) flow id of 
incoming TCP segments.
Nope in this case we have all the information needed, but I don't 
believe we can't tell that's the case.

If you insist that flow id of outgoing packets does depend on ever changing 
incoming packet's flow id,
then this is the bug that should be fixed and not lagg's defaults.
As detailed above once the session is established then the flowid 
remains fixed.


    Regards
    Steve
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r327559 - in head: . sys/net

2018-01-04 Thread Steven Hartland




On 04/01/2018 22:42, hiren panchasara wrote:

On 01/04/18 at 09:52P, Steven Hartland wrote:

On 04/01/2018 20:50, Eugene Grosbein wrote:

05.01.2018 3:05, Steven Hartland wrote:


Author: smh
Date: Thu Jan  4 20:05:47 2018
New Revision: 327559
URL: https://svnweb.freebsd.org/changeset/base/327559

Log:
Disabled the use of flowid for lagg by default

Disabled the use of RSS hash from the network card aka flowid for

lagg(4) interfaces by default as it's currently incompatible with
the lacp and loadbalance protocols.

The incompatibility is due to the fact that the flowid isn't know

for the first packet of a new outbound stream which can result in
the hash calculation method changing and hence a stream being
incorrectly split across multiple interfaces during normal
operation.

This can be re-enabled by setting the following in loader.conf:

net.link.lagg.default_use_flowid="1"

Discussed with: kmacy

Sponsored by:   Multiplay

RSS by definition has meaning to received stream. What is "outbound" stream
in this context, why can the hash calculatiom method change and what exactly
does it mean "a stream being incorrectly split"?

Yes RSS is indeed a received stream but that is used by lagg for lacp
and loadbalance protocols to decide which port of the lagg to "send" the
packet out of. As the flowid is not known when a new "output" stream is
instigated the current code falls back to manual hash calculation to
determine which port to send the initial packet from. Once a response is
received a tx then uses the flowid. This change of hash calculation
method can result in the initial packet being sent from a different port
than the rest of the stream; this is what I meant by "incorrectly split".

For my understanding, is this just an issue for the first packet when we
originate the flow? Once we have a response and if flowid is there, we'd
use it, right? OR am I missing something?
Initially yes, but that can cause a whole cascading set of problems. If 
the source machine sends from two different ports then flow can traverse 
across the network using different paths and hence arrive at the 
destination on different ports too, causing the corresponding  issue on 
the other side.

And with this change, we'd always go and do manual calculation even when
we have a valid flowid (i.e. we didn't initiate a connection)?
Correct, but there's potentially no easy way to correctly determine what 
the flowid and hence hash should be in this case, likely impossible if 
the lagg consists of different interface types.


In addition if the hardware hash doesn't match the requested one as per 
laggproto then additional issues could also be triggered.


Our TCP stack seems fragile during setup to out of order packets which 
this multipath behavior causes, we've seen this on our loadbalancers 
which is what triggered the investigation. The concrete result is many 
aborted TCP connections, over 300k ~2% on the machine I'm looking at.


I hope there's some improvements that can be made, for example if we can 
determine the stream was instigated remotely then flowid would always be 
valid hence we can use it assuming it matches the requested spec or if 
we can make it clear to the user that laggproto is not the one they 
requested, I'm open to ideas?


    Regards
    Steve

___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r327559 - in head: . sys/net

2018-01-04 Thread Steven Hartland


On 04/01/2018 20:50, Eugene Grosbein wrote:

05.01.2018 3:05, Steven Hartland wrote:


Author: smh
Date: Thu Jan  4 20:05:47 2018
New Revision: 327559
URL: https://svnweb.freebsd.org/changeset/base/327559

Log:
   Disabled the use of flowid for lagg by default
   
   Disabled the use of RSS hash from the network card aka flowid for

   lagg(4) interfaces by default as it's currently incompatible with
   the lacp and loadbalance protocols.
   
   The incompatibility is due to the fact that the flowid isn't know

   for the first packet of a new outbound stream which can result in
   the hash calculation method changing and hence a stream being
   incorrectly split across multiple interfaces during normal
   operation.
   
   This can be re-enabled by setting the following in loader.conf:

   net.link.lagg.default_use_flowid="1"
   
   Discussed with: kmacy

   Sponsored by:Multiplay

RSS by definition has meaning to received stream. What is "outbound" stream
in this context, why can the hash calculatiom method change and what exactly
does it mean "a stream being incorrectly split"?
Yes RSS is indeed a received stream but that is used by lagg for lacp 
and loadbalance protocols to decide which port of the lagg to "send" the 
packet out of. As the flowid is not known when a new "output" stream is 
instigated the current code falls back to manual hash calculation to 
determine which port to send the initial packet from. Once a response is 
received a tx then uses the flowid. This change of hash calculation 
method can result in the initial packet being sent from a different port 
than the rest of the stream; this is what I meant by "incorrectly split".


See the following:
https://github.com/freebsd/freebsd/blob/master/sys/net/if_lagg.c#L2066
https://github.com/freebsd/freebsd/blob/master/sys/net/ieee8023ad_lacp.c#L846


Defaults should not be changed so easily just because they are not optimal
for some specific case. Each lagg has its own setting for flowid usage
and why one cannot just use "ifconfig lagg0 -use_flowid" for such cases?

Yes we're already using -use_flowid to mitigate the problem, but the 
defaults should never result in broken behavior hence the change, at 
least for now.


For reference I did look at keeping the default of 1 but only using that 
for protocols which weren't effected by the issue, and introducing a 2 
to force those that are, but as its defined as acting on creation and we 
always create lagg interfaces as failover and then amend them that 
wasn't possible without making more invasive changes.


    Regards
    Steve
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

svn commit: r327559 - in head: . sys/net

2018-01-04 Thread Steven Hartland

Author: smh
Date: Thu Jan  4 20:05:47 2018
New Revision: 327559
URL: https://svnweb.freebsd.org/changeset/base/327559

Log:
  Disabled the use of flowid for lagg by default
  
  Disabled the use of RSS hash from the network card aka flowid for
  lagg(4) interfaces by default as it's currently incompatible with
  the lacp and loadbalance protocols.
  
  The incompatibility is due to the fact that the flowid isn't know
  for the first packet of a new outbound stream which can result in
  the hash calculation method changing and hence a stream being
  incorrectly split across multiple interfaces during normal
  operation.
  
  This can be re-enabled by setting the following in loader.conf:
  net.link.lagg.default_use_flowid="1"
  
  Discussed with: kmacy
  Sponsored by: Multiplay

Modified:
  head/UPDATING
  head/sys/net/if_lagg.c

Modified: head/UPDATING
==
--- head/UPDATING   Thu Jan  4 19:47:01 2018(r327558)
+++ head/UPDATING   Thu Jan  4 20:05:47 2018(r327559)
@@ -51,6 +51,14 @@ NOTE TO PEOPLE WHO THINK THAT FreeBSD 12.x IS SLOW:
 
 ** SPECIAL WARNING: **
 
+20180104:
+   The use of RSS hash from the network card aka flowid has been
+   disabled by default for lagg(4) as it's currently incompatible with
+   the lacp and loadbalance protocols.
+
+   This can be re-enabled by setting the following in loader.conf:
+   net.link.lagg.default_use_flowid="1"
+
 20180102:
The SW_WATCHDOG option is no longer necessary to enable the
hardclock-based software watchdog if no hardware watchdog is

Modified: head/sys/net/if_lagg.c
==
--- head/sys/net/if_lagg.c  Thu Jan  4 19:47:01 2018(r327558)
+++ head/sys/net/if_lagg.c  Thu Jan  4 20:05:47 2018(r327559)
@@ -244,7 +244,7 @@ SYSCTL_INT(_net_link_lagg, OID_AUTO, failover_rx_all, 
 "Accept input from any interface in a failover lagg");
 
 /* Default value for using flowid */
-static VNET_DEFINE(int, def_use_flowid) = 1;
+static VNET_DEFINE(int, def_use_flowid) = 0;
 #defineV_def_use_flowidVNET(def_use_flowid)
 SYSCTL_INT(_net_link_lagg, OID_AUTO, default_use_flowid, CTLFLAG_RWTUN,
 _NAME(def_use_flowid), 0,
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

svn commit: r327520 - stable/10/sys/netinet

2018-01-03 Thread Steven Hartland

Author: smh
Date: Wed Jan  3 16:16:20 2018
New Revision: 327520
URL: https://svnweb.freebsd.org/changeset/base/327520

Log:
  MFC r322812:
  
  Avoid TCP log messages which are false positives.
  
  Sponsored by: Multiplay

Modified:
  stable/10/sys/netinet/tcp_input.c
Directory Properties:
  stable/10/   (props changed)

Modified: stable/10/sys/netinet/tcp_input.c
==
--- stable/10/sys/netinet/tcp_input.c   Wed Jan  3 15:01:31 2018
(r327519)
+++ stable/10/sys/netinet/tcp_input.c   Wed Jan  3 16:16:20 2018
(r327520)
@@ -1647,25 +1647,6 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th, stru
to.to_tsecr = 0;
}
/*
-* If timestamps were negotiated during SYN/ACK they should
-* appear on every segment during this session and vice versa.
-*/
-   if ((tp->t_flags & TF_RCVD_TSTMP) && !(to.to_flags & TOF_TS)) {
-   if ((s = tcp_log_addrs(inc, th, NULL, NULL))) {
-   log(LOG_DEBUG, "%s; %s: Timestamp missing, "
-   "no action\n", s, __func__);
-   free(s, M_TCPLOG);
-   }
-   }
-   if (!(tp->t_flags & TF_RCVD_TSTMP) && (to.to_flags & TOF_TS)) {
-   if ((s = tcp_log_addrs(inc, th, NULL, NULL))) {
-   log(LOG_DEBUG, "%s; %s: Timestamp not expected, "
-   "no action\n", s, __func__);
-   free(s, M_TCPLOG);
-   }
-   }
-
-   /*
 * Process options only when we get SYN/ACK back. The SYN case
 * for incoming connections is handled in tcp_syncache.
 * According to RFC1323 the window field in a SYN (i.e., a 
@@ -1693,6 +1674,25 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th, stru
if ((tp->t_flags & TF_SACK_PERMIT) &&
(to.to_flags & TOF_SACKPERM) == 0)
tp->t_flags &= ~TF_SACK_PERMIT;
+   }
+
+   /*
+* If timestamps were negotiated during SYN/ACK they should
+* appear on every segment during this session and vice versa.
+*/
+   if ((tp->t_flags & TF_RCVD_TSTMP) && !(to.to_flags & TOF_TS)) {
+   if ((s = tcp_log_addrs(inc, th, NULL, NULL))) {
+   log(LOG_DEBUG, "%s; %s: Timestamp missing, "
+   "no action\n", s, __func__);
+   free(s, M_TCPLOG);
+   }
+   }
+   if (!(tp->t_flags & TF_RCVD_TSTMP) && (to.to_flags & TOF_TS)) {
+   if ((s = tcp_log_addrs(inc, th, NULL, NULL))) {
+   log(LOG_DEBUG, "%s; %s: Timestamp not expected, "
+   "no action\n", s, __func__);
+   free(s, M_TCPLOG);
+   }
}
 
/*
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

svn commit: r327519 - stable/11/sys/netinet

2018-01-03 Thread Steven Hartland

Author: smh
Date: Wed Jan  3 15:01:31 2018
New Revision: 327519
URL: https://svnweb.freebsd.org/changeset/base/327519

Log:
  MFC r322812:
  
  Avoid TCP log messages which are false positives.
  
  Sponsored by: Multiplay

Modified:
  stable/11/sys/netinet/tcp_input.c
Directory Properties:
  stable/11/   (props changed)

Modified: stable/11/sys/netinet/tcp_input.c
==
--- stable/11/sys/netinet/tcp_input.c   Wed Jan  3 12:18:55 2018
(r327518)
+++ stable/11/sys/netinet/tcp_input.c   Wed Jan  3 15:01:31 2018
(r327519)
@@ -1686,25 +1686,6 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th, stru
to.to_tsecr = 0;
}
/*
-* If timestamps were negotiated during SYN/ACK they should
-* appear on every segment during this session and vice versa.
-*/
-   if ((tp->t_flags & TF_RCVD_TSTMP) && !(to.to_flags & TOF_TS)) {
-   if ((s = tcp_log_addrs(inc, th, NULL, NULL))) {
-   log(LOG_DEBUG, "%s; %s: Timestamp missing, "
-   "no action\n", s, __func__);
-   free(s, M_TCPLOG);
-   }
-   }
-   if (!(tp->t_flags & TF_RCVD_TSTMP) && (to.to_flags & TOF_TS)) {
-   if ((s = tcp_log_addrs(inc, th, NULL, NULL))) {
-   log(LOG_DEBUG, "%s; %s: Timestamp not expected, "
-   "no action\n", s, __func__);
-   free(s, M_TCPLOG);
-   }
-   }
-
-   /*
 * Process options only when we get SYN/ACK back. The SYN case
 * for incoming connections is handled in tcp_syncache.
 * According to RFC1323 the window field in a SYN (i.e., a 
@@ -1732,6 +1713,25 @@ tcp_do_segment(struct mbuf *m, struct tcphdr *th, stru
if ((tp->t_flags & TF_SACK_PERMIT) &&
(to.to_flags & TOF_SACKPERM) == 0)
tp->t_flags &= ~TF_SACK_PERMIT;
+   }
+
+   /*
+* If timestamps were negotiated during SYN/ACK they should
+* appear on every segment during this session and vice versa.
+*/
+   if ((tp->t_flags & TF_RCVD_TSTMP) && !(to.to_flags & TOF_TS)) {
+   if ((s = tcp_log_addrs(inc, th, NULL, NULL))) {
+   log(LOG_DEBUG, "%s; %s: Timestamp missing, "
+   "no action\n", s, __func__);
+   free(s, M_TCPLOG);
+   }
+   }
+   if (!(tp->t_flags & TF_RCVD_TSTMP) && (to.to_flags & TOF_TS)) {
+   if ((s = tcp_log_addrs(inc, th, NULL, NULL))) {
+   log(LOG_DEBUG, "%s; %s: Timestamp not expected, "
+   "no action\n", s, __func__);
+   free(s, M_TCPLOG);
+   }
}
 
/*
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r325092 - head/usr.bin/fortune/datfiles

2017-10-29 Thread Steven Hartland

I’ve still had to use rehash on several occasions for it to detect new
apps, so remove that reference might be a mistake

On Sun, 29 Oct 2017 at 18:51, Cy Schubert  wrote:

> In message
>  om>
> , Warner Losh writes:
> > --94eb2c114c9a7c3c21055cb3566c
> > Content-Type: text/plain; charset="UTF-8"
> >
> > On Sun, Oct 29, 2017 at 8:26 AM, Ed Maste  wrote:
> >
> > > On 29 October 2017 at 00:53, Eitan Adler  wrote:
> > > > Author: eadler
> > > > Date: Sun Oct 29 04:53:33 2017
> > > > New Revision: 325092
> > > > URL: https://svnweb.freebsd.org/changeset/base/325092
> > > >
> > > > Log:
> > > >   Modernize freebsd-tips a bit
> > > ...
> > > >  %
> > > >  Want to run the same command again?
> > > > -In tcsh you can type "!!".
> > > > +Type "!!".
> > > >  %
> > >
> > > $ !!
> > > sh: !!: not found
> > >
> > > I doubt many people use /bin/sh as an interactive shell, but the tip
> > > ought not lead those who do astray
> > >
> >
> > Yes. /bin/sh on FreeBSD doesn't grok it, though bash and some other
> shells
> > available as ports do. I think that the old text was a bit better.
>
> Or better yet, ctrl-r in bash and zsh, or up-arrow in tcsh.
>
>
> --
> Cheers,
> Cy Schubert 
> FreeBSD UNIX:     Web:  http://www.FreeBSD.org
>
> The need of the many outweighs the greed of the few.
>
>
>
>
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r324983 - in head: lib/libc/sys sys/sys

2017-10-25 Thread Steven Hartland

Personally I would expect the fallback to be reboot as without the 
ability to power back on remotely e.g. IPMI this could render the 
machine inaccessible, which is not ideal, thoughts?


On 25/10/2017 16:30, Warner Losh wrote:

Author: imp
Date: Wed Oct 25 15:30:20 2017
New Revision: 324983
URL: https://svnweb.freebsd.org/changeset/base/324983

Log:
   Define RB_POWERCYCLE
   
   RB_POWERCYCLE instructs the platform to power off and then power back

   on a short time later, if that's possible. Otherwise, degrade to the
   RB_POWEROFF behavior.
   
   Sponsored by: Netflix


Modified:
   head/lib/libc/sys/reboot.2
   head/sys/sys/reboot.h

Modified: head/lib/libc/sys/reboot.2
==
--- head/lib/libc/sys/reboot.2  Wed Oct 25 15:28:05 2017(r324982)
+++ head/lib/libc/sys/reboot.2  Wed Oct 25 15:30:20 2017(r324983)
@@ -28,7 +28,7 @@
  .\" @(#)reboot.2 8.1 (Berkeley) 6/4/93
  .\" $FreeBSD$
  .\"
-.Dd September 18, 2015
+.Dd October 24, 2017
  .Dt REBOOT 2
  .Os
  .Sh NAME
@@ -84,6 +84,14 @@ for more information.
  .It Dv RB_HALT
  The processor is simply halted; no reboot takes place.
  This option should be used with caution.
+.It Dv RB_POWERCYCLE
+After halting, the shutdown code will do what it can to turn
+off the power and then turn the power back on.
+This requires hardware support, usually an auxiliary microprocessor
+that can sequence the power supply.
+At present only the
+.Xr ipmi 4
+driver implements this feature.
  .It Dv RB_POWEROFF
  After halting, the shutdown code will do what it can to turn
  off the power.

Modified: head/sys/sys/reboot.h
==
--- head/sys/sys/reboot.h   Wed Oct 25 15:28:05 2017(r324982)
+++ head/sys/sys/reboot.h   Wed Oct 25 15:30:20 2017(r324983)
@@ -60,6 +60,7 @@
  #define   RB_RESERVED20x8 /* reserved for internal use of boot 
blocks */
  #define   RB_PAUSE0x10 /* pause after each output line during 
probe */
  #define   RB_REROOT   0x20 /* unmount the rootfs and mount it 
again */
+#defineRB_POWERCYCLE   0x40 /* Power cycle if possible */
  #define   RB_MULTIPLE 0x2000  /* use multiple consoles */
  
  #define	RB_BOOTINFO	0x8000	/* have `struct bootinfo *' arg */




___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r318751 - in head/sys: kern sys

2017-10-21 Thread Steven Hartland

Personally I hate that idea as like being able to see all the processes
from the host.

I have a similar hate of Linux containers where you have to jump though
hoops just to see whats really happening on the host.

On Sat, 21 Oct 2017 at 20:29, Allan Jude  wrote:

> On 2017-05-23 12:59, Steve Wills wrote:
> > Author: swills (ports committer)
> > Date: Tue May 23 16:59:24 2017
> > New Revision: 318751
> > URL: https://svnweb.freebsd.org/changeset/base/318751
> >
> > Log:
> >   Add security.bsd.see_jail_proc
> >
> >   Add security.bsd.see_jail_proc sysctl to hide jail processes from
> non-root
> >   users
> >
> >   Reviewed by:jamie
> >   Approved by:allanjude
> >   Relnotes:   yes
> >   Differential Revision:  https://reviews.freebsd.org/D10770
> >
> I user was asking about this issue on IRC today.
>
> I think I have changed my mind a bit.
>
> I think we should make the default be off (so you can't see processes in
> a jail from the host) by default in 12.
>
> And that we should MFC this sysctl to stable/11, but not change the
> default behaviour there.
>
> Anyone else have thoughts?
>
> --
> Allan Jude
>
>
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r324449 - in head/sys/boot: arm/uboot efi/boot1 sparc64/loader

2017-10-09 Thread Steven Hartland


Yer even no -j fails :(

On 10/10/2017 01:01, Warner Losh wrote:
Oh, killed /usr/include/stand.h and found it. I'll post a fix when I 
get back.


On Mon, Oct 9, 2017 at 6:00 PM, Warner Losh <i...@bsdimp.com 
<mailto:i...@bsdimp.com>> wrote:


Can you find out? A clean build works for me. Chances are good
that sys/boot/efi/boot1/Makefile needs a line like
CFLAGS+=-I${SASRC} or similar. I have to go out for 2 hours, but
will look into when I get back if you can't make progress. I don't
see one there and I had to add it a couple of other places.

Warner

On Mon, Oct 9, 2017 at 5:56 PM, Steven Hartland
<steven.hartl...@multiplay.co.uk
<mailto:steven.hartl...@multiplay.co.uk>> wrote:

Not sure which of these sets of changes caused the issue but a
clean build from scratch is currently failing here with:

In file included from
/usr/home/smh/freebsd/base/head/sys/boot/efi/boot1/ufs_module.c:41:
In file included from
/usr/home/smh/freebsd/base/head/sys/boot/efi/boot1/boot_module.h:35:

/usr/home/smh/freebsd/base/head/sys/boot/efi/boot1/../include/efilib.h:33:10:
fatal error: 'stand.h' file not found
#include 
 ^

Build was with -j24 in case it matters, going to try without
-j but that will take many hours


On 09/10/2017 23:11, Warner Losh wrote:

Author: imp
Date: Mon Oct  9 22:11:57 2017
New Revision: 324449
URL:https://svnweb.freebsd.org/changeset/base/324449
<https://svnweb.freebsd.org/changeset/base/324449>

Log:
   Prefer ${LIBSTAND} to -lstand
   
   Sponsored by: Netflix


Modified:
   head/sys/boot/arm/uboot/Makefile
   head/sys/boot/efi/boot1/Makefile
   head/sys/boot/sparc64/loader/Makefile

Modified: head/sys/boot/arm/uboot/Makefile

==
--- head/sys/boot/arm/uboot/MakefileMon Oct  9 21:06:16 2017
(r324448)
+++ head/sys/boot/arm/uboot/MakefileMon Oct  9 22:11:57 2017
(r324449)
@@ -121,7 +121,7 @@ CFLAGS+=-fPIC
  NO_WERROR.clang=
  
  DPADD=		${LIBFICL} ${LIBUBOOT} ${LIBFDT} ${LIBUBOOT_FDT} ${LIBSTAND}

-LDADD= ${LIBFICL} ${LIBUBOOT} ${LIBFDT} ${LIBUBOOT_FDT} -lstand
+LDADD= ${LIBFICL} ${LIBUBOOT} ${LIBFDT} ${LIBUBOOT_FDT} 
${LIBSTAND}
  
  OBJS+=  ${SRCS:N*.h:R:S/$/.o/g}
  


Modified: head/sys/boot/efi/boot1/Makefile

==
--- head/sys/boot/efi/boot1/MakefileMon Oct  9 21:06:16 2017
(r324448)
+++ head/sys/boot/efi/boot1/MakefileMon Oct  9 22:11:57 2017
(r324449)
@@ -91,7 +91,7 @@ LIBEFI=   ${.OBJDIR}/../libefi/libefi.a
  # as well as required string and memory functions for all platforms.
  #
  DPADD+=   ${LIBEFI} ${LIBSTAND}
-LDADD+=${LIBEFI} -lstand
+LDADD+=${LIBEFI} ${LIBSTAND}
  
  DPADD+=		${LDSCRIPT}
  


Modified: head/sys/boot/sparc64/loader/Makefile

==
--- head/sys/boot/sparc64/loader/Makefile   Mon Oct  9 21:06:16 
2017(r324448)
+++ head/sys/boot/sparc64/loader/Makefile   Mon Oct  9 22:11:57 
2017(r324449)
@@ -86,7 +86,7 @@ CFLAGS+=  -I${.CURDIR}/../../../../lib/libstand/
  CFLAGS+=  -I${SRCTOP}/sys
  
  DPADD=		${LIBFICL} ${LIBZFSBOOT} ${LIBOFW} ${LIBSTAND}

-LDADD= ${LIBFICL} ${LIBZFSBOOT} ${LIBOFW} -lstand
+LDADD= ${LIBFICL} ${LIBZFSBOOT} ${LIBOFW} ${LIBSTAND}
  
  loader.help: help.common help.sparc64

cat ${.ALLSRC} | \







___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r324449 - in head/sys/boot: arm/uboot efi/boot1 sparc64/loader

2017-10-09 Thread Steven Hartland

Not sure which of these sets of changes caused the issue but a clean 
build from scratch is currently failing here with:


In file included from 
/usr/home/smh/freebsd/base/head/sys/boot/efi/boot1/ufs_module.c:41:
In file included from 
/usr/home/smh/freebsd/base/head/sys/boot/efi/boot1/boot_module.h:35:
/usr/home/smh/freebsd/base/head/sys/boot/efi/boot1/../include/efilib.h:33:10: 
fatal error: 'stand.h' file not found

#include 
 ^

Build was with -j24 in case it matters, going to try without -j but that 
will take many hours


On 09/10/2017 23:11, Warner Losh wrote:

Author: imp
Date: Mon Oct  9 22:11:57 2017
New Revision: 324449
URL: https://svnweb.freebsd.org/changeset/base/324449

Log:
   Prefer ${LIBSTAND} to -lstand
   
   Sponsored by: Netflix


Modified:
   head/sys/boot/arm/uboot/Makefile
   head/sys/boot/efi/boot1/Makefile
   head/sys/boot/sparc64/loader/Makefile

Modified: head/sys/boot/arm/uboot/Makefile
==
--- head/sys/boot/arm/uboot/MakefileMon Oct  9 21:06:16 2017
(r324448)
+++ head/sys/boot/arm/uboot/MakefileMon Oct  9 22:11:57 2017
(r324449)
@@ -121,7 +121,7 @@ CFLAGS+=-fPIC
  NO_WERROR.clang=
  
  DPADD=		${LIBFICL} ${LIBUBOOT} ${LIBFDT} ${LIBUBOOT_FDT} ${LIBSTAND}

-LDADD= ${LIBFICL} ${LIBUBOOT} ${LIBFDT} ${LIBUBOOT_FDT} -lstand
+LDADD= ${LIBFICL} ${LIBUBOOT} ${LIBFDT} ${LIBUBOOT_FDT} ${LIBSTAND}
  
  OBJS+=  ${SRCS:N*.h:R:S/$/.o/g}
  


Modified: head/sys/boot/efi/boot1/Makefile
==
--- head/sys/boot/efi/boot1/MakefileMon Oct  9 21:06:16 2017
(r324448)
+++ head/sys/boot/efi/boot1/MakefileMon Oct  9 22:11:57 2017
(r324449)
@@ -91,7 +91,7 @@ LIBEFI=   ${.OBJDIR}/../libefi/libefi.a
  # as well as required string and memory functions for all platforms.
  #
  DPADD+=   ${LIBEFI} ${LIBSTAND}
-LDADD+=${LIBEFI} -lstand
+LDADD+=${LIBEFI} ${LIBSTAND}
  
  DPADD+=		${LDSCRIPT}
  


Modified: head/sys/boot/sparc64/loader/Makefile
==
--- head/sys/boot/sparc64/loader/Makefile   Mon Oct  9 21:06:16 2017
(r324448)
+++ head/sys/boot/sparc64/loader/Makefile   Mon Oct  9 22:11:57 2017
(r324449)
@@ -86,7 +86,7 @@ CFLAGS+=  -I${.CURDIR}/../../../../lib/libstand/
  CFLAGS+=  -I${SRCTOP}/sys
  
  DPADD=		${LIBFICL} ${LIBZFSBOOT} ${LIBOFW} ${LIBSTAND}

-LDADD= ${LIBFICL} ${LIBZFSBOOT} ${LIBOFW} -lstand
+LDADD= ${LIBFICL} ${LIBZFSBOOT} ${LIBOFW} ${LIBSTAND}
  
  loader.help: help.common help.sparc64

cat ${.ALLSRC} | \



___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r323566 - head/sys/kern

2017-09-14 Thread Steven Hartland


Is this something that will be MFC'ed to 11 or is this 12 / CURRENT only?

On 13/09/2017 23:11, Gleb Smirnoff wrote:

Author: glebius
Date: Wed Sep 13 22:11:05 2017
New Revision: 323566
URL: https://svnweb.freebsd.org/changeset/base/323566

Log:
   Use soref() in sendfile(2) instead fhold() to reference a socket.
   
   The problem is that fdrop() requires syscall context, as it may

   enter sleep in some cases.  The reason to use it in the original
   non-blocking sendfile implementation, was to avoid use of global
   ACCEPT_LOCK() on every I/O completion. Now in head sorele() no
   longer requires this lock.

Modified:
   head/sys/kern/kern_sendfile.c

Modified: head/sys/kern/kern_sendfile.c
==
--- head/sys/kern/kern_sendfile.c   Wed Sep 13 21:56:49 2017
(r323565)
+++ head/sys/kern/kern_sendfile.c   Wed Sep 13 22:11:05 2017
(r323566)
@@ -80,7 +80,7 @@ struct sf_io {
volatile u_int  nios;
u_int   error;
int npages;
-   struct file *sock_fp;
+   struct socket   *so;
struct mbuf *m;
vm_page_t   pa[];
  };
@@ -255,7 +255,7 @@ static void
  sendfile_iodone(void *arg, vm_page_t *pg, int count, int error)
  {
struct sf_io *sfio = arg;
-   struct socket *so;
+   struct socket *so = sfio->so;
  
  	for (int i = 0; i < count; i++)

if (pg[i] != bogus_page)
@@ -267,8 +267,6 @@ sendfile_iodone(void *arg, vm_page_t *pg, int count, i
if (!refcount_release(>nios))
return;
  
-	so = sfio->sock_fp->f_data;

-
if (sfio->error) {
struct mbuf *m;
  
@@ -296,8 +294,8 @@ sendfile_iodone(void *arg, vm_page_t *pg, int count, i

CURVNET_RESTORE();
}
  
-	/* XXXGL: curthread */

-   fdrop(sfio->sock_fp, curthread);
+   SOCK_LOCK(so);
+   sorele(so);
free(sfio, M_TEMP);
  }
  
@@ -724,6 +722,7 @@ retry_space:

sfio = malloc(sizeof(struct sf_io) +
npages * sizeof(vm_page_t), M_TEMP, M_WAITOK);
refcount_init(>nios, 1);
+   sfio->so = so;
sfio->error = 0;
  
  		nios = sendfile_swapin(obj, sfio, off, space, npages, rhpages,

@@ -858,9 +857,8 @@ prepend_header:
error = (*so->so_proto->pr_usrreqs->pru_send)
(so, 0, m, NULL, NULL, td);
} else {
-   sfio->sock_fp = sock_fp;
sfio->npages = npages;
-   fhold(sock_fp);
+   soref(so);
error = (*so->so_proto->pr_usrreqs->pru_send)
(so, PRUS_NOTREADY, m, NULL, NULL, td);
sendfile_iodone(sfio, NULL, 0, 0);



___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

svn commit: r322881 - head/usr.bin/calendar/calendars

2017-08-25 Thread Steven Hartland

Author: smh
Date: Fri Aug 25 08:21:02 2017
New Revision: 322881
URL: https://svnweb.freebsd.org/changeset/base/322881

Log:
  Add myself (smh) to calendar.freebsd
  
  Sponsored by: Multiplay

Modified:
  head/usr.bin/calendar/calendars/calendar.freebsd

Modified: head/usr.bin/calendar/calendars/calendar.freebsd
==
--- head/usr.bin/calendar/calendars/calendar.freebsdFri Aug 25 07:49:51 
2017(r322880)
+++ head/usr.bin/calendar/calendars/calendar.freebsdFri Aug 25 08:21:02 
2017(r322881)
@@ -333,6 +333,7 @@
 09/07  Chris Rees <cr...@freebsd.org> born in Kettering, United Kingdom, 1987
 09/08  Boris Samorodov <b...@freebsd.org> born in Krasnodar, Russian 
Federation, 1963
 09/09  Yoshio Mita <m...@freebsd.org> born in Hiroshima, Japan, 1972
+09/09  Steven Hartland <s...@freebsd.org> born in Wordsley, United Kingdom, 
1973
 09/10  Wesley R. Peters <w...@freebsd.org> born in Hartford, Alabama, United 
States, 1961
 09/12  Weongyo Jeong <weon...@freebsd.org> born in Haman, Korea, 1980
 09/12  Benedict Christopher Reuschling <b...@freebsd.org> born in Darmstadt, 
Germany, 1981
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r322619 - stable/11/usr.bin/grep

2017-08-17 Thread Steven Hartland


This seems a little quick considering it only hit head 8 mins ago.

On 17/08/2017 14:48, Kyle Evans wrote:

Author: kevans
Date: Thu Aug 17 13:48:46 2017
New Revision: 322619
URL: https://svnweb.freebsd.org/changeset/base/322619

Log:
   bsdgrep: fix build when linking against libgnuregex
   
   MFC r322618: bsdgrep: cast pmatch.rm_so to fix build when linking against

   libgnuregex
   
   Approved by:	emaste (mentor)


Modified:
   stable/11/usr.bin/grep/util.c
Directory Properties:
   stable/11/   (props changed)

Modified: stable/11/usr.bin/grep/util.c
==
--- stable/11/usr.bin/grep/util.c   Thu Aug 17 13:40:45 2017
(r322618)
+++ stable/11/usr.bin/grep/util.c   Thu Aug 17 13:48:46 2017
(r322619)
@@ -450,7 +450,7 @@ procline(struct parsec *pc)
 */
if (r == REG_NOMATCH &&
(retry == pc->lnstart ||
-   pmatch.rm_so + 1 < retry))
+   (unsigned int)pmatch.rm_so + 1 < retry))
retry = pmatch.rm_so + 1;
if (r == REG_NOMATCH)
continue;



___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

svn commit: r320138 - head/usr.sbin/bsdinstall/scripts

2017-06-20 Thread Steven Hartland

Author: smh
Date: Tue Jun 20 08:03:50 2017
New Revision: 320138
URL: https://svnweb.freebsd.org/changeset/base/320138

Log:
  Fixed bsdinstall location of vfs.zfs.min_auto_ashift
  
  vfs.zfs.min_auto_ashift is a sysctl only not a tunable so updated bsdinstall
  to use the correct location /etc/sysctl.conf instead of /boot/loader.conf
  
  Reported by:  Aaron Caza
  Reviewed by:  allanjude
  MFC after:2 days
  Sponsored by: Multiplay
  Differential Revision:https://reviews.freebsd.org/D11278

Modified:
  head/usr.sbin/bsdinstall/scripts/config
  head/usr.sbin/bsdinstall/scripts/zfsboot

Modified: head/usr.sbin/bsdinstall/scripts/config
==
--- head/usr.sbin/bsdinstall/scripts/config Tue Jun 20 08:01:13 2017
(r320137)
+++ head/usr.sbin/bsdinstall/scripts/config Tue Jun 20 08:03:50 2017
(r320138)
@@ -32,7 +32,7 @@
 cat $BSDINSTALL_TMPETC/rc.conf.* >> $BSDINSTALL_TMPETC/rc.conf
 rm $BSDINSTALL_TMPETC/rc.conf.*
 
-cat $BSDINSTALL_CHROOT/etc/sysctl.conf 
$BSDINSTALL_TMPETC/sysctl.conf.hardening >> $BSDINSTALL_TMPETC/sysctl.conf
+cat $BSDINSTALL_CHROOT/etc/sysctl.conf $BSDINSTALL_TMPETC/sysctl.conf.* >> 
$BSDINSTALL_TMPETC/sysctl.conf
 rm $BSDINSTALL_TMPETC/sysctl.conf.*
 
 cp $BSDINSTALL_TMPETC/* $BSDINSTALL_CHROOT/etc

Modified: head/usr.sbin/bsdinstall/scripts/zfsboot
==
--- head/usr.sbin/bsdinstall/scripts/zfsbootTue Jun 20 08:01:13 2017
(r320137)
+++ head/usr.sbin/bsdinstall/scripts/zfsbootTue Jun 20 08:03:50 2017
(r320138)
@@ -1446,7 +1446,7 @@ zfs_create_boot()
if [ "$ZFSBOOT_FORCE_4K_SECTORS" ]; then
f_eval_catch $funcname echo "$ECHO_APPEND" \
 'vfs.zfs.min_auto_ashift=12' \
-$BSDINSTALL_TMPBOOT/loader.conf.zfs || return $FAILURE
+$BSDINSTALL_TMPETC/sysctl.conf.zfs || return $FAILURE
fi
 
if [ "$ZFSBOOT_SWAP_MIRROR" ]; then
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

svn commit: r318438 - in stable/10: cddl/lib/libdtrace sys/netinet

2017-05-17 Thread Steven Hartland

Author: smh
Date: Thu May 18 03:32:01 2017
New Revision: 318438
URL: https://svnweb.freebsd.org/changeset/base/318438

Log:
  Revert the partial MFC of r313045 which broke dtrace
  
  This removes the mbuf to ipinfo_t translator and switches tcp_autorcvbuf to
  use the older mtod macro.
  
  This was originally merged to stable/10 as part of r317375.
  
  Reported by:  markj
  Reviewed by:  markj, hiren
  Sponsored by: Multiplay
  Differential Revision:  https://reviews.freebsd.org/D10769

Modified:
  stable/10/cddl/lib/libdtrace/ip.d
  stable/10/sys/netinet/in_kdtrace.c
  stable/10/sys/netinet/tcp_input.c
Directory Properties:
  stable/10/   (props changed)

Modified: stable/10/cddl/lib/libdtrace/ip.d
==
--- stable/10/cddl/lib/libdtrace/ip.d   Thu May 18 01:46:30 2017
(r318437)
+++ stable/10/cddl/lib/libdtrace/ip.d   Thu May 18 03:32:01 2017
(r318438)
@@ -240,24 +240,6 @@ translator ipinfo_t < uint8_t *p > {
 #pragma D binding "1.0" IFF_LOOPBACK
 inline int IFF_LOOPBACK =  0x8;
 
-#pragma D binding "1.13" translator
-translator ipinfo_t < struct mbuf *m > {
-   ip_ver =m == NULL ? 0 : ((struct ip *)m->m_data)->ip_v;
-   ip_plength =m == NULL ? 0 :
-   ((struct ip *)m->m_data)->ip_v == 4 ?
-   ntohs(((struct ip *)m->m_data)->ip_len) - 
-   (((struct ip *)m->m_data)->ip_hl << 2):
-   ntohs(((struct ip6_hdr 
*)m->m_data)->ip6_ctlun.ip6_un1.ip6_un1_plen);
-   ip_saddr =  m == NULL ? 0 :
-   ((struct ip *)m->m_data)->ip_v == 4 ?
-   inet_ntoa(&((struct ip *)m->m_data)->ip_src.s_addr) :
-   inet_ntoa6(&((struct ip6_hdr *)m->m_data)->ip6_src);
-   ip_daddr =  m == NULL ? 0 :
-   ((struct ip *)m->m_data)->ip_v == 4 ?
-   inet_ntoa(&((struct ip *)m->m_data)->ip_dst.s_addr) :
-   inet_ntoa6(&((struct ip6_hdr *)m->m_data)->ip6_dst);
-};
-
 #pragma D binding "1.0" translator
 translator ifinfo_t < struct ifnet *p > {
if_name =   p->if_xname;

Modified: stable/10/sys/netinet/in_kdtrace.c
==
--- stable/10/sys/netinet/in_kdtrace.c  Thu May 18 01:46:30 2017
(r318437)
+++ stable/10/sys/netinet/in_kdtrace.c  Thu May 18 03:32:01 2017
(r318438)
@@ -58,28 +58,28 @@ SDT_PROBE_DEFINE6_XLATE(ip, , , send,
 SDT_PROBE_DEFINE5_XLATE(tcp, , , accept__established,
 "void *", "pktinfo_t *",
 "struct tcpcb *", "csinfo_t *",
-"struct mbuf *", "ipinfo_t *",
+"uint8_t *", "ipinfo_t *",
 "struct tcpcb *", "tcpsinfo_t *" ,
 "struct tcphdr *", "tcpinfoh_t *");
 
 SDT_PROBE_DEFINE5_XLATE(tcp, , , accept__refused,
 "void *", "pktinfo_t *",
 "struct tcpcb *", "csinfo_t *",
-"struct mbuf *", "ipinfo_t *",
+"uint8_t *", "ipinfo_t *",
 "struct tcpcb *", "tcpsinfo_t *" ,
 "struct tcphdr *", "tcpinfo_t *");
 
 SDT_PROBE_DEFINE5_XLATE(tcp, , , connect__established,
 "void *", "pktinfo_t *",
 "struct tcpcb *", "csinfo_t *",
-"struct mbuf *", "ipinfo_t *",
+"uint8_t *", "ipinfo_t *",
 "struct tcpcb *", "tcpsinfo_t *" ,
 "struct tcphdr *", "tcpinfoh_t *");
 
 SDT_PROBE_DEFINE5_XLATE(tcp, , , connect__refused,
 "void *", "pktinfo_t *",
 "struct tcpcb *", "csinfo_t *",
-"struct mbuf *", "ipinfo_t *",
+"uint8_t *", "ipinfo_t *",
 "struct tcpcb *", "tcpsinfo_t *" ,
 "struct tcphdr *", "tcpinfoh_t *");
 
@@ -93,7 +93,7 @@ SDT_PROBE_DEFINE5_XLATE(tcp, , , connect
 SDT_PROBE_DEFINE5_XLATE(tcp, , , receive,
 "void *", "pktinfo_t *",
 "struct tcpcb *", "csinfo_t *",
-"struct mbuf *", "ipinfo_t *",
+"uint8_t *", "ipinfo_t *",
 "struct tcpcb *", "tcpsinfo_t *" ,
 "struct tcphdr *", "tcpinfoh_t *");
 
@@ -115,7 +115,7 @@ SDT_PROBE_DEFINE6_XLATE(tcp, , , state__
 SDT_PROBE_DEFINE6_XLATE(tcp, , , receive__autoresize,
 "void *", "void *",
 "struct tcpcb *", "csinfo_t *",
-"struct mbuf *", "ipinfo_t *",
+"uint8_t *", "ipinfo_t *",
 "struct tcpcb *", "tcpsinfo_t *" ,
 "struct tcphdr *", "tcpinfoh_t *",
 "int", "int");

Modified: stable/10/sys/netinet/tcp_input.c
==
--- stable/10/sys/netinet/tcp_input.c   Thu May 18 01:46:30 2017
(r318437)
+++ stable/10/sys/netinet/tcp_input.c   Thu May 18 03:32:01 2017
(r318438)
@@ -1519,7 +1519,8 @@ tcp_autorcvbuf(struct mbuf *m, struct tc
newsize = min(so->so_rcv.sb_hiwat +
V_tcp_autorcvbuf_inc, V_tcp_autorcvbuf_max);
}
-   TCP_PROBE6(receive__autoresize, NULL, tp, m, tp, th, newsize);
+   TCP_PROBE6(receive__autoresize, NULL, tp, mtod(m, const char *),
+   tp, th, newsize);
 
/* Start over with next RTT. */
tp->rfbuf_ts = 0;

svn commit: r317470 - stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs

2017-04-26 Thread Steven Hartland

Author: smh
Date: Wed Apr 26 22:25:01 2017
New Revision: 317470
URL: https://svnweb.freebsd.org/changeset/base/317470

Log:
  MFC r315449:
  
  Reduce ARC fragmentation threshold
  
  Sponsored by: Multiplay

Modified:
  stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
Directory Properties:
  stable/11/   (props changed)

Modified: stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
==
--- stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c  Wed Apr 
26 22:23:42 2017(r317469)
+++ stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c  Wed Apr 
26 22:25:01 2017(r317470)
@@ -3978,7 +3978,7 @@ arc_available_memory(void)
 * Start aggressive reclamation if too little sequential KVA left.
 */
if (lowest > 0) {
-   n = (vmem_size(heap_arena, VMEM_MAXFREE) < zfs_max_recordsize) ?
+   n = (vmem_size(heap_arena, VMEM_MAXFREE) < SPA_MAXBLOCKSIZE) ?
-((int64_t)vmem_size(heap_arena, VMEM_ALLOC) >> 4) :
INT64_MAX;
if (n < lowest) {
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

svn commit: r317469 - stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs

2017-04-26 Thread Steven Hartland

Author: smh
Date: Wed Apr 26 22:23:42 2017
New Revision: 317469
URL: https://svnweb.freebsd.org/changeset/base/317469

Log:
  MFC r316460:
  
  Fix expandsz 16.0E vals and vdev_min_asize of RAIDZ children
  
  Sponsored by: Multiplay

Modified:
  stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c
Directory Properties:
  stable/11/   (props changed)

Modified: stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c
==
--- stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c Wed Apr 
26 22:17:54 2017(r317468)
+++ stable/11/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c Wed Apr 
26 22:23:42 2017(r317469)
@@ -228,7 +228,8 @@ vdev_get_min_asize(vdev_t *vd)
 * so each child must provide at least 1/Nth of its asize.
 */
if (pvd->vdev_ops == _raidz_ops)
-   return (pvd->vdev_min_asize / pvd->vdev_children);
+   return ((pvd->vdev_min_asize + pvd->vdev_children - 1) /
+   pvd->vdev_children);
 
return (pvd->vdev_min_asize);
 }
@@ -1376,7 +1377,7 @@ vdev_open(vdev_t *vd)
vd->vdev_psize = psize;
 
/*
-* Make sure the allocatable size hasn't shrunk.
+* Make sure the allocatable size hasn't shrunk too much.
 */
if (asize < vd->vdev_min_asize) {
vdev_set_state(vd, B_TRUE, VDEV_STATE_CANT_OPEN,
@@ -1416,12 +1417,21 @@ vdev_open(vdev_t *vd)
}
 
/*
-* If all children are healthy and the asize has increased,
-* then we've experienced dynamic LUN growth.  If automatic
-* expansion is enabled then use the additional space.
-*/
-   if (vd->vdev_state == VDEV_STATE_HEALTHY && asize > vd->vdev_asize &&
-   (vd->vdev_expanding || spa->spa_autoexpand))
+* If all children are healthy we update asize if either:
+* The asize has increased, due to a device expansion caused by dynamic
+* LUN growth or vdev replacement, and automatic expansion is enabled;
+* making the additional space available.
+*
+* The asize has decreased, due to a device shrink usually caused by a
+* vdev replace with a smaller device. This ensures that calculations
+* based of max_asize and asize e.g. esize are always valid. It's safe
+* to do this as we've already validated that asize is greater than
+* vdev_min_asize.
+*/
+   if (vd->vdev_state == VDEV_STATE_HEALTHY &&
+   ((asize > vd->vdev_asize &&
+   (vd->vdev_expanding || spa->spa_autoexpand)) ||
+   (asize < vd->vdev_asize)))
vd->vdev_asize = asize;
 
vdev_set_min_asize(vd);
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

svn commit: r317375 - in stable/10: cddl/lib/libdtrace sys/netinet

2017-04-24 Thread Steven Hartland

Author: smh
Date: Mon Apr 24 16:31:28 2017
New Revision: 317375
URL: https://svnweb.freebsd.org/changeset/base/317375

Log:
  Partial MFC r316676 and the required r313045
  
  MFC r316676:
  
  Use estimated RTT for receive buffer auto resizing instead of timestamps.
  This is a partial MFC as stable/10 doesn't include the TCP stack
  modularisation.
  
  MFC r313045:
  
  Add an mbuf to ipinfo_t translator to finish cleanup of mbuf passing to TCP
  probes. This is a partial MFC (missing debug__output & debug__drop changes)
  due to the massive amount of additional dtrace changes that would be
  required for a full MFC.
  
  Relnotes: Yes
  Sponsored by: Multiplay

Modified:
  stable/10/cddl/lib/libdtrace/ip.d
  stable/10/sys/netinet/in_kdtrace.c
  stable/10/sys/netinet/in_kdtrace.h
  stable/10/sys/netinet/tcp_input.c
  stable/10/sys/netinet/tcp_output.c
  stable/10/sys/netinet/tcp_var.h
Directory Properties:
  stable/10/   (props changed)

Modified: stable/10/cddl/lib/libdtrace/ip.d
==
--- stable/10/cddl/lib/libdtrace/ip.d   Mon Apr 24 16:07:30 2017
(r317374)
+++ stable/10/cddl/lib/libdtrace/ip.d   Mon Apr 24 16:31:28 2017
(r317375)
@@ -240,6 +240,24 @@ translator ipinfo_t < uint8_t *p > {
 #pragma D binding "1.0" IFF_LOOPBACK
 inline int IFF_LOOPBACK =  0x8;
 
+#pragma D binding "1.13" translator
+translator ipinfo_t < struct mbuf *m > {
+   ip_ver =m == NULL ? 0 : ((struct ip *)m->m_data)->ip_v;
+   ip_plength =m == NULL ? 0 :
+   ((struct ip *)m->m_data)->ip_v == 4 ?
+   ntohs(((struct ip *)m->m_data)->ip_len) - 
+   (((struct ip *)m->m_data)->ip_hl << 2):
+   ntohs(((struct ip6_hdr 
*)m->m_data)->ip6_ctlun.ip6_un1.ip6_un1_plen);
+   ip_saddr =  m == NULL ? 0 :
+   ((struct ip *)m->m_data)->ip_v == 4 ?
+   inet_ntoa(&((struct ip *)m->m_data)->ip_src.s_addr) :
+   inet_ntoa6(&((struct ip6_hdr *)m->m_data)->ip6_src);
+   ip_daddr =  m == NULL ? 0 :
+   ((struct ip *)m->m_data)->ip_v == 4 ?
+   inet_ntoa(&((struct ip *)m->m_data)->ip_dst.s_addr) :
+   inet_ntoa6(&((struct ip6_hdr *)m->m_data)->ip6_dst);
+};
+
 #pragma D binding "1.0" translator
 translator ifinfo_t < struct ifnet *p > {
if_name =   p->if_xname;

Modified: stable/10/sys/netinet/in_kdtrace.c
==
--- stable/10/sys/netinet/in_kdtrace.c  Mon Apr 24 16:07:30 2017
(r317374)
+++ stable/10/sys/netinet/in_kdtrace.c  Mon Apr 24 16:31:28 2017
(r317375)
@@ -58,28 +58,28 @@ SDT_PROBE_DEFINE6_XLATE(ip, , , send,
 SDT_PROBE_DEFINE5_XLATE(tcp, , , accept__established,
 "void *", "pktinfo_t *",
 "struct tcpcb *", "csinfo_t *",
-"uint8_t *", "ipinfo_t *",
+"struct mbuf *", "ipinfo_t *",
 "struct tcpcb *", "tcpsinfo_t *" ,
 "struct tcphdr *", "tcpinfoh_t *");
 
 SDT_PROBE_DEFINE5_XLATE(tcp, , , accept__refused,
 "void *", "pktinfo_t *",
 "struct tcpcb *", "csinfo_t *",
-"uint8_t *", "ipinfo_t *",
+"struct mbuf *", "ipinfo_t *",
 "struct tcpcb *", "tcpsinfo_t *" ,
 "struct tcphdr *", "tcpinfo_t *");
 
 SDT_PROBE_DEFINE5_XLATE(tcp, , , connect__established,
 "void *", "pktinfo_t *",
 "struct tcpcb *", "csinfo_t *",
-"uint8_t *", "ipinfo_t *",
+"struct mbuf *", "ipinfo_t *",
 "struct tcpcb *", "tcpsinfo_t *" ,
 "struct tcphdr *", "tcpinfoh_t *");
 
 SDT_PROBE_DEFINE5_XLATE(tcp, , , connect__refused,
 "void *", "pktinfo_t *",
 "struct tcpcb *", "csinfo_t *",
-"uint8_t *", "ipinfo_t *",
+"struct mbuf *", "ipinfo_t *",
 "struct tcpcb *", "tcpsinfo_t *" ,
 "struct tcphdr *", "tcpinfoh_t *");
 
@@ -93,7 +93,7 @@ SDT_PROBE_DEFINE5_XLATE(tcp, , , connect
 SDT_PROBE_DEFINE5_XLATE(tcp, , , receive,
 "void *", "pktinfo_t *",
 "struct tcpcb *", "csinfo_t *",
-"uint8_t *", "ipinfo_t *",
+"struct mbuf *", "ipinfo_t *",
 "struct tcpcb *", "tcpsinfo_t *" ,
 "struct tcphdr *", "tcpinfoh_t *");
 
@@ -112,6 +112,14 @@ SDT_PROBE_DEFINE6_XLATE(tcp, , , state__
 "void *", "void *",
 "int", "tcplsinfo_t *");
 
+SDT_PROBE_DEFINE6_XLATE(tcp, , , receive__autoresize,
+"void *", "void *",
+"struct tcpcb *", "csinfo_t *",
+"struct mbuf *", "ipinfo_t *",
+"struct tcpcb *", "tcpsinfo_t *" ,
+"struct tcphdr *", "tcpinfoh_t *",
+"int", "int");
+
 SDT_PROBE_DEFINE5_XLATE(udp, , , receive,
 "void *", "pktinfo_t *",
 "struct inpcb *", "csinfo_t *",

Modified: stable/10/sys/netinet/in_kdtrace.h
==
--- stable/10/sys/netinet/in_kdtrace.h  Mon Apr 24 16:07:30 2017
(r317374)
+++ stable/10/sys/netinet/in_kdtrace.h  Mon Apr 24 16:31:28 2017
(r317375)
@@ -52,6 +52,7 @@ SDT_PROBE_DECLARE(tcp, , , connect__requ

svn commit: r317368 - in stable/11/sys/netinet: . tcp_stacks

2017-04-24 Thread Steven Hartland

Author: smh
Date: Mon Apr 24 11:34:02 2017
New Revision: 317368
URL: https://svnweb.freebsd.org/changeset/base/317368

Log:
  MFC r316676:
  
  Use estimated RTT for receive buffer auto resizing instead of timestamps
  
  Relnotes: Yes
  Sponsored by: Multiplay

Modified:
  stable/11/sys/netinet/in_kdtrace.c
  stable/11/sys/netinet/in_kdtrace.h
  stable/11/sys/netinet/tcp_input.c
  stable/11/sys/netinet/tcp_output.c
  stable/11/sys/netinet/tcp_stacks/fastpath.c
  stable/11/sys/netinet/tcp_var.h
Directory Properties:
  stable/11/   (props changed)

Modified: stable/11/sys/netinet/in_kdtrace.c
==
--- stable/11/sys/netinet/in_kdtrace.c  Mon Apr 24 11:22:06 2017
(r317367)
+++ stable/11/sys/netinet/in_kdtrace.c  Mon Apr 24 11:34:02 2017
(r317368)
@@ -132,6 +132,14 @@ SDT_PROBE_DEFINE6_XLATE(tcp, , , state__
 "void *", "void *",
 "int", "tcplsinfo_t *");
 
+SDT_PROBE_DEFINE6_XLATE(tcp, , , receive__autoresize,
+"void *", "void *",
+"struct tcpcb *", "csinfo_t *",
+"struct mbuf *", "ipinfo_t *",
+"struct tcpcb *", "tcpsinfo_t *" ,
+"struct tcphdr *", "tcpinfoh_t *",
+"int", "int");
+
 SDT_PROBE_DEFINE5_XLATE(udp, , , receive,
 "void *", "pktinfo_t *",
 "struct inpcb *", "csinfo_t *",

Modified: stable/11/sys/netinet/in_kdtrace.h
==
--- stable/11/sys/netinet/in_kdtrace.h  Mon Apr 24 11:22:06 2017
(r317367)
+++ stable/11/sys/netinet/in_kdtrace.h  Mon Apr 24 11:34:02 2017
(r317368)
@@ -65,6 +65,7 @@ SDT_PROBE_DECLARE(tcp, , , debug__input)
 SDT_PROBE_DECLARE(tcp, , , debug__output);
 SDT_PROBE_DECLARE(tcp, , , debug__user);
 SDT_PROBE_DECLARE(tcp, , , debug__drop);
+SDT_PROBE_DECLARE(tcp, , , receive__autoresize);
 
 SDT_PROBE_DECLARE(udp, , , receive);
 SDT_PROBE_DECLARE(udp, , , send);

Modified: stable/11/sys/netinet/tcp_input.c
==
--- stable/11/sys/netinet/tcp_input.c   Mon Apr 24 11:22:06 2017
(r317367)
+++ stable/11/sys/netinet/tcp_input.c   Mon Apr 24 11:34:02 2017
(r317368)
@@ -1473,6 +1473,68 @@ drop:
return (IPPROTO_DONE);
 }
 
+/*
+ * Automatic sizing of receive socket buffer.  Often the send
+ * buffer size is not optimally adjusted to the actual network
+ * conditions at hand (delay bandwidth product).  Setting the
+ * buffer size too small limits throughput on links with high
+ * bandwidth and high delay (eg. trans-continental/oceanic links).
+ *
+ * On the receive side the socket buffer memory is only rarely
+ * used to any significant extent.  This allows us to be much
+ * more aggressive in scaling the receive socket buffer.  For
+ * the case that the buffer space is actually used to a large
+ * extent and we run out of kernel memory we can simply drop
+ * the new segments; TCP on the sender will just retransmit it
+ * later.  Setting the buffer size too big may only consume too
+ * much kernel memory if the application doesn't read() from
+ * the socket or packet loss or reordering makes use of the
+ * reassembly queue.
+ *
+ * The criteria to step up the receive buffer one notch are:
+ *  1. Application has not set receive buffer size with
+ * SO_RCVBUF. Setting SO_RCVBUF clears SB_AUTOSIZE.
+ *  2. the number of bytes received during the time it takes
+ * one timestamp to be reflected back to us (the RTT);
+ *  3. received bytes per RTT is within seven eighth of the
+ * current socket buffer size;
+ *  4. receive buffer size has not hit maximal automatic size;
+ *
+ * This algorithm does one step per RTT at most and only if
+ * we receive a bulk stream w/o packet losses or reorderings.
+ * Shrinking the buffer during idle times is not necessary as
+ * it doesn't consume any memory when idle.
+ *
+ * TODO: Only step up if the application is actually serving
+ * the buffer to better manage the socket buffer resources.
+ */
+int
+tcp_autorcvbuf(struct mbuf *m, struct tcphdr *th, struct socket *so,
+struct tcpcb *tp, int tlen)
+{
+   int newsize = 0;
+
+   if (V_tcp_do_autorcvbuf && (so->so_rcv.sb_flags & SB_AUTOSIZE) &&
+   tp->t_srtt != 0 && tp->rfbuf_ts != 0 &&
+   TCP_TS_TO_TICKS(tcp_ts_getticks() - tp->rfbuf_ts) >
+   (tp->t_srtt >> TCP_RTT_SHIFT)) {
+   if (tp->rfbuf_cnt > (so->so_rcv.sb_hiwat / 8 * 7) &&
+   so->so_rcv.sb_hiwat < V_tcp_autorcvbuf_max) {
+   newsize = min(so->so_rcv.sb_hiwat +
+   V_tcp_autorcvbuf_inc, V_tcp_autorcvbuf_max);
+   }
+   TCP_PROBE6(receive__autoresize, NULL, tp, m, tp, th, newsize);
+
+   /* Start over with next RTT. */
+   tp->rfbuf_ts = 0;
+   tp->rfbuf_cnt = 0;
+   } else {
+   tp->rfbuf_cnt += tlen;  /* add up */
+   }
+
+   return

svn commit: r316944 - in stable/11: . sys/netinet sys/netinet6

2017-04-14 Thread Steven Hartland

Author: smh
Date: Fri Apr 14 22:02:08 2017
New Revision: 316944
URL: https://svnweb.freebsd.org/changeset/base/316944

Log:
  MFC r316313, r316328:
  
  Allow explicitly assigned IPv4 & IPv6 loopback addresses to be used in
  jails.
  
  Relnotes: Yes
  Sponsored by: Multiplay

Modified:
  stable/11/UPDATING
  stable/11/sys/netinet/in_jail.c
  stable/11/sys/netinet6/in6_jail.c
Directory Properties:
  stable/11/   (props changed)

Modified: stable/11/UPDATING
==
--- stable/11/UPDATING  Fri Apr 14 21:49:20 2017(r316943)
+++ stable/11/UPDATING  Fri Apr 14 22:02:08 2017(r316944)
@@ -16,6 +16,11 @@ from older versions of FreeBSD, try WITH
 the tip of head, and then rebuild without this option. The bootstrap process
 from older version of current across the gcc/clang cutover is a bit fragile.
 
+20170414:
+   Binds and sends to the loopback addresses, IPv6 and IPv4, will now
+   use any explicitly assigned loopback address available in the jail
+   instead of using the first assigned address of the jail.
+
 20170402:
Clang, llvm, lldb, compiler-rt and libc++ have been upgraded to 4.0.0.
Please see the 20141231 entry below for information about prerequisites

Modified: stable/11/sys/netinet/in_jail.c
==
--- stable/11/sys/netinet/in_jail.c Fri Apr 14 21:49:20 2017
(r316943)
+++ stable/11/sys/netinet/in_jail.c Fri Apr 14 22:02:08 2017
(r316944)
@@ -306,11 +306,6 @@ prison_local_ip4(struct ucred *cred, str
}
 
ia0.s_addr = ntohl(ia->s_addr);
-   if (ia0.s_addr == INADDR_LOOPBACK) {
-   ia->s_addr = pr->pr_ip4[0].s_addr;
-   mtx_unlock(>pr_mtx);
-   return (0);
-   }
 
if (ia0.s_addr == INADDR_ANY) {
/*
@@ -323,6 +318,11 @@ prison_local_ip4(struct ucred *cred, str
}
 
error = prison_check_ip4_locked(pr, ia);
+   if (error == EADDRNOTAVAIL && ia0.s_addr == INADDR_LOOPBACK) {
+   ia->s_addr = pr->pr_ip4[0].s_addr;
+   error = 0;
+   }
+
mtx_unlock(>pr_mtx);
return (error);
 }
@@ -354,7 +354,8 @@ prison_remote_ip4(struct ucred *cred, st
return (EAFNOSUPPORT);
}
 
-   if (ntohl(ia->s_addr) == INADDR_LOOPBACK) {
+   if (ntohl(ia->s_addr) == INADDR_LOOPBACK &&
+   prison_check_ip4_locked(pr, ia) == EADDRNOTAVAIL) {
ia->s_addr = pr->pr_ip4[0].s_addr;
mtx_unlock(>pr_mtx);
return (0);
@@ -370,9 +371,8 @@ prison_remote_ip4(struct ucred *cred, st
 /*
  * Check if given address belongs to the jail referenced by cred/prison.
  *
- * Returns 0 if jail doesn't restrict IPv4 or if address belongs to jail,
- * EADDRNOTAVAIL if the address doesn't belong, or EAFNOSUPPORT if the jail
- * doesn't allow IPv4.  Address passed in in NBO.
+ * Returns 0 if address belongs to jail,
+ * EADDRNOTAVAIL if the address doesn't belong to the jail.
  */
 int
 prison_check_ip4_locked(const struct prison *pr, const struct in_addr *ia)

Modified: stable/11/sys/netinet6/in6_jail.c
==
--- stable/11/sys/netinet6/in6_jail.c   Fri Apr 14 21:49:20 2017
(r316943)
+++ stable/11/sys/netinet6/in6_jail.c   Fri Apr 14 22:02:08 2017
(r316944)
@@ -293,12 +293,6 @@ prison_local_ip6(struct ucred *cred, str
return (EAFNOSUPPORT);
}
 
-   if (IN6_IS_ADDR_LOOPBACK(ia6)) {
-   bcopy(>pr_ip6[0], ia6, sizeof(struct in6_addr));
-   mtx_unlock(>pr_mtx);
-   return (0);
-   }
-
if (IN6_IS_ADDR_UNSPECIFIED(ia6)) {
/*
 * In case there is only 1 IPv6 address, and v6only is true,
@@ -311,6 +305,11 @@ prison_local_ip6(struct ucred *cred, str
}
 
error = prison_check_ip6_locked(pr, ia6);
+   if (error == EADDRNOTAVAIL && IN6_IS_ADDR_LOOPBACK(ia6)) {
+   bcopy(>pr_ip6[0], ia6, sizeof(struct in6_addr));
+   error = 0;
+   }
+
mtx_unlock(>pr_mtx);
return (error);
 }
@@ -341,7 +340,8 @@ prison_remote_ip6(struct ucred *cred, st
return (EAFNOSUPPORT);
}
 
-   if (IN6_IS_ADDR_LOOPBACK(ia6)) {
+   if (IN6_IS_ADDR_LOOPBACK(ia6) &&
+prison_check_ip6_locked(pr, ia6) == EADDRNOTAVAIL) {
bcopy(>pr_ip6[0], ia6, sizeof(struct in6_addr));
mtx_unlock(>pr_mtx);
return (0);
@@ -357,9 +357,8 @@ prison_remote_ip6(struct ucred *cred, st
 /*
  * Check if given address belongs to the jail referenced by cred/prison.
  *
- * Returns 0 if jail doesn't restrict IPv6 or if address belongs to jail,
- * EADDRNOTAVAIL if the address doesn't belong, or EAFNOSUPPORT if the jail
- * doesn't allow IPv6.
+ * Returns 0

svn commit: r316943 - in stable/11/sys: conf kern netinet netinet6 sys

2017-04-14 Thread Steven Hartland

Author: smh
Date: Fri Apr 14 21:49:20 2017
New Revision: 316943
URL: https://svnweb.freebsd.org/changeset/base/316943

Log:
  MFC r303863:
  
  Move IPv4 & IPv6 specific jail functions to netinet and netinet6 files.
  
  Sponsored by: Multiplay

Added:
  stable/11/sys/netinet/in_jail.c
 - copied unchanged from r303863, head/sys/netinet/in_jail.c
  stable/11/sys/netinet6/in6_jail.c
 - copied unchanged from r303863, head/sys/netinet6/in6_jail.c
Modified:
  stable/11/sys/conf/files
  stable/11/sys/kern/kern_jail.c
  stable/11/sys/sys/jail.h
Directory Properties:
  stable/11/   (props changed)

Modified: stable/11/sys/conf/files
==
--- stable/11/sys/conf/filesFri Apr 14 21:42:27 2017(r316942)
+++ stable/11/sys/conf/filesFri Apr 14 21:49:20 2017(r316943)
@@ -3805,6 +3805,7 @@ netinet/in_fib.c  optional inet
 netinet/in_gif.c   optional gif inet | netgraph_gif inet
 netinet/ip_gre.c   optional gre inet
 netinet/ip_id.coptional inet
+netinet/in_jail.c  optional inet
 netinet/in_mcast.c optional inet
 netinet/in_pcb.c   optional inet | inet6
 netinet/in_pcbgroup.c  optional inet pcbgroup | inet6 pcbgroup
@@ -3871,6 +3872,7 @@ netinet6/in6_cksum.c  optional inet6
 netinet6/in6_fib.c optional inet6
 netinet6/in6_gif.c optional gif inet6 | netgraph_gif inet6
 netinet6/in6_ifattach.coptional inet6
+netinet6/in6_jail.coptional inet6
 netinet6/in6_mcast.c   optional inet6
 netinet6/in6_pcb.c optional inet6
 netinet6/in6_pcbgroup.coptional inet6 pcbgroup

Modified: stable/11/sys/kern/kern_jail.c
==
--- stable/11/sys/kern/kern_jail.c  Fri Apr 14 21:42:27 2017
(r316942)
+++ stable/11/sys/kern/kern_jail.c  Fri Apr 14 21:49:20 2017
(r316943)
@@ -130,14 +130,6 @@ static void prison_racct_attach(struct p
 static void prison_racct_modify(struct prison *pr);
 static void prison_racct_detach(struct prison *pr);
 #endif
-#ifdef INET
-static int _prison_check_ip4(const struct prison *, const struct in_addr *);
-static int prison_restrict_ip4(struct prison *pr, struct in_addr *newip4);
-#endif
-#ifdef INET6
-static int _prison_check_ip6(struct prison *pr, struct in6_addr *ia6);
-static int prison_restrict_ip6(struct prison *pr, struct in6_addr *newip6);
-#endif
 
 /* Flags for prison_deref */
 #definePD_DEREF0x01
@@ -252,54 +244,6 @@ prison0_init(void)
strlcpy(prison0.pr_osrelease, osrelease, sizeof(prison0.pr_osrelease));
 }
 
-#ifdef INET
-static int
-qcmp_v4(const void *ip1, const void *ip2)
-{
-   in_addr_t iaa, iab;
-
-   /*
-* We need to compare in HBO here to get the list sorted as expected
-* by the result of the code.  Sorting NBO addresses gives you
-* interesting results.  If you do not understand, do not try.
-*/
-   iaa = ntohl(((const struct in_addr *)ip1)->s_addr);
-   iab = ntohl(((const struct in_addr *)ip2)->s_addr);
-
-   /*
-* Do not simply return the difference of the two numbers, the int is
-* not wide enough.
-*/
-   if (iaa > iab)
-   return (1);
-   else if (iaa < iab)
-   return (-1);
-   else
-   return (0);
-}
-#endif
-
-#ifdef INET6
-static int
-qcmp_v6(const void *ip1, const void *ip2)
-{
-   const struct in6_addr *ia6a, *ia6b;
-   int i, rc;
-
-   ia6a = (const struct in6_addr *)ip1;
-   ia6b = (const struct in6_addr *)ip2;
-
-   rc = 0;
-   for (i = 0; rc == 0 && i < sizeof(struct in6_addr); i++) {
-   if (ia6a->s6_addr[i] > ia6b->s6_addr[i])
-   rc = 1;
-   else if (ia6a->s6_addr[i] < ia6b->s6_addr[i])
-   rc = -1;
-   }
-   return (rc);
-}
-#endif
-
 /*
  * struct jail_args {
  * struct jail *jail;
@@ -845,7 +789,8 @@ kern_jail_set(struct thread *td, struct 
 * address to connect from.
 */
if (ip4s > 1)
-   qsort(ip4 + 1, ip4s - 1, sizeof(*ip4), qcmp_v4);
+   qsort(ip4 + 1, ip4s - 1, sizeof(*ip4),
+   prison_qcmp_v4);
/*
 * Check for duplicate addresses and do some simple
 * zero and broadcast checks. If users give other bogus
@@ -893,7 +838,8 @@ kern_jail_set(struct thread *td, struct 
ip6 = malloc(ip6s * sizeof(*ip6), M_PRISON, M_WAITOK);
bcopy(op, ip6, ip6s * sizeof(*ip6));
if (ip6s > 1)
-   qsort(ip6 + 1, ip6s - 1,

Re: svn commit: r316676 - in head/sys/netinet: . tcp_stacks

2017-04-10 Thread Steven Hartland

I don't tend to MFC 10.x now, but do agree given the impact that for 
this one it should be done.


The fix is a little different, due to code restructuring in 11 / head, 
but I do have a 10.x version already.


Regards
Steve

On 10/04/2017 15:51, Julian Elischer wrote:

If possible MFC to 10 too would be nice..
thanks


On 10/4/17 4:19 pm, Steven Hartland wrote:

Author: smh
Date: Mon Apr 10 08:19:35 2017
New Revision: 316676
URL: https://svnweb.freebsd.org/changeset/base/316676

Log:
   Use estimated RTT for receive buffer auto resizing instead of 
timestamps
  Switched from using timestamps to RTT estimates when performing 
TCP receive
   buffer auto resizing, as not all hosts support / enable TCP 
timestamps.
  Disabled reset of receive buffer auto scaling when not in bulk 
receive mode,

   which gives an extra 20% performance increase.
  Also extracted auto resizing to a common method shared between 
standard and

   fastpath modules.
  With this AWS S3 downloads at ~17ms latency on a 1Gbps 
connection jump from

   ~3MB/s to ~100MB/s using the default settings.
  Reviewed by:lstewart, gnn
   MFC after:  2 weeks
   Relnotes:   Yes
   Sponsored by:   Multiplay
   Differential Revision:  https://reviews.freebsd.org/D9668

Modified:
   head/sys/netinet/in_kdtrace.c
   head/sys/netinet/in_kdtrace.h
   head/sys/netinet/tcp_input.c
   head/sys/netinet/tcp_output.c
   head/sys/netinet/tcp_stacks/fastpath.c
   head/sys/netinet/tcp_var.h

Modified: head/sys/netinet/in_kdtrace.c
== 


--- head/sys/netinet/in_kdtrace.cMon Apr 10 06:19:09 2017 (r316675)
+++ head/sys/netinet/in_kdtrace.cMon Apr 10 08:19:35 2017 (r316676)
@@ -132,6 +132,14 @@ SDT_PROBE_DEFINE6_XLATE(tcp, , , state__
  "void *", "void *",
  "int", "tcplsinfo_t *");
  +SDT_PROBE_DEFINE6_XLATE(tcp, , , receive__autoresize,
+"void *", "void *",
+"struct tcpcb *", "csinfo_t *",
+"struct mbuf *", "ipinfo_t *",
+"struct tcpcb *", "tcpsinfo_t *" ,
+"struct tcphdr *", "tcpinfoh_t *",
+"int", "int");
+
  SDT_PROBE_DEFINE5_XLATE(udp, , , receive,
  "void *", "pktinfo_t *",
  "struct inpcb *", "csinfo_t *",

Modified: head/sys/netinet/in_kdtrace.h
== 


--- head/sys/netinet/in_kdtrace.hMon Apr 10 06:19:09 2017 (r316675)
+++ head/sys/netinet/in_kdtrace.hMon Apr 10 08:19:35 2017 (r316676)
@@ -65,6 +65,7 @@ SDT_PROBE_DECLARE(tcp, , , debug__input)
  SDT_PROBE_DECLARE(tcp, , , debug__output);
  SDT_PROBE_DECLARE(tcp, , , debug__user);
  SDT_PROBE_DECLARE(tcp, , , debug__drop);
+SDT_PROBE_DECLARE(tcp, , , receive__autoresize);
SDT_PROBE_DECLARE(udp, , , receive);
  SDT_PROBE_DECLARE(udp, , , send);

Modified: head/sys/netinet/tcp_input.c
== 


--- head/sys/netinet/tcp_input.cMon Apr 10 06:19:09 2017 (r316675)
+++ head/sys/netinet/tcp_input.cMon Apr 10 08:19:35 2017 (r316676)
@@ -1486,6 +1486,68 @@ drop:
  return (IPPROTO_DONE);
  }
  +/*
+ * Automatic sizing of receive socket buffer.  Often the send
+ * buffer size is not optimally adjusted to the actual network
+ * conditions at hand (delay bandwidth product).  Setting the
+ * buffer size too small limits throughput on links with high
+ * bandwidth and high delay (eg. trans-continental/oceanic links).
+ *
+ * On the receive side the socket buffer memory is only rarely
+ * used to any significant extent.  This allows us to be much
+ * more aggressive in scaling the receive socket buffer.  For
+ * the case that the buffer space is actually used to a large
+ * extent and we run out of kernel memory we can simply drop
+ * the new segments; TCP on the sender will just retransmit it
+ * later.  Setting the buffer size too big may only consume too
+ * much kernel memory if the application doesn't read() from
+ * the socket or packet loss or reordering makes use of the
+ * reassembly queue.
+ *
+ * The criteria to step up the receive buffer one notch are:
+ *  1. Application has not set receive buffer size with
+ * SO_RCVBUF. Setting SO_RCVBUF clears SB_AUTOSIZE.
+ *  2. the number of bytes received during the time it takes
+ * one timestamp to be reflected back to us (the RTT);
+ *  3. received bytes per RTT is within seven eighth of the
+ * current socket buffer size;
+ *  4. receive buffer size has not hit maximal automatic size;
+ *
+ * This algorithm does one step per RTT at most and only if
+ * we receive a bulk stream w/o packet losses or reorderings.
+ * Shrinking the buffer during idle times is not necessary as
+ * it doesn't consume any memory when idle.
+ *
+

svn commit: r316676 - in head/sys/netinet: . tcp_stacks

2017-04-10 Thread Steven Hartland

Author: smh
Date: Mon Apr 10 08:19:35 2017
New Revision: 316676
URL: https://svnweb.freebsd.org/changeset/base/316676

Log:
  Use estimated RTT for receive buffer auto resizing instead of timestamps
  
  Switched from using timestamps to RTT estimates when performing TCP receive
  buffer auto resizing, as not all hosts support / enable TCP timestamps.
  
  Disabled reset of receive buffer auto scaling when not in bulk receive mode,
  which gives an extra 20% performance increase.
  
  Also extracted auto resizing to a common method shared between standard and
  fastpath modules.
  
  With this AWS S3 downloads at ~17ms latency on a 1Gbps connection jump from
  ~3MB/s to ~100MB/s using the default settings.
  
  Reviewed by:lstewart, gnn
  MFC after:  2 weeks
  Relnotes:   Yes
  Sponsored by:   Multiplay
  Differential Revision:  https://reviews.freebsd.org/D9668

Modified:
  head/sys/netinet/in_kdtrace.c
  head/sys/netinet/in_kdtrace.h
  head/sys/netinet/tcp_input.c
  head/sys/netinet/tcp_output.c
  head/sys/netinet/tcp_stacks/fastpath.c
  head/sys/netinet/tcp_var.h

Modified: head/sys/netinet/in_kdtrace.c
==
--- head/sys/netinet/in_kdtrace.c   Mon Apr 10 06:19:09 2017
(r316675)
+++ head/sys/netinet/in_kdtrace.c   Mon Apr 10 08:19:35 2017
(r316676)
@@ -132,6 +132,14 @@ SDT_PROBE_DEFINE6_XLATE(tcp, , , state__
 "void *", "void *",
 "int", "tcplsinfo_t *");
 
+SDT_PROBE_DEFINE6_XLATE(tcp, , , receive__autoresize,
+"void *", "void *",
+"struct tcpcb *", "csinfo_t *",
+"struct mbuf *", "ipinfo_t *",
+"struct tcpcb *", "tcpsinfo_t *" ,
+"struct tcphdr *", "tcpinfoh_t *",
+"int", "int");
+
 SDT_PROBE_DEFINE5_XLATE(udp, , , receive,
 "void *", "pktinfo_t *",
 "struct inpcb *", "csinfo_t *",

Modified: head/sys/netinet/in_kdtrace.h
==
--- head/sys/netinet/in_kdtrace.h   Mon Apr 10 06:19:09 2017
(r316675)
+++ head/sys/netinet/in_kdtrace.h   Mon Apr 10 08:19:35 2017
(r316676)
@@ -65,6 +65,7 @@ SDT_PROBE_DECLARE(tcp, , , debug__input)
 SDT_PROBE_DECLARE(tcp, , , debug__output);
 SDT_PROBE_DECLARE(tcp, , , debug__user);
 SDT_PROBE_DECLARE(tcp, , , debug__drop);
+SDT_PROBE_DECLARE(tcp, , , receive__autoresize);
 
 SDT_PROBE_DECLARE(udp, , , receive);
 SDT_PROBE_DECLARE(udp, , , send);

Modified: head/sys/netinet/tcp_input.c
==
--- head/sys/netinet/tcp_input.cMon Apr 10 06:19:09 2017
(r316675)
+++ head/sys/netinet/tcp_input.cMon Apr 10 08:19:35 2017
(r316676)
@@ -1486,6 +1486,68 @@ drop:
return (IPPROTO_DONE);
 }
 
+/*
+ * Automatic sizing of receive socket buffer.  Often the send
+ * buffer size is not optimally adjusted to the actual network
+ * conditions at hand (delay bandwidth product).  Setting the
+ * buffer size too small limits throughput on links with high
+ * bandwidth and high delay (eg. trans-continental/oceanic links).
+ *
+ * On the receive side the socket buffer memory is only rarely
+ * used to any significant extent.  This allows us to be much
+ * more aggressive in scaling the receive socket buffer.  For
+ * the case that the buffer space is actually used to a large
+ * extent and we run out of kernel memory we can simply drop
+ * the new segments; TCP on the sender will just retransmit it
+ * later.  Setting the buffer size too big may only consume too
+ * much kernel memory if the application doesn't read() from
+ * the socket or packet loss or reordering makes use of the
+ * reassembly queue.
+ *
+ * The criteria to step up the receive buffer one notch are:
+ *  1. Application has not set receive buffer size with
+ * SO_RCVBUF. Setting SO_RCVBUF clears SB_AUTOSIZE.
+ *  2. the number of bytes received during the time it takes
+ * one timestamp to be reflected back to us (the RTT);
+ *  3. received bytes per RTT is within seven eighth of the
+ * current socket buffer size;
+ *  4. receive buffer size has not hit maximal automatic size;
+ *
+ * This algorithm does one step per RTT at most and only if
+ * we receive a bulk stream w/o packet losses or reorderings.
+ * Shrinking the buffer during idle times is not necessary as
+ * it doesn't consume any memory when idle.
+ *
+ * TODO: Only step up if the application is actually serving
+ * the buffer to better manage the socket buffer resources.
+ */
+int
+tcp_autorcvbuf(struct mbuf *m, struct tcphdr *th, struct socket *so,
+struct tcpcb *tp, int tlen)
+{
+   int newsize = 0;
+
+   if (V_tcp_do_autorcvbuf && (so->so_rcv.sb_flags & SB_AUTOSIZE) &&
+   tp->t_srtt != 0 && tp->rfbuf_ts != 0 &&
+   TCP_TS_TO_TICKS(tcp_ts_getticks() - tp->rfbuf_ts) >
+   (tp->t_srtt >> TCP_RTT_SHIFT)) {
+   if (tp->rfbuf_cnt > (so->so_rcv.sb_hiwat / 8 *

svn commit: r316460 - head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs

2017-04-03 Thread Steven Hartland

Author: smh
Date: Mon Apr  3 13:11:28 2017
New Revision: 316460
URL: https://svnweb.freebsd.org/changeset/base/316460

Log:
  Fix expandsz 16.0E vals and vdev_min_asize of RAIDZ children
  
  When a member of a RAIDZ has been replaced with a device smaller than the
  original, then the top level vdev can report its expand size as 16.0E.
  
  The reduced child asize causes the RAIDZ to have a vdev_asize lower than its
  vdev_max_asize which then results in an underflow during the calculation of
  the parents expand size.
  
  Fix this by updating the vdev_asize if it shrinks, which is already
  protected by a check against vdev_min_asize so should always be safe.
  
  Also for RAIDZ vdevs, ensure that the sum of their child vdev_min_asize is
  always greater than the parents vdev_min_size.
  
  Fixes: https://www.illumos.org/issues/7885
  
  MFC after:2 weeks
  Sponsored by: Multiplay

Modified:
  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c

Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c
==
--- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c  Mon Apr  3 
13:06:28 2017(r316459)
+++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev.c  Mon Apr  3 
13:11:28 2017(r316460)
@@ -229,7 +229,8 @@ vdev_get_min_asize(vdev_t *vd)
 * so each child must provide at least 1/Nth of its asize.
 */
if (pvd->vdev_ops == _raidz_ops)
-   return (pvd->vdev_min_asize / pvd->vdev_children);
+   return ((pvd->vdev_min_asize + pvd->vdev_children - 1) /
+   pvd->vdev_children);
 
return (pvd->vdev_min_asize);
 }
@@ -1377,7 +1378,7 @@ vdev_open(vdev_t *vd)
vd->vdev_psize = psize;
 
/*
-* Make sure the allocatable size hasn't shrunk.
+* Make sure the allocatable size hasn't shrunk too much.
 */
if (asize < vd->vdev_min_asize) {
vdev_set_state(vd, B_TRUE, VDEV_STATE_CANT_OPEN,
@@ -1417,12 +1418,21 @@ vdev_open(vdev_t *vd)
}
 
/*
-* If all children are healthy and the asize has increased,
-* then we've experienced dynamic LUN growth.  If automatic
-* expansion is enabled then use the additional space.
-*/
-   if (vd->vdev_state == VDEV_STATE_HEALTHY && asize > vd->vdev_asize &&
-   (vd->vdev_expanding || spa->spa_autoexpand))
+* If all children are healthy we update asize if either:
+* The asize has increased, due to a device expansion caused by dynamic
+* LUN growth or vdev replacement, and automatic expansion is enabled;
+* making the additional space available.
+*
+* The asize has decreased, due to a device shrink usually caused by a
+* vdev replace with a smaller device. This ensures that calculations
+* based of max_asize and asize e.g. esize are always valid. It's safe
+* to do this as we've already validated that asize is greater than
+* vdev_min_asize.
+*/
+   if (vd->vdev_state == VDEV_STATE_HEALTHY &&
+   ((asize > vd->vdev_asize &&
+   (vd->vdev_expanding || spa->spa_autoexpand)) ||
+   (asize < vd->vdev_asize)))
vd->vdev_asize = asize;
 
vdev_set_min_asize(vd);
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r316311 - in head: lib/libstand sys/boot/geli sys/boot/i386/gptboot sys/boot/i386/loader sys/boot/i386/zfsboot

2017-03-31 Thread Steven Hartland



On 31/03/2017 16:16, Ian Lepore wrote:

On Fri, 2017-03-31 at 00:04 +, Allan Jude wrote:

   Add explicit_bzero() to libstand, and switch GELIBoot to using it

 revolution > man explicit_bzero
 No manual entry for explicit_bzero

 revolution > svn log -v explicit_bzero.c
 ...
 r272673 | delphij | 2014-10-06 22:54:11 -0600 (Mon, 06 Oct 2014) | 5 lines

 Add explicit_bzero(3) and its kernel counterpart.

 Obtained from:  OpenBSD

So... can anyone provide a clue what's "explicit" (or different in any
way) between explicit_bzero() and normal bzero()?

Not sure why your system doesn't find the main page, as it works on my 
11 box, however does this help:

https://www.freebsd.org/cgi/man.cgi?query=explicit_bzero=0=3=FreeBSD+11-current=html

Regards
Steve
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

svn commit: r316328 - in head: . sys/netinet6

2017-03-31 Thread Steven Hartland

Author: smh
Date: Fri Mar 31 09:10:05 2017
New Revision: 316328
URL: https://svnweb.freebsd.org/changeset/base/316328

Log:
  Allow explicitly assigned IPv6 loopback address to be used in jails
  
  If a jail has an explicitly assigned IPv6 loopback address then allow it
  to be used instead of remapping requests for the loopback adddress to the
  first IPv6 address assigned to the jail.
  
  This fixes issues where applications attempt to detect their bound port
  where they requested a loopback address, which was available, but instead
  the kernel remapped it to the jails first address.
  
  This is the same fix applied to IPv4 fix by: r316313
  
  Also:
  * Correct the description of prison_check_ip6_locked to match the code.
  
  MFC after:2 weeks
  Relnotes: Yes
  Sponsored by: Multiplay

Modified:
  head/UPDATING
  head/sys/netinet6/in6_jail.c

Modified: head/UPDATING
==
--- head/UPDATING   Fri Mar 31 08:43:07 2017(r316327)
+++ head/UPDATING   Fri Mar 31 09:10:05 2017(r316328)
@@ -52,9 +52,9 @@ NOTE TO PEOPLE WHO THINK THAT FreeBSD 12
 ** SPECIAL WARNING: **
 
 20170331:
-   Binds and sends to the IPv4 loopback address (127.0.0.1) will now
+   Binds and sends to the loopback addresses, IPv6 and IPv4, will now
use any explicitly assigned loopback address available in the jail
-   instead of using the first assigned IPv4 address of the jail.
+   instead of using the first assigned address of the jail.
 
 20170329:
The ctl.ko module no longer implements the iSCSI target frontend:

Modified: head/sys/netinet6/in6_jail.c
==
--- head/sys/netinet6/in6_jail.cFri Mar 31 08:43:07 2017
(r316327)
+++ head/sys/netinet6/in6_jail.cFri Mar 31 09:10:05 2017
(r316328)
@@ -293,12 +293,6 @@ prison_local_ip6(struct ucred *cred, str
return (EAFNOSUPPORT);
}
 
-   if (IN6_IS_ADDR_LOOPBACK(ia6)) {
-   bcopy(>pr_ip6[0], ia6, sizeof(struct in6_addr));
-   mtx_unlock(>pr_mtx);
-   return (0);
-   }
-
if (IN6_IS_ADDR_UNSPECIFIED(ia6)) {
/*
 * In case there is only 1 IPv6 address, and v6only is true,
@@ -311,6 +305,11 @@ prison_local_ip6(struct ucred *cred, str
}
 
error = prison_check_ip6_locked(pr, ia6);
+   if (error == EADDRNOTAVAIL && IN6_IS_ADDR_LOOPBACK(ia6)) {
+   bcopy(>pr_ip6[0], ia6, sizeof(struct in6_addr));
+   error = 0;
+   }
+
mtx_unlock(>pr_mtx);
return (error);
 }
@@ -341,7 +340,8 @@ prison_remote_ip6(struct ucred *cred, st
return (EAFNOSUPPORT);
}
 
-   if (IN6_IS_ADDR_LOOPBACK(ia6)) {
+   if (IN6_IS_ADDR_LOOPBACK(ia6) &&
+prison_check_ip6_locked(pr, ia6) == EADDRNOTAVAIL) {
bcopy(>pr_ip6[0], ia6, sizeof(struct in6_addr));
mtx_unlock(>pr_mtx);
return (0);
@@ -357,9 +357,8 @@ prison_remote_ip6(struct ucred *cred, st
 /*
  * Check if given address belongs to the jail referenced by cred/prison.
  *
- * Returns 0 if jail doesn't restrict IPv6 or if address belongs to jail,
- * EADDRNOTAVAIL if the address doesn't belong, or EAFNOSUPPORT if the jail
- * doesn't allow IPv6.
+ * Returns 0 if address belongs to jail,
+ * EADDRNOTAVAIL if the address doesn't belong to the jail.
  */
 int
 prison_check_ip6_locked(const struct prison *pr, const struct in6_addr *ia6)
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

svn commit: r316313 - in head: . sys/netinet

2017-03-30 Thread Steven Hartland

Author: smh
Date: Fri Mar 31 00:41:54 2017
New Revision: 316313
URL: https://svnweb.freebsd.org/changeset/base/316313

Log:
  Allow explicitly assigned IPv4 loopback address to be used in jails
  
  If a jail has an explicitly assigned loopback address then allow it to be
  used instead of remapping requests for the loopback adddress to the first
  IPv4 address assigned to the jail.
  
  This fixes issues where applications attempt to detect their bound port
  where they requested a loopback address, which was available, but instead
  the kernel remapped it to the jails first address.
  
  A example of this is binding nginx to 127.0.0.1 and then running "service
  nginx upgrade" which before this change would cause nginx to fail.
  
  Also:
  * Correct the description of prison_check_ip4_locked to match the code.
  
  MFC after:2 weeks
  Relnotes: Yes
  Sponsored by: Multiplay

Modified:
  head/UPDATING
  head/sys/netinet/in_jail.c

Modified: head/UPDATING
==
--- head/UPDATING   Fri Mar 31 00:07:03 2017(r316312)
+++ head/UPDATING   Fri Mar 31 00:41:54 2017(r316313)
@@ -51,6 +51,11 @@ NOTE TO PEOPLE WHO THINK THAT FreeBSD 12
 
 ** SPECIAL WARNING: **
 
+20170331:
+   Binds and sends to the IPv4 loopback address (127.0.0.1) will now
+   use any explicitly assigned loopback address available in the jail
+   instead of using the first assigned IPv4 address of the jail.
+
 20170329:
The ctl.ko module no longer implements the iSCSI target frontend:
cfiscsi.ko does instead.

Modified: head/sys/netinet/in_jail.c
==
--- head/sys/netinet/in_jail.c  Fri Mar 31 00:07:03 2017(r316312)
+++ head/sys/netinet/in_jail.c  Fri Mar 31 00:41:54 2017(r316313)
@@ -306,11 +306,6 @@ prison_local_ip4(struct ucred *cred, str
}
 
ia0.s_addr = ntohl(ia->s_addr);
-   if (ia0.s_addr == INADDR_LOOPBACK) {
-   ia->s_addr = pr->pr_ip4[0].s_addr;
-   mtx_unlock(>pr_mtx);
-   return (0);
-   }
 
if (ia0.s_addr == INADDR_ANY) {
/*
@@ -323,6 +318,11 @@ prison_local_ip4(struct ucred *cred, str
}
 
error = prison_check_ip4_locked(pr, ia);
+   if (error == EADDRNOTAVAIL && ia0.s_addr == INADDR_LOOPBACK) {
+   ia->s_addr = pr->pr_ip4[0].s_addr;
+   error = 0;
+   }
+
mtx_unlock(>pr_mtx);
return (error);
 }
@@ -354,7 +354,8 @@ prison_remote_ip4(struct ucred *cred, st
return (EAFNOSUPPORT);
}
 
-   if (ntohl(ia->s_addr) == INADDR_LOOPBACK) {
+   if (ntohl(ia->s_addr) == INADDR_LOOPBACK &&
+   prison_check_ip4_locked(pr, ia) == EADDRNOTAVAIL) {
ia->s_addr = pr->pr_ip4[0].s_addr;
mtx_unlock(>pr_mtx);
return (0);
@@ -370,9 +371,8 @@ prison_remote_ip4(struct ucred *cred, st
 /*
  * Check if given address belongs to the jail referenced by cred/prison.
  *
- * Returns 0 if jail doesn't restrict IPv4 or if address belongs to jail,
- * EADDRNOTAVAIL if the address doesn't belong, or EAFNOSUPPORT if the jail
- * doesn't allow IPv4.  Address passed in in NBO.
+ * Returns 0 if address belongs to jail,
+ * EADDRNOTAVAIL if the address doesn't belong to the jail.
  */
 int
 prison_check_ip4_locked(const struct prison *pr, const struct in_addr *ia)
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

svn commit: r315855 - stable/11/lib/libsysdecode

2017-03-23 Thread Steven Hartland

Author: smh
Date: Thu Mar 23 10:43:29 2017
New Revision: 315855
URL: https://svnweb.freebsd.org/changeset/base/315855

Log:
  MFC r315423:
  
  Fix libsysdecode vmprot flag decoding
  
  Sponsored by: Multiplay

Modified:
  stable/11/lib/libsysdecode/flags.c
  stable/11/lib/libsysdecode/mktables
Directory Properties:
  stable/11/   (props changed)

Modified: stable/11/lib/libsysdecode/flags.c
==
--- stable/11/lib/libsysdecode/flags.c  Thu Mar 23 10:22:06 2017
(r315854)
+++ stable/11/lib/libsysdecode/flags.c  Thu Mar 23 10:43:29 2017
(r315855)
@@ -51,6 +51,7 @@ __FBSDID("$FreeBSD$");
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 

Modified: stable/11/lib/libsysdecode/mktables
==
--- stable/11/lib/libsysdecode/mktables Thu Mar 23 10:22:06 2017
(r315854)
+++ stable/11/lib/libsysdecode/mktables Thu Mar 23 10:43:29 2017
(r315855)
@@ -135,7 +135,7 @@ gen_table "sockoptudp"  "UDP_[[:alnu
 gen_table "socktype""SOCK_[A-Z]+[[:space:]]+[1-9]+[0-9]*"  
"sys/socket.h"
 gen_table "thrcreateflags"  "THR_[A-Z]+[[:space:]]+0x[0-9]+"   
"sys/thr.h"
 gen_table "umtxop"  "UMTX_OP_[[:alnum:]_]+[[:space:]]+[0-9]+"  
"sys/umtx.h"
-gen_table "vmprot"  "VM_PROT_[A-Z]+[[:space:]]+\(\(vm_prot_t\)\)"  
"vm/vm.h"
+gen_table "vmprot"  
"VM_PROT_[A-Z]+[[:space:]]+\(\(vm_prot_t\)[[:space:]]+0x[0-9]+\)"  "vm/vm.h"
 gen_table "vmresult""KERN_[A-Z]+[[:space:]]+[0-9]+"
"vm/vm_param.h"
 gen_table "wait6opt""W[A-Z]+[[:space:]]+[0-9]+"
"sys/wait.h"
 gen_table "seekwhence"  "SEEK_[A-Z]+[[:space:]]+[0-9]+"
"sys/unistd.h"
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

svn commit: r315449 - head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs

2017-03-17 Thread Steven Hartland

Author: smh
Date: Fri Mar 17 12:34:57 2017
New Revision: 315449
URL: https://svnweb.freebsd.org/changeset/base/315449

Log:
  Reduce ARC fragmentation threshold
  
  As ZFS can request up to SPA_MAXBLOCKSIZE memory block e.g. during zfs recv,
  update the threshold at which we start agressive reclamation to use
  SPA_MAXBLOCKSIZE (16M) instead of the lower zfs_max_recordsize which
  defaults to 1M.
  
  PR:   194513
  Reviewed by:  avg, mav
  MFC after:1 month
  Sponsored by: Multiplay
  Differential Revision:https://reviews.freebsd.org/D10012

Modified:
  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c

Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
==
--- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c   Fri Mar 17 
12:34:56 2017(r315448)
+++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c   Fri Mar 17 
12:34:57 2017(r315449)
@@ -3978,7 +3978,7 @@ arc_available_memory(void)
 * Start aggressive reclamation if too little sequential KVA left.
 */
if (lowest > 0) {
-   n = (vmem_size(heap_arena, VMEM_MAXFREE) < zfs_max_recordsize) ?
+   n = (vmem_size(heap_arena, VMEM_MAXFREE) < SPA_MAXBLOCKSIZE) ?
-((int64_t)vmem_size(heap_arena, VMEM_ALLOC) >> 4) :
INT64_MAX;
if (n < lowest) {
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

svn commit: r315423 - head/lib/libsysdecode

2017-03-16 Thread Steven Hartland

Author: smh
Date: Thu Mar 16 20:55:00 2017
New Revision: 315423
URL: https://svnweb.freebsd.org/changeset/base/315423

Log:
  Fix libsysdecode vmprot flag decoding
  
  Fix the regex used to find vmprot table entries and add the missing include.
  
  This fixes kdumps output of PFLT arguments which would previously look like:
  5202 101546 ktrace   PFLT  0x5ae000 0x2<>2
  
  They now display correctly:
  5202 101546 ktrace   PFLT  0x5ac000 0x2
  
  MFC after:1 week

Modified:
  head/lib/libsysdecode/flags.c
  head/lib/libsysdecode/mktables

Modified: head/lib/libsysdecode/flags.c
==
--- head/lib/libsysdecode/flags.c   Thu Mar 16 20:39:31 2017
(r315422)
+++ head/lib/libsysdecode/flags.c   Thu Mar 16 20:55:00 2017
(r315423)
@@ -51,6 +51,7 @@ __FBSDID("$FreeBSD$");
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 

Modified: head/lib/libsysdecode/mktables
==
--- head/lib/libsysdecode/mktables  Thu Mar 16 20:39:31 2017
(r315422)
+++ head/lib/libsysdecode/mktables  Thu Mar 16 20:55:00 2017
(r315423)
@@ -135,7 +135,7 @@ gen_table "sockoptudp"  "UDP_[[:alnu
 gen_table "socktype""SOCK_[A-Z]+[[:space:]]+[1-9]+[0-9]*"  
"sys/socket.h"
 gen_table "thrcreateflags"  "THR_[A-Z]+[[:space:]]+0x[0-9]+"   
"sys/thr.h"
 gen_table "umtxop"  "UMTX_OP_[[:alnum:]_]+[[:space:]]+[0-9]+"  
"sys/umtx.h"
-gen_table "vmprot"  "VM_PROT_[A-Z]+[[:space:]]+\(\(vm_prot_t\)\)"  
"vm/vm.h"
+gen_table "vmprot"  
"VM_PROT_[A-Z]+[[:space:]]+\(\(vm_prot_t\)[[:space:]]+0x[0-9]+\)"  "vm/vm.h"
 gen_table "vmresult""KERN_[A-Z]+[[:space:]]+[0-9]+"
"vm/vm_param.h"
 gen_table "wait6opt""W[A-Z]+[[:space:]]+[0-9]+"
"sys/wait.h"
 gen_table "seekwhence"  "SEEK_[A-Z]+[[:space:]]+[0-9]+"
"sys/unistd.h"
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r314155 - head/sys/netinet

2017-02-23 Thread Steven Hartland

You might also be interested in reviewing my fix for TCP buffer scaling 
too Michael.

https://reviews.freebsd.org/D9668

This fixes slow transfers due to no receive buffer scaling if TCP 
timestamps aren't negotiated.


Its still got debug stuff in it ATM and I'm toying with removing the 
different cases between estimated RTT and timestamps as there appears to 
be no difference in practice.


Tests here show jump from ~3MB/s @ 1Gbps and 17ms latency to 100MB/s, 
pretty much line rate, which is in line with Linux results.


Any feedback welcome.

Regards
Steve

On 23/02/2017 18:14, Michael Tuexen wrote:

Author: tuexen
Date: Thu Feb 23 18:14:36 2017
New Revision: 314155
URL: https://svnweb.freebsd.org/changeset/base/314155

Log:
   TCP window updates are only sent if the window can be increased by at
   least 2 * MSS. However, if the receive buffer size is small, this might
   be impossible. Add back a criterion to send a TCP window update if
   the window can be increased by at least half of the receive buffer size.
   This condition was removed in r242252. This patch simply brings it back.
   PR:  211003
   Reviewed by: gnn
   MFC after:   1 week
   Sponsored by:Netflix, Inc.
   Differential Revision:   https://reviews.freebsd.org/D9475

Modified:
   head/sys/netinet/tcp_output.c

Modified: head/sys/netinet/tcp_output.c
==
--- head/sys/netinet/tcp_output.c   Thu Feb 23 17:56:24 2017
(r314154)
+++ head/sys/netinet/tcp_output.c   Thu Feb 23 18:14:36 2017
(r314155)
@@ -696,6 +696,8 @@ after_sack_rexmit:
 recwin <= (so->so_rcv.sb_hiwat / 8) ||
 so->so_rcv.sb_hiwat <= 8 * tp->t_maxseg))
goto send;
+   if (2 * adv >= (int32_t)so->so_rcv.sb_hiwat)
+   goto send;
}
  dontupdate:
  



___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r313260 - head/sys/kern

2017-02-07 Thread Steven Hartland


On 07/02/2017 20:34, Ed Maste wrote:

On 7 February 2017 at 10:30, Steven Hartland
<steven.hartl...@multiplay.co.uk> wrote:

All I'm suggesting is, while one could guess this may be a performance or
possibly a compatibility thing, the reason is not obvious, so a small piece
of detail on why the change was done should always be included.

For this one something like the following would be nice:

Switch fget_unlocked to atomic_fcmpset

Improve performance under contention by switching fget_unlocked to
use atomic_fcmpset.

I agree, and one of the key reasons to do this is so that there's this
tiny bit of context if someone later runs "git blame" or "svn
annotate" and discovers this change for the line containing
atomic_fcmpset. Comments containing "eliminate memory leak" or "remove
unused variable" have a self-evident reason, but I don't believe
that's true for "switch to atomic_fcmpset."

Repeating the "switch fget_unlocked to..." in the proposed commit
message above feels redundant to me though, and I would suggest:

| Switch fget_unlocked to atomic_fcmpset
|
| Improves performance under contention.

or just:

| Use atmoic_fcmpset to improve performance under contention
All those work for me as they clearly state why the change was made, so 
I hope this is something we can try to improve moving forward :)

___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r313260 - head/sys/kern

2017-02-07 Thread Steven Hartland


On 07/02/2017 14:57, Mateusz Guzik wrote:

On Sun, Feb 05, 2017 at 03:17:46PM +, Alexey Dokuchaev wrote:

On Sun, Feb 05, 2017 at 04:00:06AM +0100, Mateusz Guzik wrote:

For instance, plugging an unused variable, a memory leak, doing a
lockless check first etc. are all pretty standard and unless there is
something unusual going on (e.g. complicated circumstances leading to a
leak) there is not much to explain. In particular, I don't see why
anyone would explain why leaks are bad on each commit plugging one.

Right; these (unused variable, resource leaks) usually do not warrant
elaborate explanation.

[ Some linefeeds below were trimmed for brevity ]

The gist is as follows: there are plenty of cases where the kernel wants
to atomically replace the value of a particular variable. Sometimes,
like in this commit, we want to bump the counter by 1, but only if the
current value is not 0. For that we need to read the value, see if it is
0 and if not, try to replace what we read with what we read + 1. We
cannot just increment as the value could have changed to 0 in the
meantime.
But this also means that multiple cpus doing the same operation on the
same variable will trip on each other - one will succeed while the rest
will have to retry.
Prior to this commit, each retry attempt would explicitly re-read the
value. This induces cache coherency traffic slowing everyone down.
amd64 has the nice property of giving us the value it found eleminating
the need to explicitly re-read it. There is similar story on i386 and
sparc.
Other architectures may also benefit from this, but that I did not
benchmark.

In short[,] under contention atomic_fcmpset is going to be faster than
atomic_cmpset.
I did not benchmark this particular change, but a switch of the sort
easily gives 10%+ in microbenchmarks on amd64.
That said, while one can argue this optimizes the code, it really
depessimizes it as something of the sort should have been already
employed.

Given the above, IMHO it's quite far from an obvious or of manpage-lookup
thing, and thus requires proper explanation in the commit log.


If the aformenteiond explanation is necessary, the place for it is in
the man page. There are already several commits with fcmpset and there
will be more to come. I don't see why any of them would convey the
information.

The details of why performance under contention of atomic_fcmpset is 
better than atomic_cmpset, a manpage would be nice.


All I'm suggesting is, while one could guess this may be a performance 
or possibly a compatibility thing, the reason is not obvious, so a small 
piece of detail on why the change was done should always be included.


For this one something like the following would be nice:

Switch fget_unlocked to atomic_fcmpset

Improve performance under contention by switching fget_unlocked to
use atomic_fcmpset.

With small piece of additional information, its clear the reason for the 
change (why) was to improve performance and anyone who wants more detail 
on why this would be the case can research it via a manpage or other 
resources, wouldn't you agree?


Regards
Steve
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r313355 - in stable/11: lib/libstand sys/boot/common sys/boot/efi/libefi sys/boot/i386/libfirewire sys/boot/i386/libi386 sys/boot/mips/beri/loader sys/boot/ofw/libofw sys/boot/pc98/lib

2017-02-06 Thread Steven Hartland

IMO combining fixes from different areas (yes mainly libstand but dosfs 
and nandfs) and with style cleanups is not ideal as it makes it much 
harder to find and identify problems.


I appreciate these have been in head for a while but its not unheard of 
to only identify issues once they are MFC'ed, so keeping the separation 
is better.


You also missed the description from the 3rd commit e.g.
loader: nandfs calls strategy with one extra argument.

Finally keep the "r" prefix when mentioning the revisions being MFC'ed 
as that ensure they linked in svnweb properly e.g.


MFC r309369, r310850, r310853:



On 06/02/2017 22:03, Toomas Soome wrote:

Author: tsoome
Date: Mon Feb  6 22:03:07 2017
New Revision: 313355
URL: https://svnweb.freebsd.org/changeset/base/313355

Log:
   MFC r309369,310850,310853:
   
   libstand: dosfs cstyle cleanup for return keyword.

   dosfs support in libstand is broken since r298230
   
   PR:		214423

   Submitted by:Mikhail Kupchik
   Reported by: Mikhail Kupchik
   Approved by: imp (mentor)

Modified:
   stable/11/lib/libstand/cd9660.c
   stable/11/lib/libstand/dosfs.c
   stable/11/lib/libstand/ext2fs.c
   stable/11/lib/libstand/nandfs.c
   stable/11/lib/libstand/read.c
   stable/11/lib/libstand/stand.h
   stable/11/lib/libstand/ufs.c
   stable/11/lib/libstand/write.c
   stable/11/sys/boot/common/bcache.c
   stable/11/sys/boot/common/bootstrap.h
   stable/11/sys/boot/common/disk.c
   stable/11/sys/boot/common/md.c
   stable/11/sys/boot/efi/libefi/efipart.c
   stable/11/sys/boot/i386/libfirewire/firewire.c
   stable/11/sys/boot/i386/libi386/bioscd.c
   stable/11/sys/boot/i386/libi386/biosdisk.c
   stable/11/sys/boot/i386/libi386/pxe.c
   stable/11/sys/boot/mips/beri/loader/beri_disk_cfi.c
   stable/11/sys/boot/mips/beri/loader/beri_disk_sdcard.c
   stable/11/sys/boot/ofw/libofw/ofw_disk.c
   stable/11/sys/boot/pc98/libpc98/bioscd.c
   stable/11/sys/boot/pc98/libpc98/biosdisk.c
   stable/11/sys/boot/powerpc/kboot/hostdisk.c
   stable/11/sys/boot/powerpc/ps3/ps3cdrom.c
   stable/11/sys/boot/powerpc/ps3/ps3disk.c
   stable/11/sys/boot/uboot/lib/disk.c
   stable/11/sys/boot/usb/storage/umass_loader.c
   stable/11/sys/boot/userboot/userboot/host.c
   stable/11/sys/boot/userboot/userboot/userboot_disk.c
   stable/11/sys/boot/zfs/zfs.c
Directory Properties:
   stable/11/   (props changed)

Modified: stable/11/lib/libstand/cd9660.c
==
--- stable/11/lib/libstand/cd9660.c Mon Feb  6 21:02:26 2017
(r313354)
+++ stable/11/lib/libstand/cd9660.c Mon Feb  6 22:03:07 2017
(r313355)
@@ -143,7 +143,7 @@ susp_lookup_record(struct open_file *f,
if (bcmp(sh->type, SUSP_CONTINUATION, 2) == 0) {
shc = (ISO_RRIP_CONT *)sh;
error = f->f_dev->dv_strategy(f->f_devdata, F_READ,
-   cdb2devb(isonum_733(shc->location)), 0,
+   cdb2devb(isonum_733(shc->location)),
ISO_DEFAULT_BLOCK_SIZE, susp_buffer, );
  
  			/* Bail if it fails. */

@@ -288,7 +288,7 @@ cd9660_open(const char *path, struct ope
for (bno = 16;; bno++) {
twiddle(1);
rc = f->f_dev->dv_strategy(f->f_devdata, F_READ, cdb2devb(bno),
-   0, ISO_DEFAULT_BLOCK_SIZE, buf, );
+   ISO_DEFAULT_BLOCK_SIZE, buf, );
if (rc)
goto out;
if (read != ISO_DEFAULT_BLOCK_SIZE) {
@@ -322,7 +322,7 @@ cd9660_open(const char *path, struct ope
twiddle(1);
rc = f->f_dev->dv_strategy
(f->f_devdata, F_READ,
-cdb2devb(bno + boff), 0,
+cdb2devb(bno + boff),
 ISO_DEFAULT_BLOCK_SIZE,
 buf, );
if (rc)
@@ -381,7 +381,7 @@ cd9660_open(const char *path, struct ope
bno = isonum_733(rec.extent) + isonum_711(rec.ext_attr_length);
twiddle(1);
rc = f->f_dev->dv_strategy(f->f_devdata, F_READ, cdb2devb(bno),
-   0, ISO_DEFAULT_BLOCK_SIZE, buf, );
+   ISO_DEFAULT_BLOCK_SIZE, buf, );
if (rc)
goto out;
if (read != ISO_DEFAULT_BLOCK_SIZE) {
@@ -438,7 +438,7 @@ buf_read_file(struct open_file *f, char
  
  		twiddle(16);

rc = f->f_dev->dv_strategy(f->f_devdata, F_READ,
-   cdb2devb(blkno), 0, ISO_DEFAULT_BLOCK_SIZE,
+   cdb2devb(blkno), ISO_DEFAULT_BLOCK_SIZE,
fp->f_buf, );
if (rc)
return (rc);

Modified: stable/11/lib/libstand/dosfs.c

Re: svn commit: r313260 - head/sys/kern

2017-02-05 Thread Steven Hartland


On 05/02/2017 15:17, Alexey Dokuchaev wrote:

On Sun, Feb 05, 2017 at 04:00:06AM +0100, Mateusz Guzik wrote:

For instance, plugging an unused variable, a memory leak, doing a
lockless check first etc. are all pretty standard and unless there is
something unusual going on (e.g. complicated circumstances leading to a
leak) there is not much to explain. In particular, I don't see why
anyone would explain why leaks are bad on each commit plugging one.

Right; these (unused variable, resource leaks) usually do not warrant
elaborate explanation.

Indeed these are self explanatory

The gist is as follows: there are plenty of cases where the kernel wants
to atomically replace the value of a particular variable. Sometimes,
like in this commit, we want to bump the counter by 1, but only if the
current value is not 0. For that we need to read the value, see if it is
0 and if not, try to replace what we read with what we read + 1. We
cannot just increment as the value could have changed to 0 in the
meantime.
But this also means that multiple cpus doing the same operation on the
same variable will trip on each other - one will succeed while the rest
will have to retry.
Prior to this commit, each retry attempt would explicitly re-read the
value. This induces cache coherency traffic slowing everyone down.
amd64 has the nice property of giving us the value it found eleminating
the need to explicitly re-read it. There is similar story on i386 and
sparc.
Other architectures may also benefit from this, but that I did not
benchmark.

In short[,] under contention atomic_fcmpset is going to be faster than
atomic_cmpset.
I did not benchmark this particular change, but a switch of the sort
easily gives 10%+ in microbenchmarks on amd64.
That said, while one can argue this optimizes the code, it really
depessimizes it as something of the sort should have been already
employed.

Given the above, IMHO it's quite far from an obvious or of manpage-lookup
thing, and thus requires proper explanation in the commit log.
Absolutely, I would encourage everyone to not only think about others 
making similar changes but also providing education for those who may 
uses similar code in other areas.


If said changes where using older code as an example, without knowing 
otherwise they may not use the updated methodologies.


Sharing the detail you have done above is fantastic, allowing others to 
take note without having to do the research that the may well not have 
time for, with the result being improved code quality moving forward; so 
thanks for that :)





While on this subject are there any official guidelines to writing
commit messages, if no should we create some?

I'm unaware of any.

We might not have official guidelines, but 30%-what/70%-why rule would
apply perfectly here. ;-)


Sounds like a good guide.

Regards
Steve
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r313260 - head/sys/kern

2017-02-04 Thread Steven Hartland

Hi Mateusz could you improve on the commit message as it currently 
describes what is changed, which can be obtained from the diff, but not why?


I hope on one feels like I'm trying to teach them to suck eggs, as I 
know everyone here has a wealth of experience, but I strongly believe 
commit messages are a very important way of improving the overall 
quality of the code base by sharing with others the reason for changes, 
which they can then learn from. I know I for one love picking up new 
nuggets of knowledge from others in this way.


Also I believe this is area the project as a whole can improve on, so I 
don't mean to single out anyone here.


Anyway I hope people find this useful:

When I write a commit message I try to stick to the following rules 
which I believe helps to bring clarity for others about my actions.

1. First line is a brief summary of the out come of the change e.g.
Fixed compiler warnings in nvmecontrol on 32bit platforms
2. Follow up paragraphs expand on #1 if needed including details about 
not just what but why the change was made e.g.
Use ssize_t instead of uint32_t to prevent warnings about a comparison 
with different signs. Due to the promotion rules, this would only  
happen on 32-bit platforms.
3. When writing #2 include details that would not be obvious to 
non-experts in the particular area.


#2 and #3 are really important to sharing knowledge that others may not 
know, its quite relevant to this commit msg, as while it may be obvious 
to you and others familiar with the atomic ops, to the rest of us we're 
just wondering why make this change?


N.B. The example is based on Warner's recent commit purely as an 
example, which had a good why, just missing the brief summary.


While on this subject are there any official guidelines to writing 
commit messages, if no should we create some?


On 05/02/2017 01:40, Mateusz Guzik wrote:

Author: mjg
Date: Sun Feb  5 01:40:27 2017
New Revision: 313260
URL: https://svnweb.freebsd.org/changeset/base/313260

Log:
   fd: switch fget_unlocked to atomic_fcmpset

Modified:
   head/sys/kern/kern_descrip.c

Modified: head/sys/kern/kern_descrip.c
==
--- head/sys/kern/kern_descrip.cSun Feb  5 01:20:39 2017
(r313259)
+++ head/sys/kern/kern_descrip.cSun Feb  5 01:40:27 2017
(r313260)
@@ -2569,8 +2569,8 @@ fget_unlocked(struct filedesc *fdp, int
if (error != 0)
return (error);
  #endif
-   retry:
count = fp->f_count;
+   retry:
if (count == 0) {
/*
 * Force a reload. Other thread could reallocate the
@@ -2584,7 +2584,7 @@ fget_unlocked(struct filedesc *fdp, int
 * Use an acquire barrier to force re-reading of fdt so it is
 * refreshed for verification.
 */
-   if (atomic_cmpset_acq_int(>f_count, count, count + 1) == 0)
+   if (atomic_fcmpset_acq_int(>f_count, , count + 1) == 
0)
goto retry;
fdt = fdp->fd_files;
  #ifdefCAPABILITIES



___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

svn commit: r312279 - stable/11/usr.bin/netstat

2017-01-16 Thread Steven Hartland

Author: smh
Date: Mon Jan 16 09:16:11 2017
New Revision: 312279
URL: https://svnweb.freebsd.org/changeset/base/312279

Log:
  MFC r311769:
  
  Fix rstat: symbol not in namelist from netstat
  
  Sponsored by: Multiplay

Modified:
  stable/11/usr.bin/netstat/main.c
Directory Properties:
  stable/11/   (props changed)

Modified: stable/11/usr.bin/netstat/main.c
==
--- stable/11/usr.bin/netstat/main.cMon Jan 16 09:12:40 2017
(r312278)
+++ stable/11/usr.bin/netstat/main.cMon Jan 16 09:16:11 2017
(r312279)
@@ -427,6 +427,9 @@ main(int argc, char *argv[])
if (xflag && Tflag)
xo_errx(1, "-x and -T are incompatible, pick one.");
 
+   /* Load all necessary kvm symbols */
+   kresolve_list(nl);
+
if (Bflag) {
if (!live)
usage();
@@ -507,9 +510,6 @@ main(int argc, char *argv[])
exit(0);
}
 
-   /* Load all necessary kvm symbols */
-   kresolve_list(nl);
-
if (tp) {
xo_open_container("statistics");
printproto(tp, tp->pr_name, );
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

svn commit: r312278 - stable/10/usr.bin/netstat

2017-01-16 Thread Steven Hartland

Author: smh
Date: Mon Jan 16 09:12:40 2017
New Revision: 312278
URL: https://svnweb.freebsd.org/changeset/base/312278

Log:
  MFC r311769:
  
  Fix rstat: symbol not in namelist from netstat
  
  Sponsored by: Multiplay

Modified:
  stable/10/usr.bin/netstat/main.c
Directory Properties:
  stable/10/   (props changed)

Modified: stable/10/usr.bin/netstat/main.c
==
--- stable/10/usr.bin/netstat/main.cMon Jan 16 08:25:33 2017
(r312277)
+++ stable/10/usr.bin/netstat/main.cMon Jan 16 09:12:40 2017
(r312278)
@@ -535,6 +535,9 @@ main(int argc, char *argv[])
if (xflag && Tflag) 
errx(1, "-x and -T are incompatible, pick one.");
 
+   /* Load all necessary kvm symbols */
+   kresolve_list(nl);
+
if (Bflag) {
if (!live)
usage();
@@ -603,9 +606,6 @@ main(int argc, char *argv[])
exit(0);
}
 
-   /* Load all necessary kvm symbols */
-   kresolve_list(nl);
-
if (tp) {
printproto(tp, tp->pr_name);
exit(0);
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

svn commit: r311769 - head/usr.bin/netstat

2017-01-09 Thread Steven Hartland

Author: smh
Date: Mon Jan  9 09:28:03 2017
New Revision: 311769
URL: https://svnweb.freebsd.org/changeset/base/311769

Log:
  Fix rstat: symbol not in namelist from netstat
  
  Load kvm symbols earlier to prevent rstat: symbol not in namelist
  error when running netstat -rs.
  
  Submitted by: Sebastian Huber 
  MFC after:1 week
  Sponsored by: Multiplay

Modified:
  head/usr.bin/netstat/main.c

Modified: head/usr.bin/netstat/main.c
==
--- head/usr.bin/netstat/main.c Mon Jan  9 08:12:22 2017(r311768)
+++ head/usr.bin/netstat/main.c Mon Jan  9 09:28:03 2017(r311769)
@@ -427,6 +427,9 @@ main(int argc, char *argv[])
if (xflag && Tflag)
xo_errx(1, "-x and -T are incompatible, pick one.");
 
+   /* Load all necessary kvm symbols */
+   kresolve_list(nl);
+
if (Bflag) {
if (!live)
usage();
@@ -507,9 +510,6 @@ main(int argc, char *argv[])
exit(0);
}
 
-   /* Load all necessary kvm symbols */
-   kresolve_list(nl);
-
if (tp) {
xo_open_container("statistics");
printproto(tp, tp->pr_name, );
___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r311346 - in head/sys: kern sys vm

2017-01-05 Thread Steven Hartland

Given the use of the number of CPU's for sizing would this play nice 
with hot plug CPU's?


Regards
Steve

On 05/01/2017 01:44, Mark Johnston wrote:

Author: markj
Date: Thu Jan  5 01:44:12 2017
New Revision: 311346
URL: https://svnweb.freebsd.org/changeset/base/311346

Log:
   Add a small allocator for exec_map entries.
   
   Upon each execve, we allocate a KVA range for use in copying data to the

   new image. Pages must be faulted into the range, and when the range is
   freed, the backing pages are freed and their mappings are destroyed. This
   is a lot of needless overhead, and the exec_map management becomes a
   bottleneck when many CPUs are executing execve concurrently. Moreover, the
   number of available ranges is fixed at 16, which is insufficient on large
   systems and potentially excessive on 32-bit systems.
   
   The new allocator reduces overhead by making exec_map allocations

   persistent. When a range is freed, pages backing the range are marked clean
   and made easy to reclaim. With this change, the exec_map is sized based on
   the number of CPUs.
   
   Reviewed by:	kib

   MFC after:   1 month
   Differential Revision:   https://reviews.freebsd.org/D8921

Modified:
   head/sys/kern/kern_exec.c
   head/sys/sys/imgact.h
   head/sys/vm/vm_init.c
   head/sys/vm/vm_kern.c
   head/sys/vm/vm_kern.h

Modified: head/sys/kern/kern_exec.c
==
--- head/sys/kern/kern_exec.c   Thu Jan  5 01:28:08 2017(r311345)
+++ head/sys/kern/kern_exec.c   Thu Jan  5 01:44:12 2017(r311346)
@@ -45,6 +45,7 @@ __FBSDID("$FreeBSD$");
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -59,6 +60,7 @@ __FBSDID("$FreeBSD$");
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -1315,17 +1317,80 @@ err_exit:
return (error);
  }
  
+struct exec_args_kva {

+   vm_offset_t addr;
+   SLIST_ENTRY(exec_args_kva) next;
+};
+
+static DPCPU_DEFINE(struct exec_args_kva *, exec_args_kva);
+
+static SLIST_HEAD(, exec_args_kva) exec_args_kva_freelist;
+static struct mtx exec_args_kva_mtx;
+
+static void
+exec_prealloc_args_kva(void *arg __unused)
+{
+   struct exec_args_kva *argkva;
+   u_int i;
+
+   SLIST_INIT(_args_kva_freelist);
+   mtx_init(_args_kva_mtx, "exec args kva", NULL, MTX_DEF);
+   for (i = 0; i < exec_map_entries; i++) {
+   argkva = malloc(sizeof(*argkva), M_PARGS, M_WAITOK);
+   argkva->addr = kmap_alloc_wait(exec_map, exec_map_entry_size);
+   SLIST_INSERT_HEAD(_args_kva_freelist, argkva, next);
+   }
+}
+SYSINIT(exec_args_kva, SI_SUB_EXEC, SI_ORDER_ANY, exec_prealloc_args_kva, 
NULL);
+
+static vm_offset_t
+exec_alloc_args_kva(void **cookie)
+{
+   struct exec_args_kva *argkva;
+
+   argkva = (void *)atomic_readandclear_ptr(
+   (uintptr_t *)DPCPU_PTR(exec_args_kva));
+   if (argkva == NULL) {
+   mtx_lock(_args_kva_mtx);
+   while ((argkva = SLIST_FIRST(_args_kva_freelist)) == NULL)
+   (void)mtx_sleep(_args_kva_freelist,
+   _args_kva_mtx, 0, "execkva", 0);
+   SLIST_REMOVE_HEAD(_args_kva_freelist, next);
+   mtx_unlock(_args_kva_mtx);
+   }
+   *(struct exec_args_kva **)cookie = argkva;
+   return (argkva->addr);
+}
+
+static void
+exec_free_args_kva(void *cookie)
+{
+   struct exec_args_kva *argkva;
+   vm_offset_t base;
+
+   argkva = cookie;
+   base = argkva->addr;
+
+   vm_map_madvise(exec_map, base, base + exec_map_entry_size, MADV_FREE);
+   if (!atomic_cmpset_ptr((uintptr_t *)DPCPU_PTR(exec_args_kva),
+   (uintptr_t)NULL, (uintptr_t)argkva)) {
+   mtx_lock(_args_kva_mtx);
+   SLIST_INSERT_HEAD(_args_kva_freelist, argkva, next);
+   wakeup_one(_args_kva_freelist);
+   mtx_unlock(_args_kva_mtx);
+   }
+}
+
  /*
   * Allocate temporary demand-paged, zero-filled memory for the file name,
- * argument, and environment strings.  Returns zero if the allocation succeeds
- * and ENOMEM otherwise.
+ * argument, and environment strings.
   */
  int
  exec_alloc_args(struct image_args *args)
  {
  
-	args->buf = (char *)kmap_alloc_wait(exec_map, PATH_MAX + ARG_MAX);

-   return (args->buf != NULL ? 0 : ENOMEM);
+   args->buf = (char *)exec_alloc_args_kva(>bufkva);
+   return (0);
  }
  
  void

@@ -1333,8 +1398,7 @@ exec_free_args(struct image_args *args)
  {
  
  	if (args->buf != NULL) {

-   kmap_free_wakeup(exec_map, (vm_offset_t)args->buf,
-   PATH_MAX + ARG_MAX);
+   exec_free_args_kva(args->bufkva);
args->buf = NULL;
}
if (args->fname_buf != NULL) {

Modified: head/sys/sys/imgact.h
==
---

Re: svn commit: r310112 - head/sys/conf

2016-12-15 Thread Steven Hartland


Thanks for doing this :)

On 15/12/2016 12:57, Ed Maste wrote:

Author: emaste
Date: Thu Dec 15 12:57:03 2016
New Revision: 310112
URL: https://svnweb.freebsd.org/changeset/base/310112

Log:
   newvers.sh: add option to eliminate kernel build metadata
   
   Build metadata (username, hostname, etc.) prevents the FreeBSD kernel

   from building reproducibly. Add an option to disable inclusion of that
   metadata but retain the release information and SVN/git VCS details.
   See https://reproducible-builds.org/ for additional background.
   
   Reviewed by:	bapt

   Obtained from:   NetBSD
   MFC after:   1 month
   Sponsored by:Reproducible Builds World Summit 2, Berlin
   Differential Revision:   https://reviews.freebsd.org/D4347

Modified:
   head/sys/conf/newvers.sh

Modified: head/sys/conf/newvers.sh
==
--- head/sys/conf/newvers.shThu Dec 15 10:51:35 2016(r310111)
+++ head/sys/conf/newvers.shThu Dec 15 12:57:03 2016(r310112)
@@ -30,6 +30,14 @@
  # @(#)newvers.sh  8.1 (Berkeley) 4/20/94
  # $FreeBSD$
  
+# Command line options:

+#
+# -r   Reproducible build.  Do not embed directory names, user
+#  names, time stamps or other dynamic information into
+#  the outuput file.  This is intended to allow two builds
+#  done at different times and even by different people on
+#  different hosts to produce identical output.
+
  TYPE="FreeBSD"
  REVISION="12.0"
  BRANCH="CURRENT"
@@ -250,10 +258,28 @@ if [ -n "$hg_cmd" ] ; then
fi
  fi
  
+include_metadata=true

+while getopts r opt; do
+   case "$opt" in
+   r)
+   include_metadata=
+   ;;
+   esac
+done
+shift $((OPTIND - 1))
+
+if [ -z "${include_metadata}" ]; then
+   VERINFO="${VERSION} ${svn}${git}${hg}${p4version}"
+   VERSTR="${VERINFO}\\n"
+else
+   VERINFO="${VERSION} #${v}${svn}${git}${hg}${p4version}: ${t}"
+   VERSTR="${VERINFO}\\n${u}@${h}:${d}\\n"
+fi
+
  cat << EOF > vers.c
  $COPYRIGHT
-#define SCCSSTR "@(#)${VERSION} #${v}${svn}${git}${hg}${p4version}: ${t}"
-#define VERSTR "${VERSION} #${v}${svn}${git}${hg}${p4version}: ${t}\\n
${u}@${h}:${d}\\n"
+#define SCCSSTR "@(#)${VERINFO}"
+#define VERSTR "${VERSTR}"
  #define RELSTR "${RELEASE}"
  
  char sccs[sizeof(SCCSSTR) > 128 ? sizeof(SCCSSTR) : 128] = SCCSSTR;




___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

Re: svn commit: r308782 - in head: cddl/contrib/opensolaris/cmd/ztest sys/cddl/contrib/opensolaris/uts/common/fs/zfs sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys

2016-11-17 Thread Steven Hartland


Thanks, looks like the PR needs a rebase before it can be merged.

On 17/11/2016 22:11, Alexander Motin wrote:

It is in OpenZFS review queue now:
https://github.com/openzfs/openzfs/pull/219  Welcome to comment there to
speed up the process.

On 17.11.2016 13:43, Steven Hartland wrote:

Is this something that should be upstreamed?

On 17/11/2016 21:01, Alexander Motin wrote:

Author: mav
Date: Thu Nov 17 21:01:27 2016
New Revision: 308782
URL: https://svnweb.freebsd.org/changeset/base/308782

Log:
   After some ZIL changes 6 years ago zil_slog_limit got partially broken
   due to zl_itx_list_sz not updated when async itx'es upgraded to sync.
   Actually because of other changes about that time zl_itx_list_sz is not
   really required to implement the functionality, so this patch removes
   some unneeded broken code and variables.
   
   Original idea of zil_slog_limit was to reduce chance of SLOG abuse by

   single heavy logger, that increased latency for other (more latency critical)
   loggers, by pushing heavy log out into the main pool instead of SLOG. Beside
   huge latency increase for heavy writers, this implementation caused double
   write of all data, since the log records were explicitly prepared for SLOG.
   Since we now have I/O scheduler, I've found it can be much more efficient
   to reduce priority of heavy logger SLOG writes from ZIO_PRIORITY_SYNC_WRITE
   to ZIO_PRIORITY_ASYNC_WRITE, while still leave them on SLOG.
   
   Existing ZIL implementation had problem with space efficiency when it

   has to write large chunks of data into log blocks of limited size. In some
   cases efficiency stopped to almost as low as 50%. In case of ZIL stored on
   spinning rust, that also reduced log write speed in half, since head had to
   uselessly fly over allocated but not written areas. This change improves
   the situation by offloading problematic operations from z*_log_write() to
   zil_lwb_commit(), which knows real situation of log blocks allocation and
   can split large requests into pieces much more efficiently. Also as side
   effect it removes one of two data copy operations done by ZIL code WR_COPIED
   case.
   
   While there, untangle and unify code of z*_log_write() functions.

   Also zfs_log_write() alike to zvol_log_write() can now handle writes crossing
   block boundary, that may also improve efficiency if ZPL is made to do that.
   
   Sponsored by:	iXsystems, Inc.


Modified:
   head/cddl/contrib/opensolaris/cmd/ztest/ztest.c
   head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zil.h
   head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zil_impl.h
   head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h
   head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_log.c
   head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c
   head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c
   head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c

Modified: head/cddl/contrib/opensolaris/cmd/ztest/ztest.c
==
--- head/cddl/contrib/opensolaris/cmd/ztest/ztest.c Thu Nov 17 20:44:51 
2016(r308781)
+++ head/cddl/contrib/opensolaris/cmd/ztest/ztest.c Thu Nov 17 21:01:27 
2016(r308782)
@@ -1371,7 +1371,6 @@ ztest_log_write(ztest_ds_t *zd, dmu_tx_t
itx->itx_private = zd;
itx->itx_wr_state = write_state;
itx->itx_sync = (ztest_random(8) == 0);
-   itx->itx_sod += (write_state == WR_NEED_COPY ? lr->lr_length : 0);
  
  	bcopy(>lr_common + 1, >itx_lr + 1,

sizeof (*lr) - sizeof (lr_t));

Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zil.h
==
--- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zil.h   Thu Nov 
17 20:44:51 2016(r308781)
+++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zil.h   Thu Nov 
17 21:01:27 2016(r308782)
@@ -369,7 +369,6 @@ typedef struct itx {
void*itx_private;   /* type-specific opaque data */
itx_wr_state_t  itx_wr_state;   /* write state */
uint8_t itx_sync;   /* synchronous transaction */
-   uint64_titx_sod;/* record size on disk */
uint64_titx_oid;/* object id */
lr_titx_lr; /* common part of log record */
/* followed by type-specific part of lr_xx_t and its immediate data */

Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zil_impl.h
==
--- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zil_impl.h  Thu Nov 
17 20:44:51 2016(r308781)
+++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zil_impl.h  Thu Nov 
17 21:01:27 2016(r308782)
@@ -42,6 +42,7 @@ extern "C" {
  typed

Re: svn commit: r308782 - in head: cddl/contrib/opensolaris/cmd/ztest sys/cddl/contrib/opensolaris/uts/common/fs/zfs sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys

2016-11-17 Thread Steven Hartland


Is this something that should be upstreamed?

On 17/11/2016 21:01, Alexander Motin wrote:

Author: mav
Date: Thu Nov 17 21:01:27 2016
New Revision: 308782
URL: https://svnweb.freebsd.org/changeset/base/308782

Log:
   After some ZIL changes 6 years ago zil_slog_limit got partially broken
   due to zl_itx_list_sz not updated when async itx'es upgraded to sync.
   Actually because of other changes about that time zl_itx_list_sz is not
   really required to implement the functionality, so this patch removes
   some unneeded broken code and variables.
   
   Original idea of zil_slog_limit was to reduce chance of SLOG abuse by

   single heavy logger, that increased latency for other (more latency critical)
   loggers, by pushing heavy log out into the main pool instead of SLOG. Beside
   huge latency increase for heavy writers, this implementation caused double
   write of all data, since the log records were explicitly prepared for SLOG.
   Since we now have I/O scheduler, I've found it can be much more efficient
   to reduce priority of heavy logger SLOG writes from ZIO_PRIORITY_SYNC_WRITE
   to ZIO_PRIORITY_ASYNC_WRITE, while still leave them on SLOG.
   
   Existing ZIL implementation had problem with space efficiency when it

   has to write large chunks of data into log blocks of limited size. In some
   cases efficiency stopped to almost as low as 50%. In case of ZIL stored on
   spinning rust, that also reduced log write speed in half, since head had to
   uselessly fly over allocated but not written areas. This change improves
   the situation by offloading problematic operations from z*_log_write() to
   zil_lwb_commit(), which knows real situation of log blocks allocation and
   can split large requests into pieces much more efficiently. Also as side
   effect it removes one of two data copy operations done by ZIL code WR_COPIED
   case.
   
   While there, untangle and unify code of z*_log_write() functions.

   Also zfs_log_write() alike to zvol_log_write() can now handle writes crossing
   block boundary, that may also improve efficiency if ZPL is made to do that.
   
   Sponsored by:	iXsystems, Inc.


Modified:
   head/cddl/contrib/opensolaris/cmd/ztest/ztest.c
   head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zil.h
   head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zil_impl.h
   head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zio.h
   head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_log.c
   head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zil.c
   head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zio.c
   head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zvol.c

Modified: head/cddl/contrib/opensolaris/cmd/ztest/ztest.c
==
--- head/cddl/contrib/opensolaris/cmd/ztest/ztest.c Thu Nov 17 20:44:51 
2016(r308781)
+++ head/cddl/contrib/opensolaris/cmd/ztest/ztest.c Thu Nov 17 21:01:27 
2016(r308782)
@@ -1371,7 +1371,6 @@ ztest_log_write(ztest_ds_t *zd, dmu_tx_t
itx->itx_private = zd;
itx->itx_wr_state = write_state;
itx->itx_sync = (ztest_random(8) == 0);
-   itx->itx_sod += (write_state == WR_NEED_COPY ? lr->lr_length : 0);
  
  	bcopy(>lr_common + 1, >itx_lr + 1,

sizeof (*lr) - sizeof (lr_t));

Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zil.h
==
--- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zil.h   Thu Nov 
17 20:44:51 2016(r308781)
+++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zil.h   Thu Nov 
17 21:01:27 2016(r308782)
@@ -369,7 +369,6 @@ typedef struct itx {
void*itx_private;   /* type-specific opaque data */
itx_wr_state_t  itx_wr_state;   /* write state */
uint8_t itx_sync;   /* synchronous transaction */
-   uint64_titx_sod;/* record size on disk */
uint64_titx_oid;/* object id */
lr_titx_lr; /* common part of log record */
/* followed by type-specific part of lr_xx_t and its immediate data */

Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zil_impl.h
==
--- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zil_impl.h  Thu Nov 
17 20:44:51 2016(r308781)
+++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sys/zil_impl.h  Thu Nov 
17 21:01:27 2016(r308782)
@@ -42,6 +42,7 @@ extern "C" {
  typedef struct lwb {
zilog_t *lwb_zilog; /* back pointer to log struct */
blkptr_tlwb_blk;/* on disk address of this log blk */
+   boolean_t   lwb_slog;   /* lwb_blk is on SLOG device */
int lwb_nused;  /* # used bytes in buffer */
int

Re: svn commit: r307507 - head/sys/cam/scsi

2016-10-17 Thread Steven Hartland



On 17/10/2016 09:51, Alexander Motin wrote:

On 17.10.2016 11:45, Steven Hartland wrote:

IIRC the timeout for this was intentionally lower than the default,
might be worth just checking.

I did traced back the commit history, and it was hardcoded to that value
since the beginning 18 years ago.  Theoretically SYNCHRONIZE CACHE may
require even more time then WRITE, since nobody knows how big can be
write caches and how many writes are sitting there.
Cool, must be thinking about something else that was added recently 
then, thanks for checking :)

___
svn-src-all@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/svn-src-all
To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"

1 2 3 4 5 6 7 >

1 - 100 of 671 matches

Mail list logo