Re: Call fo comments - raising vfs.ufs.dirhash_reclaimage?

2013-10-08 Thread Davide Italiano
On Tue, Oct 8, 2013 at 1:25 PM, Adrian Chadd adr...@freebsd.org wrote:
 Hi,


Hi Adrian,

 Please try it out on a -10 VM with something RAM limited - say, 128mb w/
 GENERIC. See how it behaves.

 I've successfully done buildworlds on 10-i386 with 128mb RAM. Let's try not
 to break that before releng/10 is cut.

 thanks,



This is not supposed to hit 10-STABLE, at least not before proper
review. This is the reason why it was proposed on mailing lists. Also,
if you read the patch you'll end up with realizing this should behave
better on low memory environment because it unconditionally cleans 10%
of the cache every time. Previous changes in this area just did the
opposite keeping a lot more of memory around. I hope this makes sense
to you.

Thanks,

-- 
Davide

There are no solved problems; there are only problems that are more
or less solved -- Henri Poincare
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Call fo comments - raising vfs.ufs.dirhash_reclaimage?

2013-10-08 Thread Davide Italiano
On Tue, Oct 8, 2013 at 1:32 PM, Davide Italiano dav...@freebsd.org wrote:
 On Tue, Oct 8, 2013 at 1:25 PM, Adrian Chadd adr...@freebsd.org wrote:
 Hi,


 Hi Adrian,

 Please try it out on a -10 VM with something RAM limited - say, 128mb w/
 GENERIC. See how it behaves.

 I've successfully done buildworlds on 10-i386 with 128mb RAM. Let's try not
 to break that before releng/10 is cut.

 thanks,



 This is not supposed to hit 10-STABLE, at least not before proper
 review. This is the reason why it was proposed on mailing lists. Also,
 if you read the patch you'll end up with realizing this should behave
 better on low memory environment because it unconditionally cleans 10%
 of the cache every time. Previous changes in this area just did the
 opposite keeping a lot more of memory around. I hope this makes sense
 to you.

 Thanks,

 --
 Davide

 There are no solved problems; there are only problems that are more
 or less solved -- Henri Poincare

Also, right now you can set up a value which indicates the percentage
of memory you can reclaim. It's 10% by default, but we can discuss if
this could be adjusted to a more reasonable default. Feel free to give
your opinion.

-- 
Davide

There are no solved problems; there are only problems that are more
or less solved -- Henri Poincare
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Call fo comments - raising vfs.ufs.dirhash_reclaimage?

2013-10-08 Thread Davide Italiano
On Tue, Oct 8, 2013 at 3:38 PM, RW rwmailli...@googlemail.com wrote:
 On Tue, 8 Oct 2013 13:32:58 +0200
 Davide Italiano wrote:

 On Tue, Oct 8, 2013 at 1:25 PM, Adrian Chadd adr...@freebsd.org
 wrote:
  Hi,
 

 Hi Adrian,

  Please try it out on a -10 VM with something RAM limited - say,
  128mb w/ GENERIC. See how it behaves.

 Be aware that any test that doesn't cause vfs.ufs.dirhash_lowmemcount
 to increment isn't testing the change at all.


 This is not supposed to hit 10-STABLE, at least not before proper
 review. This is the reason why it was proposed on mailing lists. Also,
 if you read the patch you'll end up with realizing this should behave
 better on low memory environment because it unconditionally cleans 10%
 of the cache every time.

 The current version deletes anything older than the time limit and if
 this doesn't reclaim 10% it makes a second pass through the list until
 the target is met.

 Your version tries to delete 10% (or whatever) by looping around the
 list until this is achieved. And if there aren't enough unlocked
 entries it will loop  indefinitely until there are. I hope you know
 what you are doing because that looks very wrong.


Hi (RW..?),

This could be probably changed -- from what | see even under high
memory pressure this wasn't a problem but all in all I agree with you
that we shouldn't loop forever but limit the number of pass on the
list to a somewhat constant number, to prevent pathological cases.

 The original version looks better to me given that it already tries to
 free a minimum of 10%. The low-memory handler is called under very low
 memory conditions when the system is probably thrashing or even on the
 verge of killing processes. Holding on to entries that haven't been
 used for a minute seems like a luxury.

 Previous changes in this area just did the
 opposite keeping a lot more of memory around.

 I don't believe that's true. Under most circustances the existing
 scheme free more memory. The only case when yours frees more is when 90%
 of the entries are locked.

Well, no. Here you can set the threshold to a more aggressive value so
that you reclaim more memory every time. Note that this was not
possible in the previous version, so, if you could have a situation
where all your cache entries were not touched for 15 or even 30
seconds they would have kept around, and you can destroy up to 10% of
them everytime lowmem event is called.

Thanks,

-- 
Davide

There are no solved problems; there are only problems that are more
or less solved -- Henri Poincare
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Call fo comments - raising vfs.ufs.dirhash_reclaimage?

2013-10-07 Thread Davide Italiano
On Wed, Aug 28, 2013 at 3:56 PM, Ivan Voras ivo...@freebsd.org wrote:
 Hi,

 Prodded by davide@, I'd like to collect opinions about raising the
 vfs.ufs.dirhash_reclaimage sysctl from 5 to 60, committed at:

 http://svnweb.freebsd.org/changeset/base/254986

 What it does:

 Used in lowmem handler at
 http://fxr.watson.org/fxr/source/ufs/ufs/ufs_dirhash.c#L1247 when
 determining which cache entries to evict; it skips (keeps in the cache)
 entries which are younger than this number of seconds. This lowmem
 handler only frees up to 10% of the dirhash cache at a time.


I don't think this is correct. The first loop scans over the whole
ufsdirhash_list and there's nothing that sets the cap to 10%.
It might happen that UFS is unused for some minutes and you'll end up
with all the cache free'd.

-- 
Davide

There are no solved problems; there are only problems that are more
or less solved -- Henri Poincare
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Call fo comments - raising vfs.ufs.dirhash_reclaimage?

2013-10-07 Thread Davide Italiano
 What would perhaps be better than a hardcoded reclaim age would be to use
 an LRU-type approach and perhaps set a target percent to reclaim.  That is,
 suppose you were to reclaim the oldest 10% of hashes on each lowmem call
 (and make the '10%' the tunable value).  Then you will always make some amount
 of progress in a low memory situation (and if the situation remains dire you
 will eventually empty the entire cache), but the effective maximum age will
 be more dynamic.  Right now if you haven't touched UFS in 5 seconds it
 throws the entire thing out on the first lowmem event.  The LRU-approach would
 only throw the oldest 10% out on the first call, but eventually throw it all 
 out
 if the situation remains dire.

 --
 John Baldwin
 ___
 freebsd...@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-fs
 To unsubscribe, send any mail to freebsd-fs-unsubscr...@freebsd.org

I liked your idea more than what's available in HEAD right now and I
implemented it.
http://people.freebsd.org/~davide/review/ufs_direclaimage.diff
I was unsure what kind of heuristic I should choose to select which
(10% of) entries should be evicted so I just removed the first 10%
ones from the head of the ufs_dirhash list (which should be the
oldest).
The code keeps rescanning the cache until 10% (or, the percentage set
via SYSCTL) of the entry are freed, but probably we can discuss if
this limit could be relaxed and just do a single scan over the list.
Unfortunately I haven't a testcase to prove the effectiveness (or
non-effectiveness) of the approach but I think either Ivan or Peter
could be able to give it a spin, maybe.

-- 
Davide

There are no solved problems; there are only problems that are more
or less solved -- Henri Poincare
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: memory allocation in spinlock context

2013-03-01 Thread Davide Italiano
On Fri, Mar 1, 2013 at 2:50 PM, Andriy Gapon a...@freebsd.org wrote:

 I am trying to understand if it is possible to allow memory allocations 
 (M_NOWAIT,
 of course) in a spinlock context.
 I do not see any obvious architectural obstacles.
 But the fact that all of the uma locks, system map lock, object locks, page 
 queue
 locks and so on are regular mutexes makes it impossible to allocate memory 
 without
 violating the fundamental lock ordering rules.

 Could all of the above mentioned locks potentially be converted to spin 
 mutexes?
 (And that seems to be a large nasty change)

AFAIK they're suitable for particular uses and not in general.
For example if the critical section is short, spinning rather than
sleeping could avoid a potential context switches, increasing
performances. OTOH has the disadvantage of wasting time that could be
used in other activities. So, IMHO, if such a change need to be
adopted, ti should be pondered/profiled more than a bit, and I doubt
it could be used for the wide class of locks you mentioned.

 Are there any alternative possibilities?


Is there anything that prevent you to drop the lock, perform the
allocation in a reliable fashion (M_WAITOK) and try to reacquire the
lock later on?

 BTW, currently we have at least one place where a memory allocation of this 
 kind
 is done stealthily (and thus dangerously?).  ACPI resume code must execute
 AcpiLeaveSleepStatePrep with interrupts disabled and ACPICA code performs 
 memory
 allocations in that code path.  Since the interrupts are disabled by means of
 intr_disable(), witness(9) and similar are completely oblivious of the fact.

 --
 Andriy Gapon
 ___
 freebsd-hackers@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Thanks,

-- 
Davide
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: huge ktr buffer

2012-12-06 Thread Davide Italiano
On Thu, Dec 6, 2012 at 4:18 PM, Andriy Gapon a...@freebsd.org wrote:

 So I configured a kernel with the following option:
 options   KTR_ENTRIES=(1024UL*1024)
 then booted the kernel and did
 $ sysctl debug.ktr.clear=1
 and got an insta-reboot.

 No panic, nothing, just a reset.

 I suspect that the huge static buffer resulting from the above option could 
 be a
 cause.  But I would like to understand the details, if possible.

 Also, perhaps ktr could be a little bit more sophisticated with its buffer 
 than
 just using a static array.

 --
 Andriy Gapon
 ___
 freebsd-hackers@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

It was a while ago, but running r238886 built using the following
kernel configuration file:
http://people.freebsd.org/~davide/DEBUG I found a similar issue.
The machine  paniced: fatal trap 12 with interrupt disable in early boot
(even before the appareance of the Berkley logo).
Basically, my configuration file is just GENERIC with slight
modifications, in particular debugging options (WITNESS, INVARIANTS,
etc..) turned on and the following KTR options enabled:

options KTR
options KTR_COMPILE=(KTR_CALLOUT|KTR_PROC)
options KTR_MASK=(KTR_CALLOUT|KTR_PROC)
options KTR_ENTRIES=524288

It seems the issue is related to KTR itself, and in particular to the
value of KTR_ENTRIES. As long as this value is little (e.g. 2048)
everything works fine and the boot sequence ends. If I choose 524288
(the value you can also see from the kernel conf file) the fatal trap
occurs.

Even though it was really difficult to me to get some informations
because the fail happens too early, I put some printf() within the
code and I isolated the point in which the kernel dies:
(sys/amd64/amd64/machdep.c, in getmemsize())

1540/*
1541* map page into kernel: valid, read/write,non-cacheable
1542*/
1543*pte = pa | PG_V | PG_RW | PG_N;


As also Alan suggested, a way to workaround the problem is to increase
NKPT value (e.g. from 32 to 64). Obviously, this is not a proper fix.
For a proper fix the kernel needs to be able to dynamically set the
size of NKPT.  In this particular case, this wouldn't be too hard, but
there is a different case, where people preload a large memory disk
image at boot time that isn't so easy to fix.

Thanks,

Davide
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: On cooperative work [Was: Re: newbus' ivar's limitation..]

2012-08-02 Thread Davide Italiano
On Wed, Aug 1, 2012 at 7:05 PM, Arnaud Lacombe lacom...@gmail.com wrote:
 Hi,

 On Wed, Aug 1, 2012 at 12:40 PM, Attilio Rao atti...@freebsd.org wrote:
 On Wed, Aug 1, 2012 at 5:32 PM, Arnaud Lacombe lacom...@gmail.com wrote:
 Hi,

 On Tue, Jul 31, 2012 at 4:14 PM, Attilio Rao atti...@freebsd.org wrote:

 You don't want to work cooperatively.

 Why is it that mbuf's refactoring consultation is being held in
 internal, private, committers-and-invite-only-restricted meeting at
 BSDCan ?

 Why is it that so much review and discussion on changes are held privately ?

 Arnaud,
 belive me, to date I don't recall a single major technical decision
 that has been settled exclusively in private (not subjected to peer
 review) and in particular in person (e-mail help you focus on a lot of
 different details that you may not have under control when talking to
 people, etc).

 Whose call is it to declare something worth public discussion ? No one.

 Every time I see a Suggested by:, Submitted by:, Reported by:,
 and especially Approved by:, there should to be a public reference
 of the mentioned communication.

 Sometimes it is useful that a limited number of developers is involved
 in initial brainstorming of some works,

 Never.

 but after this period
 constructive people usually ask for peer review publishing their plans
 on the mailing lists or other media.

 Again, never. By doing so, you merely put the community in a situation
 where, well, We, committers, have come with this, you can either
 accept or STFU, but no major changes will be made because we decided
 so.

 The callout-ng conference at BSDCan was just beautiful, it was basically:

 Speaker: we will do this
 Audience: how about this situation ? What you will do will not work.
 Speaker: thank you for listening, end of the conference

 It was beautiful to witness.


Well, my talk was mainly there to collect some opinion on how to
continue my work.
IIRC, the only one objection was on supporting callout execution from
hw interrupt context. Mainly, the objection moved was that there were
no practical applications for that. It turned out I found some, and in
any case it wasn't it will not work but probably it's not an effort
you want to put because the consumers that can exploit some
functionality are few. I wasn't really so familiar with that so I
hesitated in answering. In any case, I liked a lot the objection moved
by Attilio because it gave me the possibility to investigate and find
out the right direction. As you may see, there's a branch in projects/
in which the feature that won't work is implemented, so, maybe
you're missing something.
If you had some concerns on it you can raise up your hand and tell:
hey, that sucks. It would be better than getting this feedback after
3 months of work honestly. I have nothing in contrary about getting
feedbacks (negative or positive). But probably you belong to that kind
of people that are able to tell only behind a monitor, so this is my
last word on the topic.
Get a life.


 If you don't see any public further discussion this may be meaning:
 a) the BSDCan meetings have been fruitless and there is no precise
 plan/roadmap/etc.

 so not only you make it private, but it shamelessly failed...

 b) there is still not consensus on details

 Then the discussion should stop, public records are kept for reference
 in the future. There is no problem with this.

 and you can always publically asked on what was decided and what not.
 Just send a mail to interested recipients and CC any FreeBSD mailing
 list.

 This is not the way openness should be about.

  - Arnaud

 Attilio


 --
 Peace can only be achieved by understanding - A. Einstein
 ___
 freebsd-hackers@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Davide
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: kqueue periodic timer confusion

2012-07-19 Thread Davide Italiano
On Thu, Jul 19, 2012 at 5:06 PM, Paul Albrecht albre...@glccom.com wrote:
 On Fri, 2012-07-13 at 07:22 -0500, Davide Italiano wrote:
 On Thu, Jul 12, 2012 at 5:25 PM, John Baldwin j...@freebsd.org wrote:
  On Thursday, July 12, 2012 11:08:47 am Davide Italiano wrote:
  On Thu, Jul 12, 2012 at 4:26 PM, John Baldwin j...@freebsd.org wrote:
   On Thursday, July 12, 2012 9:57:16 am Ian Lepore wrote:
   On Thu, 2012-07-12 at 08:34 -0400, John Baldwin wrote:
On Wednesday, July 11, 2012 5:00:47 pm Ian Lepore wrote:
 On Wed, 2012-07-11 at 14:52 -0500, Paul Albrecht wrote:
  Hi,
 
  Sorry about this repost but I'm confused about the responses I 
  received
  in my last post so I'm looking for some clarification.
 
  Specifically, I though I could use the kqueue timer as 
  essentially a
  drop in replacement for linuxfd_create/read, but was surprised 
  that
  the accuracy of the kqueue timer is much less than what I need 
  for my
  application.
 
  So my confusion at this point is whether this is consider to be 
  a bug or
  feature?
 
  Here's some test code if you want to verify the problem:
 
  #include stdio.h
  #include stdlib.h
  #include string.h
  #include unistd.h
  #include errno.h
  #include sys/types.h
  #include sys/event.h
  #include sys/time.h
 
  int
  main(void)
  {
  int i,msec;
  int kq,nev;
  struct kevent inqueue;
  struct kevent outqueue;
  struct timeval start,end;
 
  if ((kq = kqueue()) == -1) {
  fprintf(stderr, kqueue error!? errno = %s,
strerror(errno));
  exit(EXIT_FAILURE);
  }
  EV_SET(inqueue, 1, EVFILT_TIMER, EV_ADD | EV_ENABLE, 0, 
  20, 0);
 
  gettimeofday(start, 0);
  for (i = 0; i  50; i++) {
  if ((nev = kevent(kq, inqueue, 1, outqueue, 1, 
  NULL)) ==
-1) {
  fprintf(stderr, kevent error!? errno = 
  %s,
strerror(errno));
  exit(EXIT_FAILURE);
  } else if (outqueue.flags  EV_ERROR) {
  fprintf(stderr, EV_ERROR: %s\n,
strerror(outqueue.data));
  exit(EXIT_FAILURE);
  }
  }
  gettimeofday(end, 0);
 
  msec = ((end.tv_sec - start.tv_sec) * 1000) + (((100 
  +
end.tv_usec - start.tv_usec) / 1000) - 1000);
 
  printf(msec = %d\n, msec);
 
  close(kq);
  return EXIT_SUCCESS;
  }
 
 

 What you are seeing is just the way FreeBSD currently works.

 Sleeping (in most all of its various forms, and I've just looked 
 at the
 kevent code to verify this is true there) is handled by converting 
 the
 amount of time to sleep (usually specified in a timeval or timespec
 struct) to a count of timer ticks, using an internal routine called
 tvtohz() in kern/kern_time.c.  That routine rounds up by one tick 
 to
 account for the current tick.  Whether that's a good idea or not 
 (it
 probably was once, and probably not anymore) it's how things 
 currently
 work, and could explain the fairly consistant +1ms you're seeing.
   
This is all true, but mostly irrelevant for his case.  EVFILT_TIMER
installs a periodic callout that executes KNOTE() and then resets 
itself (via
callout_reset()) each time it runs.  This should generally be closer 
to
regulary spaced intervals than something that does:
   
  
   In what way is it irrelevant?  That is, what did I miss?  It appears to
   me that the next callout is scheduled by calling timertoticks() passing
   a count of milliseconds, that count is converted to a struct timeval 
   and
   passed to tvtohz() which is where the +1 adjustment happens.  If you 
   ask
   for 20ms and each tick is 1ms, then you'd get regular spacing of 21ms.
   There is some time, likely a small number of microseconds, that you've
   consumed of the current tick, and that's what the +1 in tvtohz() is
   supposed to account for according to the comments.
  
   The tvtohz() routine both rounds up in the usual way 
   (value+tick-1)/tick
   and then adds one tick on top of that.  That seems not quite right to
   me, except that it is a way to g'tee that you don't return early, and
   that is the one promise made by sleep routines on any OS; those magical
   at least words always appear in the docs.
  
   Actually what I'm missing (that I know of) is how the scheduler works.
   Maybe the +1 adjustment to account for the fraction of the current tick
   you've already consumed is the right thing to do, even when that
   fraction is 1uS or less of a 1mS tick.  That would depend on scheduler

Re: kqueue periodic timer confusion

2012-07-13 Thread Davide Italiano
On Thu, Jul 12, 2012 at 5:25 PM, John Baldwin j...@freebsd.org wrote:
 On Thursday, July 12, 2012 11:08:47 am Davide Italiano wrote:
 On Thu, Jul 12, 2012 at 4:26 PM, John Baldwin j...@freebsd.org wrote:
  On Thursday, July 12, 2012 9:57:16 am Ian Lepore wrote:
  On Thu, 2012-07-12 at 08:34 -0400, John Baldwin wrote:
   On Wednesday, July 11, 2012 5:00:47 pm Ian Lepore wrote:
On Wed, 2012-07-11 at 14:52 -0500, Paul Albrecht wrote:
 Hi,

 Sorry about this repost but I'm confused about the responses I 
 received
 in my last post so I'm looking for some clarification.

 Specifically, I though I could use the kqueue timer as essentially a
 drop in replacement for linuxfd_create/read, but was surprised 
 that
 the accuracy of the kqueue timer is much less than what I need for 
 my
 application.

 So my confusion at this point is whether this is consider to be a 
 bug or
 feature?

 Here's some test code if you want to verify the problem:

 #include stdio.h
 #include stdlib.h
 #include string.h
 #include unistd.h
 #include errno.h
 #include sys/types.h
 #include sys/event.h
 #include sys/time.h

 int
 main(void)
 {
 int i,msec;
 int kq,nev;
 struct kevent inqueue;
 struct kevent outqueue;
 struct timeval start,end;

 if ((kq = kqueue()) == -1) {
 fprintf(stderr, kqueue error!? errno = %s,
   strerror(errno));
 exit(EXIT_FAILURE);
 }
 EV_SET(inqueue, 1, EVFILT_TIMER, EV_ADD | EV_ENABLE, 0, 
 20, 0);

 gettimeofday(start, 0);
 for (i = 0; i  50; i++) {
 if ((nev = kevent(kq, inqueue, 1, outqueue, 1, 
 NULL)) ==
   -1) {
 fprintf(stderr, kevent error!? errno = %s,
   strerror(errno));
 exit(EXIT_FAILURE);
 } else if (outqueue.flags  EV_ERROR) {
 fprintf(stderr, EV_ERROR: %s\n,
   strerror(outqueue.data));
 exit(EXIT_FAILURE);
 }
 }
 gettimeofday(end, 0);

 msec = ((end.tv_sec - start.tv_sec) * 1000) + (((100 +
   end.tv_usec - start.tv_usec) / 1000) - 1000);

 printf(msec = %d\n, msec);

 close(kq);
 return EXIT_SUCCESS;
 }


   
What you are seeing is just the way FreeBSD currently works.
   
Sleeping (in most all of its various forms, and I've just looked at 
the
kevent code to verify this is true there) is handled by converting the
amount of time to sleep (usually specified in a timeval or timespec
struct) to a count of timer ticks, using an internal routine called
tvtohz() in kern/kern_time.c.  That routine rounds up by one tick to
account for the current tick.  Whether that's a good idea or not (it
probably was once, and probably not anymore) it's how things currently
work, and could explain the fairly consistant +1ms you're seeing.
  
   This is all true, but mostly irrelevant for his case.  EVFILT_TIMER
   installs a periodic callout that executes KNOTE() and then resets 
   itself (via
   callout_reset()) each time it runs.  This should generally be closer to
   regulary spaced intervals than something that does:
  
 
  In what way is it irrelevant?  That is, what did I miss?  It appears to
  me that the next callout is scheduled by calling timertoticks() passing
  a count of milliseconds, that count is converted to a struct timeval and
  passed to tvtohz() which is where the +1 adjustment happens.  If you ask
  for 20ms and each tick is 1ms, then you'd get regular spacing of 21ms.
  There is some time, likely a small number of microseconds, that you've
  consumed of the current tick, and that's what the +1 in tvtohz() is
  supposed to account for according to the comments.
 
  The tvtohz() routine both rounds up in the usual way (value+tick-1)/tick
  and then adds one tick on top of that.  That seems not quite right to
  me, except that it is a way to g'tee that you don't return early, and
  that is the one promise made by sleep routines on any OS; those magical
  at least words always appear in the docs.
 
  Actually what I'm missing (that I know of) is how the scheduler works.
  Maybe the +1 adjustment to account for the fraction of the current tick
  you've already consumed is the right thing to do, even when that
  fraction is 1uS or less of a 1mS tick.  That would depend on scheduler
  behavior that I know nothing about.
 
  Oh.  My bad, sorry.  You are correct.  It is a bug to use +1 in this
  case.  That is, the +1 makes sense when you are computing a one-time delta
  for things like nanosleep().  It is incorrect when computing a periodic
  delta such as for computing the interval

Re: kqueue periodic timer confusion

2012-07-12 Thread Davide Italiano
On Thu, Jul 12, 2012 at 4:26 PM, John Baldwin j...@freebsd.org wrote:
 On Thursday, July 12, 2012 9:57:16 am Ian Lepore wrote:
 On Thu, 2012-07-12 at 08:34 -0400, John Baldwin wrote:
  On Wednesday, July 11, 2012 5:00:47 pm Ian Lepore wrote:
   On Wed, 2012-07-11 at 14:52 -0500, Paul Albrecht wrote:
Hi,
   
Sorry about this repost but I'm confused about the responses I received
in my last post so I'm looking for some clarification.
   
Specifically, I though I could use the kqueue timer as essentially a
drop in replacement for linuxfd_create/read, but was surprised that
the accuracy of the kqueue timer is much less than what I need for my
application.
   
So my confusion at this point is whether this is consider to be a bug 
or
feature?
   
Here's some test code if you want to verify the problem:
   
#include stdio.h
#include stdlib.h
#include string.h
#include unistd.h
#include errno.h
#include sys/types.h
#include sys/event.h
#include sys/time.h
   
int
main(void)
{
int i,msec;
int kq,nev;
struct kevent inqueue;
struct kevent outqueue;
struct timeval start,end;
   
if ((kq = kqueue()) == -1) {
fprintf(stderr, kqueue error!? errno = %s,
  strerror(errno));
exit(EXIT_FAILURE);
}
EV_SET(inqueue, 1, EVFILT_TIMER, EV_ADD | EV_ENABLE, 0, 20, 
0);
   
gettimeofday(start, 0);
for (i = 0; i  50; i++) {
if ((nev = kevent(kq, inqueue, 1, outqueue, 1, 
NULL)) ==
  -1) {
fprintf(stderr, kevent error!? errno = %s,
  strerror(errno));
exit(EXIT_FAILURE);
} else if (outqueue.flags  EV_ERROR) {
fprintf(stderr, EV_ERROR: %s\n,
  strerror(outqueue.data));
exit(EXIT_FAILURE);
}
}
gettimeofday(end, 0);
   
msec = ((end.tv_sec - start.tv_sec) * 1000) + (((100 +
  end.tv_usec - start.tv_usec) / 1000) - 1000);
   
printf(msec = %d\n, msec);
   
close(kq);
return EXIT_SUCCESS;
}
   
   
  
   What you are seeing is just the way FreeBSD currently works.
  
   Sleeping (in most all of its various forms, and I've just looked at the
   kevent code to verify this is true there) is handled by converting the
   amount of time to sleep (usually specified in a timeval or timespec
   struct) to a count of timer ticks, using an internal routine called
   tvtohz() in kern/kern_time.c.  That routine rounds up by one tick to
   account for the current tick.  Whether that's a good idea or not (it
   probably was once, and probably not anymore) it's how things currently
   work, and could explain the fairly consistant +1ms you're seeing.
 
  This is all true, but mostly irrelevant for his case.  EVFILT_TIMER
  installs a periodic callout that executes KNOTE() and then resets itself 
  (via
  callout_reset()) each time it runs.  This should generally be closer to
  regulary spaced intervals than something that does:
 

 In what way is it irrelevant?  That is, what did I miss?  It appears to
 me that the next callout is scheduled by calling timertoticks() passing
 a count of milliseconds, that count is converted to a struct timeval and
 passed to tvtohz() which is where the +1 adjustment happens.  If you ask
 for 20ms and each tick is 1ms, then you'd get regular spacing of 21ms.
 There is some time, likely a small number of microseconds, that you've
 consumed of the current tick, and that's what the +1 in tvtohz() is
 supposed to account for according to the comments.

 The tvtohz() routine both rounds up in the usual way (value+tick-1)/tick
 and then adds one tick on top of that.  That seems not quite right to
 me, except that it is a way to g'tee that you don't return early, and
 that is the one promise made by sleep routines on any OS; those magical
 at least words always appear in the docs.

 Actually what I'm missing (that I know of) is how the scheduler works.
 Maybe the +1 adjustment to account for the fraction of the current tick
 you've already consumed is the right thing to do, even when that
 fraction is 1uS or less of a 1mS tick.  That would depend on scheduler
 behavior that I know nothing about.

 Oh.  My bad, sorry.  You are correct.  It is a bug to use +1 in this
 case.  That is, the +1 makes sense when you are computing a one-time delta
 for things like nanosleep().  It is incorrect when computing a periodic
 delta such as for computing the interval for an itimer (setitimer) or
 EVFILT_TIMER().

 Hah, setitimer()'s callout (realitexpire) uses tvtohz - 1:

 sys/kern/kern_time.c:

 /*
  * Real interval timer expired:
  * send process whose timer expired an alarm signal.
  * If time is not set up to reload, then just return.
  * 

Re: kqueue periodic timer confusion

2012-07-11 Thread Davide Italiano
On Wed, Jul 11, 2012 at 9:52 PM, Paul Albrecht albre...@glccom.com wrote:

 Hi,

 Sorry about this repost but I'm confused about the responses I received
 in my last post so I'm looking for some clarification.

 Specifically, I though I could use the kqueue timer as essentially a
 drop in replacement for linuxfd_create/read, but was surprised that
 the accuracy of the kqueue timer is much less than what I need for my
 application.

 So my confusion at this point is whether this is consider to be a bug or
 feature?

 Here's some test code if you want to verify the problem:

 #include stdio.h
 #include stdlib.h
 #include string.h
 #include unistd.h
 #include errno.h
 #include sys/types.h
 #include sys/event.h
 #include sys/time.h

 int
 main(void)
 {
 int i,msec;
 int kq,nev;
 struct kevent inqueue;
 struct kevent outqueue;
 struct timeval start,end;

 if ((kq = kqueue()) == -1) {
 fprintf(stderr, kqueue error!? errno = %s, strerror(errno));
 exit(EXIT_FAILURE);
 }
 EV_SET(inqueue, 1, EVFILT_TIMER, EV_ADD | EV_ENABLE, 0, 20, 0);

 gettimeofday(start, 0);
 for (i = 0; i  50; i++) {
 if ((nev = kevent(kq, inqueue, 1, outqueue, 1, NULL)) == 
 -1) {
 fprintf(stderr, kevent error!? errno = %s, 
 strerror(errno));
 exit(EXIT_FAILURE);
 } else if (outqueue.flags  EV_ERROR) {
 fprintf(stderr, EV_ERROR: %s\n, 
 strerror(outqueue.data));
 exit(EXIT_FAILURE);
 }
 }
 gettimeofday(end, 0);

 msec = ((end.tv_sec - start.tv_sec) * 1000) + (((100 + 
 end.tv_usec - start.tv_usec) / 1000) - 1000);

 printf(msec = %d\n, msec);

 close(kq);
 return EXIT_SUCCESS;
 }


 --
 Paul Albrecht

 ___
 freebsd-hackers@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Hi.
As I told you before I'm currently working on this problem.
I wrote a testcase myself, you can find it here:
http://people.freebsd.org/~davide/kqueue/kevent_test.c
As part of my callout(9) rewriting work I've recently converted
kqueue(9) in order to exploit the precision allowed by the new backend
and exposed to consumers via the new interface
(callout_reset_bt_on()).
I ran my testcase and these are the results over 100 iterations:
http://people.freebsd.org/~davide/kqueue/kqueue_res.png
(red line- old, green line - new). It seems there's some
improvement, at least for now.

If you want to give it a try checkout the projects/calloutng branch
and apply the following patch
http://people.freebsd.org/~davide/kqueue/kqueue_calloutng.diff (still
in an early stage, if there are some issues, feel free to report
them).

Davide
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: FreeBSD has serious problems with focus, longevity, and lifecycle

2012-01-25 Thread Davide Italiano
On Wed, Jan 25, 2012 at 9:58 AM, Arnaud Lacombe lacom...@gmail.com wrote:
 Hi,

 On Wed, Jan 25, 2012 at 2:28 AM, Mark Linimon lini...@lonesome.com wrote:
 I might just be also interested to review/comment code, discuss
 regressions, and architecture, for a change ;-)
 Unfortunately, such threads rarely ever happen. Most of the time, the
 only food provided is a really indigestible +5000/-3000 patch, where
 all the thinking, architectural design and code has been done behind
 closed door of a limited few committers, research lab or company.

 That's odd.  What the src committers usually tell me, when I have my
 bugmeister-advocate hat on, is that they post patches and then no one
 comments until after they check them in, at which time they complain.
 This discourages them from going through this the next time.

 exactly my point, huge patches are impossible to review.

 You will also note that some of the large commits say MFp4 or MF:
 projectname.  That means that either our Perforce repository, or
 SVN project/ directory, were used as staging areas.  It's possible to
 subscribe to these email messages.  (Exactly how is left as an exercise
 for the reader; the hour is getting late.)

 that is indeed a good source for having a look at early-alpha-WIP stuff.

 As for the research lab/company commits, I'm sure you'd complain equally
 if the code that these groups develop in-house and then release when it's
 in some kind of stable state, instead didn't get released at all.

 I see company contributed code as ad-hoc solution to the company's
 problem, not general solution for the whole FreeBSD userbase. To make
 a comparison with Linux, it is just as if Google got all the Android
 code merged in mainline as-is, without re-working anything. It did not
 happen that way. Much of their code had (and still has) to be
 re-designed, abstracted, and adapted to fit the general-purpose
 mainline.

 But, of course, I'm wasting my time trying to give you reasoned arguments
 about why FreeBSD does one thing or another.  AFAICT you're only interested
 in spreading FUD about what we do, how we do it, and what we say about it
 before, during, and afterwards.

 is this FUD: http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/160992 ?
 is this FUD: http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/156540 ?
 is this FUD: http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/156799 ?
 is this FUD: 
 http://lists.freebsd.org/pipermail/freebsd-current/2011-September/027400.html
 ?
 is this FUD: 
 http://lists.freebsd.org/pipermail/freebsd-current/2011-December/030076.html
 ?

 answer to all the above: no, this is bugs, regressions, and mis-design
 you folks introduced, not me. Don't blame me to point it out.

 You seem to be obsessed by picking over
 semantics and finding shortcomings to be aggreived over.

 Semantics and proper, independent, API are crucial.

 There is tons of ad-hoc code in FreeBSD which should be properly
 generalized. The most silly example I can think about is
 `time_after()', defined in net80211/ieee80211_freebsd.h. This has
 _nothing_ to do specifically with IEEE802.11, it's about time
 manipulation. Feel free to search the tree, there is tons of
 potentially unsafe, open-coded version of this macros. Call it
 nit-picking if you want, but when I write code, I want an API to use,
 I'm fed up to always have to re-invent the wheel.

 Btw, I do not even speak about some functions in the kernel
 re-implementing the exact same logic +10 times in a row, one after the
 other, within the same function body...

 For the story, I've been hacking tonight in Linux... a pure pleasure,
 real tough to get to, but really enjoyable.

 Whatever patches or review you've contributed to date, to my mind, are
 like the last tiny little bits of onion that are left over after one peels
 off all the outer layers.  There may be something to it, but the effort
 to get down to that point is so painful that it's not worth it.

 tl;dr: your drama outweighs your contributions.

 I already commented on this. I'm no longer interested in getting my
 stuff integrated in FreeBSD. I put it on github, eventually send
 patches on MLs, then you do what you want with it, I no longer
 particularly care. I know some patches are used around, that's enough.
 I did my time fighting committers to fix their not-so-bugfree code and
 won those battles, that's enough for me.


What I'm completely missing is the reason why you're repeating this
is my last word or that's enough for me or $THATSALLFOLKS_SENTENCE,
but you continue adding some Gaussian noise on the MLs w/out a valid
reason.
If you enjoy other projects, go there. But please, don't piss off.

  - Arnaud

 ps: I have a particular appreciation for this PR, a feature praised by
 users, and no committer dares to care:
 http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/161553 ... silly.
 ___
 freebsd-hackers@freebsd.org mailing list
 

Re: dup3 syscall - atomic set O_CLOEXEC with dup2

2012-01-12 Thread Davide Italiano
On Thu, Jan 12, 2012 at 6:01 AM, Eitan Adler li...@eitanadler.com wrote:
 This is an implementation of dup3 for FreeBSD:
 man page here (with a FreeBSD patch coming soon):
 https://www.kernel.org/doc/man-pages/online/pages/man2/dup.2.html

 Is this implementation correct? If so any objection to adding this as
 a supported syscall?


 Index: sys/kern/kern_descrip.c
 ===
 --- sys/kern/kern_descrip.c     (revision 229830)
 +++ sys/kern/kern_descrip.c     (working copy)
 @@ -110,6 +110,7 @@
  /* Flags for do_dup() */
  #define DUP_FIXED      0x1     /* Force fixed allocation */
  #define DUP_FCNTL      0x2     /* fcntl()-style errors */
 +#define DUP_CLOEXEC    0x4     /* Enable O_CLOEXEC on the new fs */

  static int do_dup(struct thread *td, int flags, int old, int new,
     register_t *retval);
 @@ -307,7 +308,39 @@
        return (0);
  }

 +#ifndef _SYS_SYSPROTO_H_
 +struct dup3_args {
 +       u_int   from;
 +       u_int   to;
 +       int     flags;
 +};
 +#endif
 +
  /*
 + * Duplicate a file descriptor and allow for O_CLOEXEC
 + */
 +
 +/* ARGSUSED */
 +int
 +sys_dup3(struct thread * const td, struct dup3_args * const uap) {
 +
 +       KASSERT(td != NULL, (%s: td == NULL, __func__));
 +       KASSERT(uap != NULL, (%s: uap == NULL, __func__));
 +
 +       if (uap-from == uap-to)
 +               return EINVAL;
 +
 +       if (uap-flags  ~O_CLOEXEC)
 +               return EINVAL;
 +
 +       const int dupflags = (uap-flags == O_CLOEXEC) ?
 DUP_FIXED|DUP_CLOEXEC : DUP_FIXED;
 +
 +       return (do_dup(td, dupflags, (int)uap-from, (int)uap-to,
 +                   td-td_retval));
 +       return (0);
 +}
 +
 +/*
  * Duplicate a file descriptor to a particular value.
  *
  * Note: keep in mind that a potential race condition exists when closing
 @@ -912,6 +945,9 @@
                fdp-fd_lastfile = new;
        *retval = new;

 +       if (flags  DUP_CLOEXEC)
 +               fdp-fd_ofileflags[new] |= UF_EXCLOSE;
 +
        /*
         * If we dup'd over a valid file, we now own the reference to it
         * and must dispose of it using closef() semantics (as if a
 Index: sys/kern/syscalls.master
 ===
 --- sys/kern/syscalls.master    (revision 229830)
 +++ sys/kern/syscalls.master    (working copy)
 @@ -951,5 +951,6 @@
                                    off_t offset, off_t len); }
  531    AUE_NULL        STD     { int posix_fadvise(int fd, off_t offset, \
                                    off_t len, int advice); }
 +532    AUE_NULL        STD     { int dup3(u_int from, u_int to, int flags); }
  ; Please copy any additions and changes to the following compatability 
 tables:
  ; sys/compat/freebsd32/syscalls.master
 Index: sys/compat/freebsd32/syscalls.master
 ===
 --- sys/compat/freebsd32/syscalls.master        (revision 229830)
 +++ sys/compat/freebsd32/syscalls.master        (working copy)
 @@ -997,3 +997,4 @@
                                    uint32_t offset1, uint32_t offset2,\
                                    uint32_t len1, uint32_t len2, \
                                    int advice); }
 +532    AUE_NULL        STD     { int dup3(u_int from, u_int to, int flags); }


 --
 Eitan Adler
 ___
 freebsd-hackers@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

From what I can see it seems that dup3() is Linux specific and not
POSIX, so maybe there are some issues in adding it, but I may be
wrong.

Davide
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: how to debug RB_TREE for memory corruption?

2011-10-06 Thread Davide Italiano
Was the node you're removing actually part of the tree?
I had a similar issue some time ago because I've tried insert two
nodes w/ the same key and then remove then.
In practice, the second INSERT operation failed (due to the definition
of key in a BST), and so I was trying to remove a node that wasn't
actually inserted.
Can you provide a snippet of code o some minimal testcase in order to
reproduce the error?

Regards

Davide
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Memory allocation in kernel -- what to use in which situation? What is the best for page-sized allocations?

2011-10-02 Thread Davide Italiano
2011/10/2 Lev Serebryakov l...@freebsd.org:
 Hello, Freebsd-hackers.

  Here are several memory-allocation mechanisms in the kernel. The two
 I'm aware of is MALLOC_DEFINE()/malloc()/free() and uma_* (zone(9)).

  As far as I understand, malloc() is general-purpose, but it has
 fixed transaction cost (in term of memory consumption) for each
 block allocated, and is not very suitable for allocation of many small
 blocks, as lots of memory will be wasted for bookkeeping.

  zone(9) allocator, on other hand, have very low cost of each
 allocated block, but could allocate only pre-configured fixed-size
 blocks, and ideal for allocation tons of small objects (and provide
 API for reusing them, too!).

  Am I right?

   But what if I need to allocate a lot (say, 16K-32K) of page-sized
 blocks? Not in one chunk, for sure, but in lifetime of my kernel
 module. Which allocator should I use? It seems, the best one will be
 very low-level only-page-sized allocator. Is here any in kernel?

 --
 // Black Lion AKA Lev Serebryakov l...@freebsd.org

 ___
 freebsd-hackers@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


My 2cents:
Everytime you request a certain amount of memory bigger than 4KB using
kernel malloc(), it results in a direct call to uma_large_malloc().
Right now, uma_large_malloc() calls kmem_malloc() (i.e. the memory is
requested to the VM directly).
This kind of approach has two main drawbacks:
1) it heavily fragments the kernel heap
2) when free() is called on these multipage chunks, it in turn calls
uma_large_free(), which immediately calls the VM system to unmap and
free the chunk of memory.  The unmapping requires a system-wide TLB
shootdown, i.e. a global action by every processor in the system.

I'm currently working supervised by alc@ to an intermediate layer that
sits between UMA and the VM, which goal is satisfyinh efficiently
requests  4KB (so, the one you want considering you're asking for
16KB-32KB), but the work is in an early stage.

Best,

Davide
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Memory allocation in kernel -- what to use in which situation? What is the best for page-sized allocations?

2011-10-02 Thread Davide Italiano
2011/10/2 Lev Serebryakov l...@freebsd.org:
 Hello, Davide.
 You wrote 2 октября 2011 г., 16:57:48:

   But what if I need to allocate a lot (say, 16K-32K) of page-sized
 blocks? Not in one chunk, for sure, but in lifetime of my kernel
 module. Which allocator should I use? It seems, the best one will be
 very low-level only-page-sized allocator. Is here any in kernel?

 My 2cents:
 Everytime you request a certain amount of memory bigger than 4KB using
 kernel malloc(), it results in a direct call to uma_large_malloc().
 Right now, uma_large_malloc() calls kmem_malloc() (i.e. the memory is
 requested to the VM directly).
 This kind of approach has two main drawbacks:
 1) it heavily fragments the kernel heap
 2) when free() is called on these multipage chunks, it in turn calls
 uma_large_free(), which immediately calls the VM system to unmap and
 free the chunk of memory.  The unmapping requires a system-wide TLB
 shootdown, i.e. a global action by every processor in the system.

 I'm currently working supervised by alc@ to an intermediate layer that
 sits between UMA and the VM, which goal is satisfyinh efficiently
 requests  4KB (so, the one you want considering you're asking for
 16KB-32KB), but the work is in an early stage.
  I was not very clear here. I'm saying about page-sized blocks, but
  many of them. 16K-32K is not a size in bytes, but count of page-sized
  blocks my code needs :)

ok.

  BTW, I/O is often require big buffers, up to MAXPHYS (128KiB for
  now), do you mean, that any allocation of such memory has
  considerable performance penalties, especially on multi-core and
  multi-CPU systems?


In fact, the main client of such kind of allocations is the ZFS
filesystem (this is due to its mechanism of adaptative cache
replacement, ARC). Afaik, at the time in which UMA was written, such
kind of allocations you describe were so infrequent that no initial
effort was made in order to optimize them.
People tried to address this issue by having ZFS create a large number
of UMA zones for large allocations of different sizes. Unfortunately,
one of the side-effects of this approach was the growth of the
fragmentation, so we're investigating about.

 --
 // Black Lion AKA Lev Serebryakov l...@freebsd.org


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Memory allocation in kernel -- what to use in which situation? What is the best for page-sized allocations?

2011-10-02 Thread Davide Italiano
On Sun, Oct 2, 2011 at 4:37 PM, Lev Serebryakov l...@freebsd.org wrote:
 Hello, Davide.
 You wrote 2 октября 2011 г., 18:00:26:

  BTW, I/O is often require big buffers, up to MAXPHYS (128KiB for
  now), do you mean, that any allocation of such memory has
  considerable performance penalties, especially on multi-core and
  multi-CPU systems?

 In fact, the main client of such kind of allocations is the ZFS
 filesystem (this is due to its mechanism of adaptative cache
 replacement, ARC). Afaik, at the time in which UMA was written, such
 kind of allocations you describe were so infrequent that no initial
 effort was made in order to optimize them.
 People tried to address this issue by having ZFS create a large number
 of UMA zones for large allocations of different sizes. Unfortunately,
 one of the side-effects of this approach was the growth of the
 fragmentation, so we're investigating about.
   What about these geom modules, which allocate buffers, because need
  to read more, than requested by upper layer? geom_cache and
  geom_raid3, for example?

I wasn't aware about that, thanks a lot for pointing me out.
I'll surely look at them.

   And my geom_raid5 -- I begin to understand, why original author
  of geom_raid5 (which need MAXPHYS-sized buffers regularry) wrote its
  own memory management layer...


If you're interested in what we're doing, contact me or poke me on efnet.org.

 --
 // Black Lion AKA Lev Serebryakov l...@freebsd.org


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


UMA large allocations issues

2011-07-26 Thread Davide Italiano
It seems that today I've some good news.
I've done some job.
Before I tried stuffs userspace
http://davit.altervista.org/malloc_new.c , then I improved my patch a
bit http://davit.altervista.org/uma_large_allocations.patch )

So, the situation is follow. System starts (before it doesn't), but
I've a fatal trap 9, hmm, that's not great, and I guess that's not
fault of mine, it's caused from buf_hash_find() in arc.c
I've taken a look at arc.c code but doesn't seems that that function
performs large allocations.

What I've done is taking two shots (one about the panic the other one
about the bt
http://dl.dropbox.com/u/3530969/zfs_fail.png
http://dl.dropbox.com/u/3530969/zfs_fail2.png
)

I thought in a first phase that the trouble was related to the fact
that there weren't enough chunks, so I increased CHUNKS_SIZE variable
(now it's set to 200), but the issue remains the same.

Ideas?

Cheers

Davide
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: UMA large allocations issues

2011-07-24 Thread Davide Italiano
oh, sorry I noticed that there's a typo.
In mtx_init(uma_mtx, Bitmap Lock, NULL, MTX_DEF);
you should replace uma_mtx with bitmap_mtx.

Davide
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


UMA large allocations issues

2011-07-22 Thread Davide Italiano
Hi.
I'm a student and some time ago I started investigating a bit about
the performance/fragmentation issue of large allocations within the
UMA allocator.
Benckmarks showed up that this problems of performances are mainly
related to the fact that every call to uma_large_malloc() results in a
call to kmem_malloc(), and this behaviour is really inefficient.

I started doing some work. Here's somethin:
First of all, I tried to define larger zones and let uma do it all as
a first step.
UMA can allocate slabs of more than one page. So I tried to define
zones of 1,2,4,8 pages, moving ZMEM_KMAX up.
I tested the solution w/ raidtest. Here there are some numbers.

Here's the workload characterization:


set mediasize=`diskinfo /dev/zvol/tank/vol | awk '{print $3}'`
set sectorsize=`diskinfo /dev/zvol/tank/vol | awk '{print $2}'`
raidtest genfile -s $mediasize -S $sectorsize -n 5

# $mediasize = 10737418240
# $sectorsize = 512

Number of READ requests: 24924
Number of WRITE requests: 25076
Numbers of bytes to transmit: 3305292800


raidtest test -d /dev/zvol/tank/vol -n 4
## tested using 4 cores, 1.5 GB Ram

Results:
Number of processes: 4
Bytes per second: 10146896
Requests per second: 153

Results: (4* PAGE_SIZE)
Number of processes: 4
Bytes per second: 14793969
Requests per second: 223

Results: (8* PAGE_SIZE)
Number of processes: 4
Bytes per second: 6855779
Requests per second: 103


The result of this tests is that defining larger zones is useful until
the size of these zones is not too big. After some size, performances
decreases significantly.

As second step, alc@ proposed to create a new layer that sits between
UMA and the VM subsystem. This layer can manage a pool of chunk that
can be used to satisfy requests from uma_large_malloc so avoiding the
overhead due to kmem_malloc() calls.

I've recently started developing a patch (not yet full working) that
implements this layer. First of all I'd like to concentrate my
attention to the performance problem rather than the fragmentation
one. So the patch that actually started to write doesn't care about
fragmentation aspects.

http://davit.altervista.org/uma_large_allocations.patch

There are some questions to which I wasn't able to answer (for
example, when it's safe to call kmem_malloc() to request memory).

So, at the end of the day I'm asking for your opinion about this issue
and I'm looking for a mentor (some kind of guidance) to continue
this project. If someone is interested to help, it would be very
appreciated.

Best

Davide Italiano
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: System freezes unexpectly

2010-08-31 Thread Davide Italiano
On 31/08/10 07:53, John Baldwin wrote:
 On Monday, August 30, 2010 12:45:40 pm Garrett Cooper wrote:
  On Mon, Aug 30, 2010 at 9:24 AM, Davide Italiano
  davide.itali...@gmail.com wrote:
   removing ~/.mozilla works fine. I think that problem's related to
   add-on Xmarks I've been installer or to Restore session
   functionality
  
  It would have been interesting to capture what `froze' the machine, in
  particular because it could have been a valuable bug for either
  Mozilla to capture and fix, or for us to capture and fix. Unless your
  machine doesn't meet the hardware requirements, I don't see a reason
  why a userland application should lock up a system.
  
  There are other ways you can debug this further, using -safe-mode as a
  next step, then choose to not restore the last session (which is
  available from within the javascript settings file -- nsPrefs.js?).
 
 If only firefox is frozen, then you can always ssh in from another machine 
 and 
 use top/ps, etc., or even gdb on the firefox process itself.
 
 -- 
 John Baldwin
 ___
 freebsd-hackers@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

I tried to ssh from another machine or ping but I can't perform this operation 
(hostname lookup failure).
I also noticed that the cause of the problem is pretty surely Xmarks. So, if I 
remove ~/.mozilla firefox3 works again. When I reinstall Xmarks the system 
freezes.
Attilio Rao (rookie), an italian kernel developer suggest me to recompile the 
kernel using the options, KDB, DDB, GDB, KDB_UNATTENDED (in particular the last 
one, that reboot the machine if a panic occurs), but I didn't obtain nothin' 
useful, because isn't a panic (the machine doesn't reboot) neither dmesg is 
more verbose about the problem. I also tried to recompile firefox from ports w/ 
DEBUG flag enable, but I don't see anythin' good launching firefox from xterm.

Regards 

Davide 

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: System freezes unexpectly

2010-08-30 Thread Davide Italiano
removing ~/.mozilla works fine. I think that problem's related to
add-on Xmarks I've been installer or to Restore session
functionality
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


System freezes unexpectly

2010-08-29 Thread Davide Italiano
Hi.
I'm running 8.1 on my Sony Vaio laptop, with dwm as window manager on
lastest Xorg on ports.
When I'm trying to run firefox3, the system freezes unexpectly. I
know that freezes is a bit generic but I can't find a more specific
term to describe the situation. Dmesg doesn't give useful infos.

I installed firefox using pkg_add -r , the only add-on/plugin
installed is Xmarks.

I'm ready to eventually debug, any suggestion is apprectiated.

Thanks

Davide
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: System freezes unexpectly

2010-08-29 Thread Davide Italiano
On Sun, Aug 29, 2010 at 6:29 PM, Glen Barber glen.j.bar...@gmail.com wrote:
 On 8/29/10 10:18 AM, Davide Italiano wrote:
 Hi.
 I'm running 8.1 on my Sony Vaio laptop, with dwm as window manager on
 lastest Xorg on ports.
 When I'm trying to run firefox3, the system freezes unexpectly. I
 know that freezes is a bit generic but I can't find a more specific
 term to describe the situation. Dmesg doesn't give useful infos.

 I installed firefox using pkg_add -r , the only add-on/plugin
 installed is Xmarks.

 I'm ready to eventually debug, any suggestion is apprectiated.


 Hi Davide,

 Can you run firefox from xterm and check for any errors that might be
 generated?

 Cheers,

 --
 Glen Barber


Tried doing this. But no output.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: System freezes unexpectly

2010-08-29 Thread Davide Italiano
On Sun, Aug 29, 2010 at 10:06 PM, Garrett Cooper gcoo...@freebsd.org wrote:
 On Sun, Aug 29, 2010 at 9:33 AM, Davide Italiano
 davide.itali...@gmail.com wrote:
 On Sun, Aug 29, 2010 at 6:29 PM, Glen Barber glen.j.bar...@gmail.com wrote:
 On 8/29/10 10:18 AM, Davide Italiano wrote:
 Hi.
 I'm running 8.1 on my Sony Vaio laptop, with dwm as window manager on
 lastest Xorg on ports.
 When I'm trying to run firefox3, the system freezes unexpectly. I
 know that freezes is a bit generic but I can't find a more specific
 term to describe the situation. Dmesg doesn't give useful infos.

 I installed firefox using pkg_add -r , the only add-on/plugin
 installed is Xmarks.

 I'm ready to eventually debug, any suggestion is apprectiated.


 Hi Davide,

 Can you run firefox from xterm and check for any errors that might be
 generated?

 What video driver are you using, can you stimulate this issue with
 other applications in X11, etc?
 -Garrett


I tried to stimulate issue w/ other applications and other window
managers, but I didn't obtain anythin' ( I also tried w/ epiphany,
that uses the same rendering engine of firefox) . Firefox freezes
the system (again, I don't know if it's the right term). The driver
I'm using is radeon.

Regards.

Davide Italiano
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: System freezes unexpectly

2010-08-29 Thread Davide Italiano

 Ok. How about this?

 Firefox items:
 1. What version of Firefox are you using?

Firefox 3.6.4. Lastest from ports, compiled now.

 2. Are you using any Firefox plugins?

Yes, Xmarks.

 3. When you try to bring up Firefox, does it start to render the GTK
 window and then freeze, or does it freeze without rendering the GTK
 window?

It starts to render the gtk window.

 4. Are you starting from `scratch' with Firefox, or are you restoring
 your last session?

I'm restoring my last session

 5. If you compiled from ports, what do your
 CPUTYPE/CC/CXX/CFLAGS/CXXFLAGS look like?


Default one from a fresh install.

 Radeon items:
 1. What Radeon card do you have?

Ati mobility radeon 9700

 2. Are you using the drm code in the kernel, or not?
 3. Have you tried with the vesa driver instead of the radeon driver?

Yeah, same behavior.

 I would focus on the Firefox items for now because they're easier to
 track down, but definitely try one spin with the vesa driver instead
 of the radeon driver because it's quick and dirty.

 Thanks!
 -Garrett

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org