[frameworks-baloo] [Bug 356357] Continous index flushing with fdatasync degrades interactive performance

2017-01-02 Thread bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=356357

k...@web.de changed:

   What|Removed |Added

 CC||k...@web.de

--- Comment #5 from k...@web.de ---
> Vishesh, Baloo is a worthy attempt at an indexing system, and I commend your 
> work. It uses a quality database backend in the form of LMDB. But any which 
> way you and I might spin it, Baloo has a serious problem with I/O: it simply 
> causes too much of it, too frequently. Numerous users have complained about 
> this, and several currently open and closed bugs are traceable directly to 
> this behaviour. Several users' impressions of Baloo, and KDE writ large, are 
> tainted by Baloo's abusive disk activity.

https://linux.die.net/man/1/ionice
[...]
Idle
A program running with idle io priority will only get disk time when no
other program has asked for disk io for a defined grace period. The impact of
idle io processes on normal system activity should be zero. This scheduling
class does not take a priority argument. Presently, this scheduling class is
permitted for an ordinary user (since kernel 2.6.25). 
Best effort
[...]

Baloo's priority is set to Idle, which means it should not cause "abusive" disk
activity.
Unless your IO scheduler does not support ionice. Could this be your problem?
https://blogs.kde.org/2014/10/15/ubuntus-linux-scheduler-or-why-baloo-might-be-slowing-your-system-1404

> Here's a relatively simple proposal: The indexer operates on a configurable 
> *duty cycle* D of 1%-50% and a time period T of 1s-3600s. For (1-D)*T seconds 
> per period, Baloo sleeps. For D*T seconds per period, Baloo *exclusively* 
> performs data/metadata reads from the filesystem, keeping an eye on 
> wall-clock time. Once D*T seconds of work have elapsed, make a *single 
> transaction* containing all of the stuff that the indexer read in the 
> previous duty cycle. Then go back to sleep again. In this way, exactly one 
> mdb_txn_commit() and fdatasync()/msync() occurs per time period, they are 
> likely to have accumulated far more than 40 files worth of information, and 
> 50-99% of I/O bandwidth is available for other uses, such as satisfying the 
> desktop UI's needs.

You are suggesting rate limiting the IO. Rate limiting is inferior to
scheduling (which is already being done) because:

1. The rate limit is wasting (1-D) of the available bandwidth in idle
situations. (When baloo is the only application using IO.)

2. If Bmin < 1 is needed to satisfy the user requirements, (1-D) might still be
smaller than Bmin. Scheduling with Idle priority will leave 1 instead of of
(1-D) to the user, which is enough in any case. Also, now we don't need to find
Bmin anymore.


Anyway, minimizing the caused IO is still useful.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 356357] Continous index flushing with fdatasync degrades interactive performance

2017-01-03 Thread Riku Voipio
https://bugs.kde.org/show_bug.cgi?id=356357

--- Comment #6 from Riku Voipio  ---
(In reply to kdeu from comment #5)
> https://linux.die.net/man/1/ionice

> Baloo's priority is set to Idle, which means it should not cause "abusive"
> disk activity.

ionice is being used, and it does a good job to makes sure the crawling
activity happens at lower priority than other use. 

The effect of ionice is ruined by aggressive fdatasync usage when writing the
large LMDB database. It appears fdatasync causes disk writes from a kernel
thread that has collected all buffered disk writes. Buffers don't carry the
iopriority info on them. Kernel thread just sees the red flag "please commit
this data ASAP" and then thinks "to keep FS consistent, I should also commit
lots of other unwritten pages just to be sure".

Try the patch I made. The disk light still flashes like mad but it doesn't ruin
interactive use anymore. Iopriority works as expected until you ask the kernel
to be sure writes get to disk too. 

(In reply to kdeu from comment #3)
> I'm not sure if I should keep this bug open or what. Specially since this is 
> probably only a problem during first run.

It also appears when doing operations like switching branches in huge git trees
(linux, chromium), copying directories etc.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 356357] Continous index flushing with fdatasync degrades interactive performance

2017-01-03 Thread bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=356357

--- Comment #7 from k...@web.de ---
> I think it would better to work on detecting and recovering corruption.

> MDB_NOSYNC Don't flush system buffers to disk
> when committing a transaction. This optimization
> means a system crash can corrupt the database or 
> lose the last transactions if buffers are not yet
> flushed to disk.

Are there any experiences whith using LMDB and NOSYNC?

Are there tools/ways to recover a corrupted database or lost transactions with
LMDB? They might not even exist yet, if people don't use NOSYNC.

I think that developing such a tool is out of the scope of this project.
It seems that LMDB was built around the idea of not needing recovery.

> How safe is your DB? LMDB is crash-proof on all current filesystem designs.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 356357] Continous index flushing with fdatasync degrades interactive performance

2017-01-03 Thread Riku Voipio
https://bugs.kde.org/show_bug.cgi?id=356357

--- Comment #8 from Riku Voipio  ---
(In reply to kdeu from comment #7)
> Are there any experiences whith using LMDB and NOSYNC?

Personally I've happily used that with patch attached since filing this bug -
except for a handful of upgrades when I forgot the patch, only to notice that
suddenly the disk light flashing means jittery UI again.

Baloo makes finding files from local HD almost as easy as finding public files
with google. It's really sad if people disable baloo because it's causing the
desktop freeze and stutter.

> Are there tools/ways to recover a corrupted database or lost transactions
> with LMDB? They might not even exist yet, if people don't use NOSYNC.

You are assuming that under current configuration LMDB can't get corrupted.
File systems are nasty and even with fdatasync there are caveats. But for most
users, sudden crashes (especially in middle of transactions) is really rare
events. 

Lost transactions are not a problem, entries would be just regenerated in next
index scanning. Recovering the DB is somewhat pointless - you can just
regenerate it from scratch, if under idle iopriority the indexing really has no
user impact.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 356357] Continous index flushing with fdatasync degrades interactive performance

2017-01-03 Thread bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=356357

--- Comment #9 from k...@web.de ---
(In reply to Riku Voipio from comment #8)
> (In reply to kdeu from comment #7)
> > Are there any experiences whith using LMDB and NOSYNC?
> 
> Personally I've happily used that with patch attached since filing this bug
> - except for a handful of upgrades when I forgot the patch, only to notice
> that suddenly the disk light flashing means jittery UI again.

I mean experiences with many users, and with crashes. If you don't test for
crashes that's not a test for data corruption. You probably didn't crash your
computer on purpose.

> You are assuming that under current configuration LMDB can't get corrupted.
> File systems are nasty and even with fdatasync there are caveats.

That's not true. Without NOSYNC, LMDB is safe and does not get corrupted. See:

http://openldap-devel.openldap.narkive.com/k1bbhN5H/lmdb-crash-consistency-again#post7
> All in all a bunch of bogus reporting; claiming that all DBs are broken when
> in fact LMDB is perfectly correct

> But for
> most users, sudden crashes (especially in middle of transactions) is really
> rare events.

There are linux users who suffer from frequent power outages.
> we used to get lots of bug reports which were because of corrupted databases

> Recovering the DB is somewhat pointless - you can just
> regenerate it from scratch, if under idle iopriority the indexing really has
> no user impact.

Yes you can regenerate from scratch, but how do you detect when to have to do
that? This concern was already mentioned in comment #2:

> I'm a little conflicted about this approach since when the index does get 
> corrupted,
> it will be impossible for us to detect it. With our previous backend 
> (xapian), we
> used to get lots of bug reports which were because of corrupted databases :(

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 356357] Continous index flushing with fdatasync degrades interactive performance

2016-03-11 Thread Alexander Potashev via KDE Bugzilla
https://bugs.kde.org/show_bug.cgi?id=356357

Alexander Potashev  changed:

   What|Removed |Added

 CC||aspotas...@gmail.com
  Component|Baloo File Daemon   |Baloo File Daemon
Product|Baloo   |frameworks-baloo

-- 
You are receiving this mail because:
You are watching all bug changes.


[frameworks-baloo] [Bug 356357] Continous index flushing with fdatasync degrades interactive performance

2016-10-13 Thread Idonotexist via KDE Bugzilla
https://bugs.kde.org/show_bug.cgi?id=356357

Idonotexist  changed:

   What|Removed |Added

 CC||obila...@yahoo.com

--- Comment #4 from Idonotexist  ---
(In reply to Vishesh Handa from comment #2)
> Perhaps the correct approach would be to refactor `baloo_file_extractor` so
> as to not perform a commit so frequently. We currently do it after a fixed
> 40 files. Perhaps it would make sense to try and estimate the amount of
> changes, and then do a commit when we reach the threshold.
> 
> I'm not sure if I should keep this bug open or what. Specially since this is
> probably only a problem during first run.

As I write this, Baloo is hammering my very modern system's HDD to a pulp. The
disk activity LED is furiously lit. KDE's UI periodically freezes because of
heavy disk I/O.

My typical solution is to
1) Pause indexing
2) Mount a 10GB ramdisk
3) Move ~/.local/share/baloo to said ramdisk
4) Symlink ~/.local/share/baloo to the ramdisk baloo
5) Resume indexing
6) When indexing is done, undo the above. 

I definitely do not think this bug should be closed. It is most certainly not
caused only on first runs. The current Baloo hyperactivity was caused by my
copying of a large number of small files from another system.

Vishesh, Baloo is a worthy attempt at an indexing system, and I commend your
work. It uses a quality database backend in the form of LMDB. But any which way
you and I might spin it, Baloo has a serious problem with I/O: it simply causes
too much of it, too frequently. Numerous users have complained about this, and
several currently open and closed bugs are traceable directly to this
behaviour. Several users' impressions of Baloo, and KDE writ large, are tainted
by Baloo's abusive disk activity.

As for how to fix this problem: 40 files per transaction commit, as you said,
is not a good enough solution. At the very least, the criterion should be based
on LMDB's page size and the disk block size. I also propose that this criterion
not be based purely on number of files; It should have a time component, and
should not commit transactions more often than once per second. A human user
couldn't care less that newly-appeared files were indexed this second or next,
and a file indexer is after all primarily, though not exclusively, for human
use.

Here's a relatively simple proposal: The indexer operates on a configurable
*duty cycle* D of 1%-50% and a time period T of 1s-3600s. For (1-D)*T seconds
per period, Baloo sleeps. For D*T seconds per period, Baloo *exclusively*
performs data/metadata reads from the filesystem, keeping an eye on wall-clock
time. Once D*T seconds of work have elapsed, make a *single transaction*
containing all of the stuff that the indexer read in the previous duty cycle.
Then go back to sleep again. In this way, exactly one mdb_txn_commit() and
fdatasync()/msync() occurs per time period, they are likely to have accumulated
far more than 40 files worth of information, and 50-99% of I/O bandwidth is
available for other uses, such as satisfying the desktop UI's needs.

-- 
You are receiving this mail because:
You are watching all bug changes.


[frameworks-baloo] [Bug 356357] Continous index flushing with fdatasync degrades interactive performance

2016-07-04 Thread Vishesh Handa via KDE Bugzilla
https://bugs.kde.org/show_bug.cgi?id=356357

Vishesh Handa  changed:

   What|Removed |Added

   Assignee|m...@vhanda.in|pinak.ah...@gmail.com

-- 
You are receiving this mail because:
You are watching all bug changes.


[frameworks-baloo] [Bug 356357] Continous index flushing with fdatasync degrades interactive performance

2016-03-11 Thread Alexander Potashev via KDE Bugzilla
https://bugs.kde.org/show_bug.cgi?id=356357

Alexander Potashev  changed:

   What|Removed |Added

 CC||aspotas...@gmail.com
  Component|Baloo File Daemon   |Baloo File Daemon
Product|Baloo   |frameworks-baloo

-- 
You are receiving this mail because:
You are watching all bug changes.


[frameworks-baloo] [Bug 356357] Continous index flushing with fdatasync degrades interactive performance

2016-07-04 Thread Vishesh Handa via KDE Bugzilla
https://bugs.kde.org/show_bug.cgi?id=356357

Vishesh Handa  changed:

   What|Removed |Added

   Assignee|m...@vhanda.in|pinak.ah...@gmail.com

-- 
You are receiving this mail because:
You are watching all bug changes.


[frameworks-baloo] [Bug 356357] Continous index flushing with fdatasync degrades interactive performance

2019-06-11 Thread Nate Graham
https://bugs.kde.org/show_bug.cgi?id=356357

Nate Graham  changed:

   What|Removed |Added

   Priority|NOR |HI

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 356357] Continous index flushing with fdatasync degrades interactive performance

2019-05-12 Thread Nate Graham
https://bugs.kde.org/show_bug.cgi?id=356357

Nate Graham  changed:

   What|Removed |Added

 CC||mou...@mail.com

--- Comment #13 from Nate Graham  ---
*** Bug 393741 has been marked as a duplicate of this bug. ***

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 356357] Continous index flushing with fdatasync degrades interactive performance

2019-09-29 Thread Kai Krakow
https://bugs.kde.org/show_bug.cgi?id=356357

Kai Krakow  changed:

   What|Removed |Added

 CC||k...@kaishome.de

--- Comment #14 from Kai Krakow  ---
I've added some patches before finding this bug. My findings are that disabling
read-ahead on the database somewhat helps in low-mem situation but the biggest
problem is fsync: That call will actually sync the whole filesystem and not
just the database file, and doing that constantly is toxic to performance. It's
as simple as that. Here's the link: https://bugs.kde.org/show_bug.cgi?id=404057
and https://github.com/kakra/baloo/commits/fixes/bko-404057. Some of these
patches may not be needed at all, some optimize for corner cases. But we should
really turn off fsync as the very least.

If you don't want to disable fsync, then LMDB is probably the wrong tool to do
the job. You'd then need some append-only database with garbage collection
(LMDB is already acting a lot like this). I'm pretty sure LMDB is actually a
bad choice for baloo, if, and only if, you expect it to be the only software
needing to do IO. But after some research, I think LMDB is not the wrong tool,
thus we need to adjust how Baloo uses it.

The devs of LMDB say that it is safe to use without fsync on any current Linux
filesystem (it can loose transactions but it won't corrupt). It is not safe to
use on some hypothetical filesystems (it could corrupt).

Can we please at least let the user decide and allow him to shoot his own foot?
Maybe a config option or env variable?

Baloo already has some sort of recovery: If it fails to open the database it
will simply purge and recreate it. Maybe it could detect corruptions during use
somehow and act similar? I'm not sure if LMDB function could return errors or
simply cause crashes. In the first case, it should be easy.

I also like the time-based instead of count-based approach much more: Linux
already flushes data after no more than 30s, why not just use the same amount?

Regarding fsync: I'm not sure if LMDB uses fsync or fdatasync, or if this is
even a choice. The developers say in their documentation it's fsync, the strace
by Riku says fdatasync. Whatever is used: It's a problem: You cannot expect
users to use the software if it totally destroys their user experience.

Baloo should be designed around the idea that corruption can occur and luckily
it's easy to recover from it: Just rebuild the database.

So the proposed solution is really about: How do we properly detect database
corruption?

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 356357] Continous index flushing with fdatasync degrades interactive performance

2019-10-08 Thread Kai Krakow
https://bugs.kde.org/show_bug.cgi?id=356357

Kai Krakow  changed:

   What|Removed |Added

   See Also||https://bugs.kde.org/show_b
   ||ug.cgi?id=404057

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 356357] Continous index flushing with fdatasync degrades interactive performance

2019-10-12 Thread Martin Steigerwald
https://bugs.kde.org/show_bug.cgi?id=356357

Martin Steigerwald  changed:

   What|Removed |Added

 CC||mar...@lichtvoll.de

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 356357] Continous index flushing with fdatasync degrades interactive performance

2019-01-07 Thread soredake
https://bugs.kde.org/show_bug.cgi?id=356357

soredake  changed:

   What|Removed |Added

 CC||fds...@krutt.org

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 356357] Continous index flushing with fdatasync degrades interactive performance

2021-02-16 Thread soredake
https://bugs.kde.org/show_bug.cgi?id=356357

soredake  changed:

   What|Removed |Added

 CC||ndrzj1...@relay.firefox.com

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 356357] Continous index flushing with fdatasync degrades interactive performance

2021-08-02 Thread bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=356357

tagwer...@innerjoin.org changed:

   What|Removed |Added

 CC||tagwer...@innerjoin.org

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 356357] Continous index flushing with fdatasync degrades interactive performance

2021-08-04 Thread bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=356357

--- Comment #15 from tagwer...@innerjoin.org ---
Is this still an issue ... ?

... To the extent that it can be pinned down to syncing writes
to the index?

I think there are still things to look at - for example batching up the initial
indexing when there are *very* *many* new files to index (Bug 394750),
adjusting the number of files "indexed in a batch" when content indexing (Bug
373021), and dealing with many deleted items (possibly Bug 437754 or Bug
353874. I'm not sure there's a bug specifically that clearing up deleted items
is slow)

I think these are more related making sure you commit before you "risk" using
swap but also you make maximum use of RAM so you don't commit too often.

I don't think the attached patch

https://bugs.kde.org/attachment.cgi?id=95923

that avoids the "sync" after each transaction was applied. This was also
proposed (2019/09) here:

https://bugs.kde.org/show_bug.cgi?id=404057#c12

It may be that "batching up" the indexing, implying fewer, larger,
transactions, reduced the advantage

For Bug 400704, most of the reports date from 2017/2018.

I am tempted to flag this as "needs info" to see if there are other test cases
that need to be looked at...

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 356357] Continous index flushing with fdatasync degrades interactive performance

2024-07-02 Thread bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=356357

tagwer...@innerjoin.org changed:

   What|Removed |Added

 Resolution|--- |WAITINGFORINFO
 Status|CONFIRMED   |NEEDSINFO

--- Comment #16 from tagwer...@innerjoin.org ---
(In reply to tagwerk19 from comment #15)
> ... I am tempted to flag this as "needs info" to see if there are other test
> cases that need to be looked at ...
As per:
https://bugs.kde.org/show_bug.cgi?id=404057#c43

I think the dust has probably settled after:
https://invent.kde.org/frameworks/baloo/-/merge_requests/131
and cherrypicked for KF5
https://invent.kde.org/frameworks/baloo/-/merge_requests/169

There's also been
 https://invent.kde.org/frameworks/baloo/-/merge_requests/121
and
 https://invent.kde.org/frameworks/baloo/-/merge_requests/148

I think lots more has happened but those seem to be the big recent changes

I reran the "torture test" suggested in Bug 404057 and Baloo indexed the data
without issues, I think we are in far better shape (and have SSDs rather the
HDDs)

Do we need to keep this issue open or is it possible to close?

Flagging a "waiting for info" for this, shout if you think there are still
issues...

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 356357] Continous index flushing with fdatasync degrades interactive performance

2024-07-02 Thread soredake
https://bugs.kde.org/show_bug.cgi?id=356357

soredake  changed:

   What|Removed |Added

 CC|katyaberezy...@gmail.com|

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 356357] Continous index flushing with fdatasync degrades interactive performance

2024-07-16 Thread Bug Janitor Service
https://bugs.kde.org/show_bug.cgi?id=356357

--- Comment #17 from Bug Janitor Service  ---
Dear Bug Submitter,

This bug has been in NEEDSINFO status with no change for at least
15 days. Please provide the requested information as soon as
possible and set the bug status as REPORTED. Due to regular bug
tracker maintenance, if the bug is still in NEEDSINFO status with
no change in 30 days the bug will be closed as RESOLVED > WORKSFORME
due to lack of needed information.

For more information about our bug triaging procedures please read the
wiki located here:
https://community.kde.org/Guidelines_and_HOWTOs/Bug_triaging

If you have already provided the requested information, please
mark the bug as REPORTED so that the KDE team knows that the bug is
ready to be confirmed.

Thank you for helping us make KDE software even better for everyone!

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 356357] Continous index flushing with fdatasync degrades interactive performance

2024-07-31 Thread Bug Janitor Service
https://bugs.kde.org/show_bug.cgi?id=356357

Bug Janitor Service  changed:

   What|Removed |Added

 Status|NEEDSINFO   |RESOLVED
 Resolution|WAITINGFORINFO  |WORKSFORME

--- Comment #18 from Bug Janitor Service  ---
🐛🧹 This bug has been in NEEDSINFO status with no change for at least 30 days.
Closing as RESOLVED WORKSFORME.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 356357] Continous index flushing with fdatasync degrades interactive performance

2022-11-30 Thread Nate Graham
https://bugs.kde.org/show_bug.cgi?id=356357

Nate Graham  changed:

   What|Removed |Added

  Component|Baloo File Daemon   |general

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 356357] Continous index flushing with fdatasync degrades interactive performance

2021-12-08 Thread Micah Shennum
https://bugs.kde.org/show_bug.cgi?id=356357

Micah Shennum  changed:

   What|Removed |Added

 CC||jimt...@gmail.com

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 356357] Continous index flushing with fdatasync degrades interactive performance

2019-03-01 Thread Patrick Silva
https://bugs.kde.org/show_bug.cgi?id=356357

Patrick Silva  changed:

   What|Removed |Added

 CC||bugsefor...@gmx.com

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 356357] Continous index flushing with fdatasync degrades interactive performance

2018-11-26 Thread Nate Graham
https://bugs.kde.org/show_bug.cgi?id=356357

Nate Graham  changed:

   What|Removed |Added

   See Also||https://bugs.kde.org/show_b
   ||ug.cgi?id=400704

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 356357] Continous index flushing with fdatasync degrades interactive performance

2018-11-26 Thread Nate Graham
https://bugs.kde.org/show_bug.cgi?id=356357

Nate Graham  changed:

   What|Removed |Added

 CC||igor.pobo...@gmail.com,
   ||n...@kde.org,
   ||stefan.bruens@rwth-aachen.d
   ||e
 Status|REPORTED|CONFIRMED
 Ever confirmed|0   |1

--- Comment #10 from Nate Graham  ---
40 files per sync seems reasonable for incremental additions after the DB has
already been been populated during the initial indexing operation. It seems
like the place where this really gets people is during that initial indexing,
where the system's responsiveness can be degraded due to the heavy IO. If we
have a way to detect the initial indexing operation, maybe we could use a less
aggressive sync policy there, either increasing the number of files before each
sync, or switching to a time-based sync or something.

Stefan and/or Igor, does this idea make any sense?

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 356357] Continous index flushing with fdatasync degrades interactive performance

2018-11-26 Thread Jack
https://bugs.kde.org/show_bug.cgi?id=356357

Jack  changed:

   What|Removed |Added

 CC||ostroffjh@users.sourceforge
   ||.net

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 356357] Continous index flushing with fdatasync degrades interactive performance

2018-12-01 Thread Feng
https://bugs.kde.org/show_bug.cgi?id=356357

Feng  changed:

   What|Removed |Added

 CC||wang_f...@live.com

--- Comment #11 from Feng  ---
(In reply to Nate Graham from comment #10)
> 40 files per sync seems reasonable for incremental additions after the DB
> has already been been populated during the initial indexing operation. It
> seems like the place where this really gets people is during that initial
> indexing, where the system's responsiveness can be degraded due to the heavy
> IO. If we have a way to detect the initial indexing operation, maybe we
> could use a less aggressive sync policy there, either increasing the number
> of files before each sync, or switching to a time-based sync or something.
> 
> Stefan and/or Igor, does this idea make any sense?

My laptop has 32GB memory with SSD driver. But when baloo is indexing, I have
to manually reboot it, as everything is freezed except the power button.

Is is possible for baloo to do indexing cacahed in memory instead of instant
i/o on disk?

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 356357] Continous index flushing with fdatasync degrades interactive performance

2018-12-01 Thread Stefan Brüns
https://bugs.kde.org/show_bug.cgi?id=356357

--- Comment #12 from Stefan Brüns  ---
Eventually the data has to be flushed to disk. The flushing has to be done in a
specific order, to guarantee the on-disk data is consistent.

You can of course delay the flush, but then you are just shifting the stutters
from one time instant to a different one.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 356357] Continous index flushing with fdatasync degrades interactive performance

2016-10-13 Thread Idonotexist via KDE Bugzilla
https://bugs.kde.org/show_bug.cgi?id=356357

Idonotexist  changed:

   What|Removed |Added

 CC||obila...@yahoo.com

--- Comment #4 from Idonotexist  ---
(In reply to Vishesh Handa from comment #2)
> Perhaps the correct approach would be to refactor `baloo_file_extractor` so
> as to not perform a commit so frequently. We currently do it after a fixed
> 40 files. Perhaps it would make sense to try and estimate the amount of
> changes, and then do a commit when we reach the threshold.
> 
> I'm not sure if I should keep this bug open or what. Specially since this is
> probably only a problem during first run.

As I write this, Baloo is hammering my very modern system's HDD to a pulp. The
disk activity LED is furiously lit. KDE's UI periodically freezes because of
heavy disk I/O.

My typical solution is to
1) Pause indexing
2) Mount a 10GB ramdisk
3) Move ~/.local/share/baloo to said ramdisk
4) Symlink ~/.local/share/baloo to the ramdisk baloo
5) Resume indexing
6) When indexing is done, undo the above. 

I definitely do not think this bug should be closed. It is most certainly not
caused only on first runs. The current Baloo hyperactivity was caused by my
copying of a large number of small files from another system.

Vishesh, Baloo is a worthy attempt at an indexing system, and I commend your
work. It uses a quality database backend in the form of LMDB. But any which way
you and I might spin it, Baloo has a serious problem with I/O: it simply causes
too much of it, too frequently. Numerous users have complained about this, and
several currently open and closed bugs are traceable directly to this
behaviour. Several users' impressions of Baloo, and KDE writ large, are tainted
by Baloo's abusive disk activity.

As for how to fix this problem: 40 files per transaction commit, as you said,
is not a good enough solution. At the very least, the criterion should be based
on LMDB's page size and the disk block size. I also propose that this criterion
not be based purely on number of files; It should have a time component, and
should not commit transactions more often than once per second. A human user
couldn't care less that newly-appeared files were indexed this second or next,
and a file indexer is after all primarily, though not exclusively, for human
use.

Here's a relatively simple proposal: The indexer operates on a configurable
*duty cycle* D of 1%-50% and a time period T of 1s-3600s. For (1-D)*T seconds
per period, Baloo sleeps. For D*T seconds per period, Baloo *exclusively*
performs data/metadata reads from the filesystem, keeping an eye on wall-clock
time. Once D*T seconds of work have elapsed, make a *single transaction*
containing all of the stuff that the indexer read in the previous duty cycle.
Then go back to sleep again. In this way, exactly one mdb_txn_commit() and
fdatasync()/msync() occurs per time period, they are likely to have accumulated
far more than 40 files worth of information, and 50-99% of I/O bandwidth is
available for other uses, such as satisfying the desktop UI's needs.

-- 
You are receiving this mail because:
You are watching all bug changes.


[frameworks-baloo] [Bug 356357] Continous index flushing with fdatasync degrades interactive performance

2017-01-02 Thread bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=356357

k...@web.de changed:

   What|Removed |Added

 CC||k...@web.de

--- Comment #5 from k...@web.de ---
> Vishesh, Baloo is a worthy attempt at an indexing system, and I commend your 
> work. It uses a quality database backend in the form of LMDB. But any which 
> way you and I might spin it, Baloo has a serious problem with I/O: it simply 
> causes too much of it, too frequently. Numerous users have complained about 
> this, and several currently open and closed bugs are traceable directly to 
> this behaviour. Several users' impressions of Baloo, and KDE writ large, are 
> tainted by Baloo's abusive disk activity.

https://linux.die.net/man/1/ionice
[...]
Idle
A program running with idle io priority will only get disk time when no
other program has asked for disk io for a defined grace period. The impact of
idle io processes on normal system activity should be zero. This scheduling
class does not take a priority argument. Presently, this scheduling class is
permitted for an ordinary user (since kernel 2.6.25). 
Best effort
[...]

Baloo's priority is set to Idle, which means it should not cause "abusive" disk
activity.
Unless your IO scheduler does not support ionice. Could this be your problem?
https://blogs.kde.org/2014/10/15/ubuntus-linux-scheduler-or-why-baloo-might-be-slowing-your-system-1404

> Here's a relatively simple proposal: The indexer operates on a configurable 
> *duty cycle* D of 1%-50% and a time period T of 1s-3600s. For (1-D)*T seconds 
> per period, Baloo sleeps. For D*T seconds per period, Baloo *exclusively* 
> performs data/metadata reads from the filesystem, keeping an eye on 
> wall-clock time. Once D*T seconds of work have elapsed, make a *single 
> transaction* containing all of the stuff that the indexer read in the 
> previous duty cycle. Then go back to sleep again. In this way, exactly one 
> mdb_txn_commit() and fdatasync()/msync() occurs per time period, they are 
> likely to have accumulated far more than 40 files worth of information, and 
> 50-99% of I/O bandwidth is available for other uses, such as satisfying the 
> desktop UI's needs.

You are suggesting rate limiting the IO. Rate limiting is inferior to
scheduling (which is already being done) because:

1. The rate limit is wasting (1-D) of the available bandwidth in idle
situations. (When baloo is the only application using IO.)

2. If Bmin < 1 is needed to satisfy the user requirements, (1-D) might still be
smaller than Bmin. Scheduling with Idle priority will leave 1 instead of of
(1-D) to the user, which is enough in any case. Also, now we don't need to find
Bmin anymore.


Anyway, minimizing the caused IO is still useful.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 356357] Continous index flushing with fdatasync degrades interactive performance

2017-01-03 Thread Riku Voipio
https://bugs.kde.org/show_bug.cgi?id=356357

--- Comment #6 from Riku Voipio  ---
(In reply to kdeu from comment #5)
> https://linux.die.net/man/1/ionice

> Baloo's priority is set to Idle, which means it should not cause "abusive"
> disk activity.

ionice is being used, and it does a good job to makes sure the crawling
activity happens at lower priority than other use. 

The effect of ionice is ruined by aggressive fdatasync usage when writing the
large LMDB database. It appears fdatasync causes disk writes from a kernel
thread that has collected all buffered disk writes. Buffers don't carry the
iopriority info on them. Kernel thread just sees the red flag "please commit
this data ASAP" and then thinks "to keep FS consistent, I should also commit
lots of other unwritten pages just to be sure".

Try the patch I made. The disk light still flashes like mad but it doesn't ruin
interactive use anymore. Iopriority works as expected until you ask the kernel
to be sure writes get to disk too. 

(In reply to kdeu from comment #3)
> I'm not sure if I should keep this bug open or what. Specially since this is 
> probably only a problem during first run.

It also appears when doing operations like switching branches in huge git trees
(linux, chromium), copying directories etc.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 356357] Continous index flushing with fdatasync degrades interactive performance

2017-01-03 Thread bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=356357

--- Comment #7 from k...@web.de ---
> I think it would better to work on detecting and recovering corruption.

> MDB_NOSYNC Don't flush system buffers to disk
> when committing a transaction. This optimization
> means a system crash can corrupt the database or 
> lose the last transactions if buffers are not yet
> flushed to disk.

Are there any experiences whith using LMDB and NOSYNC?

Are there tools/ways to recover a corrupted database or lost transactions with
LMDB? They might not even exist yet, if people don't use NOSYNC.

I think that developing such a tool is out of the scope of this project.
It seems that LMDB was built around the idea of not needing recovery.

> How safe is your DB? LMDB is crash-proof on all current filesystem designs.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 356357] Continous index flushing with fdatasync degrades interactive performance

2017-01-03 Thread Riku Voipio
https://bugs.kde.org/show_bug.cgi?id=356357

--- Comment #8 from Riku Voipio  ---
(In reply to kdeu from comment #7)
> Are there any experiences whith using LMDB and NOSYNC?

Personally I've happily used that with patch attached since filing this bug -
except for a handful of upgrades when I forgot the patch, only to notice that
suddenly the disk light flashing means jittery UI again.

Baloo makes finding files from local HD almost as easy as finding public files
with google. It's really sad if people disable baloo because it's causing the
desktop freeze and stutter.

> Are there tools/ways to recover a corrupted database or lost transactions
> with LMDB? They might not even exist yet, if people don't use NOSYNC.

You are assuming that under current configuration LMDB can't get corrupted.
File systems are nasty and even with fdatasync there are caveats. But for most
users, sudden crashes (especially in middle of transactions) is really rare
events. 

Lost transactions are not a problem, entries would be just regenerated in next
index scanning. Recovering the DB is somewhat pointless - you can just
regenerate it from scratch, if under idle iopriority the indexing really has no
user impact.

-- 
You are receiving this mail because:
You are watching all bug changes.

[frameworks-baloo] [Bug 356357] Continous index flushing with fdatasync degrades interactive performance

2017-01-03 Thread bugzilla_noreply
https://bugs.kde.org/show_bug.cgi?id=356357

--- Comment #9 from k...@web.de ---
(In reply to Riku Voipio from comment #8)
> (In reply to kdeu from comment #7)
> > Are there any experiences whith using LMDB and NOSYNC?
> 
> Personally I've happily used that with patch attached since filing this bug
> - except for a handful of upgrades when I forgot the patch, only to notice
> that suddenly the disk light flashing means jittery UI again.

I mean experiences with many users, and with crashes. If you don't test for
crashes that's not a test for data corruption. You probably didn't crash your
computer on purpose.

> You are assuming that under current configuration LMDB can't get corrupted.
> File systems are nasty and even with fdatasync there are caveats.

That's not true. Without NOSYNC, LMDB is safe and does not get corrupted. See:

http://openldap-devel.openldap.narkive.com/k1bbhN5H/lmdb-crash-consistency-again#post7
> All in all a bunch of bogus reporting; claiming that all DBs are broken when
> in fact LMDB is perfectly correct

> But for
> most users, sudden crashes (especially in middle of transactions) is really
> rare events.

There are linux users who suffer from frequent power outages.
> we used to get lots of bug reports which were because of corrupted databases

> Recovering the DB is somewhat pointless - you can just
> regenerate it from scratch, if under idle iopriority the indexing really has
> no user impact.

Yes you can regenerate from scratch, but how do you detect when to have to do
that? This concern was already mentioned in comment #2:

> I'm a little conflicted about this approach since when the index does get 
> corrupted,
> it will be impossible for us to detect it. With our previous backend 
> (xapian), we
> used to get lots of bug reports which were because of corrupted databases :(

-- 
You are receiving this mail because:
You are watching all bug changes.