Re: Very various speed of grep operation on btrfs partition

2015-12-13 Thread Михаил Гаврилов
Ok, I am make another experiment. I am buy new HDD and format it with
btrfs file system. Also I increased size of grep data and make bash script
wich automate testing:

#!/bin/bash

#For testing on windows machine
#grep_path='/cygdrive/e/Sources/inside'
#For testing on new HDD
#grep_path='/run/media/mikhail/eaa531cd-25f4-4e00-b31f-22665faa9768/sources/inside'
#For testing in real life
grep_path='/home/mikhail/sources/inside'
command="grep -rn 'float:left;display: block;height: 24px;line-height:
1.2em;position: relative;text-align: center;white-space: nowrap;width:
80px;' '$grep_path'"
log_file='res.log'

exec 3>&1 1>>${log_file} 2>&1
while [ 1 = 1 ]
do

   (( count++ ))
   echo "PASS: $count" at `date +"%T"` | tee /dev/fd/3
   echo $command | tee /dev/fd/3
   eval "{ time $command > /dev/null; } |& tee /dev/fd/3"
done


And get very interesting results:

Linux btrfs with NEW HDD: 6.441s (result as in syntetic tests)
Linux btrfs with real data HDD (used 94%): 16m52.036s Very bad why???
Data are same with first variant.
Windows ntfs NEW HDD: 1m27.643s

I am really disappointed why in real life (home folder) have so bad results
It's possible HDD which is used 94% optimise speed as on empty hard drive?
Both hard disk are same. This is ST4000NM0033-9ZM170.


--
Best Regards,
Mike Gavrilov.


Re: Very various speed of grep operation on btrfs partition

2015-12-07 Thread Austin S Hemmelgarn

On 2015-12-06 22:32, Duncan wrote:

FWIW, I build kde without the semantic-desktop stuff even enabled at
build-time (gentoo offers that option) here.  All the kdepim stuff (kmail,
etc) uses it, so I dumped the several kdepim related apps (kmail,
akregator, kaddressbook) I used here and found alternatives.  I don't
normally need the indexing, which only takes space for the index and
lowers performance, so it's all turned off at build-time.
Personally, I just avoid both KDE and GNOME.  For me efficiency is most 
important, and XFCE beats both at that (I've done testing between the 
three on my laptop with equivalent settings, and XFCE gives me about 
50-70% more battery life than either KDE or GNOME, and a much smaller 
memory footprint), followed closely by lack of lock-in (both GNOME and 
KDE have a very 'all or nothing' feel to them, and in the case of GNOME, 
it really is all or nothing).  I do have to say though, Okular is by far 
the best document viewer available for Linux.



A day later noticed that the effect of the cache is missing:
real 4m33.940s user 0m0.862s sys 0m1.711s


That's probably due to something knocking it out of cache overnite.  If
you have a cronjob running nitely to update the locate-variant database,
for example, as many distros do by default, that'd do it, as that scans
pretty much the entire filesystem, typically many times the size of RAM,
thus trashing cache.

The indexer could potentially wipe out cache too, particularly on lower
memory machines, if it's actively indexing files, as that would normally
pull what it's indexing into cache, throwing something else that hasn't
been used for awhile away, unless the indexer is smart enough to do
direct access and thus not disturb cache, since it's single-time access
and caching it isn't going to do anything but force stuff from cache you
use more frequently.
Somehow I doubt that the indexer is using O_DIRECT, while that bypasses 
cache, it also makes things really slow, which is directly counter to 
their intent to finish indexing as fast as possible (supposedly to 
minimize performance impact, but that's at odds with their behavior of 
trashing the cache).  That, and the slowest part of indexing is calling 
stat() on everything (stat is one of the slowest filesystem related 
system calls, and is in general usually near the top of the list of 
calls to avoid when doing real-time work, or for that matter anything 
that needs to be fast).



As I understand to solve my problem just need to do the cache is always
effective, even if memory occupied by other applications.

Is possible to specify minimal size of disk cache?


AFAIK, not directly.  What happens is that rather than leave the memory
empty, the kernel caches stuff as it reads it.  If the memory is needed
for apps, it's reclaimed from cache and used for apps.  So Linux systems
tend to run close to zero really free memory, unless you just dropped
caches or rebooted, or you just used some memory hog and it's done and
just freed its memory, and you haven't read enough files since then to
fill that memory back up with cache.
Yep, there's no direct way to force it to a fixed size.  Personally I 
would love to be able to reserve some fixed amount of RAM for disk 
caching, but I also usually run with _a lot_ of swap (usually on the 
order of 4 to 16 times physical RAM, I do work sometimes on really big 
images).


However, if you're running swap, there's an adjustment, file
/proc/sys/vm/swappiness, but would be set on most distros using the
sysctrl config (/etc/sysctl.conf and/or /etc/sysctl.d/*), 0-100, that
normally controls the balance preference between swapping apps out to
keep cache (nearer 100) vs. dumping cache to keep more apps in RAM
instead of swapped out (near 0). IIRC the default is 60.
You may also want to look into /proc/sys/vm/vfs_cache_pressure as well, 
the lower that is, the less likely it is that pages in the filesystem 
cache will be reclaimed.  This tends to have a bigger impact if all you 
care about is the filesystem cache.  Don't set it below about 50 though, 
otherwise it becomes very easy to run the system out of memory.


Obviously if you're not running swap, all app memory must be kept in
physical RAM as it can't be swapped out, and cache simply uses what's
left.


Pity that I can't do 'echo 3 > /proc/sys/vm/drop_caches' on Windows
machine. It be interesting how fast grep would be work without cache.


FWIW, I jumped off of MS when they started shipping malware[1] as part of
the OS, with eXPrivacy.  So I've no idea if they've something similar,
tho I'd be somewhat surprised if they didn't, at least as some obscure
and possibly undocumented system call, so you'd have to call it from a
program written for that purpose, instead of having it exposed such that
any admin with suitable privs can do it with a single line command using
only shell builtins, as Linux does.
Windows has no way that I know of outside of kernel mode to force the 
filesystem cache to 

Re: Very various speed of grep operation on btrfs partition

2015-12-06 Thread Duncan
Михаил Гаврилов posted on Mon, 07 Dec 2015 02:16:08 +0500
as excerpted:

> 2015-12-04 17:59 GMT+05:00 Austin S Hemmelgarn :
>> Well, what other things are accessing the filesystem at the same time?
>> If you've got something like KDE running with the 'semantic desktop'
>> stuff turned on, than that will seriously impact the performance of
>> other things using that filesystem.
>>
>> The other thing to keep in mind, is that caching may be impacting
>> things somewhat.  To really get a good idea of performance for
>> something like this,
>> you should run 'sync' followed by 'echo 3 > /proc/sys/vm/drop_caches'
>> (you'll need to be root for the second one) prior to each run, and
>> ideally have nothing else running on that filesystem.
> 
> Thanks for clarifying.
> 
> I was able to further clarify:
> 
> After resetting the cache on a clean machine after a reboot grep
> operation was take:
> real 2m54.549s user 0m0.662s sys 0m1.062s
> 
> After turning off the indexing service (tracker) result improved: real
> 2m12.182s user 0m0.657s sys 0m1.021s
> 
> 
> If the cache is not cleaned:
> real 0m0.575s user 0m0.467s sys 0m0.108s
> 
> 
> And the result is stable and all subsequent launches, even when the
> indexing service is enabled.

FWIW, I build kde without the semantic-desktop stuff even enabled at 
build-time (gentoo offers that option) here.  All the kdepim stuff (kmail, 
etc) uses it, so I dumped the several kdepim related apps (kmail, 
akregator, kaddressbook) I used here and found alternatives.  I don't 
normally need the indexing, which only takes space for the index and 
lowers performance, so it's all turned off at build-time.

> A day later noticed that the effect of the cache is missing:
> real 4m33.940s user 0m0.862s sys 0m1.711s

That's probably due to something knocking it out of cache overnite.  If 
you have a cronjob running nitely to update the locate-variant database, 
for example, as many distros do by default, that'd do it, as that scans 
pretty much the entire filesystem, typically many times the size of RAM, 
thus trashing cache.

The indexer could potentially wipe out cache too, particularly on lower 
memory machines, if it's actively indexing files, as that would normally 
pull what it's indexing into cache, throwing something else that hasn't 
been used for awhile away, unless the indexer is smart enough to do 
direct access and thus not disturb cache, since it's single-time access 
and caching it isn't going to do anything but force stuff from cache you 
use more frequently.

> As I understand to solve my problem just need to do the cache is always
> effective, even if memory occupied by other applications.
> 
> Is possible to specify minimal size of disk cache?

AFAIK, not directly.  What happens is that rather than leave the memory 
empty, the kernel caches stuff as it reads it.  If the memory is needed 
for apps, it's reclaimed from cache and used for apps.  So Linux systems 
tend to run close to zero really free memory, unless you just dropped 
caches or rebooted, or you just used some memory hog and it's done and 
just freed its memory, and you haven't read enough files since then to 
fill that memory back up with cache.

However, if you're running swap, there's an adjustment, file
/proc/sys/vm/swappiness, but would be set on most distros using the 
sysctrl config (/etc/sysctl.conf and/or /etc/sysctl.d/*), 0-100, that 
normally controls the balance preference between swapping apps out to 
keep cache (nearer 100) vs. dumping cache to keep more apps in RAM 
instead of swapped out (near 0). IIRC the default is 60.

Obviously if you're not running swap, all app memory must be kept in 
physical RAM as it can't be swapped out, and cache simply uses what's 
left.

> Pity that I can't do 'echo 3 > /proc/sys/vm/drop_caches' on Windows
> machine. It be interesting how fast grep would be work without cache.

FWIW, I jumped off of MS when they started shipping malware[1] as part of 
the OS, with eXPrivacy.  So I've no idea if they've something similar, 
tho I'd be somewhat surprised if they didn't, at least as some obscure 
and possibly undocumented system call, so you'd have to call it from a 
program written for that purpose, instead of having it exposed such that 
any admin with suitable privs can do it with a single line command using 
only shell builtins, as Linux does.

>> Additionally, do you have some particular reason that you absolutely
>> _need_ nodatacow to be enabled for the FS?  It usually has no impact on
>> performance, but it removes any kind of error correction for file data
>> (checksums can't be used safely without COW semantics).  It probably
>> has no direct impact on what you're seeing here, but it is something
>> that really shouldn't be used in most cases at the filesystem level (it
>> can be done on given subvolumes or directories, and that's the
>> recommended way to do it if you don't want to go down to the per-file
>> level).
>>
>>
> I see 

Re: Very various speed of grep operation on btrfs partition

2015-12-06 Thread Михаил Гаврилов
2015-12-04 17:59 GMT+05:00 Austin S Hemmelgarn :
> Well, what other things are accessing the filesystem at the same time? If
> you've got something like KDE running with the 'semantic desktop' stuff
> turned on, than that will seriously impact the performance of other things
> using that filesystem.
>
> The other thing to keep in mind, is that caching may be impacting things
> somewhat.  To really get a good idea of performance for something like this,
> you should run 'sync' followed by 'echo 3 > /proc/sys/vm/drop_caches'
> (you'll need to be root for the second one) prior to each run, and ideally
> have nothing else running on that filesystem.

Thanks for clarifying.

I was able to further clarify:

After resetting the cache on a clean machine after a reboot grep
operation was take:
real 2m54.549s
user 0m0.662s
sys 0m1.062s

After turning off the indexing service (tracker) result improved:
real 2m12.182s
user 0m0.657s
sys 0m1.021s


If the cache is not cleaned:
real 0m0.575s
user 0m0.467s
sys 0m0.108s


And the result is stable and all subsequent launches, even when the
indexing service is enabled.

A day later noticed that the effect of the cache is missing:
real 4m33.940s
user 0m0.862s
sys 0m1.711s


As I understand to solve my problem just need to do the cache is
always effective, even if memory occupied by other applications.

Is possible to specify minimal size of disk cache?

Pity that I can't do 'echo 3 > /proc/sys/vm/drop_caches' on Windows
machine. It be interesting how fast grep would be work without cache.

> On a separate note, if you're either running on a 64-bit system, or have
> less than about 2^31 files on the FS, inode_cache will slow things down.
> It's intended for stuff like mail spools where you have billions of files
> being created and deleted over a few weeks, and quickly use up the inode
> numbers.  On almost all systems, it will make things run slower, and
> possibly result in non-=deterministic filesystem performance like what you
> are seeing here.

Hmm, less than about 2^31 ? Maybe you mean more?

> Additionally, do you have some particular reason that you absolutely _need_
> nodatacow to be enabled for the FS?  It usually has no impact on
> performance, but it removes any kind of error correction for file data
> (checksums can't be used safely without COW semantics).  It probably has no
> direct impact on what you're seeing here, but it is something that really
> shouldn't be used in most cases at the filesystem level (it can be done on
> given subvolumes or directories, and that's the recommended way to do it if
> you don't want to go down to the per-file level).
>

I see that some issue with btrfs still not closed:
https://code.google.com/p/chromium/issues/detail?id=284738
And gnome-boxes still very slow when COW is enable.



--
Best Regards,
Mike Gavrilov.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Very various speed of grep operation on btrfs partition

2015-12-04 Thread Austin S Hemmelgarn

On 2015-12-03 14:36, Михаил Гаврилов wrote:

Today on work I needed searching some strings in repository. Only
machine with windows was available. I am was using grep from Cygwin
for this task and I am was surprised about speed of NTFS partition.I
decided to repeat this task on my home Linux workstation.


[...snip...]

 From results we see that search goes sometimes instantly less than a
second, and sometimes lasts 4 minutes. /home partition formatted in
BTRFS filesystem. I would be interested investigate what is related to
search speed. And make that search was always goes less than a second.

Here is my mount options:
UUID=82df2d84-bf54-46cb-84ba-c88e93677948 /home btrfs
subvolid=5,autodefrag,noatime,space_cache,inode_cache,nodatacow 0 0

# uname -a
Linux localhost.localdomain 4.2.6-301.fc23.x86_64+debug #1 SMP Fri Nov
20 22:07:57 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

How to start investigation?

Well, what other things are accessing the filesystem at the same time? 
If you've got something like KDE running with the 'semantic desktop' 
stuff turned on, than that will seriously impact the performance of 
other things using that filesystem.


The other thing to keep in mind, is that caching may be impacting things 
somewhat.  To really get a good idea of performance for something like 
this, you should run 'sync' followed by 'echo 3 > 
/proc/sys/vm/drop_caches' (you'll need to be root for the second one) 
prior to each run, and ideally have nothing else running on that filesystem.


On a separate note, if you're either running on a 64-bit system, or have 
less than about 2^31 files on the FS, inode_cache will slow things down. 
 It's intended for stuff like mail spools where you have billions of 
files being created and deleted over a few weeks, and quickly use up the 
inode numbers.  On almost all systems, it will make things run slower, 
and possibly result in non-=deterministic filesystem performance like 
what you are seeing here.


Additionally, do you have some particular reason that you absolutely 
_need_ nodatacow to be enabled for the FS?  It usually has no impact on 
performance, but it removes any kind of error correction for file data 
(checksums can't be used safely without COW semantics).  It probably has 
no direct impact on what you're seeing here, but it is something that 
really shouldn't be used in most cases at the filesystem level (it can 
be done on given subvolumes or directories, and that's the recommended 
way to do it if you don't want to go down to the per-file level).




smime.p7s
Description: S/MIME Cryptographic Signature


Very various speed of grep operation on btrfs partition

2015-12-03 Thread Михаил Гаврилов
Today on work I needed searching some strings in repository. Only
machine with windows was available. I am was using grep from Cygwin
for this task and I am was surprised about speed of NTFS partition.I
decided to repeat this task on my home Linux workstation.

[mikhail@localhost ~]$ time grep -rn 'float:left;display:
block;height: 24px;line-height: 1.2em;position: relative;text-align:
center;white-space: nowrap;width: 80px;'
"/home/mikhail/sources/inside/Модули интерфейса/"
/home/mikhail/sources/inside/Модули интерфейса/Активные
продажи/Темы/Разводящая группы
тем/CRMGroupTheme/CRMGroupTheme.xhtml:38:  

real 3m21.262s
user 0m0.914s
sys 0m2.288s
[mikhail@localhost ~]$ time grep -rn 'float:left;display:
block;height: 24px;line-height: 1.2em;position: relative;text-align:
center;white-space: nowrap;width: 80px;'
"/home/mikhail/sources/inside/Модули интерфейса/"
/home/mikhail/sources/inside/Модули интерфейса/Активные
продажи/Темы/Разводящая группы
тем/CRMGroupTheme/CRMGroupTheme.xhtml:38:  

real 0m49.921s
user 0m0.736s
sys 0m0.957s
[mikhail@localhost ~]$ time grep -rn 'float:left;display:
block;height: 24px;line-height: 1.2em;position: relative;text-align:
center;white-space: nowrap;width: 80px;'
"/home/mikhail/sources/inside/Модули интерфейса/"
/home/mikhail/sources/inside/Модули интерфейса/Активные
продажи/Темы/Разводящая группы
тем/CRMGroupTheme/CRMGroupTheme.xhtml:38:  

real 0m1.077s
user 0m0.667s
sys 0m0.409s
[mikhail@localhost ~]$ time grep -rn 'float:left;display:
block;height: 24px;line-height: 1.2em;position: relative;text-align:
center;white-space: nowrap;width: 80px;'
"/home/mikhail/sources/inside/Модули интерфейса/"
/home/mikhail/sources/inside/Модули интерфейса/Активные
продажи/Темы/Разводящая группы
тем/CRMGroupTheme/CRMGroupTheme.xhtml:38:  

real 0m1.062s
user 0m0.657s
sys 0m0.386s
[mikhail@localhost ~]$ time grep -rn 'float:left;display:
block;height: 24px;line-height: 1.2em;position: relative;text-align:
center;white-space: nowrap;width: 80px;'
"/home/mikhail/sources/inside/Модули интерфейса/"
/home/mikhail/sources/inside/Модули интерфейса/Активные
продажи/Темы/Разводящая группы
тем/CRMGroupTheme/CRMGroupTheme.xhtml:38:  

real 0m1.020s
user 0m0.641s
sys 0m0.373s
[mikhail@localhost ~]$ time grep -rn 'float:left;display:
block;height: 24px;line-height: 1.2em;position: relative;text-align:
center;white-space: nowrap;width: 80px;'
"/home/mikhail/sources/inside/Модули интерфейса/"
/home/mikhail/sources/inside/Модули интерфейса/Активные
продажи/Темы/Разводящая группы
тем/CRMGroupTheme/CRMGroupTheme.xhtml:38:  

real 0m0.941s
user 0m0.593s
sys 0m0.335s
[mikhail@localhost ~]$ time grep -rn 'float:left;display:
block;height: 24px;line-height: 1.2em;position: relative;text-align:
center;white-space: nowrap;width: 80px;'
"/home/mikhail/sources/inside/Модули интерфейса/"
/home/mikhail/sources/inside/Модули интерфейса/Активные
продажи/Темы/Разводящая группы
тем/CRMGroupTheme/CRMGroupTheme.xhtml:38:  

real 0m0.918s
user 0m0.594s
sys 0m0.322s
[mikhail@localhost ~]$ time grep -rn 'float:left;display:
block;height: 24px;line-height: 1.2em;position: relative;text-align:
center;white-space: nowrap;width: 80px;'
"/home/mikhail/sources/inside/Модули интерфейса/"
/home/mikhail/sources/inside/Модули интерфейса/Активные
продажи/Темы/Разводящая группы
тем/CRMGroupTheme/CRMGroupTheme.xhtml:38:  

real 0m1.053s
user 0m0.620s
sys 0m0.401s
[mikhail@localhost ~]$ time grep -rn 'float:left;display:
block;height: 24px;line-height: 1.2em;position: relative;text-align:
center;white-space: nowrap;width: 80px;'
"/home/mikhail/sources/inside/Модули интерфейса/"
/home/mikhail/sources/inside/Модули интерфейса/Активные
продажи/Темы/Разводящая группы
тем/CRMGroupTheme/CRMGroupTheme.xhtml:38:  

real 0m1.049s
user 0m0.625s
sys 0m0.411s
[mikhail@localhost ~]$ time grep -rn 'float:left;display:
block;height: 24px;line-height: 1.2em;position: relative;text-align:
center;white-space: nowrap;width: 80px;'
"/home/mikhail/sources/inside/Модули интерфейса/"
/home/mikhail/sources/inside/Модули интерфейса/Активные
продажи/Темы/Разводящая группы
тем/CRMGroupTheme/CRMGroupTheme.xhtml:38:  

real 2m51.858s
user 0m0.885s
sys 0m1.863s
[mikhail@localhost ~]$ time grep -rn 'float:left;display:
block;height: 24px;line-height: 1.2em;position: relative;text-align:
center;white-space: nowrap;width: 80px;'
"/home/mikhail/sources/inside/Модули интерфейса/"
/home/mikhail/sources/inside/Модули интерфейса/Активные
продажи/Темы/Разводящая группы
тем/CRMGroupTheme/CRMGroupTheme.xhtml:38:  

real 3m22.615s
user 0m0.876s
sys 0m2.160s
[mikhail@localhost ~]$ time grep -rn 'float:left;display:
block;height: 24px;line-height: 1.2em;position: relative;text-align:
center;white-space: nowrap;width: 80px;'
"/home/mikhail/sources/inside/Модули интерфейса/"
/home/mikhail/sources/inside/Модули интерфейса/Активные
продажи/Темы/Разводящая группы
тем/CRMGroupTheme/CRMGroupTheme.xhtml:38:  

real 1m16.100s
user 0m0.773s
sys