Re: [CentOS] lots of small files in a folder on Linux centos

2011-07-25 Thread Marc Deop
On Sunday 24 July 2011 10:13:30 R P Herrold wrote:
 #!/bin/sh
 #
 CANDIDATES=pix1.jpg pix2.jpg pix3.jpg
 for i in `echo ${CANDIDATES}`; do
  HASH=`echo $i | md5sum - | awk {'print $1'}`
  echo $i${HASH}
 done

I know it absolutelly has nothing to do with databases or files in folders but 
as we are talking about optimizing:

#!/bin/bash
CANDIDATES=(pix1.jpg pix2.jpg pix3.jpg)
for i in ${CANDIDATES[@]}; do 
MD5SUM=$(md5sum (echo $i)) 
echo $i ${MD5SUM% *};
done

It's more than twice as fast than the previous sh script.

[ willing to learn mode, feel free to ignore this]

Anyway, about the the hashes and directories and so on... I assume we'd need a 
hash table in our application, right?

Would we proceed as follows (correct me if I'm wrong please)?

1- m5sum the file we need
2- look for the first letter of the hash
3- get into the directory
4- now we look for our file

Is this right? I understand this would improve the searching of files when 
there's a lot of them.

Thanks to anyone that replies me and sorry for the offtopic

Regards,

Marc Deop
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] lots of small files in a folder on Linux centos

2011-07-25 Thread Lamar Owen
On Sunday, July 24, 2011 05:29:23 AM yonatan pingle wrote:
...
 lately the server is under-preforming and load averages are high,
 mysql service keeps crashing and the server is hitting max memory
 usage ( so i added ram .. ) ,
 after looking into the website folders, i have found one folder which
 from my point of view is one of the causes for the server loads.
...
 uploads]# ls | wc -l
 3123
...
 pros vs cons of having a large amount of small files in the same
 folder on Linux Centos?

3,123 files is not a large number.  From a CentOS 4 file server here.

[root@pachyderm sky_data]# ls|wc -l
13526
[root@pachyderm sky_data]# cd ../motse
[root@pachyderm motse]# ls |wc -l
28218
[root@pachyderm motse]#cd
[root@pachyderm ~]# du -s /var/lib/pgsql
556420596   /var/lib/pgsql
[root@pachyderm ~]# 

(Yeah, 556GB in PostgreSQL)  Pachyderm = 'The elephant never forgets' 
But I'm not looking forward to converting it to a post-C4 PostgreSQL

Performance on this box is pretty good, all things considered.

Large log files I have found can be performance problems; check to make sure 
log files are being rolled properly.

There are some specific MySQL tuning documents out there; I seem to remember a 
posting on a local LUG list about some serious MySQL performance issues that 
took a long time to ferret out, but I can't seem to find it quickly.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] lots of small files in a folder on Linux centos

2011-07-25 Thread R P Herrold
On Mon, 25 Jul 2011, Marc Deop wrote:

 It's more than twice as fast than the previous sh script.

In part this is /bin/sh v /bin/bash and using 'bashisms' 
matter, but yes, I did not seek to optimize a teaching 
throwaway

 1- m5sum the file we need
  ... actually the NAME of the file, to make it explicit we are
not looking at content [also a reasonable approach if one is
looking to find and de-duplicate a filestore]

 2- look for the first letter of the hash
  ... actually this may be more than a single letter of the
hash --- with ca 3000 files, and 16 hash characters,
we should end up with about 200 files per
subdirectory.  The filesystem should be doing some sort of
index as well -- as I recall, a B-tree in the case of
extN but I've not expressly looked.  The php case was
mentioned, however, and its directory searching is less
optimal

We have a customer with a similar problem with a naiively 
written set of home brewed PHP code, and are helping them work 
through similar issues

 3- get into the directory
 4- now we look for our file
  ... this is probably a single operation to suck the sub-directory
listing into an array in php, and use an associative
match

but you are right, we are moving increasingly away from a 
CentOS issue to a more general coding style issue

-- Russ herrold
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] lots of small files in a folder on Linux centos

2011-07-24 Thread Eero Volotinen
2011/7/24 yonatan pingle yonatan.pin...@gmail.com:
 Hello,
 I have a rather annoying issue on going with one of my centos virtual servers.
 the server hosts a website using apache and mysql ,there are three
 persons involved with keeping the site up and running.
 and i am his root due to the fact he does not know anything with about Linux.
 there is an php/sql coder , and the site owner which only knows to use
 the CMS and upload new articles to the website.

 the coder and the site owner work together for a long time already , i
 am their new admin ( as the last one was a major ISP which failed to
 host the site properly ).

 lately the server is under-preforming and load averages are high,
 mysql service keeps crashing and the server is hitting max memory
 usage ( so i added ram .. ) ,
 after looking into the website folders, i have found one folder which
 from my point of view is one of the causes for the server loads.

 (sorry for piping ls ).

 uploads]# ls | wc -l
 3123

 I have talked with the site owner, which in turn showed this to the
 coder ,now he throws the ball back claiming: it has nothing to do with
 server performance.
 the folder is full of images, about 40K each, and i have good reason
 to believe this is the problem, as this is not the first time i see
 that a folder which includes a large amount of files causes a server
 to under-perform.

 the coder is not tech savvy as one might expect, so it's really hard
 for me to explain the issue of having lots of files in one folder to
 the site owner or to the coder.

 the hardware is a decent machine dual E5530 24RAM with six hard drives in 
 raid.
 the virtual server has 2GB of ram and it's own CPU share ( 4 cores 8 threads 
 ).
 the coder is arguing with facts sadly to say he has the site owner on
 his side.

 long story short, how should i explain in the most simple way in plain
 english that having that much files in a folder will cause a server to
 work slower?

 pros vs cons of having a large amount of small files in the same
 folder on Linux Centos?

I assume that you are using ext3 or ext4 filesystems? Both ext3 and
ext4 slows down, if there is too much files in same directory.
XFS-fs is solution to fix this problem.

--
Eero
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] lots of small files in a folder on Linux centos

2011-07-24 Thread Alexander Dalloz
Am 24.07.2011 13:03, schrieb Eero Volotinen:
 2011/7/24 yonatan pingle yonatan.pin...@gmail.com:

 uploads]# ls | wc -l
 3123

 I assume that you are using ext3 or ext4 filesystems? Both ext3 and
 ext4 slows down, if there is too much files in same directory.
 XFS-fs is solution to fix this problem.

 Eero

Seriously, 3123 files in a single directory is not an issue for any of
the extX filesystems. Though ext2 probably performs the worst, ext3 and
particular ext4 should not have any problem with that small amount of
file objects. Given that the filesystem is not already filled nearby 100%.

An issue may be, how the code deals with the directory content. Horrible
code for sure can impact the speed of the website, but should not affect
the system globally.

Yonatan, if you really are concerned about the uploads directory, then
use vmstat, iostat or sar to check system parameters while the directory
is accessed.

Your problem is something else, I am pretty sure.

Alexander

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] lots of small files in a folder on Linux centos

2011-07-24 Thread yonatan pingle
On Sun, Jul 24, 2011 at 2:19 PM, Alexander Dalloz ad+li...@uni-x.org wrote:
 Am 24.07.2011 13:03, schrieb Eero Volotinen:
 2011/7/24 yonatan pingle yonatan.pin...@gmail.com:

 uploads]# ls | wc -l
 3123

 I assume that you are using ext3 or ext4 filesystems? Both ext3 and
 ext4 slows down, if there is too much files in same directory.
 XFS-fs is solution to fix this problem.

 Eero

 Seriously, 3123 files in a single directory is not an issue for any of
 the extX filesystems. Though ext2 probably performs the worst, ext3 and
 particular ext4 should not have any problem with that small amount of
 file objects. Given that the filesystem is not already filled nearby 100%.

 An issue may be, how the code deals with the directory content. Horrible
 code for sure can impact the speed of the website, but should not affect
 the system globally.

 Yonatan, if you really are concerned about the uploads directory, then
 use vmstat, iostat or sar to check system parameters while the directory
 is accessed.

 Your problem is something else, I am pretty sure.

 Alexander

 ___
 CentOS mailing list
 CentOS@centos.org
 http://lists.centos.org/mailman/listinfo/centos



Hi, Alexander
good suggestions, ill monitor I/O and mysql code, sounds like a code
related issue and not a centos issue after all.

it runs on ext3  ,i could only guess how to code deals with the dir,
as it seems to be the site builds the pages using php+mysql for each
visitor, with about 40K unique visitors a day, that is a lot of I/O.

This looks like an issue with MySQL after all.
Queries: 48.0M  qps:   66 Slow:65.0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
 0.970.000.28   97.910.000.84

   runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15

 0   102  5.30  3.13  2.06
 2   120  3.14  2.77  2.22

we wait and see,
 tail -f log-slow-queries.log
/usr/sbin/mysqld, Version: 5.0.67-community-log (MySQL Community
Edition (GPL)). started with:
Tcp port: 3306  Unix socket: /var/lib/mysql/mysql.sock
Time Id CommandArgument



thank you



-- 
Best Regards,
Yonatan Pingle
RHCT | RHCSA | CCNA1
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] lots of small files in a folder on Linux centos

2011-07-24 Thread Ljubomir Ljubojevic
yonatan pingle wrote:
 On Sun, Jul 24, 2011 at 2:19 PM, Alexander Dalloz ad+li...@uni-x.org wrote:
 Am 24.07.2011 13:03, schrieb Eero Volotinen:
 2011/7/24 yonatan pingle yonatan.pin...@gmail.com:
 uploads]# ls | wc -l
 3123
 I assume that you are using ext3 or ext4 filesystems? Both ext3 and
 ext4 slows down, if there is too much files in same directory.
 XFS-fs is solution to fix this problem.
 Eero
 Seriously, 3123 files in a single directory is not an issue for any of
 the extX filesystems. Though ext2 probably performs the worst, ext3 and
 particular ext4 should not have any problem with that small amount of
 file objects. Given that the filesystem is not already filled nearby 100%.

 An issue may be, how the code deals with the directory content. Horrible
 code for sure can impact the speed of the website, but should not affect
 the system globally.

 Yonatan, if you really are concerned about the uploads directory, then
 use vmstat, iostat or sar to check system parameters while the directory
 is accessed.

 Your problem is something else, I am pretty sure.

 Alexander

 ___
 CentOS mailing list
 CentOS@centos.org
 http://lists.centos.org/mailman/listinfo/centos

 
 
 Hi, Alexander
 good suggestions, ill monitor I/O and mysql code, sounds like a code
 related issue and not a centos issue after all.
 
 it runs on ext3  ,i could only guess how to code deals with the dir,
 as it seems to be the site builds the pages using php+mysql for each
 visitor, with about 40K unique visitors a day, that is a lot of I/O.
 
 This looks like an issue with MySQL after all.
 Queries: 48.0M  qps:   66 Slow:65.0
 
 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  0.970.000.28   97.910.000.84
 
runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15
 
  0   102  5.30  3.13  2.06
  2   120  3.14  2.77  2.22
 
 we wait and see,
  tail -f log-slow-queries.log
 /usr/sbin/mysqld, Version: 5.0.67-community-log (MySQL Community
 Edition (GPL)). started with:
 Tcp port: 3306  Unix socket: /var/lib/mysql/mysql.sock
 Time Id CommandArgument
 
 
 
 thank you
 
 
 

Do you have cahcing turned on in CMS? That could help.

-- 

Ljubomir Ljubojevic
(Love is in the Air)
PL Computers
Serbia, Europe

Google is the Mother, Google is the Father, and traceroute is your
trusty Spiderman...
StarOS, Mikrotik and CentOS/RHEL/Linux consultant
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] lots of small files in a folder on Linux centos

2011-07-24 Thread Ryan Wagoner
On Sun, Jul 24, 2011 at 7:52 AM, yonatan pingle
yonatan.pin...@gmail.com wrote:
 Hi, Alexander
 good suggestions, ill monitor I/O and mysql code, sounds like a code
 related issue and not a centos issue after all.

 it runs on ext3  ,i could only guess how to code deals with the dir,
 as it seems to be the site builds the pages using php+mysql for each
 visitor, with about 40K unique visitors a day, that is a lot of I/O.

 This looks like an issue with MySQL after all.
 Queries: 48.0M  qps:   66 Slow:    65.0

 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
                     0.97    0.00    0.28   97.91    0.00    0.84

   runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15

         0       102      5.30      3.13      2.06
         2       120      3.14      2.77      2.22

 we wait and see,
  tail -f log-slow-queries.log
 /usr/sbin/mysqld, Version: 5.0.67-community-log (MySQL Community
 Edition (GPL)). started with:
 Tcp port: 3306  Unix socket: /var/lib/mysql/mysql.sock
 Time                 Id Command    Argument



 thank you



 --
 Best Regards,
 Yonatan Pingle
 RHCT | RHCSA | CCNA1

If you are using phpMyAdmin the status page will aid you in tuning
mySQL. Look for values in red. The description will usually tell you
what to adjust to improve performance.

Ryan
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] lots of small files in a folder on Linux centos

2011-07-24 Thread yonatan pingle



 Do you have cahcing turned on in CMS? That could help.

 --

 Ljubomir Ljubojevic
 (Love is in the Air)
 PL Computers
 Serbia, Europe

 Google is the Mother, Google is the Father, and traceroute is your
 trusty Spiderman...
 StarOS, Mikrotik and CentOS/RHEL/Linux consultant
 ___
 CentOS mailing list
 CentOS@centos.org
 http://lists.centos.org/mailman/listinfo/centos


there is no caching system, its a  home made CMS.


-- 
Best Regards,
Yonatan Pingle
RHCT | RHCSA | CCNA1
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] lots of small files in a folder on Linux centos

2011-07-24 Thread yonatan pingle
 RHCT | RHCSA | CCNA1

 If you are using phpMyAdmin the status page will aid you in tuning
 mySQL. Look for values in red. The description will usually tell you
 what to adjust to improve performance.

 Ryan
 ___
 CentOS mailing list
 CentOS@centos.org
 http://lists.centos.org/mailman/listinfo/centos


im good with mysqltuner.pl,
as it seems there are slow queries on mysql and i have adjusted all
values in my.cnf according to the application needs.

looks like it's all in the code and the way the CMS handles the files
from that upload directory , so there is nothing wrong with the centos
machine after all, it's doing it's job

ill point the coder to the status page and hope he gets a clue.

thank you everybody for the good advices, i am now sure it's not my fault :-)

/thread

-- 
Best Regards,
Yonatan Pingle
RHCT | RHCSA | CCNA1
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] lots of small files in a folder on Linux centos

2011-07-24 Thread Ryan Wagoner
On Sun, Jul 24, 2011 at 8:40 AM, yonatan pingle
yonatan.pin...@gmail.com wrote:
 im good with mysqltuner.pl,
 as it seems there are slow queries on mysql and i have adjusted all
 values in my.cnf according to the application needs.

 looks like it's all in the code and the way the CMS handles the files
 from that upload directory , so there is nothing wrong with the centos
 machine after all, it's doing it's job

 ill point the coder to the status page and hope he gets a clue.

 thank you everybody for the good advices, i am now sure it's not my fault 
 :-)

 /thread

 --
 Best Regards,
 Yonatan Pingle
 RHCT | RHCSA | CCNA1

Sounds like you need to enable logging in mySQL for slow queries. Give
your developer the log and let him know to either optimize the queries
or create indexes appropriately to improve the performance.

Ryan
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] lots of small files in a folder on Linux centos

2011-07-24 Thread yonatan pingle
On Sun, Jul 24, 2011 at 3:43 PM, Ryan Wagoner rswago...@gmail.com wrote:
 On Sun, Jul 24, 2011 at 8:40 AM, yonatan pingle
 yonatan.pin...@gmail.com wrote:
 im good with mysqltuner.pl,
 as it seems there are slow queries on mysql and i have adjusted all
 values in my.cnf according to the application needs.

 looks like it's all in the code and the way the CMS handles the files
 from that upload directory , so there is nothing wrong with the centos
 machine after all, it's doing it's job

 ill point the coder to the status page and hope he gets a clue.

 thank you everybody for the good advices, i am now sure it's not my fault 
 :-)

 /thread

 --
 Best Regards,
 Yonatan Pingle
 RHCT | RHCSA | CCNA1

 Sounds like you need to enable logging in mySQL for slow queries. Give
 your developer the log and let him know to either optimize the queries
 or create indexes appropriately to improve the performance.

 Ryan
 ___
 CentOS mailing list
 CentOS@centos.org
 http://lists.centos.org/mailman/listinfo/centos


Yes Ryan, that exactly what i have done.
he will get the log shortly and i will get some not free beer.
:-)


-- 
Best Regards,
Yonatan Pingle
RHCT | RHCSA | CCNA1
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] lots of small files in a folder on Linux centos

2011-07-24 Thread John R. Dennison
On Sun, Jul 24, 2011 at 03:53:46PM +0300, yonatan pingle wrote:
 
 Yes Ryan, that exactly what i have done.
 he will get the log shortly and i will get some not free beer.

While I'm all for mysql optimization it's clearly evident from an
earlier posting that your disks are thrashing with insanely high iowait
figures; and while it's _possible_ for this to be caused by mysql you
really have to go out of your way to achieve that type of behavior.





John
-- 
The best argument against democracy is a five minute conversation
with the average voter.

-- Winston Churchill


pgpQQWMH9fKHI.pgp
Description: PGP signature
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] lots of small files in a folder on Linux centos

2011-07-24 Thread Diego Sanchez
2011/7/24 yonatan pingle yonatan.pin...@gmail.com:

 there is no caching system, its a  home made CMS.



You can use an accelerator too.

http://en.wikipedia.org/wiki/PHP_accelerator
http://en.wikipedia.org/wiki/List_of_PHP_accelerators

Please, make a big backup before this! (I nevever had a problem,
but... why tempt the devil?)
-- 
Diego - Yo no soy paranoico! (pero que me siguen, me siguen)
http://about.me/diegors/bio
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] lots of small files in a folder on Linux centos

2011-07-24 Thread R P Herrold
On Sun, 24 Jul 2011, yonatan pingle wrote:

 the coder is not tech savvy as one might expect, so it's 
 really hard for me to explain the issue of having lots of 
 files in one folder to the site owner or to the coder.

I do not expect coders to remain 'not tech savvy'

If the coder is not willing to learn and to test, you are 
already doomed, and should walk away from the project

To show the problem, take a pile of pennies, and ask the coder 
to find one with a given year.  The coder will have to do a 
linear search, to even know if the target exists.  Then show a 
egg carton with another pile of pennies sorted and labelled by 
year in each section, and aask them to repeat the task -- in 
the latter case, it is a 'single seek' to solve the problem

Obviously, the target year may not even be present.  With a 
single pile (directory) the linear search is still required, 
but with 'binning' by years, that is obvious by inspection as 
well


One approach to lots of files in a single directory (which can 
cause problems in getting timely access to a specific file) is 
to build a permuted directory tree from the file names to 
spread the load around.  If the files are of a form where they 
have 'closely identical' names [pix1.jpg, pix2.jpg, 
etc], first build a 'hashed' version of the file name with 
md5sum, or such, to level the hash leading characters

[herrold@localhost ~]$ ./hashdemo.sh
pix1.jpgfd8f49c6487588989cd764eb493251ec
pix2.jpg12955d9587d99becf3b2ede46305624c
pix3.jpgbfdc8f593676e4f1e878bb6959f14ce2
[herrold@localhost ~]$ cat hashdemo.sh
#!/bin/sh
#
CANDIDATES=pix1.jpg pix2.jpg pix3.jpg
for i in `echo ${CANDIDATES}`; do
 HASH=`echo $i | md5sum - | awk {'print $1'}`
 echo $i${HASH}
done
[herrold@localhost ~]$

then, we look to the leading letter of the hask, to design our 
egg carton bins.  We place pix1.jpg in directory: ./f/ and 
pix2.jpg in directory ./1/ and pix3.jpg in directory 
./b/ and so forth -- if the directories get too full again, 
you might go to using the first two letters of the hash to 
perform the 'binning' process

The md5sum function is readily available in php, as are 
directory creation and so forth, so positioning the files, and 
computing the indexes are straightforward there

This is all pretty basic stuff, covered in Knuth in TAOCP long 
ago

-- Russ herrold
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] lots of small files in a folder on Linux centos

2011-07-24 Thread yonatan pingle
On Sun, Jul 24, 2011 at 4:02 PM, John R. Dennison j...@gerdesas.com wrote:
 On Sun, Jul 24, 2011 at 03:53:46PM +0300, yonatan pingle wrote:

 Yes Ryan, that exactly what i have done.
 he will get the log shortly and i will get some not free beer.

 While I'm all for mysql optimization it's clearly evident from an
 earlier posting that your disks are thrashing with insanely high iowait
 figures; and while it's _possible_ for this to be caused by mysql you
 really have to go out of your way to achieve that type of behavior.





                                                        John
 --
 The best argument against democracy is a five minute conversation
 with the average voter.

 -- Winston Churchill

 ___
 CentOS mailing list
 CentOS@centos.org
 http://lists.centos.org/mailman/listinfo/centos



this is exactly what i was thinking, that's an insane iowait value,
taking into consideration its a VM , not the hardware machine , and
the fact the he fills up all his ram along with slow queries showing
in the log, it's simply bad code and wrong handling of files.

-- 
Best Regards,
Yonatan Pingle
RHCT | RHCSA | CCNA1
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] lots of small files in a folder on Linux centos

2011-07-24 Thread yonatan pingle
On Sun, Jul 24, 2011 at 5:13 PM, R P Herrold herr...@owlriver.com wrote:
 On Sun, 24 Jul 2011, yonatan pingle wrote:

 the coder is not tech savvy as one might expect, so it's
 really hard for me to explain the issue of having lots of
 files in one folder to the site owner or to the coder.

 I do not expect coders to remain 'not tech savvy'

 If the coder is not willing to learn and to test, you are
 already doomed, and should walk away from the project

 To show the problem, take a pile of pennies, and ask the coder
 to find one with a given year.  The coder will have to do a
 linear search, to even know if the target exists.  Then show a
 egg carton with another pile of pennies sorted and labelled by
 year in each section, and aask them to repeat the task -- in
 the latter case, it is a 'single seek' to solve the problem

 Obviously, the target year may not even be present.  With a
 single pile (directory) the linear search is still required,
 but with 'binning' by years, that is obvious by inspection as
 well


 One approach to lots of files in a single directory (which can
 cause problems in getting timely access to a specific file) is
 to build a permuted directory tree from the file names to
 spread the load around.  If the files are of a form where they
 have 'closely identical' names [pix1.jpg, pix2.jpg,
 etc], first build a 'hashed' version of the file name with
 md5sum, or such, to level the hash leading characters

 [herrold@localhost ~]$ ./hashdemo.sh
 pix1.jpg    fd8f49c6487588989cd764eb493251ec
 pix2.jpg    12955d9587d99becf3b2ede46305624c
 pix3.jpg    bfdc8f593676e4f1e878bb6959f14ce2
 [herrold@localhost ~]$ cat hashdemo.sh
 #!/bin/sh
 #
 CANDIDATES=pix1.jpg pix2.jpg pix3.jpg
 for i in `echo ${CANDIDATES}`; do
         HASH=`echo $i | md5sum - | awk {'print $1'}`
         echo $i        ${HASH}
 done
 [herrold@localhost ~]$

 then, we look to the leading letter of the hask, to design our
 egg carton bins.  We place pix1.jpg in directory: ./f/ and
 pix2.jpg in directory ./1/ and pix3.jpg in directory
 ./b/ and so forth -- if the directories get too full again,
 you might go to using the first two letters of the hash to
 perform the 'binning' process

 The md5sum function is readily available in php, as are
 directory creation and so forth, so positioning the files, and
 computing the indexes are straightforward there

 This is all pretty basic stuff, covered in Knuth in TAOCP long
 ago

 -- Russ herrold
 ___
 CentOS mailing list
 CentOS@centos.org
 http://lists.centos.org/mailman/listinfo/centos


Thank you for the excellent analogy , i will actually use it to
explain the matter.

I do hope he understands the simple logic behind a proper directory
tree, it's clearly a design flaw, bad planning or laziness which lead
him to this state.

unfortunately, as bash is easier to read then English for you and me,
ill spare the demohash.sh code from him , and simply put it out in
words , and hope he figures out the proper way to create a tree.

I am strongly tempted to walk away on this one, normally when there no
co-operation and statements like it's a problem with the server 
when clearly it's a code issue , it's just nerve wrecking to try and
help these guys.

as i said earlier , he was hosted directly on a virtual server with
the largest isp in my country , and they have failed to help him (
just selling him more ram and cpu, until it got to a breaking point ).
I have actually co-locate at the very same ISP and i know for a fact
they are awesome when it comes to support...

-- 
Best Regards,
Yonatan Pingle
RHCT | RHCSA | CCNA1
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] lots of small files in a folder on Linux centos

2011-07-24 Thread Always Learning

 On Sun, Jul 24, 2011 at 5:13 PM, R P Herrold herr...@owlriver.com wrote:

  then, we look to the leading letter of the hask, to design our
  egg carton bins.  We place pix1.jpg in directory: ./f/ and
  pix2.jpg in directory ./1/ and pix3.jpg in directory
  ./b/ and so forth -- if the directories get too full again,
  you might go to using the first two letters of the hash to
  perform the 'binning' process

If the pictures are named sequentially, why not store then at a 100 per
directory structure something like this

/pix/0/00/pix1.jpg

/pix/0/26/pix02614.jpg 

/pix/6/72/pix67255.jpg




-- 
With best regards,

Paul.
England,
EU.


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] lots of small files in a folder on Linux centos

2011-07-24 Thread Marian Marinov
On Sunday 24 July 2011 22:48:23 Always Learning wrote:
  On Sun, Jul 24, 2011 at 5:13 PM, R P Herrold herr...@owlriver.com wrote:
   then, we look to the leading letter of the hask, to design our
   egg carton bins.  We place pix1.jpg in directory: ./f/ and
   pix2.jpg in directory ./1/ and pix3.jpg in directory
   ./b/ and so forth -- if the directories get too full again,
   you might go to using the first two letters of the hash to
   perform the 'binning' process
 
 If the pictures are named sequentially, why not store then at a 100 per
 directory structure something like this
 
 /pix/0/00/pix1.jpg
 
 /pix/0/26/pix02614.jpg
 
 /pix/6/72/pix67255.jpg

As I have worked on projects where the 'coder' is not willing to do any 
changes, I offer you another temporary solution:

If the pictures are in /home/site/public_html/images, you simply need to  
create a tmpfs, copy the pictures there and then bind mount the tmpfs in that 
directory:

# mkdir /home/site/ram
# mount -t tmpfs -o size=200M none /home/site/ram
# cp -a /home/site/public_html/images/* /home/site/ram
# mount --bind /home/site/ram /home/site/public_html/images

Instant performance gain, while you wait for the coder to actually fix the 
problem. 

However you should make sure that you copy the new images from the ram to 
disk. Maybe with inotifywatch.

Keep in mind that this is only a temporary solution that should serve only as 
a proof that this is the problem and it needs to be fixed. Try to explain that 
this hack is not an actual solution.

-- 
Best regards,
Marian Marinov


signature.asc
Description: This is a digitally signed message part.
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] lots of small files in a folder on Linux centos

2011-07-24 Thread R P Herrold
On Sun, 24 Jul 2011, Always Learning wrote:

 If the pictures are named sequentially, why not store then at a 100 per
 directory structure something like this

 /pix/0/00/pix1.jpg

 /pix/0/26/pix02614.jpg

 /pix/6/72/pix67255.jpg

Go read Knuth

One does not do that because then one is counting on the end 
user's data to conform to, and to continue to conform to your 
expectations [here you have added an invisible constraint of 
'pix' as the first part of the file name which you are 
hoping remains constant -- it will not, as survey of naming 
schemes used by digital camera makers will reveal].  Your 
explicit constraint of a monotonicly increasing image number 
is also not likely to be realized in a world where people will 
erase or for other reasons not submit all of a given photo 
shoot

By using a hash, we remove those constraints, and also gain 
the virtuous effect for free of self-organizing a relatively 
level dispersion of files to the destination directories

-- Russ herrold
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] lots of small files in a folder on Linux centos

2011-07-24 Thread Keith Roberts
On Sun, 24 Jul 2011, R P Herrold wrote:

 By using a hash, we remove those constraints, and also gain
 the virtuous effect for free of self-organizing a relatively
 level dispersion of files to the destination directories

Not followed the whole thread, but a SQL database index of 
the actual picture files, giving the path into the directory 
structure. Would that work?

Kind Regards,

Keith Roberts

-
Websites:
http://www.karsites.net
http://www.php-debuggers.net
http://www.raised-from-the-dead.org.uk

All email addresses are challenge-response protected with
TMDA [http://tmda.net]
-
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] lots of small files in a folder on Linux centos

2011-07-24 Thread Always Learning

On Sun, 2011-07-24 at 16:33 -0400, R P Herrold wrote:

 On Sun, 24 Jul 2011, Always Learning wrote:
 
  If the pictures are named sequentially, why not store then at a 100 per
  directory structure something like this
 
  /pix/0/00/pix1.jpg
 
  /pix/0/26/pix02614.jpg
 
  /pix/6/72/pix67255.jpg
 
 Go read Knuth
 
 One does not do that because then one is counting on the end 
 user's data to conform to, and to continue to conform to your 
 expectations [here you have added an invisible constraint of 
 'pix' as the first part of the file name which you are 
 hoping remains constant -- it will not, as survey of naming 
 schemes used by digital camera makers will reveal].  Your 
 explicit constraint of a monotonicly increasing image number 
 is also not likely to be realized in a world where people will 
 erase or for other reasons not submit all of a given photo 
 shoot

I did begin with 'IF' :-)

Photo-shoot or whatever, using the 'rename' command means pictures can
adopt a uniform numbering system. There is no logical or genuine
practical reason to accept a disorganised mess. 

I have about 21,000+ pictures - all my own work. I can find and display
any of them within about 17 seconds (just timed myself) using basic
operating system commands.  (My database application is unfinished).


-- 
With best regards,

Paul.
England,
EU.


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] lots of small files in a folder on Linux centos

2011-07-24 Thread R P Herrold
On Sun, 24 Jul 2011, Keith Roberts wrote:

 By using a hash, we remove those constraints, and also gain
 the virtuous effect for free of self-organizing a relatively
 level dispersion of files to the destination directories

 Not followed the whole thread, but a SQL database index of
 the actual picture files, giving the path into the directory
 structure. Would that work?

Fortunately there is a full, and freely accessible of all 
posts to this mailing list.  The link to that archive is in 
the header of every message through this list.  As such you 
need not speculate

As I read the post initially, the problem was as stated in the 
subject line, and the database issue was not in the forefront

Per the initial problem description, the files were all 
splatted into a single directory.  The fastest database I know 
of is using the filesystem as a database; The addition of the 
hashing is just a pointer, and so also O(1)

Adding a database engine, with the overhead that it brings, 
and as the thread has already pointed out, in a domU as well 
(not usually the best place to add the overhead of a 
database), simply are additonal points of mis-design

“We should forget about small efficiencies, say about 97% of 
the time: premature optimization is the root of all evil. Yet 
we should not pass up our opportunities in that critical 3%. A 
good programmer will not be lulled into complacency by such 
reasoning, he will be wise to look carefully at the critical 
code; but only after that code has been identified”
   - Donald Knuth [1]

Once the implementation is 'correct', then it is time to do 
A:B testing to see where the really problem lies ... which 
testing was at the head of my initial post on this topic

-- Russ herrold

[1] http://pplab.snu.ac.kr/courses/adv_pl05/papers/p261-knuth.pdf

A person not willing to pony up $2.73 for a used copy of 'The 
Art of Computer Programming: Sorting and Searching. Volume 3', 
which discusses the specific problem space here, may wish to 
read and consider his rather nice lecture published by the 
ACM
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] lots of small files in a folder on Linux centos

2011-07-24 Thread Always Learning

On Sun, 2011-07-24 at 17:50 -0400, R P Herrold wrote:

 On Sun, 24 Jul 2011, Keith Roberts wrote:
 
  By using a hash, we remove those constraints, and also gain
  the virtuous effect for free of self-organizing a relatively
  level dispersion of files to the destination directories
 
  Not followed the whole thread, but a SQL database index of
  the actual picture files, giving the path into the directory
  structure. Would that work?

The answer must be 'yes' to a normal problem of identifying (searching
for) then retrieving data. MySQL would be a good choice.

Russ' adoration(?) of Donald KNUTH made me read the first page of

[1] http://pplab.snu.ac.kr/courses/adv_pl05/papers/p261-knuth.pdf

which includes this

This study focuses largely on two issues: (a) improved syntax for
iterations and error exits, making it possible to write a larger class
of programs clearly and efficiently without go to statements; (b) a
methodology of program design, beginning with readable and correct,
but possibly inefficient programs that are systematically transformed if
necessary into efficient and correct, but possibly less readable code.


A computer programmer can not change the syntax of the language he or
her is writing-in. The syntax of any programming language is determined
by the creator of that programming language.

Spaghetti-code is a trade-mark of confused programmers, usually of
little ability and certainly have never spend days trying to debug
someone else's programme. Spaghetti-code can always be avoided by a
clear understanding of what the user wants coupled with the programmer's
in depth understanding of how to implement the user's requirements in
the chosen programming language whilst remembering someone else may have
to maintain the programme.

Hashing file names is an interesting concept but a simple, and they are
very simple to write, MySQL db application running as HTML pages, with a
dash of PHP, makes the application universally accessible and easy to
use. Oh, and on Centos, amazingly quick to run :-)




-- 
With best regards,

Paul.
England,
EU.


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] lots of small files in a folder on Linux centos

2011-07-24 Thread Rajagopal Swaminathan
Greetings,


On Sun, Jul 24, 2011 at 2:59 PM, yonatan pingle
yonatan.pin...@gmail.com wrote:
 Hello,
 after looking into the website folders, i have found one folder which
 from my point of view is one of the causes for the server loads.


hmm... does mount dir -noatime -noadirtime help speed it up?


-- 
Regards,

Rajagopal
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] lots of small files in a folder on Linux centos

2011-07-24 Thread John R. Dennison
On Mon, Jul 25, 2011 at 06:38:33AM +0530, Rajagopal Swaminathan wrote:
 
 hmm... does mount dir -noatime -noadirtime help speed it up?

Just an FYI:

noatime is a superset that includes noadirtime.





John
-- 
You can safely assume you've created God in your own image when it turns
out that God hates all the same people you do.

-- Anne Lamott (10 April 1954-), American author, Bird by Bird


pgp0qSMUHthUt.pgp
Description: PGP signature
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] lots of small files in a folder on Linux centos

2011-07-24 Thread Les Mikesell
On 7/24/11 4:08 PM, Keith Roberts wrote:
 On Sun, 24 Jul 2011, R P Herrold wrote:

 By using a hash, we remove those constraints, and also gain
 the virtuous effect for free of self-organizing a relatively
 level dispersion of files to the destination directories

 Not followed the whole thread, but a SQL database index of
 the actual picture files, giving the path into the directory
 structure. Would that work?

You introduce new issues where the name in the database can't be managed 
atomically with the name in the directory that way.  Consider what might happen 
with concurrent operations trying to add different files with the same name - 
or 
perhaps an add and delete at the same times.

And it still doesn't help with the real problem unless you do something to 
break 
up the large directory.   Unix-like filesystems guarantee atomic operations in 
filename manipulation, so every time you try to create a file, the system must 
check that the name does not already exist, find an empty slot for the name and 
insert it with the directory locked against other changes until that is 
complete.  Filesystems that index directories can help with the lookup, with 
the 
tradeoff that additions require an index update.

-- 
Les Mikesell
 lesmikes...@gmail.com



___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos