find . -regex '.*js' -type f -exec md5sum '{}' \\; really slow!

2008-11-24 Thread Bartolomeo Nicolotti
Hi,

I'm using  the command:

/usr/bin/find . -type f -exec md5sum '{}' \\;

to compare the content of two subtree(161Mbytes) on different systems,
one linux, and the other on windows with cygwin.

The command on linux takes some seconds, while on windows+cygwin takes
some minutes.

Could some one help me to speed-up things on windows+cygwin?

Many thanks, best regards

B.Nicolotti

-- 
Bartolomeo Nicolotti
SIAP s.r.l.
www.siapcn.it
v.S.Albano 13 12049
Trinità(CN) Italy
ph:+39 0172 652553
centralino: +39 0172 652511
fax: +39 0172 652519


--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: find . -regex '.*js' -type f -exec md5sum '{}' \\; really slow!

2008-11-24 Thread Marco Atzeri

--- Bartolomeo Nicolotti  ha scritto:

 Hi,
 
 I'm using  the command:
 
 /usr/bin/find . -type f -exec md5sum '{}' \\;
 

try this
find . -type f  | xargs md5sum


 to compare the content of two subtree(161Mbytes) on
 different systems,
 one linux, and the other on windows with cygwin.
 
 The command on linux takes some seconds, while on
 windows+cygwin takes
 some minutes.
 

the -exec on cygwin is very slow.

http://cygwin.com/faq/faq-nochunks.html#faq.api.fork


 Could some one help me to speed-up things on
 windows+cygwin?
 

 Many thanks, best regards
 
 B.Nicolotti
 

Regards
Marco



  

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: find . -regex '.*js' -type f -exec md5sum '{}' \\; really slow!

2008-11-24 Thread Spiro Trikaliotis
Hello Bartolomeo,

* On Mon, Nov 24, 2008 at 04:27:29PM +0100 Bartolomeo Nicolotti wrote:
 
 /usr/bin/find . -type f -exec md5sum '{}' \\;
[...]
 Could some one help me to speed-up things on windows+cygwin?

Just a guess:

/usr/bin/find . -type f|xargs md5sum

(Background: I expect the execution of md5sum to take much time. Thus,
with the xargs approach, I make sure the md5sum is called less times,
with more than one parameter each time - which will *hopefully* decrease
the time)

HTH,
Spiro.

-- 
Spiro R. Trikaliotis  http://opencbm.sf.net/
http://www.trikaliotis.net/ http://www.viceteam.org/

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



RE: find . -regex '.*js' -type f -exec md5sum '{}' \\; really slow!

2008-11-24 Thread Bartolomeo Nicolotti
Hi,

but the command

find . -type f  | xargs md5sum

has problems with blanks in the name of the files:

md5sum: ./Pdf/1226503623_Offerta: No such file or directory
md5sum: Speciale: No such file or directory
md5sum: Vilnius.pdf: No such file or directory
md5sum: ./DynPkg/Fly/Old/Copy: No such file or directory
md5sum: of: No such file or directory
md5sum: volisearchstep3.php: No such file or directory
md5sum: ./DynPkg/Fly/Old/Copy: No such file or directory
md5sum: of: No such file or directory
md5sum: ./DynPkg/Fly/Old/Copy: No such file or directory
md5sum: of: No such file or directory
md5sum: flightconfirmandata.php: No such file or directory
md5sum: ./DynPkg/Fly/Old/Copy: No such file or directory
md5sum: of: No such file or directory
md5sum: volisearchstep3.php: No such file or directory
md5sum: ./Pdf/1226503623_Offerta: No such file or directory
md5sum: Speciale: No such file or directory
md5sum: Vilnius.pdf: No such file or directory

Many thanks, best regards.

B.Nicolotti

Il giorno lun, 24/11/2008 alle 17.03 +0100, Bartolomeo Nicolotti ha
scritto:
 Great!
 
 the command 
 
 /usr/bin/find . -type f -exec md5sum '{}' \\;
 
 takes 3min 10s
 
 the command
 
 /usr/bin/find . -type f -exec md5sum \{} +
 
 takes 25s.
 
 the command
 
 find . -type f  | xargs md5sum
 
 takes 17s
 
 Many thanks, best regards!
 
 B.Nicolotti
 
 Il giorno lun, 24/11/2008 alle 16.35 +0100, Jörg Schaible ha scritto:
  [EMAIL PROTECTED] wrote:
   Hi,
   
   I'm using  the command:
   
   /usr/bin/find . -type f -exec md5sum '{}' \\;
   
   to compare the content of two subtree(161Mbytes) on different systems,
   one linux, and the other on windows with cygwin.
   
   The command on linux takes some seconds, while on windows+cygwin
   takes some minutes. 
   
   Could some one help me to speed-up things on windows+cygwin?
  
  Don't run the command for each file, try to use as much files as you can on 
  one line (use '+' instead of ';'):
  
  /usr/bin/find . -type f -exec md5sum \{} +
  
  Hint: A fork is expensive in Cygwin ...
  
  - Jörg
-- 
Bartolomeo Nicolotti
SIAP s.r.l.
www.siapcn.it
v.S.Albano 13 12049
Trinità(CN) Italy
ph:+39 0172 652553
centralino: +39 0172 652511
fax: +39 0172 652519


--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



RE: find . -regex '.*js' -type f -exec md5sum '{}' \\; really slow!

2008-11-24 Thread Bartolomeo Nicolotti
Great!

the command 

/usr/bin/find . -type f -exec md5sum '{}' \\;

takes 3min 10s

the command

/usr/bin/find . -type f -exec md5sum \{} +

takes 25s.

the command

find . -type f  | xargs md5sum

takes 17s

Many thanks, best regards!

B.Nicolotti

Il giorno lun, 24/11/2008 alle 16.35 +0100, Jörg Schaible ha scritto:
 [EMAIL PROTECTED] wrote:
  Hi,
  
  I'm using  the command:
  
  /usr/bin/find . -type f -exec md5sum '{}' \\;
  
  to compare the content of two subtree(161Mbytes) on different systems,
  one linux, and the other on windows with cygwin.
  
  The command on linux takes some seconds, while on windows+cygwin
  takes some minutes. 
  
  Could some one help me to speed-up things on windows+cygwin?
 
 Don't run the command for each file, try to use as much files as you can on 
 one line (use '+' instead of ';'):
 
 /usr/bin/find . -type f -exec md5sum \{} +
 
 Hint: A fork is expensive in Cygwin ...
 
 - Jörg
-- 
Bartolomeo Nicolotti
SIAP s.r.l.
www.siapcn.it
v.S.Albano 13 12049
Trinità(CN) Italy
ph:+39 0172 652553
centralino: +39 0172 652511
fax: +39 0172 652519


--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: find . -regex '.*js' -type f -exec md5sum '{}' \\; really slow!

2008-11-24 Thread Mark J. Reed
On Mon, Nov 24, 2008 at 11:09 AM, Bartolomeo Nicolotti wrote:

 Hi,

 but the command

 find . -type f  | xargs md5sum

 has problems with blanks in the name of the files:

This isn't a general help list for UNIX tools; they work the same on
Cygwin as on UNIX.  I recommend you search for tutorials online;
http://www.softpanorama.org/Tools/Find/find_mini_tutorial.shtml looks
like it might be helpful for find.

The solution to your problem is the -print0 option to find, coupled
with the -0 option to xargs.

find . -type f -print0 | xargs -0 md5sum

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: find . -regex '.*js' -type f -exec md5sum '{}' \\; really slow!

2008-11-24 Thread Matthew Woehlke

Bartolomeo Nicolotti wrote:

but the command

find . -type f  | xargs md5sum

has problems with blanks in the name of the files:
[snip examples]


find . -type f -print0 | xargs -0 md5sum

...tells find to output \0-separated lines instead of \n-separated 
lines, and tells xargs to expect \0-separated args instead of 
whitespace-seoarated args.


(That said, I've always rather wondered why xargs doesn't have a mode to 
expect \n-separated args. There is of course the problem that file names 
might also contain \n (maybe not on 'doze, but on POSIX filesystems they 
can), though for less typing it seems most xargs input tends to be 
line-delineated anyway.


--
Matthew
Please do not quote my e-mail address unobfuscated in message bodies.
--
Is this thing on?


--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



RE: Re: find . -regex '.*js' -type f -exec md5sum '{}' \\; really slow!

2008-11-24 Thread Buchbinder, Barry (NIH/NIAID) [E]
The following may technically be off-topic.  If so, I apologize.

Matthew Woehlke wrote on Monday, November 24, 2008 12:46 PM:
 Bartolomeo Nicolotti wrote:
 but the command
 
 find . -type f | xargs md5sum
 
 has problems with blanks in the name of the files:
 [snip examples]
 
 find . -type f -print0 | xargs -0 md5sum

I've found that find is significantly slower than native tools.  (The following 
was run several times to fill any cached file system data.)

local hard disk (C:):

 time $(cygpath -u ${COMSPEC}) /c dir /s /b /a:-d | wc
  16085   16308  690388

real0m0.343s
user0m0.122s
sys 0m0.170s

networked drive:

 time $(cygpath -u ${COMSPEC}) /c dir /s /b /a:-d | wc
   11833093   66761

real0m3.078s
user0m0.075s
sys 0m0.108s

 time find . -type f | wc  ) /c d
   11833093   53748

real1m0.813s
user0m0.216s
sys 0m8.046s

Therefore, you might consider using something like this if there are no 
symbolic links* and it doesn't offend your sensibilities.  (* and other 
oddities.  I'm not sure how symbolic links work with find . -type f, so this 
might not be a problem.)

$(cygpath -u ${COMSPEC}) /c dir /s /b /a:-d | \
tr -s '\r\n' '\n' | \
cygpath -u -f - | \
tr '\n' '\0' | \
xargs -r0 md5sum

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/



Re: Re: find . -regex '.*js' -type f -exec md5sum '{}' \\; really slow!

2008-11-24 Thread Mark J. Reed
On Mon, Nov 24, 2008 at 5:53 PM, Buchbinder, Barry (NIH/NIAID) [E]  wrote:
  I'm not sure how symbolic links work with find . -type f, so this might not 
 be a problem.

find ignores symlinks by default; but if you specify -follow, then it
will descend into symbolic links to directories and return symbolic
links to files as matching -type f.


-- 
Mark J. Reed [EMAIL PROTECTED]

--
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple
Problem reports:   http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ:   http://cygwin.com/faq/