Re: [9fans] du and find

2010-05-04 Thread Karljurgen Feuerherm
My impression as an undergraduate in CS was that most of my peers were
mechanics, rather than artists. They could ape things, but only few
could see past what was shown and apply the principles abstractly.

This may have to do with failure in the earlier education--I remember
that again, peers could do 'picture frame problems' but without any real
comprehension of the actual algebra.

On the other hand, it may just be a question of what human beings are,
and how few artists there are, proportionately speaking

K

>>> Gabriel Díaz  04/05/2010 12:56 pm >>>
Hello


(about students/trainees and perl)

Being able to recognize what you've studied in your daily work is quite
difficult in most places. Also your work objectives are rarely related
to the correctness, in the sense of science. I mean something correct or
well enough for the business could not be correct or well enough from
the science point of view.

Speaking about non programming-related business, for me, it's enough if
a student is able to use or ask for a programming language to solve a
task perl, vbscript or whatever. I've seen a couple of times students
matching two lists of thousands of entries by hand, either in paper or
in the original excel format. And I've seen mentors and managers agree
with the method. If they can write regexp, even ugly ones, that's
enough, you can show them alternatives, suggest other ways, etc.

The fail is not the school, or not completely. The tools are given to
you, it is not usual you can choose the tool you want to use to finish a
task. In nice places, you might be able to propose one. . .

slds.

gabi




Re: [9fans] du and find

2010-05-04 Thread Gabriel Díaz
Hello


(about students/trainees and perl)

Being able to recognize what you've studied in your daily work is quite 
difficult in most places. Also your work objectives are rarely related to the 
correctness, in the sense of science. I mean something correct or well enough 
for the business could not be correct or well enough from the science point of 
view.

Speaking about non programming-related business, for me, it's enough if a 
student is able to use or ask for a programming language to solve a task perl, 
vbscript or whatever. I've seen a couple of times students matching two lists 
of thousands of entries by hand, either in paper or in the original excel 
format. And I've seen mentors and managers agree with the method. If they can 
write regexp, even ugly ones, that's enough, you can show them alternatives, 
suggest other ways, etc.

The fail is not the school, or not completely. The tools are given to you, it 
is not usual you can choose the tool you want to use to finish a task. In nice 
places, you might be able to propose one. . .

slds.

gabi




- Original Message 
From: Jorden M 
To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
Sent: Tue, May 4, 2010 5:38:35 PM
Subject: Re: [9fans] du and find

On Tue, May 4, 2010 at 6:01 AM, Ethan Grammatikidis  wrote:
>
> On 3 May 2010, at 19:34, Jorden M wrote:
>
>> On Mon, May 3, 2010 at 10:53 AM, erik quanstrom 
>> wrote:
>>>>
>>>> It's always been easier for me to use python's/perl's regular
>>>> expressions when I needed to process a text file than to use plan9's.
>>>> For simple things, e.g. while editing an ordinary text in acme/sam,
>>>> plan9's regexps are just fine.
>>>
>>> i find it hard to think of cases where i would need
>>> such sophistication and where tokenization or
>>> tokenization plus parsing wouldn't be a better idea.
>>
>> A lot of the `sophisticated' Perl I've seen uses some horrible regexes
>> when really the job would have been done better and faster by a
>> simple, job-specific parser.
>>
>> I've yet to find out why this happens so much, but I think I can
>> narrow it to a combination of ignorance, laziness, and perhaps that
>> all-too-frequent assumption `oh, I can do this in 10 lines with perl!'
>> I guess by the time you've written half a parser in line noise, it's
>> too late to quit while you're behind.
>
> I think it's ignorance and something. I'm not sure what that something is. I
> am sure if you tried to suggest writing a parser to many of the
> open-sourcers I've talked to you would be treated as if you were suggesting

I can attest that it's not just open-source folk.

> a big job rather than a small one. "Why Write a Parser,"  they would ask,
> "when I can just scribble a few little lines of perl?"

That phenomenon is true, and if you take it further once that person
is done writing their abominable perl, and point out that they've
written a parser anyway, but poorly (not to mention one that would
have to be totally rewritten to be modified), they look at you
crosseyed and say `whatever.'

>
> Maybe it's humans' natural tendencies toward hierarchy coming into play.
> Stuff known by Teachers and Masters easily takes on a bizarre kind of
> importance, rank is unconsciously attached, and the student naturally but
> unconsciously feels he is not of sufficient rank to attempt the Master's
> Way. That explanation does pre-suppose humans have a very strong natural
> tendency to hierarchy. I find sufficient evidence within myself to believe
> it's true, as unpopular as the idea may be. Perhaps some people are more
> strongly inclined that way than others. Anyway, it's the only explanation I
> can imagine for the phenomena.
>

Pretty much. Like Raschke mentioned, people as students are
conditioned to think that parsers are hard to do because they're a
piece of a compiler, and that Dragon book is too big and scary and
only Gods can write compilers and parsers, etc.. Another function of
the `parsers are too hard' mentality is that people don't recognize
the difference between something that's regular and something that's
CF, and spend days scratching their head wondering why their regexes
break all over the place. Situations often become complicated when
self-proclaimed perl experts drop in and go, `oh here, you just add
this case and that case and you should be fine X% of the time!', where
X is a BS figure pulled out of you know where.

I think what we have here can be construed as a failure of CS
education, which fits right in with the many failures of education at
large.

>>

Re: [9fans] du and find

2010-05-04 Thread Jorden M
On Tue, May 4, 2010 at 6:01 AM, Ethan Grammatikidis  wrote:
>
> On 3 May 2010, at 19:34, Jorden M wrote:
>
>> On Mon, May 3, 2010 at 10:53 AM, erik quanstrom 
>> wrote:

 It's always been easier for me to use python's/perl's regular
 expressions when I needed to process a text file than to use plan9's.
 For simple things, e.g. while editing an ordinary text in acme/sam,
 plan9's regexps are just fine.
>>>
>>> i find it hard to think of cases where i would need
>>> such sophistication and where tokenization or
>>> tokenization plus parsing wouldn't be a better idea.
>>
>> A lot of the `sophisticated' Perl I've seen uses some horrible regexes
>> when really the job would have been done better and faster by a
>> simple, job-specific parser.
>>
>> I've yet to find out why this happens so much, but I think I can
>> narrow it to a combination of ignorance, laziness, and perhaps that
>> all-too-frequent assumption `oh, I can do this in 10 lines with perl!'
>> I guess by the time you've written half a parser in line noise, it's
>> too late to quit while you're behind.
>
> I think it's ignorance and something. I'm not sure what that something is. I
> am sure if you tried to suggest writing a parser to many of the
> open-sourcers I've talked to you would be treated as if you were suggesting

I can attest that it's not just open-source folk.

> a big job rather than a small one. "Why Write a Parser,"  they would ask,
> "when I can just scribble a few little lines of perl?"

That phenomenon is true, and if you take it further once that person
is done writing their abominable perl, and point out that they've
written a parser anyway, but poorly (not to mention one that would
have to be totally rewritten to be modified), they look at you
crosseyed and say `whatever.'

>
> Maybe it's humans' natural tendencies toward hierarchy coming into play.
> Stuff known by Teachers and Masters easily takes on a bizarre kind of
> importance, rank is unconsciously attached, and the student naturally but
> unconsciously feels he is not of sufficient rank to attempt the Master's
> Way. That explanation does pre-suppose humans have a very strong natural
> tendency to hierarchy. I find sufficient evidence within myself to believe
> it's true, as unpopular as the idea may be. Perhaps some people are more
> strongly inclined that way than others. Anyway, it's the only explanation I
> can imagine for the phenomena.
>

Pretty much. Like Raschke mentioned, people as students are
conditioned to think that parsers are hard to do because they're a
piece of a compiler, and that Dragon book is too big and scary and
only Gods can write compilers and parsers, etc.. Another function of
the `parsers are too hard' mentality is that people don't recognize
the difference between something that's regular and something that's
CF, and spend days scratching their head wondering why their regexes
break all over the place. Situations often become complicated when
self-proclaimed perl experts drop in and go, `oh here, you just add
this case and that case and you should be fine X% of the time!', where
X is a BS figure pulled out of you know where.

I think what we have here can be construed as a failure of CS
education, which fits right in with the many failures of education at
large.

>>
>>>
>>> for example, you could write a re to parse the output
>>> of ls -l and or ps.  but awk '{print $field}' is so much
>>> easier to write and read.
>>>
>>> so in all, i view perl "regular" expressions as a tough sell.
>>> i think they're harder to write, harder to read, require more
>>> and more unstable code, and slower.
>>>
>>> one could speculate that perl, by encouraging a
>>> monolithic, rather than tools-based approach;
>>> and cleverness over clarity made perl expressions
>>> the logical next step.  if so, i question the assumptions.
>>>
>>> - erik
>>>
>>>
>>
>
> --
> Simplicity does not precede complexity, but follows it. -- Alan Perlis
>
>
>



Re: [9fans] du and find

2010-05-04 Thread Robert Raschke
On Tue, May 4, 2010 at 11:01 AM, Ethan Grammatikidis wrote:

> On 3 May 2010, at 19:34, Jorden M wrote:
>
>> I've yet to find out why this happens so much, but I think I can
>> narrow it to a combination of ignorance, laziness, and perhaps that
>> all-too-frequent assumption `oh, I can do this in 10 lines with perl!'
>> I guess by the time you've written half a parser in line noise, it's
>> too late to quit while you're behind.
>>
>
> I think it's ignorance and something. I'm not sure what that something is.
> I am sure if you tried to suggest writing a parser to many of the
> open-sourcers I've talked to you would be treated as if you were suggesting
> a big job rather than a small one. "Why Write a Parser,"  they would ask,
> "when I can just scribble a few little lines of perl?"
>
>
I'd think it's simply not knowing that there are easier ways of doing it. It
is just not taught. Also, people learn about parsers in that really scary
module about compilers and never give them a second thought afterwards. And
anything else to do with strings is usually hopelessly complicated stuff
involving indices into character arrays.

Then there's the "kudos" of writing write-only code. Even the writer doesn't
understand it anymore, but nobody else knows that, so ...

I always found it a wee bit sad that Icon (http://www.cs.arizona.edu/icon/)
never really had much of an impact in the "let's take this string apart"
problem domain. If I need something quick and dirty, it's my "secret" tool
for "parsing" stuff quickly. String scanning is trivial.

Robby


Re: [9fans] du and find

2010-05-04 Thread Ethan Grammatikidis


On 3 May 2010, at 19:34, Jorden M wrote:

On Mon, May 3, 2010 at 10:53 AM, erik quanstrom  
 wrote:

It's always been easier for me to use python's/perl's regular
expressions when I needed to process a text file than to use  
plan9's.

For simple things, e.g. while editing an ordinary text in acme/sam,
plan9's regexps are just fine.


i find it hard to think of cases where i would need
such sophistication and where tokenization or
tokenization plus parsing wouldn't be a better idea.


A lot of the `sophisticated' Perl I've seen uses some horrible regexes
when really the job would have been done better and faster by a
simple, job-specific parser.

I've yet to find out why this happens so much, but I think I can
narrow it to a combination of ignorance, laziness, and perhaps that
all-too-frequent assumption `oh, I can do this in 10 lines with perl!'
I guess by the time you've written half a parser in line noise, it's
too late to quit while you're behind.


I think it's ignorance and something. I'm not sure what that something  
is. I am sure if you tried to suggest writing a parser to many of the  
open-sourcers I've talked to you would be treated as if you were  
suggesting a big job rather than a small one. "Why Write a Parser,"   
they would ask, "when I can just scribble a few little lines of perl?"


Maybe it's humans' natural tendencies toward hierarchy coming into  
play. Stuff known by Teachers and Masters easily takes on a bizarre  
kind of importance, rank is unconsciously attached, and the student  
naturally but unconsciously feels he is not of sufficient rank to  
attempt the Master's Way. That explanation does pre-suppose humans  
have a very strong natural tendency to hierarchy. I find sufficient  
evidence within myself to believe it's true, as unpopular as the idea  
may be. Perhaps some people are more strongly inclined that way than  
others. Anyway, it's the only explanation I can imagine for the  
phenomena.






for example, you could write a re to parse the output
of ls -l and or ps.  but awk '{print $field}' is so much
easier to write and read.

so in all, i view perl "regular" expressions as a tough sell.
i think they're harder to write, harder to read, require more
and more unstable code, and slower.

one could speculate that perl, by encouraging a
monolithic, rather than tools-based approach;
and cleverness over clarity made perl expressions
the logical next step.  if so, i question the assumptions.

- erik






--
Simplicity does not precede complexity, but follows it. -- Alan Perlis




Re: [9fans] du and find

2010-05-03 Thread Jorden M
On Mon, May 3, 2010 at 10:53 AM, erik quanstrom  wrote:
>> It's always been easier for me to use python's/perl's regular
>> expressions when I needed to process a text file than to use plan9's.
>> For simple things, e.g. while editing an ordinary text in acme/sam,
>> plan9's regexps are just fine.
>
> i find it hard to think of cases where i would need
> such sophistication and where tokenization or
> tokenization plus parsing wouldn't be a better idea.

A lot of the `sophisticated' Perl I've seen uses some horrible regexes
when really the job would have been done better and faster by a
simple, job-specific parser.

I've yet to find out why this happens so much, but I think I can
narrow it to a combination of ignorance, laziness, and perhaps that
all-too-frequent assumption `oh, I can do this in 10 lines with perl!'
I guess by the time you've written half a parser in line noise, it's
too late to quit while you're behind.

>
> for example, you could write a re to parse the output
> of ls -l and or ps.  but awk '{print $field}' is so much
> easier to write and read.
>
> so in all, i view perl "regular" expressions as a tough sell.
> i think they're harder to write, harder to read, require more
> and more unstable code, and slower.
>
> one could speculate that perl, by encouraging a
> monolithic, rather than tools-based approach;
> and cleverness over clarity made perl expressions
> the logical next step.  if so, i question the assumptions.
>
> - erik
>
>



Re: [9fans] du and find

2010-05-03 Thread Ethan Grammatikidis


On 3 May 2010, at 16:29, j...@9srv.net wrote:


On 3 May 2010, at 14:41, Steve Simon wrote:

Or just apply runs grep -r patch...


% man 1 grep | grep '\-r'
%


Key word being patch.


Oh right! Well, if the point of this thread is to talk about something  
better than grep -r in the first place...


eh, whateva.

--
Simplicity does not precede complexity, but follows it. -- Alan Perlis




Re: [9fans] du and find

2010-05-03 Thread Steve Simon
> > Or just apply runs grep -r patch...
> % man 1 grep | grep '\-r'

s/runs/ron's/

see 9fans passim for the patch.

-Steve



Re: [9fans] du and find

2010-05-03 Thread jake
> On 3 May 2010, at 14:41, Steve Simon wrote:
>> Or just apply runs grep -r patch...
> 
> % man 1 grep | grep '\-r'
> %
> 
Key word being patch.




Re: [9fans] du and find

2010-05-03 Thread Ethan Grammatikidis


On 3 May 2010, at 14:41, Steve Simon wrote:

on Plan 9 you'd probably want to make a wrapper for grep anyway if  
you

do a lot of recursive searching.


Or just apply runs grep -r patch...


% man 1 grep | grep '\-r'
%



-Steve



--
Simplicity does not precede complexity, but follows it. -- Alan Perlis




Re: [9fans] du and find

2010-05-03 Thread erik quanstrom
> It's always been easier for me to use python's/perl's regular
> expressions when I needed to process a text file than to use plan9's.
> For simple things, e.g. while editing an ordinary text in acme/sam,
> plan9's regexps are just fine.

i find it hard to think of cases where i would need
such sophistication and where tokenization or
tokenization plus parsing wouldn't be a better idea.

for example, you could write a re to parse the output
of ls -l and or ps.  but awk '{print $field}' is so much
easier to write and read.

so in all, i view perl "regular" expressions as a tough sell.
i think they're harder to write, harder to read, require more
and more unstable code, and slower.

one could speculate that perl, by encouraging a
monolithic, rather than tools-based approach;
and cleverness over clarity made perl expressions
the logical next step.  if so, i question the assumptions.

- erik



Re: [9fans] du and find

2010-05-03 Thread erik quanstrom
> http://betterthangrep.com/
> 
> it does not seem to work out of the box (expecting some unix paths), but
> since there's a perl port and that thing is supposed to be more or
> less self contained (for the standalone version), maybe it's not too
> much work for someone interested enough.

don't be silly.  russ wrote something like this in pure sh(1) for p9p,
g.  i reimplemented it in pure rc(1) and added gh (grep in headers)
and gf (grep function).  since g is an engine, you can add other
specialized search functions.

contrib quanstro/g

- erik



Re: [9fans] du and find

2010-05-03 Thread Steve Simon
> on Plan 9 you'd probably want to make a wrapper for grep anyway if you  
> do a lot of recursive searching.

Or just apply runs grep -r patch...

-Steve



Re: [9fans] du and find

2010-05-03 Thread Rudolf Sykora
On 3 May 2010 14:18, Akshat Kumar  wrote:
> Forgive my ignorance and irrelevance to this topic,
> but what are the advantages of Perl's regular
> expressions, over the implementation we have
> currently in Plan 9?

Regexps in Plan9 are on one hand much less powerful than Perl's, on
the other hand they are (thanks to their simplicity) much quicker.
Often one doesn't need Perl's power and in such a case Plan9's regexps
are better. But in sometimes...

Just compare:
http://www.amk.ca/python/howto/regex/
to
regexp(7)

... particularly e.g. Lookahead Assertions, Non-capturing and Named Groups.

It's always been easier for me to use python's/perl's regular
expressions when I needed to process a text file than to use plan9's.
For simple things, e.g. while editing an ordinary text in acme/sam,
plan9's regexps are just fine.

Also read Russ Cox text:
http://swtch.com/~rsc/regexp/regexp1.html

Ruda



Re: [9fans] du and find

2010-05-03 Thread Ethan Grammatikidis


On 3 May 2010, at 13:26, Mathieu Lonjaret wrote:


No idea, probably none.

that would not be the interesting point, if any.  it's just that the
tool is already there and (should be) simpler to use than piping
various commands around, as they illustrate below.


Ack looks cute, but I think a fairly simple shell script could do all  
of what ack does without requiring perl. I imagine it would be faster  
still by not using Perl's broken regular expressions, and bear in mind  
on Plan 9 you'd probably want to make a wrapper for grep anyway if you  
do a lot of recursive searching.


--
Simplicity does not precede complexity, but follows it. -- Alan Perlis




Re: [9fans] du and find

2010-05-03 Thread tlaronde
On Mon, May 03, 2010 at 02:26:07PM +0200, Mathieu Lonjaret wrote:
> No idea, probably none. 
> 
> that would not be the interesting point, if any.  it's just that the
> tool is already there and (should be) simpler to use than piping
> various commands around, as they illustrate below.

> Date: Mon, 3 May 2010 05:18:56 -0700
> From: Akshat Kumar 
> Subject: Re: [9fans] du and find
> To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
> 
> >From the website:
> 
> "ack is written purely in Perl, and takes advantage
> of the power of Perl's regular expressions."
> 
> Forgive my ignorance and irrelevance to this topic,
> but what are the advantages of Perl's regular
> expressions, over the implementation we have
> currently in Plan 9?

I had in fact the answer, a long time ago: because they simply do not
know that ed(1) exists, and sed(1) etc.

A group, providing an ISDN router based on Debian, was requiring a lot
of memory and disk space. I asked why??? that much for _that_?!! The
answer: we need perl(1) installed. But what for? Answer: to replace
@@GATEWAY@@ and so on by customized values in a file... (They didn't
even thought of building a distribution on a vulcan to simply install on
the target.)

They didn't know about ed(1). So I tell them regexp were ed(1); and
ed(1) was required by POSIX.2. And tried to make the demonstration... to
see that Debian didn't provide ed(1) by default. I ask Debian, why the
f...?!!? Answer: GNU's Not Unix...

And this day I realized I was not GNU. And switch to *BSD before asking
myself some questions that lead me to Plan9...
-- 
Thierry Laronde 
  http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C



Re: [9fans] du and find

2010-05-03 Thread Mathieu Lonjaret
No idea, probably none. 

that would not be the interesting point, if any.  it's just that the
tool is already there and (should be) simpler to use than piping
various commands around, as they illustrate below.
--- Begin Message ---
>From the website:

"ack is written purely in Perl, and takes advantage
of the power of Perl's regular expressions."

Forgive my ignorance and irrelevance to this topic,
but what are the advantages of Perl's regular
expressions, over the implementation we have
currently in Plan 9?


Thanks,
ak

On Mon, May 3, 2010 at 5:13 AM, Mathieu Lonjaret
 wrote:
> Hello,
>
> just because reviving old threads is fun...
> I've just found out about this:
>
> http://betterthangrep.com/
>
> it does not seem to work out of the box (expecting some unix paths), but
> since there's a perl port and that thing is supposed to be more or
> less self contained (for the standalone version), maybe it's not too
> much work for someone interested enough.
>
> Cheers,
> Mathieu
>
>
>
--- End Message ---


Re: [9fans] du and find

2010-05-03 Thread Akshat Kumar
>From the website:

"ack is written purely in Perl, and takes advantage
of the power of Perl's regular expressions."

Forgive my ignorance and irrelevance to this topic,
but what are the advantages of Perl's regular
expressions, over the implementation we have
currently in Plan 9?


Thanks,
ak

On Mon, May 3, 2010 at 5:13 AM, Mathieu Lonjaret
 wrote:
> Hello,
>
> just because reviving old threads is fun...
> I've just found out about this:
>
> http://betterthangrep.com/
>
> it does not seem to work out of the box (expecting some unix paths), but
> since there's a perl port and that thing is supposed to be more or
> less self contained (for the standalone version), maybe it's not too
> much work for someone interested enough.
>
> Cheers,
> Mathieu
>
>
>



Re: [9fans] du and find

2010-05-03 Thread Mathieu Lonjaret
Hello,

just because reviving old threads is fun...
I've just found out about this:

http://betterthangrep.com/

it does not seem to work out of the box (expecting some unix paths), but
since there's a perl port and that thing is supposed to be more or
less self contained (for the standalone version), maybe it's not too
much work for someone interested enough.

Cheers,
Mathieu




Re: [9fans] du and find

2010-01-02 Thread erik quanstrom
> Given the way Unix programs
> behave you can't replace arg list with an arg fd (I used to

didn't know this was "unixfans".  will keep that in mind.

- erik



Re: [9fans] du and find

2010-01-02 Thread Bakul Shah
On Sat, 02 Jan 2010 20:49:39 EST erik quanstrom   wrote:
> > And can eat up a lot of memory or even run out of it.  On a
> > 2+ year old MacBookPro "find -x /" takes 4.5 minutes for 1.6M
> > files and 155MB to hold paths.  My 11 old machine has 64MB
> > and over a million files on a rather slow disk. Your solution
> > would run out of space on it.
> 
> modern cat wouldn't fit in core on the early pdps unix was
> developed on!

No point in gratuitously obsoleting old machines.  I am
running FreeBSD-7.2 on it my 11yo machine and so far it has
stood up well enough.

> just to be fair, could you fit your 1.6m files on your 11yu machine?
> i'm guessing you couldn't.

Yes. It's on its third disk. A 6yo 80G IDE disk.

> > Basically this is just streams programming for arguments
> > instead of data. 
> 
> that's fine.  but it's no excuse to hobble exec.  not unless
> you're prepared to be replace argument lists with an argument
> fd.

Not sure how exec is hobbled.  Given the way Unix programs
behave you can't replace arg list with an arg fd (I used to
carry around a libary to do just that but the problem is all
the standard programs). Anyway, I don't see how xargs can be
gotten rid of.



Re: [9fans] du and find

2010-01-02 Thread Anthony Sorace

Rog said:


that's why breadth-first might be useful, by putting
shallower files earlier in the search results - i often
do grep foo *.[ch] */*.[ch] */*/*.[ch] to achieve
a similar result, but you have to guess the depth that way.


for what it's worth, dan's walk.c has a -d option for limiting search
depth. it's not breadth-first, but is still nice.




Re: [9fans] du and find

2010-01-02 Thread erik quanstrom
> And can eat up a lot of memory or even run out of it.  On a
> 2+ year old MacBookPro "find -x /" takes 4.5 minutes for 1.6M
> files and 155MB to hold paths.  My 11 old machine has 64MB
> and over a million files on a rather slow disk. Your solution
> would run out of space on it.

modern cat wouldn't fit in core on the early pdps unix was
developed on!

just to be fair, could you fit your 1.6m files on your 11yu machine?
i'm guessing you couldn't.

> Basically this is just streams programming for arguments
> instead of data. 

that's fine.  but it's no excuse to hobble exec.  not unless
you're prepared to be replace argument lists with an argument
fd.

- erik



Re: [9fans] du and find

2010-01-02 Thread Bakul Shah
On Sat, 02 Jan 2010 14:47:26 EST erik quanstrom   wrote:
> 
> my beef with xargs is only that it is used as an excuse
> for not fixing exec in unix.  it's also used to bolster the
> "that's a rare case" argument.

I often do something like the following:

  find . -type f  | xargs grep -l  | xargs 

If by "fixing exec in unix" you mean allowing something like

   $(grep -l  $(find . -type f ))

then  would take far too long to even get started.
And can eat up a lot of memory or even run out of it.  On a
2+ year old MacBookPro "find -x /" takes 4.5 minutes for 1.6M
files and 155MB to hold paths.  My 11 old machine has 64MB
and over a million files on a rather slow disk. Your solution
would run out of space on it.  Now granted I should update it
to a more balanced system but mechanisms should continue
working even if one doesn't have an optimal system.  At least
xargs gives me that choice.

Basically this is just streams programming for arguments
instead of data. Ideally all the args would be taken from a
stream (and specifying args on a command line would be just a
convenience) but it is too late for that.  Often unix
commands have a -r option to walk a file tree but it would've
been nicer to have the tree walk factored out. Then you can
do things like breadth first walk etc. and have everyone
benefit.



Re: [9fans] du and find

2010-01-02 Thread erik quanstrom
> i'm not saying it can't be passed in an argument list, just that
> xargs gives you a lazy evaluation of the walk
> of the file tree which can result in a faster result
> when the result is found earlier in the file list.

i have no problem with breadth-first.

my beef with xargs is only that it is used as an excuse
for not fixing exec in unix.  it's also used to bolster the
"that's a rare case" argument.

imho "rare case" arguments work best if the downside
is that the "rare case" is slow or awkward.  in this case
it's broken.

> that's why breadth-first might be useful, by putting
> shallower files earlier in the search results - i often
> do grep foo *.[ch] */*.[ch] */*/*.[ch] to achieve
> a similar result, but you have to guess the depth that way.

clearly i'm not in your league.  my source trees are
smaller than that.  no more than two levels.  or, it
doesn't matter.  on a few machines laying around
the house (details on the poorly-chosen disks here:
http://www.quanstro.net/plan9/fs.html)

i7 2666 0.36u 0.42s 7.35r
Atom 1605   0.83u 1.88s 8.85r
AMD64 2007  0.94u 0.97s 12.51r

and on the fast 10gbe stuff at coraid
Xeon5000 1865   0.45u 0.83s 4.33r   10gbe myricom
PIV/Xeon 3003   0.66u 1.41s 7.32r   i82573

(it would be fun to put the i7 with 10gbe
together!)

it's easy to try the completely uncached case at
coraid since the working set is about 7gb and the
cache is only 3.5

Xeon5000 1874   0.50u 0.84s 27.63r

- erik



Re: [9fans] du and find

2010-01-02 Thread roger peppe
2010/1/2 erik quanstrom :
>> and /sys/src isn't by any means the largest tree i like to grep
>> (for instance, searching for lost files with a name i longer remember,
>> i've been known to search through all the files in my home directory,
>> ~425000 files at last count)
>>
>> sometimes i think it would be nice if du had a breadth-first option.
>
> aren't you contridicting yourself?  at 128 characters/file,
> that's only 52mb -- 2% of memory on a typical system these days.
> why can't it be passed as an argument list?

i'm not saying it can't be passed in an argument list, just that
xargs gives you a lazy evaluation of the walk
of the file tree which can result in a faster result
when the result is found earlier in the file list.

that's why breadth-first might be useful, by putting
shallower files earlier in the search results - i often
do grep foo *.[ch] */*.[ch] */*/*.[ch] to achieve
a similar result, but you have to guess the depth that way.



Re: [9fans] du and find

2010-01-02 Thread anonymous
Yes, you are right. I have forgot about cache. But probably cache is
the reason why du -a takes 25s?




Re: [9fans] du and find

2010-01-02 Thread erik quanstrom
> On Fri, Jan 01, 2010 at 09:02:28PM -0500, erik quanstrom wrote:
> > > you've got a fast system.
> > > in at least one system i use, du -a of /sys/src takes about 25s.
> > 
> > i have a humble 2y.o. single-core 35w celeron as a fileserver.
> > 
> Speed of `du' depends on I/O, not CPU.

really?  have you tested this?  i've always found the
two to be related.

first, most fileservers have an in-memory block cache.
unless your active set is really big, most directories should
be in the in-memory cache.

when in memory cache, the time it takes to acquire
block cache locks and copy data dominates.  for
fossil+venti this factor is multiplied by 2 plus 2 trips
through the kernel.  so for memory-cached blocks,
fileserver speed is entirely dependent on network+
cpu.  the proportion of memory cache is of course
proportional to one's cache sizes.

sure, you can take this to extremes where the size of
the memory cache is so small, that the memory cache
doesn't matter, or the speed of the network is so slow
(find /n/sources) that nothing other than disk io or
network speed matters.

second, each directory read requires a number of 9p messages.
in the current system, each incurs the full rtt penality.
so the network latency is a really big factor in du
performance.

you can test to see how the du speed is related to
network performance very easily if you have the right
sort of network card.  just adjust the interrupt coalesing
values.  (Tidv/Tadv in ether82563, or 
echo coal $µs>/net/ether$n/clone for etherm10g)

and the cool thing is that especially for tcp, i've
found cpu speed and network latency to be pretty
imporant.

- erik



Re: [9fans] du and find

2010-01-01 Thread anonymous
On Fri, Jan 01, 2010 at 09:02:28PM -0500, erik quanstrom wrote:
> > you've got a fast system.
> > in at least one system i use, du -a of /sys/src takes about 25s.
> 
> i have a humble 2y.o. single-core 35w celeron as a fileserver.
> 
Speed of `du' depends on I/O, not CPU.




Re: [9fans] du and find

2010-01-01 Thread erik quanstrom
> because the limit is big enough that cases that break the
> limit almost never happen except in this case?

we can easily fit all the files in most any system in memory.
why shouldn't that be the limit?   see below.

> > i'm not sure i understand when and why this would be useful.  nobody
> > has a real worm anymore.  i can walk /sys/src in 0.5s.
> 
> you've got a fast system.
> in at least one system i use, du -a of /sys/src takes about 25s.

i have a humble 2y.o. single-core 35w celeron as a fileserver.

> and /sys/src isn't by any means the largest tree i like to grep
> (for instance, searching for lost files with a name i longer remember,
> i've been known to search through all the files in my home directory,
> ~425000 files at last count)
> 
> sometimes i think it would be nice if du had a breadth-first option.

aren't you contridicting yourself?  at 128 characters/file,
that's only 52mb -- 2% of memory on a typical system these days.
why can't it be passed as an argument list?

- erik



Re: [9fans] du and find

2010-01-01 Thread roger peppe
2010/1/2 erik quanstrom :
> using xargs does work around the problem.  but then why not
> go all the way and remove ` from rc?  after all, ` only works some
> of the time?

because the limit is big enough that cases that break the
limit almost never happen except in this case?

> i'm not sure i understand when and why this would be useful.  nobody
> has a real worm anymore.  i can walk /sys/src in 0.5s.

you've got a fast system.
in at least one system i use, du -a of /sys/src takes about 25s.

and /sys/src isn't by any means the largest tree i like to grep
(for instance, searching for lost files with a name i longer remember,
i've been known to search through all the files in my home directory,
~425000 files at last count)

sometimes i think it would be nice if du had a breadth-first option.



Re: [9fans] du and find

2010-01-01 Thread erik quanstrom
> i don't really see why xargs (the idea, not the usual unix implementations)
> is inherently such a bad idea. years ago i wrote an ultra simple version
> with no options, and it's about 80 lines of code, which i use to grep
> through all of /sys/src for example.

that's interesting.  my objection to xargs is all about the idea.
the usual way of doing things breaks if there are too many files.
using xargs does work around the problem.  but then why not
go all the way and remove ` from rc?  after all, ` only works some
of the time?

> if you always split on \n (which is banned in filenames in plan 9) and
> don't interpret any other metacharacters, what's the problem?
> 
> it's also nice because you often get some results before you've
> walked the entire file tree.

i'm not sure i understand when and why this would be useful.  nobody
has a real worm anymore.  i can walk /sys/src in 0.5s.  grepping takes
about 12s; saving 1/24th the time (best case) doesn't seem like a
big win.  (the ratio is 10:1 on coraid's fileserver.)

- erik



Re: [9fans] du and find

2010-01-01 Thread roger peppe
2009/12/29 erik quanstrom :
> what seems more important to me is a way to unlimit the size
> of argv.  otherwise we'll need to go down the hideous xargs path.
> (apoligizes to hideous functions everywhere for the slur.)

i don't really see why xargs (the idea, not the usual unix implementations)
is inherently such a bad idea. years ago i wrote an ultra simple version
with no options, and it's about 80 lines of code, which i use to grep
through all of /sys/src for example.

if you always split on \n (which is banned in filenames in plan 9) and
don't interpret any other metacharacters, what's the problem?

it's also nice because you often get some results before you've
walked the entire file tree.



Re: [9fans] du and find

2009-12-30 Thread anonymous
Ok, so it is better to use
du -a | sed 's/^.*  //'




Re: [9fans] du and find

2009-12-29 Thread Rob Pike
The 'g' is unnecessary.

-rob

On Wed, Dec 30, 2009 at 4:59 AM, Tim Newsham  wrote:
>> It is suggested to use
>>   du -a | awk '{print $2}'
>> instead of find. But what if filename contains spaces? For example if
>> file is named "foo bar" then awk will output "foo" only.
>
> What about
>
>   du -a | sed 's/^[0-9]*//g'
>
> no loss on spaces in filenames.
> no loss on tabs in filenames.
>
> Tim Newsham | www.thenewsh.com/~newsham | thenewsh.blogspot.com
>
>



Re: [9fans] du and find

2009-12-29 Thread Don Bailey
Chicken dinner!

On Tue, Dec 29, 2009 at 10:59 AM, Tim Newsham  wrote:

> It is suggested to use
>>   du -a | awk '{print $2}'
>> instead of find. But what if filename contains spaces? For example if
>> file is named "foo bar" then awk will output "foo" only.
>>
>
> What about
>
>   du -a | sed 's/^[0-9]*//g'
>
> no loss on spaces in filenames.
> no loss on tabs in filenames.
>
> Tim Newsham | www.thenewsh.com/~newsham|
> thenewsh.blogspot.com
>
>


Re: [9fans] du and find

2009-12-29 Thread Tim Newsham

It is suggested to use
   du -a | awk '{print $2}'
instead of find. But what if filename contains spaces? For example if
file is named "foo bar" then awk will output "foo" only.


What about

   du -a | sed 's/^[0-9]*//g'

no loss on spaces in filenames.
no loss on tabs in filenames.

Tim Newsham | www.thenewsh.com/~newsham | thenewsh.blogspot.com



Re: [9fans] du and find

2009-12-28 Thread erik quanstrom
On Mon Dec 28 20:04:48 EST 2009, lyn...@orthanc.ca wrote:
> > what seems more important to me is a way to unlimit the size
> > of argv.  otherwise we'll need to go down the hideous xargs path.
> 
> How often have you run up against the current limit?  I've yet to hit
> it in anything other than contrived tests.  And even those took work.

several times a month; just often enough to be irritating.  since storage
is still going exponential, i expect this to get worse.

minooka; grep pattern `{find /sys}
grep: virtual memory allocation failed

- erik



Re: [9fans] du and find

2009-12-28 Thread Don Bailey
To be fair, the correct script on Plan 9 is academic. Just do what gets the
job done for you now. Don't go down an academic black hole. These guys have
been arguing about `find` since 2002.

D

On Mon, Dec 28, 2009 at 6:00 PM, anonymous  wrote:

> > While it's true that you'll have misses on tabs in filenames, it's much
> more
> > rare to have a tab in a filename than it is to have a space, yes?
> >
>
> I don't have spaces too, but correct script should not make any
> assumptions.
>
> There is interesting date on http://swtch.com/plan9history/:
> March 23, 1999   allow spaces in file names
>
> I think it will be better to just disallow whitespaces (spaces and
> tabs) in file names. Looks like that idea with using awk was there
> before whitespaces allowed, so there was no problem.
>
>
>


Re: [9fans] du and find

2009-12-28 Thread anonymous
> While it's true that you'll have misses on tabs in filenames, it's much more
> rare to have a tab in a filename than it is to have a space, yes?
>

I don't have spaces too, but correct script should not make any assumptions.

There is interesting date on http://swtch.com/plan9history/:
March 23, 1999   allow spaces in file names

I think it will be better to just disallow whitespaces (spaces and
tabs) in file names. Looks like that idea with using awk was there
before whitespaces allowed, so there was no problem.




Re: [9fans] du and find

2009-12-28 Thread Lyndon Nerenberg (VE6BBM/VE7TFX)
> what seems more important to me is a way to unlimit the size
> of argv.  otherwise we'll need to go down the hideous xargs path.

How often have you run up against the current limit?  I've yet to hit
it in anything other than contrived tests.  And even those took work.

> find and walk are about the same program.

There are a few versions about.  Dan's has the exactly right lack of
options to meet my needs. Others might too, but his is the version I
found first.

--lyndon




[9fans] du and find

2009-12-28 Thread erik quanstrom
>> ; du -a | awk '-F\t' '{print $2}' -
>
>All this nonsense because the dogmatists refuse to accept 
>/n/sources/contrib/cross/walk.c into the distribution.

find and walk are about the same program.  my version of
find started with andrey's.  his find page (http://mirtchovski.com/p9/find/)
is dated 31-jul-2004, predating the given walk.c by ~18 months,
though i don't know which was written first.

the reason i started fiddling with find was to see if it couldn't
go a bit faster than du. (it did.)

my cannonical examples of its use are
find | grep whereisthatfile
and
grep whereisthatfunction `{find /sys/src|grep '\.[chlsy]$'}

i don't think it's that important that it absolutely needs to
be in the distribution; it's a convience.

what seems more important to me is a way to unlimit the size
of argv.  otherwise we'll need to go down the hideous xargs path.
(apoligizes to hideous functions everywhere for the slur.)

- erik



Re: [9fans] du and find

2009-12-28 Thread Lyndon Nerenberg (VE6BBM/VE7TFX)
> du -a | awk '-F\t' '{print $2}' -

All this nonsense because the dogmatists refuse to accept 
/n/sources/contrib/cross/walk.c into the distribution.




Re: [9fans] du and find

2009-12-28 Thread Don Bailey
While it's true that you'll have misses on tabs in filenames, it's much more
rare to have a tab in a filename than it is to have a space, yes? There is
no loss on a single quote character. You're quoting the command line
argument.

On Mon, Dec 28, 2009 at 4:35 PM, erik quanstrom wrote:

> On Mon Dec 28 18:32:36 EST 2009, don.bai...@gmail.com wrote:
>
> > du -a | awk '-F\t' '{print $2}' -
> >
>
> lossage on tabs and ' in filenames.
>
> - erik
>
>


Re: [9fans] du and find

2009-12-28 Thread Don Bailey
du -a | awk '-F\t' '{print $2}' -

On Mon, Dec 28, 2009 at 4:25 PM, erik quanstrom wrote:

> i agree that du -a has a few holes.  too bad whitespace
> is allowed in file names.  i use the attached find.c.
> it's also available as contrib quanstro/find.  by default
> the output is quoted so that it can be reparsed properly
> with rc or gettokens.
>
> - erik


Re: [9fans] du and find

2009-12-28 Thread erik quanstrom
On Mon Dec 28 18:32:36 EST 2009, don.bai...@gmail.com wrote:

> du -a | awk '-F\t' '{print $2}' -
> 

lossage on tabs and ' in filenames.

- erik



Re: [9fans] du and find

2009-12-28 Thread erik quanstrom
i agree that du -a has a few holes.  too bad whitespace
is allowed in file names.  i use the attached find.c.
it's also available as contrib quanstro/find.  by default
the output is quoted so that it can be reparsed properly
with rc or gettokens.

- erik#include 
#include 
#include 

char*defargv[] = {".", 0};
char*fmt = "%q\n";
int flag[256];
uintdev ;
uinttype;
Biobuf  out;

void
warn(char *s)
{
if(flag['f'] == 0)
fprint(2, "find: %s: %r\n", s);
}

void
usage(void)
{
fprint(2, "usage: find [-1dfq] [path ...]\n");
exits("usage");
}

/*  if you think this scales you'd be wrong.  this is is 1/128th of a linear 
search.  */

enum{
Ncache  = 128,  /* must be power of two */
Cachebits   = Ncache-1,
};

typedef struct{
vlong   qpath;
uintdev;
uchar   type;
} Fsig;

typedef struct  Cache   Cache;
struct Cache{
Fsig*cache;
int n;
int nalloc;
} cache[Ncache];

void
clearcache(void)
{
int i;

for(i = 0; i < nelem(cache); i++)
free(cache[i].cache);
memset(cache, 0, nelem(cache)*sizeof cache[0]);
}

int
seen(Dir *dir)
{
Fsig*f;
Cache   *c;
int i;

c = &cache[dir->qid.path&Cachebits];
f = c->cache;
for(i = 0; i < c->n; i++)
if(dir->qid.path == f[i].qpath
&& dir->type == f[i].type
&& dir->dev == f[i].dev)
return 1;
if(i == c->nalloc){
c->nalloc += 20;
f = c->cache = realloc(c->cache, c->nalloc*sizeof *f);
}
f[c->n].qpath = dir->qid.path;
f[c->n].type = dir->type;
f[c->n].dev = dir->dev;
c->n++;
return 0;
}

int
dskip(Dir *d)
{
if(flag['1']){
if(dev == 0 && type == 0){
dev = d->dev;
type = d->type;
}
if(d->dev != dev || d->type != type)
return 0;
}
return 1;
}

int
skip(Dir *d)
{
if(strcmp(d->name, ".") == 0|| strcmp(d->name, "..") == 0 || seen(d))
return 1;
return 0;
}

void
find(char *name)
{
int fd, n;
Dir *buf, *p, *e;
char file[256];

if((fd = open(name, OREAD)) < 0) {
warn(name);
return;
}
Bprint(&out, fmt, name);
for(; (n = dirread(fd, &buf)) > 0; free(buf))
for(p = buf, e = p+n; p < e; p++){
snprint(file, sizeof file, "%s/%s", name, p->name);
if((p->qid.type&QTDIR) == 0 || !dskip(p)){
if(!flag['d'])
Bprint(&out, fmt, file);
}else if(!skip(p))
find(file);
}
close(fd);
}

void
main(int argc, char *argv[])
{
doquote = needsrcquote;
quotefmtinstall();

ARGBEGIN{
case 'd':
case 'f':
case '1':
flag[ARGC()] = 1;
break;
case 'q':
fmt = "%s\n";
break;
default:
usage();
}ARGEND

Binit(&out, 1, OWRITE);
if(argc == 0)
argv = defargv;
for(; *argv; argv++){
find(*argv);
clearcache();
}
Bterm(&out);
exits(0);
}



Re: [9fans] du and find

2009-12-28 Thread Steve Simon
> It is suggested to use
>du -a | awk '{print $2}'
> instead of find. But what if filename contains spaces?

how about

du -a | awk '{$1=""; print}'

This does print a leading space but is simple enough,
or perhaps

du -a | while(s=`{read}) echo $s(2-)

which is more accurate but arguably more complex.

-Steve



Re: [9fans] du and find

2009-12-28 Thread lucio
> du -a | awk '{print $2}'
du -a | awk '{$1=""; print}'

will be a good approximation...

++L




[9fans] du and find

2009-12-28 Thread anonymous
It is suggested to use
du -a | awk '{print $2}'
instead of find. But what if filename contains spaces? For example if
file is named "foo bar" then awk will output "foo" only.