Re: Introducing codesearch.debian.net, a regexp code search engine

2012-11-07 Thread Chris Bannister
On Tue, Nov 06, 2012 at 07:05:43PM +0100, Michael Stapelberg wrote:
 Hi,
 
 I hereby announce a new Debian project: Debian Code Search.

[...]

 You can use the search engine at http://codesearch.debian.net/
 Here are a few sample queries:
 • http://codesearch.debian.net/search?q=workaround+package%3Alinux
 • http://codesearch.debian.net/search?q=XCreateWindow
 • http://codesearch.debian.net/search?q=AnyEvent%3A%3AI3+filetype%3Aperl
 
 The corresponding thesis (and source code, of course) will be released
 
 I hope you find it useful and would love to hear your feedback.

Just in case the correct use of English is important (and I hope it is)
then the lines:
amount of regexp results:
amount of source results:

should be altered to either:
Number of regexp results:
Number of source results:

or simply
regexp results:
source results:

See:
http://grammar.about.com/od/words/a/amount.htm
http://grammarist.com/usage/amount-number/


-- 
If you're not careful, the newspapers will have you hating the people
who are being oppressed, and loving the people who are doing the 
oppressing. --- Malcolm X


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20121107201157.GI24124@tal



Re: Introducing codesearch.debian.net, a regexp code search engine

2012-11-07 Thread Michael Stapelberg
Hi Chris,

Chris Bannister cbannis...@slingshot.co.nz writes:
 See:
 http://grammar.about.com/od/words/a/amount.htm
 http://grammarist.com/usage/amount-number/
Thanks. I have heard about this rule but must have forgotten it.

I changed the text and will push an update soon.

-- 
Best regards,
Michael


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/x6vcdhcfyh@midna.zekjur.net



Re: Introducing codesearch.debian.net, a regexp code search engine

2012-11-07 Thread Michael Stapelberg
Hi Neil,

Neil Williams codeh...@debian.org writes:
 That's just swamped by licences, as would be received and lots of other
 common words (which are, rightly or wrongly, used as variable names or
 as part of function names).
Well, of course searching for common words will result in a lot of
results. Asking the other way around: What is your expected result for
something like modify, even if comments were ignored?

 http://codesearch.debian.net/search?q=codehelp+filetype%3Aperl

 filetype:perl just doesn't seem to be working:
 http://codesearch.debian.net/search?q=QofBook+filetype%3Aperl
 ... lists a lot of .c files ...

 filetype:python does the same - some .py but then a lot more .c
Thanks, this is fixed now.

-- 
Best regards,
Michael


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/x6mwytcetq@midna.zekjur.net



Re: Introducing codesearch.debian.net, a regexp code search engine

2012-11-07 Thread Neil Williams
On Wed, 07 Nov 2012 21:56:17 +0100
Michael Stapelberg stapelb...@debian.org wrote:

 Neil Williams codeh...@debian.org writes:
  That's just swamped by licences, as would be received and lots of other
  common words (which are, rightly or wrongly, used as variable names or
  as part of function names).
 Well, of course searching for common words will result in a lot of
 results. Asking the other way around: What is your expected result for
 something like modify, even if comments were ignored?

function names and variables which contain the word modify...

bytes_received could be a very common variable, but it could also be
bytesReceived or received_bytes depending on the convention. It's just
the kind of thing to search for buffer overflows

My own initial query was QofBook.

http://codesearch.debian.net/search?q=QofBook+filetype%3Ac
http://codesearch.debian.net/search?q=QofBook+filetype%3Ac+package%3Aqof

Any variable/class which is used as a base struct/class across a library
or which is contained within a lot of other structs in a library is
going to come up again and again in documentation comments and in
class/struct definitions.

  http://codesearch.debian.net/search?q=codehelp+filetype%3Aperl
 
  filetype:perl just doesn't seem to be working:
  http://codesearch.debian.net/search?q=QofBook+filetype%3Aperl
  ... lists a lot of .c files ...
 
  filetype:python does the same - some .py but then a lot more .c
 Thanks, this is fixed now.

Now it's missing known hits:

http://codesearch.debian.net/search?skip=17q=noauth+filetype%3Aperl
Should find listings in multistrap, which this search finds:

http://codesearch.debian.net/search?q=noauth+package%3Amultistrap

Just because a file doesn't end in .pl, doesn't mean it isn't perl -
Policy mandates that perl in /usr/bin does not end in .pl Is this only
finding perl modules and perl scripts in /usr/share?

That's a bigger problem than the extra listings for comments.

e.g. http://codesearch.debian.net/search?q=codehelp+filetype%3Aperl

Now lists lots of .pm and .pl files but nothing else. dpkg-cross is
listed as a .pm but not as the executable dpkg-cross. wrap-lintian.pl
is listed but not multistrap. Grip.pm is listed but not emgrip.

-- 


Neil Williams
=
http://www.linux.codehelp.co.uk/



pgpDyovenF244.pgp
Description: PGP signature


Re: Introducing codesearch.debian.net, a regexp code search engine

2012-11-07 Thread Michael Stapelberg
Hi Neil,

Neil Williams codeh...@debian.org writes:
 Just because a file doesn't end in .pl, doesn't mean it isn't perl -
 Policy mandates that perl in /usr/bin does not end in .pl Is this only
 finding perl modules and perl scripts in /usr/share?
As the FAQ¹ states, this is filtering by file extension. I will keep
recognizing the actual file type in mind, but cannot make any promises
as to whether this will be implemented or not.

¹ = http://codesearch.debian.net/faq

-- 
Best regards,
Michael


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/x6k3txcbtt@midna.zekjur.net



Re: Introducing codesearch.debian.net, a regexp code search engine

2012-11-06 Thread Neil Williams
On Tue, 6 Nov 2012 19:05:43 +0100
Michael Stapelberg stapelb...@debian.org wrote:

 Debian Code Search is a search engine for program source code within
 Debian.
 
 It allows you to search all ≈ 17000 source packages,
 containing 130 GiB of FLOSS source code (including Debian
 packaging) with regular expressions.

It's pleasingly quick, which is always good. Might need to be able to
exclude the debian/ directory from searches.
 
 You can use the search engine at http://codesearch.debian.net/
 Here are a few sample queries:
 • http://codesearch.debian.net/search?q=workaround+package%3Alinux
 • http://codesearch.debian.net/search?q=XCreateWindow
 • http://codesearch.debian.net/search?q=AnyEvent%3A%3AI3+filetype%3Aperl
 
 The corresponding thesis (and source code, of course) will be released
 soon (2013-01-15 being the deadline, but I hope I can do it
 earlier).
 
 I hope you find it useful and would love to hear your feedback.

First thing which occurs to me is that I'd prefer a summary page as the
entry point for the search results - listing package, version and
possibly a link to the PTS, possibly the number of hits for that
package/package+version. First thing I've needed to do with every search
result so far is find a relevant package within the results. The search
results (and any summary page) should probably be sorted by package
name too - I'm getting results from packages starting with m before
package names starting with e.

Maybe extend the keywords to allow regexp matching on package names?

Another important step would be a way of excluding matches
within comments from the results.

The filetype seems a little confused in places too. Searching for
things in filetype:perl I get matches in debian/control and
debian/copyright.

-- 


Neil Williams
=
http://www.linux.codehelp.co.uk/



pgp4kXw5FLbf9.pgp
Description: PGP signature


Re: Introducing codesearch.debian.net, a regexp code search engine

2012-11-06 Thread alberto fuentes
2 words: Awe some

roughly speaking, how does it work internally?

On Tue, Nov 6, 2012 at 7:05 PM, Michael Stapelberg
stapelb...@debian.org wrote:
 Hi,

 I hereby announce a new Debian project: Debian Code Search.

 Debian Code Search is a search engine for program source code within
 Debian.

 It allows you to search all ≈ 17000 source packages,
 containing 130 GiB of FLOSS source code (including Debian
 packaging) with regular expressions.

 You can use the search engine at http://codesearch.debian.net/
 Here are a few sample queries:
 • http://codesearch.debian.net/search?q=workaround+package%3Alinux
 • http://codesearch.debian.net/search?q=XCreateWindow
 • http://codesearch.debian.net/search?q=AnyEvent%3A%3AI3+filetype%3Aperl

 The corresponding thesis (and source code, of course) will be released
 soon (2013-01-15 being the deadline, but I hope I can do it
 earlier).

 I hope you find it useful and would love to hear your feedback.

 --
 Best regards,
 Michael


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/calkubt69c2xtzf9qvn2shkfrkor+d0tz8t-kjgjsdetjkob...@mail.gmail.com



Re: Introducing codesearch.debian.net, a regexp code search engine

2012-11-06 Thread Mike Dupont
LOVE IT!!! THANK YOU SO MUCH


On Tue, Nov 6, 2012 at 12:05 PM, Michael Stapelberg
stapelb...@debian.orgwrote:

 Hi,

 I hereby announce a new Debian project: Debian Code Search.

 Debian Code Search is a search engine for program source code within
 Debian.

 It allows you to search all ≈ 17000 source packages,
 containing 130 GiB of FLOSS source code (including Debian
 packaging) with regular expressions.

 You can use the search engine at http://codesearch.debian.net/
 Here are a few sample queries:
 • http://codesearch.debian.net/search?q=workaround+package%3Alinux
 • http://codesearch.debian.net/search?q=XCreateWindow
 • http://codesearch.debian.net/search?q=AnyEvent%3A%3AI3+filetype%3Aperl

 The corresponding thesis (and source code, of course) will be released
 soon (2013-01-15 being the deadline, but I hope I can do it
 earlier).

 I hope you find it useful and would love to hear your feedback.

 --
 Best regards,
 Michael




-- 
James Michael DuPont
Member of Free Libre Open Source Software Kosova http://flossk.org
Saving wikipedia(tm) articles from deletion http://SpeedyDeletion.wikia.com
Contributor FOSM, the CC-BY-SA map of the world http://fosm.org
Mozilla Rep https://reps.mozilla.org/u/h4ck3rm1k3
Free Software Foundation Europe Fellow http://fsfe.org/support/?h4ck3rm1k3


Re: Introducing codesearch.debian.net, a regexp code search engine

2012-11-06 Thread Michael Stapelberg
Hi alberto,

alberto fuentes paj...@gmail.com writes:
 roughly speaking, how does it work internally?
It uses a trigram index and the RE2 regular expression engine.

My work is based on Russ Cox’s ideas and code published at
http://swtch.com/~rsc/regexp/regexp4.html

In case you are interested, I’m happy to send you (or anyone else) the
current draft of my thesis, which describes the system in much more
detail. In that case, just send me an email in private.

-- 
Best regards,
Michael


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/x6ehk6fqcw@midna.zekjur.net



Re: Introducing codesearch.debian.net, a regexp code search engine

2012-11-06 Thread Michael Stapelberg
Hi Neil,

Neil Williams codeh...@debian.org writes:
 It's pleasingly quick, which is always good. Might need to be able to
 exclude the debian/ directory from searches.
File regular expressions and a minus operator is already on the TODO
list :-).

 First thing which occurs to me is that I'd prefer a summary page as the
 entry point for the search results - listing package, version and
 possibly a link to the PTS, possibly the number of hits for that
 package/package+version. First thing I've needed to do with every search
 result so far is find a relevant package within the results. The search
 results (and any summary page) should probably be sorted by package
 name too - I'm getting results from packages starting with m before
 package names starting with e.
Changing the entry point of the search is not going to happen — I quite
like the interface it currently has. However, adding a list of packages
which are present in the current page of search results would be
possible. Note that displaying the entire list of matching packages is
unfortunately not possible because it’d require searching through all
the files, which is — depending on the query — absolutely impossible
when still wanting to guarantee a timely response :-).

 Maybe extend the keywords to allow regexp matching on package names?
I have also considered this. Probably I will resort to making the
filename keyword (not yet implemented) use regular expressions and keep
the package keyword an exact match. Since the package is part of the
filename, complex things are possible while easy matches stay easy :-).

 Another important step would be a way of excluding matches
 within comments from the results.
I have considered this, but when you think about it, identifiers
(variable names, function names, …) and comments are really are there is
searchable in source code. Could you give me a few convincing points on
why it would be useful to exclude comments (that is, examples)?

 The filetype seems a little confused in places too. Searching for
 things in filetype:perl I get matches in debian/control and
 debian/copyright.
Can you give me the exact query for which this happens, please?

-- 
Best regards,
Michael


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/x6390mfpmu@midna.zekjur.net



Re: Introducing codesearch.debian.net, a regexp code search engine

2012-11-06 Thread alberto fuentes
On Tue, Nov 6, 2012 at 9:06 PM, Michael Stapelberg
stapelb...@debian.org wrote:
 Hi alberto,

 alberto fuentes paj...@gmail.com writes:
 roughly speaking, how does it work internally?
 It uses a trigram index and the RE2 regular expression engine.

 My work is based on Russ Cox’s ideas and code published at
 http://swtch.com/~rsc/regexp/regexp4.html

That read was enough to satiate my questions on how it works. :)

Now some actual details would be appreciate.
Like size of database, size on memory, engine running, kind of
machine, number of nodes, etc...

Have you run any benchmark?

greets
aL


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/calkubt7iuvm8r0jgkajfzl4d-nrxalpinurfpdr4-t8cysa...@mail.gmail.com



Re: Introducing codesearch.debian.net, a regexp code search engine

2012-11-06 Thread Joachim Breitner
Hi,

Am Dienstag, den 06.11.2012, 19:05 +0100 schrieb Michael Stapelberg:
 I hereby announce a new Debian project: Debian Code Search.

Great!

 I hope you find it useful and would love to hear your feedback.

Since you have all code extracted anyways, could you extend the page to
allow for easy code browsing? Might be faster than apt-get source;
less ... sometimes.

Greetings,
Joachim

-- 
Joachim nomeata Breitner
Debian Developer
  nome...@debian.org | ICQ# 74513189 | GPG-Keyid: 4743206C
  JID: nome...@joachim-breitner.de | http://people.debian.org/~nomeata


signature.asc
Description: This is a digitally signed message part


Re: Introducing codesearch.debian.net, a regexp code search engine

2012-11-06 Thread Domenico Andreoli
On Tue, Nov 06, 2012 at 07:05:43PM +0100, Michael Stapelberg wrote:
 Hi,

Hi!

 I hereby announce a new Debian project: Debian Code Search.
 
 Debian Code Search is a search engine for program source code within
 Debian.

 It allows you to search all ??? 17000 source packages,
 containing 130 GiB of FLOSS source code (including Debian
 packaging) with regular expressions.

cool :)
 
 You can use the search engine at http://codesearch.debian.net/

nice

 I hope you find it useful and would love to hear your feedback.

yes, I think it is. it's an enabler kind of tool, people can study the
code in new ways and it has applications also in the security field. if
you consider that Debian is one of the more extended (and regularly used)
collections of software, I'm sure it will be the joy of many :)

cheers,
Domenico


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20121106215213.GA20829@glitch



Re: Introducing codesearch.debian.net, a regexp code search engine

2012-11-06 Thread Michael Stapelberg
Hi Joachim,

Joachim Breitner nome...@debian.org writes:
 Since you have all code extracted anyways, could you extend the page to
 allow for easy code browsing? Might be faster than apt-get source;
 less ... sometimes.
Very basic code browsing is on my agenda, but zack@ mentioned he wants
to build a new sources.debian.org. Maybe his project is what you are
looking for? :-)

-- 
Best regards,
Michael


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/x6fw4me621@midna.zekjur.net



Re: Introducing codesearch.debian.net, a regexp code search engine

2012-11-06 Thread Joachim Breitner
Hi,

Am Dienstag, den 06.11.2012, 23:10 +0100 schrieb Michael Stapelberg:
 Joachim Breitner nome...@debian.org writes:
  Since you have all code extracted anyways, could you extend the page to
  allow for easy code browsing? Might be faster than apt-get source;
  less ... sometimes.
 Very basic code browsing is on my agenda, but zack@ mentioned he wants
 to build a new sources.debian.org. Maybe his project is what you are
 looking for? :-)

either works. Or rather, both should be one (or at least appear as one,
e.g. search input field on sources.d.o; search results on codesearch.d.n
linking back to sources.d.o).

Greetings,
Joachim

-- 
Joachim nomeata Breitner
Debian Developer
  nome...@debian.org | ICQ# 74513189 | GPG-Keyid: 4743206C
  JID: nome...@joachim-breitner.de | http://people.debian.org/~nomeata


signature.asc
Description: This is a digitally signed message part


Re: Introducing codesearch.debian.net, a regexp code search engine

2012-11-06 Thread Neil Williams
On Tue, 06 Nov 2012 21:22:17 +0100
Michael Stapelberg stapelb...@debian.org wrote:

  Another important step would be a way of excluding matches
  within comments from the results.
 I have considered this, but when you think about it, identifiers
 (variable names, function names, …) and comments are really are there is
 searchable in source code. Could you give me a few convincing points on
 why it would be useful to exclude comments (that is, examples)?

Any search term which can be a variable name and frequently occurs in
licence headers or doxygen markup or email addresses (copyright).

(I dread to think what results come from searching just for 'debian',
even with filetype:c it's all licence headers / email addresses.)

http://codesearch.debian.net/search?q=QofBook+filetype%3Ac

Any similar term which is frequently used across doxygen-style API docs
will give a mix of comments and code.

e.g.
http://codesearch.debian.net/search?q=modify+filetype%3Ac
That's just swamped by licences, as would be received and lots of other
common words (which are, rightly or wrongly, used as variable names or
as part of function names).

Without exclusions on comments (and without fixes for filetype: matches
below) then any common word is going to be swamped.

  The filetype seems a little confused in places too. Searching for
  things in filetype:perl I get matches in debian/control and
  debian/copyright.
 Can you give me the exact query for which this happens, please?

http://codesearch.debian.net/search?q=codehelp+filetype%3Aperl

filetype:perl just doesn't seem to be working:
http://codesearch.debian.net/search?q=QofBook+filetype%3Aperl
... lists a lot of .c files ...

filetype:python does the same - some .py but then a lot more .c

-- 


Neil Williams
=
http://www.linux.codehelp.co.uk/



pgpW7FiDGAXan.pgp
Description: PGP signature