Re: How to start for newbie

2008-06-04 Fir de Conversatie Andrei Aiordachioaie

Hello,

I have recently started to work on the regular expression code ... In
short, last summer two students have implemented a new version of the
regular expression engine. However, it is not yet included in the main
source code of Vim. As part of Google Summer of Code, my project deals
with including the new version of the regexp engine into vim, testing
it and making sure it does not break anything.

If you're still interested, I have a couple of notes on the
implementation of regexps at  
http://code.google.com/p/vim-soc2008-regexp/wiki/Regexp_Interface_Both_Engines
. You are more than welcome to also browse through the sources in the
project (just go to the "Source" tab).

Also, a very good article about regular expressions and their
efficient implementation is by Russ Cox, at 
http://swtch.com/~rsc/regexp/regexp1.html
. I really recommend reading this article, it is very clear and
concise. It describes the non-deterministic finite automata (NFA)
method of dealing with regexps.

Cheers,
Andrei

On May 28, 6:33 pm, Guang Yang <[EMAIL PROTECTED]> wrote:
> I want to start read vim source code from regular expression, but I found it 
> really hard for a newbie. Do you have any suggestion about how to start? I 
> think it is better we make this post a guideline for all the people who want 
> to contribute this gread project.
> _
> Discover the new Windows 
> Vistahttp://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE
--~--~-~--~~~---~--~~
You received this message from the "vim_dev" maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re: Summer of Code: Regexp

2008-03-31 Fir de Conversatie Andrei Aiordachioaie



On Mar 31, 12:54 am, "Ian Young" <[EMAIL PROTECTED]> wrote:
> Sorry to get back to you so late - here's what I can offer:
>
> As far as I'm aware, the code in the vim71-ian branch of the
> repository contains almost all of the stable work done by both myself
> and Xiaozhou, so that's the best place to look.  There's a bunch of
> testing code in that branch as well, but it isn't all documented
> (sorry).  The tools I've been using are vgrep, regtest, and the
> run_tests shell script (found in reg_test/).  Xiaozhou also wrote up a
> test file for use with 'make test', but I'm not well acquainted with
> its contents.
>
> On Fri, Mar 28, 2008 at 5:53 AM, Andrei Aiordachioaie<[EMAIL PROTECTED]> 
> wrote:
>
> >  From what I've looked at the test-cases, it seems that the NFA
> >  implementation is not greedy, as it should be. I will look more into
> >  it.
>
> It's greedy in its own way: IIRC, leftmost-first, with the exception
> of ordered alternation 
> (seehttp://groups.google.com/group/vim_dev/browse_thread/thread/9db490f9c...
> for a discussion of that feature).
>
> >  So for the project, I want to extend the test-suite to compare the way
> >  regexps are handled in the old vs the new engine. Maybe this uncovers
> >  other bugs. Then, the largest portion of the project would be fixing
> >  the found bugs. And if that takes little time, I could work on the old
> >  regexp engine bugs.
>
> The largest batch of test cases is in reg_test/files/basic.dat, which
> can be run with "./regtest --engine=nfa reg_test/files/basic.dat".
>
> This file has been modified so all tests succeed with the old vim
> matching engine.  So the failures there represent the differences
> between the old and new engines.  The --engine=[nfa,bt] flag on
> regtest and vgrep control which engine is used, so you can compare
> easily.  There are a few lingering bugs to be ironed out, but it seems
> like we're pretty close to a correct engine - more of the work will
> probably go into making it faster.
>
> >  Do you have any other ideas? Would this be enough
> >  for a 2.5 months project?
>
> Here's what I wrote to another student who enquired about the project:
>
> "The short answer is yes, there's more work to be done by another
> student.  I've been slowly working on fixing a few lingering problems
> in the code we wrote last summer (thus the commits you saw).  The code
> is very close to running correctly. However, it's not super fast at
> this point, so one big project might be optimizing the new code so
> that it is more comparable to the speed of the old engine on
> non-pathological cases.  There are also some more features that would
> be great to add (off the top of my head, a couple are multibyte
> characters and the \{n,m} construct).  And of course, there's a
> non-trivial amount of work in just preparing the code for inclusion in
> Vim's source.  I just haven't found the time this semester to do as
> much as I had hoped, so again, yes, I think another summer on this
> project would prove fruitful.  If you'd like a better idea of where
> development left off, I suggest poking through the archives of the
> group we used at <http://groups.google.com/group/vim-soc-regexp>.  The
> last couple commits I've made are not yet documented, so don't worry
> too much about those for the moment."
>
> Hope all this helps,
> Ian


Thanks a lot for your reply. Running regtest with the NFA engine
crashes for me right at the first test. The old engine passes all
tests though. I'll try to find a way to include them in the main
testing suite, along with the new engine.

Cheers,
Andrei
--~--~-~--~~~---~--~~
You received this message from the "vim_dev" maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re: Summer of Code: Regexp

2008-03-28 Fir de Conversatie Andrei Aiordachioaie



On Mar 20, 1:30 pm, Bram Moolenaar <[EMAIL PROTECTED]> wrote:
>
> Let's do the fast regexp work first.  It's easy to underestimate how
> much work this stuff is.

I looked at the updated regexp code that Xiaozhou Liu has maintained,
and it looks a lot closer to being included. The problems I see so far
with the new engine are:
- the three test cases that fail, but of course there may be more bugs
- compatibility with the old engine.

>From what I've looked at the test-cases, it seems that the NFA
implementation is not greedy, as it should be. I will look more into
it.

So for the project, I want to extend the test-suite to compare the way
regexps are handled in the old vs the new engine. Maybe this uncovers
other bugs. Then, the largest portion of the project would be fixing
the found bugs. And if that takes little time, I could work on the old
regexp engine bugs. Do you have any other ideas? Would this be enough
for a 2.5 months project?

The todo list mentions using regexp search in the gtk find&replace
dialog. That might also deserve some attention, though I imagine it's
pretty straightforward.

Cheers,
Andrei
--~--~-~--~~~---~--~~
You received this message from the "vim_dev" maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re: Summer of Code: Regexp

2008-03-28 Fir de Conversatie Andrei Aiordachioaie



On Mar 20, 1:30 pm, Bram Moolenaar <[EMAIL PROTECTED]> wrote:
> Andrei Aiordachioaie wrote:
> > I'm a final-year student from Jacobs University Bremen, studying
> > Computer Science. I am interested in improving the regular expression
> > code for Vim, as part of the Summer of Code, and maybe even
> > afterwards. The following are what I could do for the summer. Do you
> > think it's too little or too much?
>
> > I understand that a new regexp engine has been written during SoC last
> > year, but it is not officially included in vim. I would like to do the
> > necessary testing and integrate the new engine in the main code. So
> > far, I was only able to download the archives that the 2 students
> > submitted. Is there an existing svn repo that includes the vim code
> > and the new engine?
>
> Sounds good.  Look here for the work done so 
> far:http://code.google.com/p/vim-soc-regexp/
>
> > An idea that sounds really interesting is using approximate regular
> > expressions to find similar words. There are a number of projects that
> > have already implemented it, such as Agrep, libTre, or lib-bitap. Do
> > you think that we could use one of the existing libraries for the
> > algorithm itself ? Or would we have to reimplement it?
>
> Let's do the fast regexp work first.  It's easy to underestimate how
> much work this stuff is.
>
> > Also, regexps in vim look hard to read, because of the many escapes
> > that have to be used by default. Maybe we should consider enabling the
> > "very magic" in the global configuration file, when vim gets
> > installed ? This would make  vim regexps friendlier for newbies :-)
>
> No, because this breaks Vim script portability.  Currently there is the
> 'magic' option, and that is a problem already.  Most scripts assume it's
> on, thus if you switch it off lots of things will fall down.
>
> > What exactly is the T-search algorithm, for searching strings ? I
> > tried googling for "C't article, August 1997", but I don't think there
> > was anything relevant. Can someone point me in the right direction?
>
> I don't have a reference at hand...
>
> > There are a number of small things or bugs related to regexps that I
> > could help with, such as
> >Regexp: matchlist('12a4aaa', '^\(.\{-}\)\(\%5c\@<=a\+\)\(.\+\)\?')
> > returns ['12a4', 'aaa', '4aaa'], but should be ['12a4', 'aaa', '']
> > or
> >Recognize "[a-z]", "[0-9]", etc. and replace them with the faster
> > "\l" and "\d".
> > or
> >allowing ":23,45/pat/flags" to search for "pat" in lines 23 to 45?
>
> > Do you have any other ideas or suggestions?
>
> These are also nice, but the main work is to get the fast regexp engine
> included.  I'm sure this will take up the two months that a student has
> available.
>
> --
> hundred-and-one symptoms of being an internet addict:
> 161. You get up before the sun rises to check your e-mail, and you
>  find yourself in the very same chair long after the sun has set.
>
>  /// Bram Moolenaar -- [EMAIL PROTECTED] --http://www.Moolenaar.net  \\\
> ///sponsor Vim, vote for features --http://www.Vim.org/sponsor/\\\
> \\\download, build and distribute --http://www.A-A-P.org   ///
>  \\\help me help AIDS victims --http://ICCF-Holland.org   ///
--~--~-~--~~~---~--~~
You received this message from the "vim_dev" maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---



Re: test42 failure

2008-03-27 Fir de Conversatie Andrei Aiordachioaie



On Mar 27, 2:35 am, "Edward L. Fox" <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> Oh yes, it did corrupt.  I just copied the corresponding file from CVS
> and re-submitted it.  Any one please help me verify whether it is OK
> now?
>

Hi,

the SVN source still fails test 42 for me... but if I use the CVS
source, then all tests are ok. The problem is not test42.in, but
test42.ok ... They are still different. Could you check for that too?

Cheers,
Andrei
--~--~-~--~~~---~--~~
You received this message from the "vim_dev" maillist.
For more information, visit http://www.vim.org/maillist.php
-~--~~~~--~~--~--~---