Re: pro coding style

2012-12-03 Thread Toke Eskildsen
On Sat, 2012-12-01 at 17:18 +0100, Per Steffensen wrote:
 With change/merge-tracking in both system, the important thing must be
 that you do not have to throw the tracked information away before in
 you attempt to get your changes into the main repository.

People write commit messages in many different ways and have different
working habits. Inserting all commit messages from a patch would
probably be quite messy for some patches, e.g. I have a tendency to make
many small commits, where a lot of those are just added TODO's, cosmetic
enhancements or spelling corrections.

Of course, git's rebase would mitigate this. As a non-committer and
newly converted git user, I'd much prefer to use git for working on
Lucene/Solr patches. Michael Sokolov's analysis is spot on.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: pro coding style

2012-12-01 Thread Per Steffensen

Robert Muir wrote:


Right, I'm positive this (pull requests) is github :)
Well, as I said, I dont KNOW exactly where the border between git and 
github is, but I have a very mature logical sence and a fair amount of 
knowledge about information theory. The rest of the content in the mail 
is based only on my logical sence. I would love to be proved wrong and 
told that my logical sence played a trick on me :-)


My thoughts are that SVN is not dumb (always thinks the best about 
others :-) ), it just hasnt got enough information to help merging in a 
better way than what we have all experienced with SVN. If SVN is not 
dumb, the only way you can be better is by having more information. And 
I would really be mistaken if this is not the reason that git is able to 
act in a smarter way when merging. It simply has more information than SVN!


So what kind of information does it have that SVN does not have? My 
guess is, that it knows where code came from and how it developed 
throughout the life of a hierarchy of branches/forks with a history 
going all the way back to the original common ancestor of the branches. 
If you have this kind of information I would imagine you can do smarter 
merging, and most important you can maintain the detailed information 
about where code came from and how it developed even on the target side 
of the merge, so that you can also be smarter when that branch/fork has 
to be merged somewhere else. So when I said in a prior mail ...else I 
couldnt imagine how you get the advantages you get. Remember that when 
using git you actually run a repository on every developers local 
machines. When you commit, you commit only to you local repository. 
You need to push in order to have it upstreamed (as they call it) 
it was acutally an attempt to make an argument that all the smartness 
must be in git - I understand it was a very unclear argument.


But the thing is that the local repository is NOT github - it is git. I 
dont have github on my local machine :-) And it is information about the 
commits I did to my local git repository that potentially can make the 
entire process smarter. Without information from my local git repository 
about how I changed the code, it is not possible (again only based on my 
mature logical sence) for the receiving git-repository (eventually 
github) to be smarter than SVN. Therefore, since the clever information 
tracking needs to go on on my local machine, where only git (not github) 
lives, I argue that the smart thing is in git and not in github. 
Actually I couldnt imagine that a pull requests isnt just a convenient 
alternative to sending a mail to the owners/committers of the target 
git-fork asking them to downstream this and that commit from my source 
git-fork. I know it is a lot to claim, based alone on logical sence, but 
I trust my logical sence very much :-)


Below some thought about what kind of information you will need to be 
smarter than SVN - again just completely on top of my own head (I really 
dont know anything about git :-) ):


Imagine a file in a repository - content of file:
*abcd
efgh
ijkl
mnop
*
Now we fork (branch) into three forks - fork1, fork2 and fork3
On fork1 the content of the file is changed into
*abcd
1234
efgh
ijkl
mnop
*A line between original line #1 and #2 was inserted - a line with the 
content 1234


On fork2 the content of the file is changed into
*abcd
efxyz
ijkl
mnop
*The original line #2 was changed

Now you push the changes on fork1 into fork2. The net result on fork2 
obviously is

*abcd
1234
efxyz
ijkl
mnop

* In the meanwhile on fork3 the content of the file has changed into
*abcd
efgh
ij
mnop
*The original line #3 was changed

Now you want to push the changes on fork2 (where some of it came from 
fork1) into fork3
Basically you have the problem of merging the following two versions of 
the file:

*abcd
1234
efxyz
ijkl
mnop
*and
*abcd
efgh
ij
mnop
*In SVN you have no other information than the content of the two 
versions of the file about to be merged. With that amount of information 
it is impossible to make a solid decission about the net result -- 
merge conflict.
If you had information about the history of changes since the common 
original a line between original line #1 and #2 was inserted, the 
original line #2 was changed and the original line #3 was changed, 
there would be no doubt that the correct net result must be

*abcd
1234
efxyz
ij
mnop
*No merge conflict! I believe the thing about git is that is has this 
information and therefore that is can be smart.


So back to the my line of argument, that it has to be git and not github:
* The information needed to be smart in the final step has to be picked 
up in the early steps

* The early steps are (potentially) going on outside github
* Therefore github cannot be the smart one

A very important thing for this to work is that everything is a fork. In 
git a developer does not checkout a branch to do modifications on it. He 
forks the 

Re: pro coding style

2012-12-01 Thread Michael Sokolov

On 12/1/2012 7:59 AM, Per Steffensen wrote:


It is all about information - git has it, SVN doesnt. And my logical 
sence tells me that is has to be git and not github!


:-) Now tell me that I am stupid :-)
This kind of information (merge tracking) has been in svn since 1.5 (see 
http://subversion.apache.org/docs/release-notes/1.5.html#merge-tracking). I 
believe this perception of SVN dates from its early days, when merging 
was indeed much more difficult: you had to keep track of all the merges 
you had done, to avoid doing them again, and it was a huge mess.  That 
has pretty much been sorted out now.


Now it seems to me that the main advantage about git/github is that it 
doesn't create a strict boundary between committers and non-committers.  
As a committer, the two systems are basically the same up to differences 
in UI, convenience of tools, etc.


But for a non-committer, with SVN the situation is irritating if you 
submit patches that you continue to use, but are not accepted (or not in 
a timely way) into the main repository.  In such a case, you either have 
to abandon the use of source control (OUCH!), or you have to fork the 
entire project and maintain your own repo, with no tools for integrating 
with the main repo.


My understanding is that with git, you can maintain your own repo, and 
you have tools for taking changes from upstream repos, and also that the 
pull request mechanism may be more convenient than submitting 
patches.  So this sounds, on the whole, much more attractive for outside 
contributors.  I have to admit I've only fiddled with this a bit, so 
this is mostly based on what I've read and heard: please tell me that I 
am stupid!


-Mike


Re: pro coding style

2012-12-01 Thread Per Steffensen

Michael Sokolov skrev:
This kind of information (merge tracking) has been in svn since 1.5 
(see 
http://subversion.apache.org/docs/release-notes/1.5.html#merge-tracking).  
I believe this perception of SVN dates from its early days, when 
merging was indeed much more difficult: you had to keep track of all 
the merges you had done, to avoid doing them again, and it was a huge 
mess.  That has pretty much been sorted out now.
Ok, so the necessary tracking of data seems to be in both. One might be 
better that the other in some aspects and visa versa. I stand corrected.


Now it seems to me that the main advantage about git/github is that it 
doesn't create a strict boundary between committers and 
non-committers.  As a committer, the two systems are basically the 
same up to differences in UI, convenience of tools, etc. 

But for a non-committer, with SVN the situation is irritating if you 
submit patches that you continue to use, but are not accepted (or not 
in a timely way) into the main repository.  In such a case, you either 
have to abandon the use of source control (OUCH!), or you have to fork 
the entire project and maintain your own repo, with no tools for 
integrating with the main repo.


My understanding is that with git, you can maintain your own repo, and 
you have tools for taking changes from upstream repos, and also that 
the pull request mechanism may be more convenient than submitting 
patches.  So this sounds, on the whole, much more attractive for 
outside contributors.  I have to admit I've only fiddled with this a 
bit, so this is mostly based on what I've read and heard: please tell 
me that I am stupid!


-Mike


With change/merge-tracking in both system, the important thing must be 
that you do not have to throw the tracked information away before in you 
attempt to get your changes into the main repository. You certainly 
throw this information away when you create a dumb patch-file. Guess 
that we could make it work, if just outsiders where allowed to make 
branches in Apaches SVN - we are not :-) So I guess that is the main 
bennefit of git. It allows for forks from the main repository to live 
remote from the main repository - that is, I would be able to make a 
fork from the Apache git-repository (github or not), a fork that lives 
entirely on my system of servers. And when I want to forward changes 
into Apache it can go from my forked repository into the main repository 
(through upstreaming) without having to cross a border where the nice 
change/merge-tracking is lost. Still pretty sure that the stuff for this 
is all in git, but depeding on whether or not Apache would need access 
to my local repository (containing my fork) in order for the upstream 
from my repository to the Apache repository to be possible, when the 
actual action of accepting the upstream has to be on the Apache side, I 
dont know. With GitHub my repository would live the same place as 
Apaches and then certainly it would be possible. But why the discussion? 
Why not just GitHub?!


Regards, Per Steffensen



Re: pro coding style

2012-12-01 Thread Roman Chyla
On Fri, Nov 30, 2012 at 8:56 AM, Robert Muir rcm...@gmail.com wrote:



 On Fri, Nov 30, 2012 at 8:50 AM, Per Steffensen st...@designware.dkwrote:

 Robert Muir skrev:

  Is it really git? Because its my understanding pull requests aren't
 actually a git thing but a github thing.

 The distinction is important.


 Actually Im not sure. Have never used git outside github, but at least
 part of it has to be git and not github (I think) - or else I couldnt
 imagine how you get the advantages you get. Remember that when using git
 you actually run a repository on every developers local machines. When
 you commit, you commit only to you local repository. You need to push
 in order to have it upstreamed (as they call it)


 Right, I'm positive this (pull requests) is github :)

 I just wanted make this point: when we have discussions about using git
 instead of svn, I'm not sure it makes things easier on anyone, actually
 probably worse and more complex.

 Its the github workflow that contributors want (I would +1 some scheme
 that supports this!), but git by itself, is pretty unusable.

 Github is like a nice front-end to this mess.


This is like a medicine to me! With all the craze about git (and we use it
for our main project and also for solr development) it just confirms my 3
years-long experience. Git is pain. Github is great (too bad there is git
behind it ;))

And now the problems of forks - with git the fork is the natural evil - git
just makes it established practice. But it still doesn't save us from the
(slow) process of incorporating new patches. While it is inevitable and we
cannot be more grateful to all the committers for their hard work (really
thanks!) perhaps there is a way to make solr/lucene more sandbox friendly?

In our organization we are doing something similar (to using SOLR as a
library), the automated build/deployment goes like this:

- checkout our sources
- downloadbuild solr sources
- compile our code
- merge with solr  test
- deploy

This avoids forking solr and we always develop against the chosen branch,
the pain was in porting the solr build infrastructure - if there was this
infrastructure inside solr, ready for developers to take advantage of it,
others were saved the pain or reinventing it. As far as I am aware, there
is only one hard problem - the confusing nature of the classloaders inside
webcontainers, i have really had hard time understanding it to make it
right - but there are surely more knowledgeable people here. And if the
worst comes to worst, the automated procedure could easily merge jars.
Sounds evil? Is forking Solr a better way?


roman


Re: pro coding style

2012-12-01 Thread Radim Kolar



or you will lose contributors
I think the type of people we are looking for tend to stick around ;)
I know several companies with forked SOLR. Why? Reason is that is 
fucking difficult to get their patches into SOLR in time. You are losing 
that way most valuable contributions. You need to work faster to keep 
them interested. Also you mentioned that contributor patches are for 
your project low priority, this is why you are losing them.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: pro coding style

2012-12-01 Thread Radim Kolar



Instead of educating others about what's good and bad how about if you
take some more time studying the sources of Lucene/ Solr and its build
system?
i did, i had to figure how to build that thing. In standardized maven 
environment: mvn package is all you need to do. No need to spend 
minutes reading ant scripts.

  Your observations are superficial to say the least: POM files are generated 
dynamically
Yes, its common sport in ant project to generate POM files for uploaded 
artifacts. You guys took it to next level? Generate poms with ant + then 
build with maven?


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: pro coding style

2012-12-01 Thread Israel Tsadok
On Fri, Nov 30, 2012 at 3:56 PM, Robert Muir rcm...@gmail.com wrote:


 Right, I'm positive this (pull requests) is github :)


Just a note - pull request has been a git concept before github embraced
and extended it. However, almost nobody uses the old meaning, and it's
really only useful for projects like the Linux kernel, where everything is
done through the mailing list.

http://stackoverflow.com/a/6235394/7581


Re: pro coding style

2012-12-01 Thread Itamar Syn-Hershko
In the past git had bad tooling, that is not the case today. I've been
using git also without github screens - and while they definitely add a
lot, it is still ten times more usable than SVN.

As I told the Lucene.NET mailing list, you should all watch the following
video and give git a few days of your time before continuing with this
discussion: http://www.youtube.com/watch?v=4XpnKHJAok8

Also, Apache mirrors to github, so basically you work against github all
the time


On Fri, Nov 30, 2012 at 4:15 PM, Robert Muir rcm...@gmail.com wrote:



 On Fri, Nov 30, 2012 at 9:10 AM, Mark Miller markrmil...@gmail.comwrote:


 On Nov 30, 2012, at 8:56 AM, Robert Muir rcm...@gmail.com wrote:

  but git by itself, is pretty unusable.

 Given the number of committers that eat some pain to use git when
 developing lucene/solr, and have no github or pull requests, I'm not sure
 that's a common though :)


 Sure, some people might disagree with me.
 I'm more than willing to eat some pain if it makes contributions easier.

 I just feel like a lot of what makes github successful is
 unfortunately actually in github and not git.

 Its like if your development team is screaming for linux machines. You
 have to be careful how to interpret that. If you hand them a bunch of
 machines with just linux kernels, they probably won't be productive. When
 they scream for linux they want a userland with a shell, compiler,
 X-windows, editor and so on too.




Re: pro coding style

2012-12-01 Thread Robert Muir
On Sun, Dec 2, 2012 at 2:13 AM, Israel Tsadok itsa...@gmail.com wrote:

 On Fri, Nov 30, 2012 at 3:56 PM, Robert Muir rcm...@gmail.com wrote:


 Right, I'm positive this (pull requests) is github :)


 Just a note - pull request has been a git concept before github embraced
 and extended it. However, almost nobody uses the old meaning, and it's
 really only useful for projects like the Linux kernel, where everything is
 done through the mailing list.

 http://stackoverflow.com/a/6235394/7581


Dude the old meaning ('git-request-pull' ) basically creates a patch at best

Thats not at all whats being discussed here.


Re: pro coding style

2012-12-01 Thread Robert Muir
Its also classic git brokenness to have confusing names like this, like a
command called git-request-pull that doesn't do anything like a pull
request.

These are the reasons why git is unusable!

On Sun, Dec 2, 2012 at 2:47 AM, Robert Muir rcm...@gmail.com wrote:



 On Sun, Dec 2, 2012 at 2:13 AM, Israel Tsadok itsa...@gmail.com wrote:

 On Fri, Nov 30, 2012 at 3:56 PM, Robert Muir rcm...@gmail.com wrote:


 Right, I'm positive this (pull requests) is github :)


 Just a note - pull request has been a git concept before github
 embraced and extended it. However, almost nobody uses the old meaning, and
 it's really only useful for projects like the Linux kernel, where
 everything is done through the mailing list.

 http://stackoverflow.com/a/6235394/7581


 Dude the old meaning ('git-request-pull' ) basically creates a patch at
 best

 Thats not at all whats being discussed here.



Re: pro coding style

2012-11-30 Thread Per Steffensen

Everything below is my humble opinion and input - DONT MEAN TO OFFEND ANYONE

Radim Kolar wrote:



what you should do:
* stuff i do
Like people with confidence, but it is a balance :-) Every decent 
developer in the world believes that he is the best in the world. Chance 
is that he is not. Be humble.


+
* ant - maven
Maven is a step forward, but it is still crap. Believe the original 
creator of ant has apologized in public for basing it on XML. Maven is 
also based on XML, besides being way to complex in infrastructure - 
goal, phases, environments, strange plugin with exections mapping to 
phases etc. XML is good for static data/config stuff, but build process 
is not static data/config - it is a process. Go gradle!
I dont have either, if i decide to go with SOLR instead of EC, i will 
fork it. It will save me lot of time.
We are baiscally handling our own version of Solr at my organization, 
because it is so hard go get contributions in - SOLR-3173,  SOLR-3178, 
SOLR-3382, SOLR-3428, SOLR-3383 etc - and lately SOLR-4114 and 
SOLR-4120. It is really hard keeping up with the latest versions of 
Apache Solr, because it is a huge job to merge new stuff into our Solr. 
We are considering to take the consequence and fork our own public (to 
let others bennefit and contribute) variant of Solr.


I understand that no committers are really assigned to focus on 
committing other peoples stuff, but it is a shame. I would really, 
really not like Solr to end up in a situation, where many organizations 
run their own little fork. Instead we should all collaborate on 
improving the one and only Solr! Maybe we should try to find a sponsor 
to pay for a full-time Solr committer with the main focus on verifying 
and committing contributions from the outside.

* svn - git (way better tools)
I think we had this discussion already and it seems that lots of 
folks are positive, yet there is still some barrier infrasturcuture 
wise along the lines.


dont blame infrastructure, other apache projects are using it.
Git is the way forward. It will also make comitting outside 
contributions easier (especially if the commit is to be performed after 
the branch has developed a lot since the pull-request was made). Merging 
among branches will also become easier. Why? Basically, since a pull 
request (request to merge) is a operation handled/know by git, i allows 
for git to maintain more information about where merged code fits into 
the code-base considering revisions etc. That information can be used to 
ease future or late merges.



* split code into small manageable maven modules
see above - we have a fully functional maven build but ant is out 
primary build.

i dont see pom.xml in your source tree.
Have a look at templates in dev-tools/maven. Do a ant 
-Dversion=$VERSION get-maven-poms to get your maven stuff generated in 
folder maven-build. Maven build does not work 100% out of the box, (at 
least on lucene_solr_4_0 branch) but it is very close.



* use github to track patches wait why is github good for patches?
you can track patch revisions and apply/browse/comment it easily. Also 
its way easier to upload it and do pull request then attach to ticket 
in jira.

See comments under git above

Besides that I have some additional input, now that we are talking

Basically that code is a mess. Not blaming anyone particular. Its 
probably to some extend the nature of open source. If someone honestly 
belive that the code-base is beautiful, they should find something else 
to do. Some of the major problems are

* Bad separation of concerns
** Very long classes/methods dealing with a lot of different concerns
*** Example: DistributedUpdateProcessor - dealing with 
cloud/standalone-, phases-, optimistic-locking, calculating values for 
document-fields (for add/inc/set requests), routing etc. This should all 
be separated into different classes each dealing with the a different 
concern
** Code dealing with a particular concern is spread all over the code - 
it makes it very hard to change strategy for this concern
*** Example: An obvious separate concern is routing (the decision 
about in which shard under a collection a particualr document belongs 
(should be indexed and found) and where particualr request needs to go - 
leaders, replica, all shards under the collection?). This concern is 
dealt with in a lot of places - DistributedUpdateProcessor, 
CloudSolrServer, RealTimeGetComponent, SearchHandler etc.
** In my patch for TLT-3178 I have made a separate concern called 
UpdateSemantics. It deals with decissions on stuff related to how 
updates should be performed, depending on what update-semantics you have 
choosen (classic, consistency or classic-consistency-hybrid). This class 
UpdateSemantics is used from the actual updating component 
DirectUpdateHandler2 - instead of having a lot of if-else-if-else 
statements in DirectUpdateHandler2 itself

* Copied code
** A lot of code is clearly just copied from another 

Re: pro coding style

2012-11-30 Thread Dawid Weiss
 I see your point about bringing up bugs nobody thought to cover manually, 
 but it also has cons - e.g. violating the principal that tests  should be 
 (easily) repeatable (you will/can end up with tests the sometimes fail and 
 sometimes succeed, and you have to dig out the  random values of the tests 
 that fail in order to be able to repeat/reconstruct the fail)

Randomized tests should be identical in their execution given the same
seed, it's the same principle as with regular tests but expands on
different code paths every time you execute with a different seed.
They are not a replacement for boundary condition tests, they're a
complementary thing that should allow picking things you haven't
thought of. Sure, in case of a failure you need to find the seed that
caused the problem but that doesn't seem like a lot of effort given
the potential profit.

If you want identical runs -- fix the initial seed.

If you have a non-deterministic test for a given fixed seed, it'd be
equally non-deterministic if no randomization was used, it's just a
flawed test (or inherently non-deterministic by nature so assertions
should be relaxed).

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: pro coding style

2012-11-30 Thread Per Steffensen

Per Steffensen skrev:

Spot on! Good arguments.
When you just do not think of randomized tests as a replacement for 
boundary condition tests etc
Thanks. Will consider randomized for my projects in the future - with 
limits :-)


Regards, Per Steffensen

Dawid Weiss skrev:

I see your point about bringing up bugs nobody thought to cover manually, but it 
also has cons - e.g. violating the principal that tests  should be (easily) repeatable (you 
will/can end up with tests the sometimes fail and sometimes succeed, and you have to dig out 
the  random values of the tests that fail in order to be able to repeat/reconstruct the 
fail)



Randomized tests should be identical in their execution given the same
seed, it's the same principle as with regular tests but expands on
different code paths every time you execute with a different seed.
They are not a replacement for boundary condition tests, they're a
complementary thing that should allow picking things you haven't
thought of. Sure, in case of a failure you need to find the seed that
caused the problem but that doesn't seem like a lot of effort given
the potential profit.

If you want identical runs -- fix the initial seed.

If you have a non-deterministic test for a given fixed seed, it'd be
equally non-deterministic if no randomization was used, it's just a
flawed test (or inherently non-deterministic by nature so assertions
should be relaxed).

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org


  






Re: pro coding style

2012-11-30 Thread Dawid Weiss
 When you just do not think of randomized tests as a replacement for
 boundary condition tests etc

I never claimed they were; in fact, I always make it very explicit
that it's just another tool for yet another type of problems. I
typically write the tests for the conditions I can think of and put a
randomized test as an addition. And guess what typically fails first
;)

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: pro coding style

2012-11-30 Thread Robert Muir
On Fri, Nov 30, 2012 at 8:50 AM, Per Steffensen st...@designware.dk wrote:

 Robert Muir skrev:

  Is it really git? Because its my understanding pull requests aren't
 actually a git thing but a github thing.

 The distinction is important.


 Actually Im not sure. Have never used git outside github, but at least
 part of it has to be git and not github (I think) - or else I couldnt
 imagine how you get the advantages you get. Remember that when using git
 you actually run a repository on every developers local machines. When
 you commit, you commit only to you local repository. You need to push
 in order to have it upstreamed (as they call it)


Right, I'm positive this (pull requests) is github :)

I just wanted make this point: when we have discussions about using git
instead of svn, I'm not sure it makes things easier on anyone, actually
probably worse and more complex.

Its the github workflow that contributors want (I would +1 some scheme that
supports this!), but git by itself, is pretty unusable.

Github is like a nice front-end to this mess.


Re: pro coding style

2012-11-30 Thread Mark Miller

On Nov 30, 2012, at 8:56 AM, Robert Muir rcm...@gmail.com wrote:

 but git by itself, is pretty unusable.

Given the number of committers that eat some pain to use git when developing 
lucene/solr, and have no github or pull requests, I'm not sure that's a common 
though :)

- Mark
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: pro coding style

2012-11-30 Thread Robert Muir
On Fri, Nov 30, 2012 at 9:10 AM, Mark Miller markrmil...@gmail.com wrote:


 On Nov 30, 2012, at 8:56 AM, Robert Muir rcm...@gmail.com wrote:

  but git by itself, is pretty unusable.

 Given the number of committers that eat some pain to use git when
 developing lucene/solr, and have no github or pull requests, I'm not sure
 that's a common though :)


Sure, some people might disagree with me.
I'm more than willing to eat some pain if it makes contributions easier.

I just feel like a lot of what makes github successful is unfortunately
actually in github and not git.

Its like if your development team is screaming for linux machines. You have
to be careful how to interpret that. If you hand them a bunch of machines
with just linux kernels, they probably won't be productive. When they
scream for linux they want a userland with a shell, compiler, X-windows,
editor and so on too.


Re: pro coding style

2012-11-30 Thread Adrien Grand
On Fri, Nov 30, 2012 at 3:48 PM, David Smiley (@MITRE.org) 
dsmi...@mitre.org wrote:

 RandomizedTesting for the win!  Thanks a ton Dawid.


+1

-- 
Adrien


Re: pro coding style

2012-11-30 Thread Yonik Seeley
On Fri, Nov 30, 2012 at 9:52 AM, Dawid Weiss
dawid.we...@cs.put.poznan.pl wrote:
 RandomizedTesting for the win!  Thanks a ton Dawid.

 I didn't invent this thing, I merely wrapped it up, cleaned up the
 rough edges and extracted to a stand-alone package. Lucene/Solr
 contributors should be credited for introducing the concept. And
 there's also research literature dating waaay back so I don't think
 the concept it entirely new -- it just never caught on.

Caught on slowly... I had been using it before I became a Lucene
committer in '05 and used it in Lucene/Solr for anything that had
enough complexity to warrant it.

https://issues.apache.org/jira/browse/LUCENE-395?focusedCommentId=12356746page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12356746

And one of my personal favorites, I think the first random indexing
test - TestStressIndexing2
https://issues.apache.org/jira/browse/LUCENE-1173?focusedCommentId=12567845page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12567845

But yeah, it's only become a religion here recently.
The support in the framework is certainly welcome!

-Yonik
http://lucidworks.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: pro coding style

2012-11-30 Thread Dawid Weiss
 Caught on slowly... I had been using it before I became a Lucene

Yep, so did I albeit in a sligtly different flavor -- always starting
from a static seed and running a certain number of randomized
iterations of things, usually higher level. Kind of sanity checking I
guess. I don't know why I hadn't thought of just picking a different
seed every time.

 But yeah, it's only become a religion here recently.

Come on, I don't think it's that bad :) We may differ in opinions on
certain things (like which tests to run and when) but I think everyone
shares the same overall goal of having well tested code.

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



pro coding style

2012-11-29 Thread Radim Kolar
if you talk about my yesterday work then no reformats were done because 
code was already properly formatted. Also all code was hand written, no 
generated code was used. Generated code is not committed to git anyway.


my hard limits for code quality (checked at commit):
* no findbugs warnings with level 14+
* code coverage 80%
* code coverage in critical parts 95%
* list of PMD warnings to stop commit
* generation of call tree graph - check it for cycles, checking for 
calling same procedure from different levels (indicates bad code flow)

* all eclipse warnings turned into errors
* patched eclipse compiler to do better flow analysis
* code reformatted at commit
* javadoc everything, no warnings

what you should do:
* stuff i do
   +
* ant - maven
* svn - git (way better tools)
* split code into small manageable maven modules
* get more people
* put trust into your testing, not into perfect people
* work faster
* use github to track patches
* use springs for integration testing
* use jenkins to do tests on incoming patches
* do library checks for number of functions really used
* contributor patches should be high priority or you will lose contributors

i am giving sometimes lessons: about 1-2 sessions per year for 14 
people, if i have spare time. But its waste of time, most ppl will not 
follow.


learn this:
SLOW CODING != BUG FREE CODE.
GOOD TESTS + GOOD STATIC TESTING = GOOD BUG FREE CODE
CODE STYLE != GAME WITH SPACES AND { }
GOOD TESTS =  2x TIME NEEDED TO CODE STUFF UNDER TEST
GOOD TESTS ARE MORE VALUABLE THEN GOOD CODE

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: pro coding style

2012-11-29 Thread Simon Willnauer
hey,

some comments inline...

On Thu, Nov 29, 2012 at 7:48 PM, Radim Kolar h...@filez.com wrote:
 if you talk about my yesterday work then no reformats were done because code
 was already properly formatted. Also all code was hand written, no generated
 code was used. Generated code is not committed to git anyway.

 my hard limits for code quality (checked at commit):
 * no findbugs warnings with level 14+
 * code coverage 80%
 * code coverage in critical parts 95%
 * list of PMD warnings to stop commit
 * generation of call tree graph - check it for cycles, checking for calling
 same procedure from different levels (indicates bad code flow)
 * all eclipse warnings turned into errors
 * patched eclipse compiler to do better flow analysis
 * code reformatted at commit
 * javadoc everything, no warnings

 what you should do:
 * stuff i do
+
 * ant - maven

I suggest you start with this, make sure you have enough time and
energy for the discussion.

 * svn - git (way better tools)

I think we had this discussion already and it seems that lots of folks
are positive, yet there is still some barrier infrasturcuture wise
along the lines.
 * split code into small manageable maven modules
see above - we have a fully functional maven build but ant is out
primary build. My honest opinion forget what I said above - don't try.
 * get more people
good point - can you refere us some, in my experience they are pretty
hard to find.

 * put trust into your testing, not into perfect people

ahh yeah testing, we should do that at some point

 * work faster

wow - I never thought about that though!
 * use github to track patches

wait why is github good for patches?

 * use springs for integration testing

sorry we are a no-dependency library.

 * use jenkins to do tests on incoming patches

patches welcome

 * do library checks for number of functions really used

hmm - we are a library?

 * contributor patches should be high priority or you will lose contributors

thats is a good advice for such a young project.

 i am giving sometimes lessons: about 1-2 sessions per year for 14 people, if
 i have spare time. But its waste of time, most ppl will not follow.

 learn this:
 SLOW CODING != BUG FREE CODE.
 GOOD TESTS + GOOD STATIC TESTING = GOOD BUG FREE CODE
 CODE STYLE != GAME WITH SPACES AND { }
 GOOD TESTS =  2x TIME NEEDED TO CODE STUFF UNDER TEST
 GOOD TESTS ARE MORE VALUABLE THEN GOOD CODE

lets drop the code its a hassle to maintain anyway!

thanks man,

this mail made my day!

simon

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: pro coding style

2012-11-29 Thread Radim Kolar



what you should do:
* stuff i do
+
* ant - maven

I suggest you start with this, make sure you have enough time and
energy for the discussion.
I dont have either, if i decide to go with SOLR instead of EC, i will 
fork it. It will save me lot of time.





* svn - git (way better tools)

I think we had this discussion already and it seems that lots of folks
are positive, yet there is still some barrier infrasturcuture wise
along the lines.

dont blame infrastructure, other apache projects are using it.


* split code into small manageable maven modules

see above - we have a fully functional maven build but ant is out
primary build.

i dont see pom.xml in your source tree.

good point - can you refere us some, in my experience they are pretty
hard to find.
i do not know people who believe that process designed to be slow is a 
good process. We here believe that fast process = high salary.



* use github to track patches
wait why is github good for patches?
you can track patch revisions and apply/browse/comment it easily. Also 
its way easier to upload it and do pull request then attach to ticket in 
jira.



* use springs for integration testing
sorry we are a no-dependency library.

scopetest/scope

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: pro coding style

2012-11-29 Thread Dawid Weiss
 i dont see pom.xml in your source tree.

Instead of educating others about what's good and bad how about if you
take some more time studying the sources of Lucene/ Solr and its build
system? Your observations are superficial to say the least: POM files
are generated dynamically, the test infrastructure is among the more
sophisticated things to be found; with multiple CI systems running the
code all the time, the coverage is great across JVMs, the
randomization really brings up bugs nobody thought to cover manually.

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org