On 13.11.2012 11:18, Knut Anders Hatlen wrote:
Lubos Kosco <[email protected]> writes:
Hi guys, can you have a look please?
Thanks, Lubos. I had a quick look at the patch (skipped quickly past the
Thank you for the review kah ;)
big changes to the NetBeans project files...), and it looks mostly fine
to me. I'm not a Lucene expert, though.
Some questions:
I saw many changes like this one:
- }
- doc.add(new Field("date", date, Field.Store.YES,
Field.Index.NOT_ANALYZED));
+ }
+ doc.add(new Field("date", date, StringField.TYPE_STORED));
According to http://lucene.apache.org/core/4_0_0/MIGRATE.html, one
should also call ft.setOmitNorms(false) to preserve the original
semantics for this particular combination of arguments. Is this
something we need to do?
good catch, I had it there before, so it's my mistake when doing
merges(I had some fighting with the reusable components in Analyzers :(
, ev. ApacheCon visit solved it :-D ), will check all fieldtypes used
There are also many new instance variables in the analyzers (for the
TokenStreamComponents and tokenizers used in the new createComponents()
methods), but I don't see that they are ever read except in the local
scope where they are assigned a value. Could they be changed to local
variables?
I guess yes, they are cached by Analyzer APIs after created once and all
subsequent tokenStream calls reuse the cache, or create the object(s)
I will change where appropriate and no other reuse occurs
I also forgot to do formatting, so I will run autoformat from NB on all
changed files
Why?
Lucene 4.0 is 300% faster when indexing, 100% faster for queries
Sounds promising! :)
We can easily add regexp queries now
We can easily take latest highlighting
index statistics (and all what can be done from them, better
searching, grouping, categorization/classification)
possible Solr/Tika integrations
(for more search the web please)
this above should be more promising, I actually hope for highlighter and
some index statistics to be used soon
also I have the tika for pdf and open/libre/ms office integration
pending, once we're done with lucene4.0 ;)
Webrev
http://stargate.cnl.tuke.sk/~taz/webrev-2012-11-09-lucene_40/
not tested(in progress):
updating documents (I am not sure on uid document retrieval, wasn't
obvious to port)
tested: all junits, regression on term count numbers (same numbers of
tokenized terms as in 3.6.1)
what's missing: build system fixes so l40 will get autodownloaded ,
Another build problem I had, was that I had to set the
platforms.JDK_1.7.home variable when running Ant. Never seen that
before. It works fine without the patch.
BUILD FAILED
/code/opengrok/trunk/nbproject/build-impl.xml:86: The J2SE Platform is not
correctly set up.
Your active platform is: JDK_1.7, but the corresponding property
"platforms.JDK_1.7.home" is not found in the project's properties files.
Either open the project in the IDE and setup the Platform with the same name
or add it manually.
For example like this:
ant -Duser.properties.file=<path_to_property_file> jar (where you put the property
"platforms.JDK_1.7.home" in a .properties file)
or ant -Dplatforms.JDK_1.7.home=<path_to_JDK_home> jar (where no properties
file is used)
yes, I haven't played with build system yet and netbeans was upgraded in
my env in between too(which seems like the cause)
so I won't push anything until I test both ant in cli and from netbeans
(if I will have spare time I might do the same for eclipse and idea -
saw on bitbucket that J. Ryan Stinnett was developing OpenGrok in Idea,
what is a very good idea ;) )
package-ing fixes, lucene compatibility test auto run if lucene
test-framework on classpath
(will add it once I get some time)
I have the lucene compatibility tests running (in a very basic form, but
they are) - when the classpath has the lucene test-framework jars
will publish new review today/tomorrow
cheers
Lubos
_______________________________________________
opengrok-dev mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/opengrok-dev