Re: [opengrok-dev] webrev lucene 40 port for opengrok

Lubos Kosco Tue, 13 Nov 2012 04:17:06 -0800

On 13.11.2012 11:18, Knut Anders Hatlen wrote:

Lubos Kosco <[email protected]> writes:

Hi guys, can you have a look please?

Thanks, Lubos. I had a quick look at the patch (skipped quickly past the


Thank you for the review kah ;)

big changes to the NetBeans project files...), and it looks mostly fine
to me. I'm not a Lucene expert, though.

Some questions:

I saw many changes like this one:

-        }
-        doc.add(new Field("date", date, Field.Store.YES, 
Field.Index.NOT_ANALYZED));
+        }
+        doc.add(new Field("date", date, StringField.TYPE_STORED));

According to http://lucene.apache.org/core/4_0_0/MIGRATE.html, one
should also call ft.setOmitNorms(false) to preserve the original
semantics for this particular combination of arguments. Is this
something we need to do?

good catch, I had it there before, so it's my mistake when doingmerges(I had some fighting with the reusable components in Analyzers :(, ev. ApacheCon visit solved it :-D ), will check all fieldtypes used


There are also many new instance variables in the analyzers (for the
TokenStreamComponents and tokenizers used in the new createComponents()
methods), but I don't see that they are ever read except in the local
scope where they are assigned a value. Could they be changed to local
variables?

I guess yes, they are cached by Analyzer APIs after created once and allsubsequent tokenStream calls reuse the cache, or create the object(s)

I will change where appropriate and no other reuse occurs

I also forgot to do formatting, so I will run autoformat from NB on allchanged files

Why?
Lucene 4.0 is 300% faster when indexing, 100% faster for queries

Sounds promising! :)

We can easily add regexp queries now
We can easily take latest highlighting
index statistics (and all what can be done from them, better
searching, grouping, categorization/classification)
possible Solr/Tika integrations
(for more search the web please)

this above should be more promising, I actually hope for highlighter andsome index statistics to be used soonalso I have the tika for pdf and open/libre/ms office integrationpending, once we're done with lucene4.0 ;)


Webrev
http://stargate.cnl.tuke.sk/~taz/webrev-2012-11-09-lucene_40/

not tested(in progress):
updating documents (I am not sure on uid document retrieval, wasn't
obvious to port)

tested: all junits, regression on term count numbers (same numbers of
tokenized terms as in 3.6.1)

what's missing: build system fixes  so l40 will get autodownloaded ,

Another build problem I had, was that I had to set the
platforms.JDK_1.7.home variable when running Ant. Never seen that
before. It works fine without the patch.

BUILD FAILED
/code/opengrok/trunk/nbproject/build-impl.xml:86: The J2SE Platform is not 
correctly set up.
  Your active platform is: JDK_1.7, but the corresponding property 
"platforms.JDK_1.7.home" is not found in the project's properties files.
  Either open the project in the IDE and setup the Platform with the same name 
or add it manually.
  For example like this:
      ant -Duser.properties.file=<path_to_property_file> jar (where you put the property 
"platforms.JDK_1.7.home" in a .properties file)
   or ant -Dplatforms.JDK_1.7.home=<path_to_JDK_home> jar (where no properties 
file is used)

yes, I haven't played with build system yet and netbeans was upgraded inmy env in between too(which seems like the cause)

so I won't push anything until I test both ant in cli and from netbeans

(if I will have spare time I might do the same for eclipse and idea -saw on bitbucket that J. Ryan Stinnett was developing OpenGrok in Idea,what is a very good idea ;) )

package-ing fixes, lucene compatibility test auto run if lucene
test-framework on classpath
(will add it once I get some time)

I have the lucene compatibility tests running (in a very basic form, butthey are) - when the classpath has the lucene test-framework jars


will publish new review today/tomorrow

cheers
Lubos

_______________________________________________
opengrok-dev mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/opengrok-dev

Re: [opengrok-dev] webrev lucene 40 port for opengrok

Reply via email to