Re: [EasyHack] #44681 port to CLucene from java/Lucene

2012-02-21 Thread G.H.M.Valkenhoef, van
Great to hear that it works & that it will make it into master :-).

Any thoughts on this one:

 3) When creating the CLucene FileReader (HelpIndexer.cxx), the path
 is converted to plain ASCII, that's probably dangerous. There is 
probably a way to work around this, but I haven't gotten around to it 
yet.

Is that a problem?

On 20-02-12, Caolán McNamara   wrote:
> On Sun, 2012-02-19 at 18:49 +0100, Gert van Valkenhoef wrote:
> > Thanks again for the help. Attached a new series of patches (cumulative 
> > with the previously sent ones and Caolan's), in which (I think) all the 
> > Java invocations have been removed in favor of using the C++ components:
> 
> Attached is an additional patch to stick together the code to date with
> internal clucene and the "missing link" to use the OUString ctor that
> takes UCS-4 strings. So with this applied additionally you should now be
> able to type stuff help's search and see a list of results, so vital bit
> apparently works :-)
> 
> I'll integrate all of this to master in the next day or two.
> 
> C.
> 
>
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


License for my contributions

2012-02-20 Thread G.H.M.Valkenhoef, van
Hi all,

I hereby declare that my past and future contributions to LibreOffice are 
licensed under the LGPL v3+ and the MPL v1.1.

Best wishes,

Gert van Valkenhoef
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: [EasyHack] #44681 port to CLucene from java/Lucene

2012-02-15 Thread G.H.M.Valkenhoef, van
Great, thanks. That looks like it fills in most of the missing pieces from my 
patch.

On 15-02-12, Caolán McNamara   wrote:
> On Tue, 2012-02-14 at 12:01 +, Caolán McNamara wrote:
> > On Tue, 2012-02-14 at 09:45 +0100, G.H.M.Valkenhoef, van wrote:
> > > > 
> > > Yes, I found that java code (the HelpIndexer I refer to). I'll work on
> > > a patch to replace the XInvocations of the Java code with calls to my
> > > code.
> > 
> > I can try and knock together a skeleton of a conversion of that Java
> > component to a C++ component for you to integration the clucene stuff
> > into.
> 
> FWIW, here's a suggested skeleton for replacing the java help component,
> with this applied going to help and typing something into "search"
> should print out on stderr "implement me" and same for the other case of
> indexing help of an extension on-demand if it doesn't come with an index
> already.
> 
> C.
> 
>
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: [EasyHack] #44681 port to CLucene from java/Lucene

2012-02-14 Thread G.H.M.Valkenhoef, van


On 14-02-12, Andras Timar   wrote:
> 2012/2/14 Norbert Thiebaud :
> > Just a word of encouragement: thanks for working on that, I'm looking
> > forward to see the impact on the build with LANG=all :-)
> 
> Me too, and also I wonder, if it fixes
> https://bugs.freedesktop.org/show_bug.cgi?id=40665
> 
> 
I noticed that CJK-based indexing is only enabled for the Japanese 
language. Maybe this can be fixed by adding more languages to be 
CJK-indexed.
> 
> Cheers,
> Andras
> 
> 
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: [EasyHack] #44681 port to CLucene from java/Lucene

2012-02-14 Thread G.H.M.Valkenhoef, van
Thanks for the answers, responses below.

On 13-02-12, Caolán McNamara   wrote:
> The xmlhelp/source/com/sun/star/help/*.java route is the one that sets
> the "bExtensionMode". I think this one is which third party extensions
> can insert their help into our help system. The cxxhelp is the
> straightforward backend of the "search in help" dialogs.
> 
Yes, I found that java code (the HelpIndexer I refer to). I'll work on a patch 
to replace the XInvocations of the Java code with calls to my code.

> > couldn't the ZIP creation just always be replaced by this alternative
> > code path?
> 
> hmph, indeed, seems that way on the face of it. Lets try that.
> 
Great! I'll work on that too.

> >   * This implementation is using the master branch of CLucene's git, 
> > with clucene-contribs-lib enabled (for CJK support). The released 
> > version of CLucene is compatible with Lucene 1.9.x, whereas LibreOffice 
> > uses Lucene 2.3.
> 
> I don't *think* compatibility between java and c++ file formats matters
> to us, if that's what you're getting at here.
> 
Ok, that's good to know. I'm not sure if the 1.9.x-compatible version has all 
the required functionality (but I just sent another message to the list about 
this).

> >   * I'm not sure exactly how to make my code build as part of the LO 
> > build, but could probably figure it out as long as the previous point is 
> > addressed.
> 
> Presumably just editing l10ntools/source/help/makefile.mk and adding
> another target or so in there will do the trick. I can hook this up and
> see if how it goes.
> 
Great, send me a patch if you get it going, then I can work on some of the 
other stuff.

> >   * CLucene (like Java) uses wide characters throughout, and defines 
> > it's own TCHAR type for that. Can we make this play nice with how LO 
> > handles strings?
> > 
> >   * I'm using some Unix headers, are these available on windows or 
> > should I use some kind of LO equivalent of them?
> 
> Should be basically cross-platform stuff in sal/inc/ to handle any of
> that stuff.
> 
Ok, will check it out, thanks.

> >   * I tried replacing the HelpIndexerTool in 
> > helpcontent2/util/target.pmk, which seems to work fine, except that I'm 
> > returning an error code when the content/caption directory doesn't exist 
> > (unlike HelpIndexerTool), which breaks on "shared".
> 
> I'll see if I can hook up what you got to our build system, ignoring the
> lack of clucene in our tree and assuming availablility of system
> clucene, and see how that goes.
> 
I've got an update on this: I managed to create all the indexes and doing a few 
searches on both the Java-generated an the C++-generated indexes seems to give 
identical results (at least if I pipe the results through sort).

Gert
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice


Re: [EasyHack] #44681 port to CLucene from java/Lucene

2012-02-14 Thread G.H.M.Valkenhoef, van
On 13-02-12, Rene Engelhard   wrote:
> Hi,
> 
> first of all: thanks for this effort, which should have been done when this
> lucene dependency was introduced in the first place in OOo times.
> 
> But...
> 
> On Mon, Feb 13, 2012 at 04:17:49PM +0100, Radek Doulik wrote:
> > >   * This implementation is using the master branch of CLucene's git, 
> > > with clucene-contribs-lib enabled (for CJK support). The released 
> > > version of CLucene is compatible with Lucene 1.9.x, whereas LibreOffice 
> > > uses Lucene 2.3.
> 
> This is bad. Will that get somewhen released? And is the clucene-contribs-lib
> included? (And if separate, how hard is to enable it? Patching it into 
> "proper"
> clucene is a no-go.)
> 
> As Radek says, we should (if it was only me: must) support building
> against "standard" libclucene. And relying on a git snapshot is bad...
> 
I can't guess at if or when there will be a clucene release. However the 
released version is ancient, and the developers themselves say the git version 
is stable and should be used. On the other hand, I don't think distributions 
carry it. I could look into using the released 1.9.x-compatible version 
instead, but I expect there was a reason for using Lucene 2.3.x. There is a tag 
for clucene-src-2.3.3.4 for which there are also tarballs on the sourceforge 
page. We could depend on that version -- it is a small dependency and takes 
under a minute to build.

Regarding the contribs: it is part of their git repository, and can be enabled 
as part of the standard build.

> > >   * Can someone help to figure out how to make CLucene part of the LO 
> > > build process? CLucene is using CMake and there seems to be no way to 
> > > 'make install' the clucene-contribs-lib, so this might be tricky.
> > 
> > This usually done like this, you either use system libraries if
> > available or build the package (CLucene in this case) inside LO build
> > tree. Look into configure.in, search for cairo for example. Cairo is
> > graphic library where we link against system one or build one inside LO.
> > Giving Cc to _rene_ and pmladek who know a lot about build process.
> 
> Exactly.
> 
> But he didn't post *any* makefile, so trying to write a configure check
> is moot right now anyways ;-)
> 
I'm planning to work on that later today. (But also see the post by Caolan, 
which you may have missed.)

> Regards,
> 
> Rene
> 
>
___
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice