On Aug 24, 2010, at 8:03, Roman Chyla <roman.ch...@gmail.com> wrote:

I am trying to understand PyLucene more and to see if it is faster to
retrieve result ids with java instead of with Python. The use case is
to retrieve millions of recids -- with python, 700K ids takes about
1.5s. (even if query takes just fraction of that).

I wrote a simple java code (works in java) which returns array of
ints. I have wrapped it with jcc, it is visible from inside python,
but callind the static method throws InvalidArgsError (below is an
example python session)

JCC is version 2.4, built with shared mode -- the DistUtils is in a
different package than lucene (ie. not inside lucene jars). Can this
problem be similar to passing jcc-wrapped objects between different
jcc-packages? http://search-lucene.com/m/SPgeW1hDtAw1

The java class is very simple:

import org.apache.lucene.search.TopDocs;

public class DumpUtils {
   public static int[] GetDocIds(TopDocs topdocs) {
       int[] out;
       out = new int[topdocs.totalHits];
       ScoreDoc[] hits = topdocs.scoreDocs;
       for (int i=0; i < topdocs.totalHits; i++) {
           out[i] = hits[i].doc;
       }
       return out;
   }
}

Thanks for any help/pointers,

Ah yes, importing separately built extensions that share classes (or dependencies) didn't work until support for the --import parameter was added in jcc 2.6 to solve the problem of incompatible shared classes. To make this work:
  - first, build PyLucene as usual, with --shared
- then, build your DistUtils package with --import lucene and with --shared

That way, instead of generating code and wrapper classes again for the lucene classes, jcc will import them at build time thus making a much smaller library and faster build. The resulting shared library is linked against the lucene one.

See docs and list archives about --import for more examples. Then, when running all this, you should also import lucene first, then your other package.

Andi..


  roman


Here is an example python session:

In [1]: import pyjama

In [2]: pyjama.initVM(pyjama.CLASSPATH)
Out[2]: <jcc.JCCEnv object at 0x00C0E1F0>

In [3]: import lucene as lu

In [4]: pyjama.DumpUtils
Out[4]: <type 'DumpUtils'>

In [5]: pyjama.DumpUtils.GetDocIds
Out[5]: <built-in method GetDocIds of type object at 0x0189E780>

In [6]:

In [7]: import newseman.pyjamic.slucene.searcher as se

In [8]: s = se.Searcher();s.open('/tmp/whisper/')

In [9]: hits = s._search(s._query('key:bo*',None), 50)

In [10]: hits
Out[10]: <TopDocs: org.apache.lucene.search.topd...@480457>

In [11]:

In [12]: pyjama.DumpUtils.GetDocIds(hits)
--- --- --------------------------------------------------------------------- InvalidArgsError Traceback (most recent call last)

InvalidArgsError: (<type 'DumpUtils'>, 'GetDocIds', <TopDocs: org.apache.lucene.
search.topd...@480457>)

Reply via email to