Thank you very much, Andi.
Best,

  roman

On Tue, Aug 24, 2010 at 5:36 PM, Andi Vajda <va...@apache.org> wrote:
>
> On Aug 24, 2010, at 8:03, Roman Chyla <roman.ch...@gmail.com> wrote:
>
>> I am trying to understand PyLucene more and to see if it is faster to
>> retrieve result ids with java instead of with Python. The use case is
>> to retrieve millions of recids -- with python, 700K ids takes about
>> 1.5s. (even if query takes just fraction of that).
>>
>> I wrote a simple java code (works in java) which returns array of
>> ints. I have wrapped it with jcc, it is visible from inside python,
>> but callind the static method throws InvalidArgsError (below is an
>> example python session)
>>
>> JCC is version 2.4, built with shared mode -- the DistUtils is in a
>> different package than lucene (ie. not inside lucene jars). Can this
>> problem be similar to passing jcc-wrapped objects between different
>> jcc-packages? http://search-lucene.com/m/SPgeW1hDtAw1
>>
>> The java class is very simple:
>>
>> import org.apache.lucene.search.TopDocs;
>>
>> public class DumpUtils {
>>   public static int[] GetDocIds(TopDocs topdocs) {
>>       int[] out;
>>       out = new int[topdocs.totalHits];
>>       ScoreDoc[] hits = topdocs.scoreDocs;
>>       for (int i=0; i < topdocs.totalHits; i++) {
>>           out[i] = hits[i].doc;
>>       }
>>       return out;
>>   }
>> }
>>
>> Thanks for any help/pointers,
>
> Ah yes, importing separately built extensions that share classes (or
> dependencies) didn't work until support for the --import parameter was added
> in jcc 2.6 to solve the problem of incompatible shared classes. To make this
> work:
>  - first, build PyLucene as usual, with --shared
>  - then, build your DistUtils package with --import lucene and with --shared
>
> That way, instead of generating code and wrapper classes again for the
> lucene classes, jcc will import them at build time thus making a much
> smaller library and faster build. The resulting shared library is linked
> against the lucene one.
>
> See docs and list archives about --import for more examples. Then, when
> running all this, you should also import lucene first, then your other
> package.
>
> Andi..
>
>>
>>  roman
>>
>>
>> Here is an example python session:
>>
>> In [1]: import pyjama
>>
>> In [2]: pyjama.initVM(pyjama.CLASSPATH)
>> Out[2]: <jcc.JCCEnv object at 0x00C0E1F0>
>>
>> In [3]: import lucene as lu
>>
>> In [4]: pyjama.DumpUtils
>> Out[4]: <type 'DumpUtils'>
>>
>> In [5]: pyjama.DumpUtils.GetDocIds
>> Out[5]: <built-in method GetDocIds of type object at 0x0189E780>
>>
>> In [6]:
>>
>> In [7]: import newseman.pyjamic.slucene.searcher as se
>>
>> In [8]: s = se.Searcher();s.open('/tmp/whisper/')
>>
>> In [9]: hits = s._search(s._query('key:bo*',None), 50)
>>
>> In [10]: hits
>> Out[10]: <TopDocs: org.apache.lucene.search.topd...@480457>
>>
>> In [11]:
>>
>> In [12]: pyjama.DumpUtils.GetDocIds(hits)
>>
>> ---------------------------------------------------------------------------
>> InvalidArgsError                          Traceback (most recent call
>> last)
>>
>> InvalidArgsError: (<type 'DumpUtils'>, 'GetDocIds', <TopDocs:
>> org.apache.lucene.
>> search.topd...@480457>)
>

Reply via email to