Re: Changing Python class/module layout, dropping --rename ?

2012-07-19 Thread Roman Chyla
The script must have thought about it somehow :-) Have a great,
undisturbed vacation!

roman

On Thu, Jul 19, 2012 at 9:33 AM, Andi Vajda  wrote:
>
> On Fri, 13 Jul 2012, Roman Chyla wrote:
>
>> Hi,
>> I was playing with the idea of creating virtual packages, attached is a
>> working script that illustrates it. I am getting this output:
>>
>> Dit it work?
>
>
> No, I haven't forgotten, I'm just on vacation.
>
> Andi..
>
>
>> ==
>> from org.apache.lucene.search import SearcherFactory; print
>> SearcherFactory
>> 
>> from org.apache.lucene.analysis import Analyzer as Banalyzer; print
>> Banalyzer
>> 
>> print sys.modules['org'] 
>> print sys.modules['org.apache'] 
>> print sys.modules['org.apache.lucene'] > (built-in)>
>> print sys.modules['org.apache.lucene.search'] > 'org.apache.lucene.search' (built-in)>
>>
>> Cheers,
>>
>>  roman
>>
>>
>> On Fri, Jul 13, 2012 at 1:34 PM, Andi Vajda  wrote:
>>
>>>
>>> On Jul 13, 2012, at 18:33, Roman Chyla  wrote:
>>>
 I think this would be great. Let me add little bit more to your
 observations (whole night yesterday was spent fighting with renames -
 because I was building a project which imports shared lucene and solr
 --
 there were thousands of same classes, I am not sure it would be possible
 without some sort of a flexible rename...)

 JCC is a great tool and is used by potentially many projects - so
>>>
>>> stripping

 "org.apache" seems right for pylucene, but looks arbitrary otherwise
>>>
>>>
>>> Yes, I forgot to say that there would be a way to declare one or more
>>> mappings  so that org.apache.lucene becomes lucene.
>>>
>>> Andi..
>>>
 (unless there is a flexible stripping mechanism). Also, if the full
 namespace remains original, then the code written in Python would be
 also
 executable by Jython, which is IMHO an advantage.

 But this being Python, the packages cannot be spread in different
>>>
>>> locations

 (ie. there can be only one org.apache.lucene.analysis package) - unless
 there exists (again) some flexible mechanism which populates the
>>>
>>> namespace

 with objects that belong there. It may seem an overkill to you, because
>>>
>>> for

 single projects it would work, but seems perfectly justifiable in case
 of
 imported shared libraries

 I don't know what is your idea for implementing the python packages, but
 your last email got me thinking as well - there might be a very simple
>>>
>>> way

 of getting to the java packages inside Python without too much work.

 Let's say the java "org.apache.lucene.search.IndexSearcher" is known to
 python as org_apache_lucene_search_IndexSearcher

 and users do:

 import lucene
 lucene.initVM()

 initVM() first initiates java VM (and populates the lucene namespace
 with
 all objects), but then it will call jcc.register_module(self)

 A new piece of code inside JCC grabs the lucene module and creates (on
>>>
>>> the

 fly) python packages -- using types.ModuleType (or new.module()) -- the
>>>
>>> new

 packages will be inserted into sys.modules

 so after lucene.initVM() returns

 users can do "from org.apache.lucene.search import IndexSearcher" and
 get
 lucene.org_apache_lucene_search_IndexSearcher object

 and also, when shared libraries are present (let's say 'solr') users do:

 import solr
 solr.initVM()

 The JCC will just update the existing packages and create new ones if
 needed (and from this perspective, having fully qualified name is safer
 than to have lucene.search.IndexSearcher)

 I think this change is totally possible and will not change the way how
 extensions are built. Does it have some serious flaw?

 I would be of course more than happy to contribute and test.

 Best,

  roman


 On Fri, Jul 13, 2012 at 11:47 AM, Andi Vajda  wrote:

>
> On Tue, 10 Jul 2012, Andi Vajda wrote:
>
> I would also like to propose a change, to allow for more flexible
>>>
>>> mechanism of generating Python class names. The patch doesn't change
>>> the default pylucene behaviour, but it gives people a way to replace
>>> class names with patterns. I have noticed that there are more
>>> same-name classes from different packages in the new lucene (and it
>>> becomes worse when one has to deal with both lucene and solr).
>>>
>>
>> Another way to fix this is to reproduce the namespace hierarchy used
>> in
>> Lucene, following along the Java packages, something I've been
>>>
>>> dreading to
>>
>> do. Lucene just loves a really long deeply nested class structure.
>> I'm not convinced yet it is bad enough to go down that route, though.
>>
>> Your proposal to use patterns may in fact yield a much more convenient
>> sol

Changing Python class/module layout, dropping --rename ?

2012-07-13 Thread Andi Vajda


On Tue, 10 Jul 2012, Andi Vajda wrote:


I would also like to propose a change, to allow for more flexible
mechanism of generating Python class names. The patch doesn't change
the default pylucene behaviour, but it gives people a way to replace
class names with patterns. I have noticed that there are more
same-name classes from different packages in the new lucene (and it
becomes worse when one has to deal with both lucene and solr).


Another way to fix this is to reproduce the namespace hierarchy used in 
Lucene, following along the Java packages, something I've been dreading to 
do. Lucene just loves a really long deeply nested class structure.

I'm not convinced yet it is bad enough to go down that route, though.

Your proposal to use patterns may in fact yield a much more convenient 
solution. Thanks !


Rethinking this a bit, I'm prepared to change my mind on this. Your 
patterned rename patch shows that we're slowly but surely reaching the limit 
of the current setup that consists in throwing all wrapped classes under the 
one global 'lucene' namespace.


Lucene 4.0 has seen a large number of deeply nested classes with similar 
names added since 3.x. Renaming these one by one (or excluding some) doesn't 
scale. Using the proposed patterned rename scales more but makes it 
difficult to know what got renamed and how.
Ultimately, the more classes that are like-named, the more classes would 
have instable names from one release to the next as more duplicated names 
are encountered.


What if instead JCC supported the original Java namespaces all the way to 
the Python inteface (still dropping the original 'org.apache' Java package 
tree prefix) ?
The world-rooted style of naming Java classes isn't Pythonic but using the 
second half of the package structure feels right at home in the Python 
world.


JCC already re-creates the complete Java package structure in C++ as 
namespaces for all the C++ code it generates, for both the JNI wrapper 
classes and the C++/Python types. It's only the installation of the class 
names into the Python VM that is done in the flat 'lucene' namespace.


I think it shouldn't be too hard to change the code that installs classes to 
create sub-modules of the lucene module and install classes in these 
submodules instead (down to however many levels are in the original).


In other words:
  - from lucene import Document
would become
  - from lucene.document import Document

One could of course also say:
  - import lucene.document.Document as whateverOneLikes

If that proposal isn't mortally flawed somewhere, I'm prepared to drop 
support for --rename and replace it with this new Python class/module 
layout.


Since this is being talked about in the context of a major PyLucene release, 
version 4.0, and that all tests/samples have to be reworked anyway, this 
backwards compat break shouldn't be too controversial, hopefully.


If it is, the old --rename could be preserved for sure, but I'd prefer 
simplying the JCC interface than to accrete more to it.


What do you think ?

Andi..



Andi..



I can confirm the test_test_BinaryDocument.py crashes the JVM no more.

Roman


On Tue, Jul 10, 2012 at 8:54 AM, Andi Vajda  wrote:


 Hi Roman,


On Mon, 9 Jul 2012, Roman Chyla wrote:


Thanks, I am attaching a new patch that adds the missing test base.
Sorry for the tabs, I was probably messing around with a few editors
(some of them not configured properly)



I integrated your test class (renaming it to fit the naming scheme used).
Thanks !



So far, found one serious problem, crashes VM -- see. eg
test/test_BinaryDocument.py - when getting the document using:
reader.document(0)



test/test_BInaryDocument.py doesn't seem to crash the VM but fails because
of some API changes. I suspect the crash to be some issue related to using
an older jcc.

I see a comment saying: "couldn't find any combination with lucene4.0 
where
it would raise errors". Most of these unit tests are straight ports from 
the
original Java version. If you're stumped about a change, check the 
original

Java test, it may have changed too.

Andi..