Re: Changing Python class/module layout, dropping --rename ?

Andi Vajda Fri, 13 Jul 2012 10:36:02 -0700

On Jul 13, 2012, at 18:33, Roman Chyla <roman.ch...@gmail.com> wrote:


> I think this would be great. Let me add little bit more to your
> observations (whole night yesterday was spent fighting with renames -
> because I was building a project which imports shared lucene and solr  --
> there were thousands of same classes, I am not sure it would be possible
> without some sort of a flexible rename...)
> 
> JCC is a great tool and is used by potentially many projects - so stripping
> "org.apache" seems right for pylucene, but looks arbitrary otherwise

Yes, I forgot to say that there would be a way to declare one or more mappings  
so that org.apache.lucene becomes lucene.

Andi..

> (unless there is a flexible stripping mechanism). Also, if the full
> namespace remains original, then the code written in Python would be also
> executable by Jython, which is IMHO an advantage.
> 
> But this being Python, the packages cannot be spread in different locations
> (ie. there can be only one org.apache.lucene.analysis package) - unless
> there exists (again) some flexible mechanism which populates the namespace
> with objects that belong there. It may seem an overkill to you, because for
> single projects it would work, but seems perfectly justifiable in case of
> imported shared libraries
> 
> I don't know what is your idea for implementing the python packages, but
> your last email got me thinking as well - there might be a very simple way
> of getting to the java packages inside Python without too much work.
> 
> Let's say the java "org.apache.lucene.search.IndexSearcher" is known to
> python as org_apache_lucene_search_IndexSearcher
> 
> and users do:
> 
> import lucene
> lucene.initVM()
> 
> initVM() first initiates java VM (and populates the lucene namespace with
> all objects), but then it will call jcc.register_module(self)
> 
> A new piece of code inside JCC grabs the lucene module and creates (on the
> fly) python packages -- using types.ModuleType (or new.module()) -- the new
> packages will be inserted into sys.modules
> 
> so after lucene.initVM() returns
> 
> users can do "from org.apache.lucene.search import IndexSearcher" and get
> lucene.org_apache_lucene_search_IndexSearcher object
> 
> and also, when shared libraries are present (let's say 'solr') users do:
> 
> import solr
> solr.initVM()
> 
> The JCC will just update the existing packages and create new ones if
> needed (and from this perspective, having fully qualified name is safer
> than to have lucene.search.IndexSearcher)
> 
> I think this change is totally possible and will not change the way how
> extensions are built. Does it have some serious flaw?
> 
> I would be of course more than happy to contribute and test.
> 
> Best,
> 
>  roman
> 
> 
> On Fri, Jul 13, 2012 at 11:47 AM, Andi Vajda <va...@apache.org> wrote:
> 
>> 
>> On Tue, 10 Jul 2012, Andi Vajda wrote:
>> 
>> I would also like to propose a change, to allow for more flexible
>>>> mechanism of generating Python class names. The patch doesn't change
>>>> the default pylucene behaviour, but it gives people a way to replace
>>>> class names with patterns. I have noticed that there are more
>>>> same-name classes from different packages in the new lucene (and it
>>>> becomes worse when one has to deal with both lucene and solr).
>>>> 
>>> 
>>> Another way to fix this is to reproduce the namespace hierarchy used in
>>> Lucene, following along the Java packages, something I've been dreading to
>>> do. Lucene just loooooves a really long deeply nested class structure.
>>> I'm not convinced yet it is bad enough to go down that route, though.
>>> 
>>> Your proposal to use patterns may in fact yield a much more convenient
>>> solution. Thanks !
>>> 
>> 
>> Rethinking this a bit, I'm prepared to change my mind on this. Your
>> patterned rename patch shows that we're slowly but surely reaching the
>> limit of the current setup that consists in throwing all wrapped classes
>> under the one global 'lucene' namespace.
>> 
>> Lucene 4.0 has seen a large number of deeply nested classes with similar
>> names added since 3.x. Renaming these one by one (or excluding some)
>> doesn't scale. Using the proposed patterned rename scales more but makes it
>> difficult to know what got renamed and how.
>> Ultimately, the more classes that are like-named, the more classes would
>> have instable names from one release to the next as more duplicated names
>> are encountered.
>> 
>> What if instead JCC supported the original Java namespaces all the way to
>> the Python inteface (still dropping the original 'org.apache' Java package
>> tree prefix) ?
>> The world-rooted style of naming Java classes isn't Pythonic but using the
>> second half of the package structure feels right at home in the Python
>> world.
>> 
>> JCC already re-creates the complete Java package structure in C++ as
>> namespaces for all the C++ code it generates, for both the JNI wrapper
>> classes and the C++/Python types. It's only the installation of the class
>> names into the Python VM that is done in the flat 'lucene' namespace.
>> 
>> I think it shouldn't be too hard to change the code that installs classes
>> to create sub-modules of the lucene module and install classes in these
>> submodules instead (down to however many levels are in the original).
>> 
>> In other words:
>>  - from lucene import Document
>> would become
>>  - from lucene.document import Document
>> 
>> One could of course also say:
>>  - import lucene.document.Document as whateverOneLikes
>> 
>> If that proposal isn't mortally flawed somewhere, I'm prepared to drop
>> support for --rename and replace it with this new Python class/module
>> layout.
>> 
>> Since this is being talked about in the context of a major PyLucene
>> release, version 4.0, and that all tests/samples have to be reworked
>> anyway, this backwards compat break shouldn't be too controversial,
>> hopefully.
>> 
>> If it is, the old --rename could be preserved for sure, but I'd prefer
>> simplying the JCC interface than to accrete more to it.
>> 
>> What do you think ?
>> 
>> Andi..
>> 
>> 
>>> Andi..
>>> 
>>> 
>>>> I can confirm the test_test_BinaryDocument.py crashes the JVM no more.
>>>> 
>>>> Roman
>>>> 
>>>> 
>>>> On Tue, Jul 10, 2012 at 8:54 AM, Andi Vajda <va...@apache.org> wrote:
>>>> 
>>>>> 
>>>>> Hi Roman,
>>>>> 
>>>>> 
>>>>> On Mon, 9 Jul 2012, Roman Chyla wrote:
>>>>> 
>>>>> Thanks, I am attaching a new patch that adds the missing test base.
>>>>>> Sorry for the tabs, I was probably messing around with a few editors
>>>>>> (some of them not configured properly)
>>>>>> 
>>>>> 
>>>>> 
>>>>> I integrated your test class (renaming it to fit the naming scheme
>>>>> used).
>>>>> Thanks !
>>>>> 
>>>>> 
>>>>> So far, found one serious problem, crashes VM -- see. eg
>>>>>>>>>> test/test_BinaryDocument.py - when getting the document using:
>>>>>>>>>> reader.document(0)
>>>>>>>>>> 
>>>>>>>>> 
>>>>> 
>>>>> test/test_BInaryDocument.py doesn't seem to crash the VM but fails
>>>>> because
>>>>> of some API changes. I suspect the crash to be some issue related to
>>>>> using
>>>>> an older jcc.
>>>>> 
>>>>> I see a comment saying: "couldn't find any combination with lucene4.0
>>>>> where
>>>>> it would raise errors". Most of these unit tests are straight ports
>>>>> from the
>>>>> original Java version. If you're stumped about a change, check the
>>>>> original
>>>>> Java test, it may have changed too.
>>>>> 
>>>>> Andi..
>>>>> 
>>>>> 
>>>> 
>>>

Re: Changing Python class/module layout, dropping --rename ?

Reply via email to