On Jul 13, 2012, at 18:33, Roman Chyla <roman.ch...@gmail.com> wrote:
> I think this would be great. Let me add little bit more to your > observations (whole night yesterday was spent fighting with renames - > because I was building a project which imports shared lucene and solr -- > there were thousands of same classes, I am not sure it would be possible > without some sort of a flexible rename...) > > JCC is a great tool and is used by potentially many projects - so stripping > "org.apache" seems right for pylucene, but looks arbitrary otherwise Yes, I forgot to say that there would be a way to declare one or more mappings so that org.apache.lucene becomes lucene. Andi.. > (unless there is a flexible stripping mechanism). Also, if the full > namespace remains original, then the code written in Python would be also > executable by Jython, which is IMHO an advantage. > > But this being Python, the packages cannot be spread in different locations > (ie. there can be only one org.apache.lucene.analysis package) - unless > there exists (again) some flexible mechanism which populates the namespace > with objects that belong there. It may seem an overkill to you, because for > single projects it would work, but seems perfectly justifiable in case of > imported shared libraries > > I don't know what is your idea for implementing the python packages, but > your last email got me thinking as well - there might be a very simple way > of getting to the java packages inside Python without too much work. > > Let's say the java "org.apache.lucene.search.IndexSearcher" is known to > python as org_apache_lucene_search_IndexSearcher > > and users do: > > import lucene > lucene.initVM() > > initVM() first initiates java VM (and populates the lucene namespace with > all objects), but then it will call jcc.register_module(self) > > A new piece of code inside JCC grabs the lucene module and creates (on the > fly) python packages -- using types.ModuleType (or new.module()) -- the new > packages will be inserted into sys.modules > > so after lucene.initVM() returns > > users can do "from org.apache.lucene.search import IndexSearcher" and get > lucene.org_apache_lucene_search_IndexSearcher object > > and also, when shared libraries are present (let's say 'solr') users do: > > import solr > solr.initVM() > > The JCC will just update the existing packages and create new ones if > needed (and from this perspective, having fully qualified name is safer > than to have lucene.search.IndexSearcher) > > I think this change is totally possible and will not change the way how > extensions are built. Does it have some serious flaw? > > I would be of course more than happy to contribute and test. > > Best, > > roman > > > On Fri, Jul 13, 2012 at 11:47 AM, Andi Vajda <va...@apache.org> wrote: > >> >> On Tue, 10 Jul 2012, Andi Vajda wrote: >> >> I would also like to propose a change, to allow for more flexible >>>> mechanism of generating Python class names. The patch doesn't change >>>> the default pylucene behaviour, but it gives people a way to replace >>>> class names with patterns. I have noticed that there are more >>>> same-name classes from different packages in the new lucene (and it >>>> becomes worse when one has to deal with both lucene and solr). >>>> >>> >>> Another way to fix this is to reproduce the namespace hierarchy used in >>> Lucene, following along the Java packages, something I've been dreading to >>> do. Lucene just loooooves a really long deeply nested class structure. >>> I'm not convinced yet it is bad enough to go down that route, though. >>> >>> Your proposal to use patterns may in fact yield a much more convenient >>> solution. Thanks ! >>> >> >> Rethinking this a bit, I'm prepared to change my mind on this. Your >> patterned rename patch shows that we're slowly but surely reaching the >> limit of the current setup that consists in throwing all wrapped classes >> under the one global 'lucene' namespace. >> >> Lucene 4.0 has seen a large number of deeply nested classes with similar >> names added since 3.x. Renaming these one by one (or excluding some) >> doesn't scale. Using the proposed patterned rename scales more but makes it >> difficult to know what got renamed and how. >> Ultimately, the more classes that are like-named, the more classes would >> have instable names from one release to the next as more duplicated names >> are encountered. >> >> What if instead JCC supported the original Java namespaces all the way to >> the Python inteface (still dropping the original 'org.apache' Java package >> tree prefix) ? >> The world-rooted style of naming Java classes isn't Pythonic but using the >> second half of the package structure feels right at home in the Python >> world. >> >> JCC already re-creates the complete Java package structure in C++ as >> namespaces for all the C++ code it generates, for both the JNI wrapper >> classes and the C++/Python types. It's only the installation of the class >> names into the Python VM that is done in the flat 'lucene' namespace. >> >> I think it shouldn't be too hard to change the code that installs classes >> to create sub-modules of the lucene module and install classes in these >> submodules instead (down to however many levels are in the original). >> >> In other words: >> - from lucene import Document >> would become >> - from lucene.document import Document >> >> One could of course also say: >> - import lucene.document.Document as whateverOneLikes >> >> If that proposal isn't mortally flawed somewhere, I'm prepared to drop >> support for --rename and replace it with this new Python class/module >> layout. >> >> Since this is being talked about in the context of a major PyLucene >> release, version 4.0, and that all tests/samples have to be reworked >> anyway, this backwards compat break shouldn't be too controversial, >> hopefully. >> >> If it is, the old --rename could be preserved for sure, but I'd prefer >> simplying the JCC interface than to accrete more to it. >> >> What do you think ? >> >> Andi.. >> >> >>> Andi.. >>> >>> >>>> I can confirm the test_test_BinaryDocument.py crashes the JVM no more. >>>> >>>> Roman >>>> >>>> >>>> On Tue, Jul 10, 2012 at 8:54 AM, Andi Vajda <va...@apache.org> wrote: >>>> >>>>> >>>>> Hi Roman, >>>>> >>>>> >>>>> On Mon, 9 Jul 2012, Roman Chyla wrote: >>>>> >>>>> Thanks, I am attaching a new patch that adds the missing test base. >>>>>> Sorry for the tabs, I was probably messing around with a few editors >>>>>> (some of them not configured properly) >>>>>> >>>>> >>>>> >>>>> I integrated your test class (renaming it to fit the naming scheme >>>>> used). >>>>> Thanks ! >>>>> >>>>> >>>>> So far, found one serious problem, crashes VM -- see. eg >>>>>>>>>> test/test_BinaryDocument.py - when getting the document using: >>>>>>>>>> reader.document(0) >>>>>>>>>> >>>>>>>>> >>>>> >>>>> test/test_BInaryDocument.py doesn't seem to crash the VM but fails >>>>> because >>>>> of some API changes. I suspect the crash to be some issue related to >>>>> using >>>>> an older jcc. >>>>> >>>>> I see a comment saying: "couldn't find any combination with lucene4.0 >>>>> where >>>>> it would raise errors". Most of these unit tests are straight ports >>>>> from the >>>>> original Java version. If you're stumped about a change, check the >>>>> original >>>>> Java test, it may have changed too. >>>>> >>>>> Andi.. >>>>> >>>>> >>>> >>>