Hi. There are plenty of bugs though. Just fixed one: Cell content has to be saved, as there are char* pointers, not string objects, and hypertable can (and do) free if, making string content of python strings invalid.
2010/2/14, Mateusz Berezecki <[email protected]>: > Masha, > > I've just completed reviewing your code. I like the new version a lot. > There're a couple of whitespace problem on lines 91, 77 of > HypertableBindings.cc (whitespaces after opening parentheses before > actual argument list. Other than that it looks very clean and thanks > to boost python you can't really go wrong with that. Looks good to me > ! > > Mateusz > > On Sun, Feb 14, 2010 at 1:31 AM, Masha <[email protected]> wrote: >> git repo is here: >> http://github.com/conferno/hypertable/tree/master/contrib/cc/PythonBinding/ >> >> I replaced Mateusz's files as they are outdated and cannot be compiled >> with the current hypertable (see prev thread here >> http://groups.google.com/group/hypertable-dev/browse_thread/thread/52d88cd9bed771c3 >> ) >> >> 2010/2/14, Masha <[email protected]>: >>> Hi >>> >>> Of course, your feedback will be appreciated! >>> >>> What I failed to do, and would like to ask the other developers: how >>> to compile all the six .so info one big .so. ? >>> >>> Just adding -static into linker flags (extra_link_args =["-static"] in >>> setup.py) did not help. >>> Linking failed with the error: >>> >>> /usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/4.4.1/crtbeginT.o: >>> relocation R_X86_64_32 against `__DTOR_END__' can not be used when >>> making a shared object; recompile with -fPIC >>> /usr/lib/gcc/x86_64-linux-gnu/4.4.1/crtbeginT.o: could not read >>> symbols: Bad value >>> >>> That would simplify the deployment of mapreduce workers. >>> Actually, six libraries is not enough: "apt-get install boost log4cpp >>> libdb4 ..." must be executed on a worker machine in advance (I do the >>> hypertable compilation against the versions from standard ubuntu >>> repositories) >>> >>> With one big .so we can avoid version mismatches in case if the worker >>> machine, for example, already have too new of too old boost. >>> To be safe of the version hell, much more than 6 libraries must be >>> deployed >>> :( >>> >>> >>> 2010/2/13, Mateusz Berezecki <[email protected]>: >>>> Hi Masha >>>> >>>> This is awesome news. I'll check it out and prepare patches if you >>>> don't mind. >>>> >>>> Thanks for a great job! >>>> >>>> And yes, thrift does not feel solid at all!;-) >>>> >>>> Mateusz >>>> >>>> On Feb 13, 2010, at 0:14, Masha <[email protected]> wrote: >>>> >>>>> Hello >>>>> >>>>> I have fixed the Python bindings to reflect the modern Hypertable and >>>>> boost versions. >>>>> >>>>> Using the python bindings, 'select *' over a large dataset is about 20 >>>>> times faster than using the Thrift (I tested in on a single Linux-x64 >>>>> server, Thrift client eat CPU a lot). >>>>> >>>>> >>>>> Also, The API is slightly improved: >>>>> 1. TableScanner can act as an iterable object emitting Cell >>>>> >>>>> # how it was >>>>> >>>>> scanner = table.create_scanner(scan_spec) >>>>> cell = ht.Cell() >>>>> while scanner.next(cell): >>>>> print "%s:%s %s" % (cell.row_key, cell.column_family, cell.value()) >>>>> >>>>> # how it is >>>>> >>>>> for cell in table.create_scanner(scan_spec): >>>>> print "%s:%s %s" % (cell.row_key, cell.column_family, cell.value) >>>>> >>>>> # or even simpler >>>>> >>>>> for cell in client.hql("select * from table"): >>>>> print "%s:%s %s" % (cell.row_key, cell.column_family, cell.value) >>>>> >>>>> #-------------------------- >>>>> >>>>> 2. client.hql("select ...") returns TableScanner >>>>> client.hql("show tables") returns python list, both of them are >>>>> iterables >>>>> >>>>> 3. cell.value now is a getter, the parenthesis are not required. >>>>> >>>>> 4. Parameter of Client constructor is a path to 'hypertable.cfg', not >>>>> the path to the installation directory. >>>>> Hypertable libraries deep inside use path to the executable as a >>>>> starting point to find 'hypertable.cfg'. >>>>> It fails in case if the executable is '/usr/bin/python'. >>>>> >>>>> As it is intended to be used on a client, it must work without full >>>>> Hypertable installation, and must work with more than one hypertable >>>>> server. >>>>> >>>>> Required files are to copy from the full installation: 'ht.so' >>>>> 'libHyperComm.so' 'libHyperCommon.so' 'libHyperTools.so' >>>>> 'libHyperspace.so' 'libHypertable.so' >>>>> And, of course, 'hypertable.cfg' >>>>> >>>>> It is my first experience with boost:python and I'm not sure if it is >>>>> correct to wrap pointers (TablePtr, TableMutatorPtr) instead of the >>>>> the objects. >>>>> So I suppose there could be some memory leaks, I have not investigated >>>>> it yet. >>>>> (I tried to wrap the objects - Table, TableMutator, TableScanner - >>>>> but then I do not know how to return either TableScanner or list from >>>>> client.hql(), with the pointers it is easy, so I get back to use >>>>> them). >>>>> >>>>> Compiling of the python bindings does not depend on hypertable >>>>> compilation process and can be done independently later. >>>>> Just run 'python setup.py build'. >>>>> But note that hypertable libraries must be compiled with - >>>>> DBUILD_SHARED_LIBS=ON (precompiled binaries from hypertable.org do >>>>> not). >>>>> >>>>> I put the code here for a while (sorry, I do not know how to use >>>>> git): >>>>> http://code.google.com/p/python-hypertable/source/browse/trunk/ >>>> >>> >> > -- You received this message because you are subscribed to the Google Groups "Hypertable Development" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/hypertable-dev?hl=en.
