Hi.

There are plenty of bugs though.
Just fixed one: Cell content has to be saved, as there are char*
pointers, not string objects, and hypertable can (and do) free if,
making string content of python strings invalid.


2010/2/14, Mateusz Berezecki <[email protected]>:
> Masha,
>
> I've just completed reviewing your code. I like the new version a lot.
> There're a couple of whitespace problem on lines 91, 77 of
> HypertableBindings.cc (whitespaces after opening parentheses before
> actual argument list. Other than that it looks very clean and thanks
> to boost python you can't really go wrong with that. Looks good to me
> !
>
> Mateusz
>
> On Sun, Feb 14, 2010 at 1:31 AM, Masha <[email protected]> wrote:
>> git repo is here:
>> http://github.com/conferno/hypertable/tree/master/contrib/cc/PythonBinding/
>>
>> I replaced Mateusz's files as they are outdated and cannot be compiled
>> with the current hypertable (see prev thread here
>> http://groups.google.com/group/hypertable-dev/browse_thread/thread/52d88cd9bed771c3
>> )
>>
>> 2010/2/14, Masha <[email protected]>:
>>> Hi
>>>
>>> Of course, your feedback will be appreciated!
>>>
>>> What I failed to do, and would like to ask the other developers: how
>>> to compile all the six .so info one big .so. ?
>>>
>>> Just adding -static into linker flags (extra_link_args =["-static"] in
>>> setup.py) did not help.
>>> Linking failed with the error:
>>>
>>> /usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/4.4.1/crtbeginT.o:
>>> relocation R_X86_64_32 against `__DTOR_END__' can not be used when
>>> making a shared object; recompile with -fPIC
>>> /usr/lib/gcc/x86_64-linux-gnu/4.4.1/crtbeginT.o: could not read
>>> symbols: Bad value
>>>
>>> That would simplify the deployment of mapreduce workers.
>>> Actually, six libraries is not enough: "apt-get install boost log4cpp
>>> libdb4 ..." must be executed on a worker machine in advance (I do the
>>> hypertable compilation against the versions from standard ubuntu
>>> repositories)
>>>
>>> With one big .so we can avoid version mismatches in case if the worker
>>> machine, for example, already have too new of too old boost.
>>> To be safe of the version hell, much more than 6 libraries must be
>>> deployed
>>> :(
>>>
>>>
>>> 2010/2/13, Mateusz Berezecki <[email protected]>:
>>>> Hi Masha
>>>>
>>>> This is awesome news. I'll check it out and prepare patches if you
>>>> don't mind.
>>>>
>>>> Thanks for a great job!
>>>>
>>>> And yes, thrift does not feel solid at all!;-)
>>>>
>>>> Mateusz
>>>>
>>>> On Feb 13, 2010, at 0:14, Masha <[email protected]> wrote:
>>>>
>>>>> Hello
>>>>>
>>>>> I have fixed the Python bindings to reflect the modern Hypertable and
>>>>> boost versions.
>>>>>
>>>>> Using the python bindings, 'select *' over a large dataset is about 20
>>>>> times faster than using the Thrift (I tested in on a single Linux-x64
>>>>> server, Thrift client eat CPU a lot).
>>>>>
>>>>>
>>>>> Also, The API is slightly improved:
>>>>> 1. TableScanner can act as an iterable object emitting Cell
>>>>>
>>>>> # how it was
>>>>>
>>>>> scanner = table.create_scanner(scan_spec)
>>>>> cell = ht.Cell()
>>>>> while scanner.next(cell):
>>>>>  print "%s:%s %s" % (cell.row_key, cell.column_family, cell.value())
>>>>>
>>>>> # how it is
>>>>>
>>>>> for cell in table.create_scanner(scan_spec):
>>>>>  print "%s:%s %s" % (cell.row_key, cell.column_family, cell.value)
>>>>>
>>>>> # or even simpler
>>>>>
>>>>> for cell in client.hql("select * from table"):
>>>>>  print "%s:%s %s" % (cell.row_key, cell.column_family, cell.value)
>>>>>
>>>>> #--------------------------
>>>>>
>>>>> 2. client.hql("select ...") returns TableScanner
>>>>>   client.hql("show tables") returns python list, both of them are
>>>>> iterables
>>>>>
>>>>> 3. cell.value now is a getter, the parenthesis are not required.
>>>>>
>>>>> 4. Parameter of Client constructor is a path to 'hypertable.cfg', not
>>>>> the path to the installation directory.
>>>>>   Hypertable libraries deep inside use path to the executable as a
>>>>> starting point to find 'hypertable.cfg'.
>>>>>   It fails in case if the executable is '/usr/bin/python'.
>>>>>
>>>>>   As it is intended to be used on a client, it must work without full
>>>>> Hypertable installation, and must work with more than one hypertable
>>>>> server.
>>>>>
>>>>>   Required files are to copy from the full installation: 'ht.so'
>>>>> 'libHyperComm.so' 'libHyperCommon.so' 'libHyperTools.so'
>>>>> 'libHyperspace.so' 'libHypertable.so'
>>>>>   And, of course, 'hypertable.cfg'
>>>>>
>>>>> It is my first experience with boost:python and I'm not sure if it is
>>>>> correct to wrap pointers (TablePtr, TableMutatorPtr) instead of the
>>>>> the objects.
>>>>> So I suppose there could be some memory leaks, I have not investigated
>>>>> it yet.
>>>>> (I tried to wrap the objects  - Table, TableMutator, TableScanner -
>>>>> but then I do not know how to return either TableScanner or list from
>>>>> client.hql(), with the pointers it is easy, so I get back to use
>>>>> them).
>>>>>
>>>>> Compiling of the python bindings does not depend on hypertable
>>>>> compilation process and can be done independently later.
>>>>> Just run 'python setup.py build'.
>>>>> But note that hypertable libraries must be compiled with -
>>>>> DBUILD_SHARED_LIBS=ON (precompiled binaries from hypertable.org do
>>>>> not).
>>>>>
>>>>> I put the code here for a while (sorry, I do not know how to use
>>>>> git):
>>>>> http://code.google.com/p/python-hypertable/source/browse/trunk/
>>>>
>>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Hypertable Development" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/hypertable-dev?hl=en.

Reply via email to