Hi, Petr,

We were contemplating of adding support for BLOBs.  The first question
to you is that "is BLOB what you call byte arrays?"  Or maybe you are
simply talking about 8-bit integers?  The 8-bit integer field is
called ibis::BYTE and ibis::UBYTE in the current source code.  There
are some indication we intended to support BLOB, but the support is
totally inadequate for any real use cases.  If you have a more
detailed use case that you can share, let us know.  We might be able
to do some work on BLOBs in month.

John



On 4/13/12 2:05 AM, Thorgrin wrote:
> Hi John,
> 
> my colleague is currently working on the data storage and he decided
> to incorporate the creation of fastbit files directly into our code,
> thus we have no generic storage code to give back. There are no
> problems with our approach so far.
> 
> I hope that I'll be able to test the performance soon to see if the
> difference is really significant.
> 
> Meanwhile, I've another question. How does fastbit work with byte
> arrays? The table API seems to miss this feature, but there are some
> internal types for this. We need to support storing byte arrays along
> with strings and basic types in the end, so I would like to know
> whether this is possible using fastbit, or if we have to come up with
> another solution.
> 
> Thank you,
> Petr
> 
> On 6 April 2012 19:36, K. John Wu <[email protected]> wrote:
>> Hi, Petr,
>>
>> Have you got a chance to try this?  Any comments or questions?
>>
>> John
>>
>>
>> On 3/22/12 8:14 AM, Thorgrin wrote:
>>> Thanks,
>>>
>>> we will definitely give it a try. Maybe this simple approach could be
>>> made into some C++ class or C code and provided along with the
>>> library, assuming the results are significantly better.
>>>
>>> Petr
>>>
>>> On 22 March 2012 15:58, Dominique Prunier
>>> <[email protected]> wrote:
>>>> Hi,
>>>>
>>>> I was doing pretty much the same thing and here how i did it:
>>>>  * I handle import myself (the column file format and the -part.txt are 
>>>> pretty straightforward, i was only using category and long, beware of the 
>>>> endianness though)
>>>>  * I generate indexes column by column, using the code from the C API 
>>>> (which, to answer your question, builds the index for a specific column 
>>>> using part::getColumn and then column::loadIndex and column::unloadIndex 
>>>> on the selected column)
>>>>  * From time to time, i merge smallers partitions in larger one and 
>>>> reindex them
>>>>
>>>> Hope this helps,
>>>>
>>>> -----Original Message-----
>>>> From: [email protected] 
>>>> [mailto:[email protected]] On Behalf Of Thorgrin
>>>> Sent: Thursday, March 22, 2012 10:12 AM
>>>> To: K. John Wu
>>>> Cc: FastBit Users
>>>> Subject: Re: [FastBit-users] Howto store data with fatbit library
>>>>
>>>> Hi John,
>>>>
>>>> Thank you for pointing me in the right direction. What we are
>>>> currently using seems quite similar, but we are experiencing some
>>>> performance issues.
>>>>
>>>> We are storing quite lot of rows into multiple tables at a same time.
>>>> Currently we are using one thread to write into about 7 different
>>>> tables, some of which are more heavily used than others. We are
>>>> experiencing high CPU load while the throughput is not as big as
>>>> expected.
>>>> On the test machine we have hit the ceiling at about 35k rows per
>>>> second. This of course includes processing the incoming data as well
>>>> as storing it, but we believe that the storing is what is the most
>>>> limiting factor. The harddrive performance should not be an issue
>>>> here, its busy at about 7-15%.
>>>> The data is not written to harddrive immediately, but always where
>>>> there are about 200k rows (this applies for each table).
>>>>
>>>> Since the fastbit data partition format is quite simple, it might be
>>>> best to store the data directly. This would allow us to create a
>>>> buffer for each data type and partition, which could be written
>>>> directly to harddrive. The only drawback is that we need to generate
>>>> the -part.txt file for ourselves, but that is not  too hard. Then we
>>>> can use fastbit library to generate indexes on existing data.
>>>> What is your recommendation?
>>>>
>>>>
>>>> Regarding the buildIndexes() functions, they are indeed present both
>>>> at parts and tables. But the table have also a buildIndex() function,
>>>> which can be used to generate indexes on specific columns. I cannot
>>>> seem to find an equivalent in parts API. Is there any way to build
>>>> index on one column only using parts?
>>>>
>>>>
>>>> Thank you,
>>>> Petr
>>>>
>>>> On 8 March 2012 18:46, K. John Wu <[email protected]> wrote:
>>>>> Hi, Petr,
>>>>>
>>>>> You are on the right track.  The file tests/setqgen.cpp is essentially
>>>>> doing what you are talking about.  You can take a look at the file
>>>>> either in the source code directory or online at <http://goo.gl/D1XgX>.
>>>>>
>>>>> The class ibis::table (for a data table) is a container of ibis::part
>>>>> (for a data partition).  A table can have multiple partitions.  All
>>>>> data records written by setqgen.cpp can be regarded as one table, but
>>>>> it might have a number of partitions.  The function
>>>>> ibis::table::buildIndexes calls ibis::part::buildIndexes on each data
>>>>> partition it has.  The actual work is done belong
>>>>> ibis::part::buildIndexes.  If you are not using
>>>>> ibis::table::buildIndexes, you will be doing the looping yourself.
>>>>> Either way is fine.  Take the option that is convenient for you.
>>>>>
>>>>> Good luck.
>>>>>
>>>>> John
>>>>>
>>>>>
>>>>>
>>>>> On 3/8/12 8:10 AM, Thorgrin wrote:
>>>>>> Hi John,
>>>>>>
>>>>>> thank you for your ongoing work on fastbit library, the improvements are 
>>>>>> great.
>>>>>>
>>>>>> I have a several question regarding correct usage. We are currently
>>>>>> creating fastbit data partitions using tablex object in following
>>>>>> manner:
>>>>>>
>>>>>> # initialise partition with columns
>>>>>> tablex->addColumns();
>>>>>>
>>>>>> tablex->reserveSpace();
>>>>>> # multiple calls to append data. We are storing the data on the fly as
>>>>>> is comes, so there are lots of calls to append.
>>>>>> tablex->append();
>>>>>>
>>>>>> # When we fill the reserved space, we write the data to disk
>>>>>> tablex->write();
>>>>>> tablex->clearData();
>>>>>>
>>>>>> # And continue with
>>>>>> tablex->append()
>>>>>> .
>>>>>> .
>>>>>>
>>>>>> Is this the right and efficient way? Or could you recommend a better
>>>>>> approach? We really just need to receive data and store it into the
>>>>>> partitions fast. Currently it seems that this consumes quite a lot of
>>>>>> CPU resources, just for writing thins down.
>>>>>>
>>>>>> Additionally, we want to created indexes on the newly created
>>>>>> partitions. Currently we load it as a table using
>>>>>> table = ibis::table::create();
>>>>>> # and then
>>>>>> table->buildIndexes();
>>>>>> delete table;
>>>>>>
>>>>>> I've noticed that there is a buildIndexes() function on part class.
>>>>>> What is the difference? Should we use the other one? Additionally,
>>>>>> table allows to build an index on specific columns, part only on all
>>>>>> columns. Is there a reason for this?
>>>>>>
>>>>>> Thank you,
>>>>>> Petr
>>>> _______________________________________________
>>>> FastBit-users mailing list
>>>> [email protected]
>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>>> _______________________________________________
>>>> FastBit-users mailing list
>>>> [email protected]
>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to