Well, it actually is Java code, so there is little chances that it get through 
:) To be honest, i didn't do it for performance reason, more for convenience 
(i'm using JNA because i didn't want to write a java JNI wrapper, so i have to 
stick with C API which write functions are limited). There is a lot of stuff in 
my code to do application specifics but the writing part is quite simple. 
Strings are good old C NUL-terminated strings and longs are native byte 
representation of the numbers. Merging is basically just a "cat" (don't forget 
to remove indexes). That's pretty much it.

Thanks,

-----Original Message-----
From: [email protected] 
[mailto:[email protected]] On Behalf Of Thorgrin
Sent: Thursday, March 22, 2012 11:15 AM
To: FastBit Users
Cc: K. John Wu
Subject: Re: [FastBit-users] Howto store data with fatbit library

Thanks,

we will definitely give it a try. Maybe this simple approach could be
made into some C++ class or C code and provided along with the
library, assuming the results are significantly better.

Petr

On 22 March 2012 15:58, Dominique Prunier
<[email protected]> wrote:
> Hi,
>
> I was doing pretty much the same thing and here how i did it:
>  * I handle import myself (the column file format and the -part.txt are 
> pretty straightforward, i was only using category and long, beware of the 
> endianness though)
>  * I generate indexes column by column, using the code from the C API (which, 
> to answer your question, builds the index for a specific column using 
> part::getColumn and then column::loadIndex and column::unloadIndex on the 
> selected column)
>  * From time to time, i merge smallers partitions in larger one and reindex 
> them
>
> Hope this helps,
>
> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of Thorgrin
> Sent: Thursday, March 22, 2012 10:12 AM
> To: K. John Wu
> Cc: FastBit Users
> Subject: Re: [FastBit-users] Howto store data with fatbit library
>
> Hi John,
>
> Thank you for pointing me in the right direction. What we are
> currently using seems quite similar, but we are experiencing some
> performance issues.
>
> We are storing quite lot of rows into multiple tables at a same time.
> Currently we are using one thread to write into about 7 different
> tables, some of which are more heavily used than others. We are
> experiencing high CPU load while the throughput is not as big as
> expected.
> On the test machine we have hit the ceiling at about 35k rows per
> second. This of course includes processing the incoming data as well
> as storing it, but we believe that the storing is what is the most
> limiting factor. The harddrive performance should not be an issue
> here, its busy at about 7-15%.
> The data is not written to harddrive immediately, but always where
> there are about 200k rows (this applies for each table).
>
> Since the fastbit data partition format is quite simple, it might be
> best to store the data directly. This would allow us to create a
> buffer for each data type and partition, which could be written
> directly to harddrive. The only drawback is that we need to generate
> the -part.txt file for ourselves, but that is not  too hard. Then we
> can use fastbit library to generate indexes on existing data.
> What is your recommendation?
>
>
> Regarding the buildIndexes() functions, they are indeed present both
> at parts and tables. But the table have also a buildIndex() function,
> which can be used to generate indexes on specific columns. I cannot
> seem to find an equivalent in parts API. Is there any way to build
> index on one column only using parts?
>
>
> Thank you,
> Petr
>
> On 8 March 2012 18:46, K. John Wu <[email protected]> wrote:
>> Hi, Petr,
>>
>> You are on the right track.  The file tests/setqgen.cpp is essentially
>> doing what you are talking about.  You can take a look at the file
>> either in the source code directory or online at <http://goo.gl/D1XgX>.
>>
>> The class ibis::table (for a data table) is a container of ibis::part
>> (for a data partition).  A table can have multiple partitions.  All
>> data records written by setqgen.cpp can be regarded as one table, but
>> it might have a number of partitions.  The function
>> ibis::table::buildIndexes calls ibis::part::buildIndexes on each data
>> partition it has.  The actual work is done belong
>> ibis::part::buildIndexes.  If you are not using
>> ibis::table::buildIndexes, you will be doing the looping yourself.
>> Either way is fine.  Take the option that is convenient for you.
>>
>> Good luck.
>>
>> John
>>
>>
>>
>> On 3/8/12 8:10 AM, Thorgrin wrote:
>>> Hi John,
>>>
>>> thank you for your ongoing work on fastbit library, the improvements are 
>>> great.
>>>
>>> I have a several question regarding correct usage. We are currently
>>> creating fastbit data partitions using tablex object in following
>>> manner:
>>>
>>> # initialise partition with columns
>>> tablex->addColumns();
>>>
>>> tablex->reserveSpace();
>>> # multiple calls to append data. We are storing the data on the fly as
>>> is comes, so there are lots of calls to append.
>>> tablex->append();
>>>
>>> # When we fill the reserved space, we write the data to disk
>>> tablex->write();
>>> tablex->clearData();
>>>
>>> # And continue with
>>> tablex->append()
>>> .
>>> .
>>>
>>> Is this the right and efficient way? Or could you recommend a better
>>> approach? We really just need to receive data and store it into the
>>> partitions fast. Currently it seems that this consumes quite a lot of
>>> CPU resources, just for writing thins down.
>>>
>>> Additionally, we want to created indexes on the newly created
>>> partitions. Currently we load it as a table using
>>> table = ibis::table::create();
>>> # and then
>>> table->buildIndexes();
>>> delete table;
>>>
>>> I've noticed that there is a buildIndexes() function on part class.
>>> What is the difference? Should we use the other one? Additionally,
>>> table allows to build an index on specific columns, part only on all
>>> columns. Is there a reason for this?
>>>
>>> Thank you,
>>> Petr
> _______________________________________________
> FastBit-users mailing list
> [email protected]
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
> _______________________________________________
> FastBit-users mailing list
> [email protected]
> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to