Well, it actually is Java code, so there is little chances that it get through :) To be honest, i didn't do it for performance reason, more for convenience (i'm using JNA because i didn't want to write a java JNI wrapper, so i have to stick with C API which write functions are limited). There is a lot of stuff in my code to do application specifics but the writing part is quite simple. Strings are good old C NUL-terminated strings and longs are native byte representation of the numbers. Merging is basically just a "cat" (don't forget to remove indexes). That's pretty much it.
Thanks, -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Thorgrin Sent: Thursday, March 22, 2012 11:15 AM To: FastBit Users Cc: K. John Wu Subject: Re: [FastBit-users] Howto store data with fatbit library Thanks, we will definitely give it a try. Maybe this simple approach could be made into some C++ class or C code and provided along with the library, assuming the results are significantly better. Petr On 22 March 2012 15:58, Dominique Prunier <[email protected]> wrote: > Hi, > > I was doing pretty much the same thing and here how i did it: > * I handle import myself (the column file format and the -part.txt are > pretty straightforward, i was only using category and long, beware of the > endianness though) > * I generate indexes column by column, using the code from the C API (which, > to answer your question, builds the index for a specific column using > part::getColumn and then column::loadIndex and column::unloadIndex on the > selected column) > * From time to time, i merge smallers partitions in larger one and reindex > them > > Hope this helps, > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Thorgrin > Sent: Thursday, March 22, 2012 10:12 AM > To: K. John Wu > Cc: FastBit Users > Subject: Re: [FastBit-users] Howto store data with fatbit library > > Hi John, > > Thank you for pointing me in the right direction. What we are > currently using seems quite similar, but we are experiencing some > performance issues. > > We are storing quite lot of rows into multiple tables at a same time. > Currently we are using one thread to write into about 7 different > tables, some of which are more heavily used than others. We are > experiencing high CPU load while the throughput is not as big as > expected. > On the test machine we have hit the ceiling at about 35k rows per > second. This of course includes processing the incoming data as well > as storing it, but we believe that the storing is what is the most > limiting factor. The harddrive performance should not be an issue > here, its busy at about 7-15%. > The data is not written to harddrive immediately, but always where > there are about 200k rows (this applies for each table). > > Since the fastbit data partition format is quite simple, it might be > best to store the data directly. This would allow us to create a > buffer for each data type and partition, which could be written > directly to harddrive. The only drawback is that we need to generate > the -part.txt file for ourselves, but that is not too hard. Then we > can use fastbit library to generate indexes on existing data. > What is your recommendation? > > > Regarding the buildIndexes() functions, they are indeed present both > at parts and tables. But the table have also a buildIndex() function, > which can be used to generate indexes on specific columns. I cannot > seem to find an equivalent in parts API. Is there any way to build > index on one column only using parts? > > > Thank you, > Petr > > On 8 March 2012 18:46, K. John Wu <[email protected]> wrote: >> Hi, Petr, >> >> You are on the right track. The file tests/setqgen.cpp is essentially >> doing what you are talking about. You can take a look at the file >> either in the source code directory or online at <http://goo.gl/D1XgX>. >> >> The class ibis::table (for a data table) is a container of ibis::part >> (for a data partition). A table can have multiple partitions. All >> data records written by setqgen.cpp can be regarded as one table, but >> it might have a number of partitions. The function >> ibis::table::buildIndexes calls ibis::part::buildIndexes on each data >> partition it has. The actual work is done belong >> ibis::part::buildIndexes. If you are not using >> ibis::table::buildIndexes, you will be doing the looping yourself. >> Either way is fine. Take the option that is convenient for you. >> >> Good luck. >> >> John >> >> >> >> On 3/8/12 8:10 AM, Thorgrin wrote: >>> Hi John, >>> >>> thank you for your ongoing work on fastbit library, the improvements are >>> great. >>> >>> I have a several question regarding correct usage. We are currently >>> creating fastbit data partitions using tablex object in following >>> manner: >>> >>> # initialise partition with columns >>> tablex->addColumns(); >>> >>> tablex->reserveSpace(); >>> # multiple calls to append data. We are storing the data on the fly as >>> is comes, so there are lots of calls to append. >>> tablex->append(); >>> >>> # When we fill the reserved space, we write the data to disk >>> tablex->write(); >>> tablex->clearData(); >>> >>> # And continue with >>> tablex->append() >>> . >>> . >>> >>> Is this the right and efficient way? Or could you recommend a better >>> approach? We really just need to receive data and store it into the >>> partitions fast. Currently it seems that this consumes quite a lot of >>> CPU resources, just for writing thins down. >>> >>> Additionally, we want to created indexes on the newly created >>> partitions. Currently we load it as a table using >>> table = ibis::table::create(); >>> # and then >>> table->buildIndexes(); >>> delete table; >>> >>> I've noticed that there is a buildIndexes() function on part class. >>> What is the difference? Should we use the other one? Additionally, >>> table allows to build an index on specific columns, part only on all >>> columns. Is there a reason for this? >>> >>> Thank you, >>> Petr > _______________________________________________ > FastBit-users mailing list > [email protected] > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users > _______________________________________________ > FastBit-users mailing list > [email protected] > https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
