Hi, Petr, We were contemplating of adding support for BLOBs. The first question to you is that "is BLOB what you call byte arrays?" Or maybe you are simply talking about 8-bit integers? The 8-bit integer field is called ibis::BYTE and ibis::UBYTE in the current source code. There are some indication we intended to support BLOB, but the support is totally inadequate for any real use cases. If you have a more detailed use case that you can share, let us know. We might be able to do some work on BLOBs in month.
John On 4/13/12 2:05 AM, Thorgrin wrote: > Hi John, > > my colleague is currently working on the data storage and he decided > to incorporate the creation of fastbit files directly into our code, > thus we have no generic storage code to give back. There are no > problems with our approach so far. > > I hope that I'll be able to test the performance soon to see if the > difference is really significant. > > Meanwhile, I've another question. How does fastbit work with byte > arrays? The table API seems to miss this feature, but there are some > internal types for this. We need to support storing byte arrays along > with strings and basic types in the end, so I would like to know > whether this is possible using fastbit, or if we have to come up with > another solution. > > Thank you, > Petr > > On 6 April 2012 19:36, K. John Wu <[email protected]> wrote: >> Hi, Petr, >> >> Have you got a chance to try this? Any comments or questions? >> >> John >> >> >> On 3/22/12 8:14 AM, Thorgrin wrote: >>> Thanks, >>> >>> we will definitely give it a try. Maybe this simple approach could be >>> made into some C++ class or C code and provided along with the >>> library, assuming the results are significantly better. >>> >>> Petr >>> >>> On 22 March 2012 15:58, Dominique Prunier >>> <[email protected]> wrote: >>>> Hi, >>>> >>>> I was doing pretty much the same thing and here how i did it: >>>> * I handle import myself (the column file format and the -part.txt are >>>> pretty straightforward, i was only using category and long, beware of the >>>> endianness though) >>>> * I generate indexes column by column, using the code from the C API >>>> (which, to answer your question, builds the index for a specific column >>>> using part::getColumn and then column::loadIndex and column::unloadIndex >>>> on the selected column) >>>> * From time to time, i merge smallers partitions in larger one and >>>> reindex them >>>> >>>> Hope this helps, >>>> >>>> -----Original Message----- >>>> From: [email protected] >>>> [mailto:[email protected]] On Behalf Of Thorgrin >>>> Sent: Thursday, March 22, 2012 10:12 AM >>>> To: K. John Wu >>>> Cc: FastBit Users >>>> Subject: Re: [FastBit-users] Howto store data with fatbit library >>>> >>>> Hi John, >>>> >>>> Thank you for pointing me in the right direction. What we are >>>> currently using seems quite similar, but we are experiencing some >>>> performance issues. >>>> >>>> We are storing quite lot of rows into multiple tables at a same time. >>>> Currently we are using one thread to write into about 7 different >>>> tables, some of which are more heavily used than others. We are >>>> experiencing high CPU load while the throughput is not as big as >>>> expected. >>>> On the test machine we have hit the ceiling at about 35k rows per >>>> second. This of course includes processing the incoming data as well >>>> as storing it, but we believe that the storing is what is the most >>>> limiting factor. The harddrive performance should not be an issue >>>> here, its busy at about 7-15%. >>>> The data is not written to harddrive immediately, but always where >>>> there are about 200k rows (this applies for each table). >>>> >>>> Since the fastbit data partition format is quite simple, it might be >>>> best to store the data directly. This would allow us to create a >>>> buffer for each data type and partition, which could be written >>>> directly to harddrive. The only drawback is that we need to generate >>>> the -part.txt file for ourselves, but that is not too hard. Then we >>>> can use fastbit library to generate indexes on existing data. >>>> What is your recommendation? >>>> >>>> >>>> Regarding the buildIndexes() functions, they are indeed present both >>>> at parts and tables. But the table have also a buildIndex() function, >>>> which can be used to generate indexes on specific columns. I cannot >>>> seem to find an equivalent in parts API. Is there any way to build >>>> index on one column only using parts? >>>> >>>> >>>> Thank you, >>>> Petr >>>> >>>> On 8 March 2012 18:46, K. John Wu <[email protected]> wrote: >>>>> Hi, Petr, >>>>> >>>>> You are on the right track. The file tests/setqgen.cpp is essentially >>>>> doing what you are talking about. You can take a look at the file >>>>> either in the source code directory or online at <http://goo.gl/D1XgX>. >>>>> >>>>> The class ibis::table (for a data table) is a container of ibis::part >>>>> (for a data partition). A table can have multiple partitions. All >>>>> data records written by setqgen.cpp can be regarded as one table, but >>>>> it might have a number of partitions. The function >>>>> ibis::table::buildIndexes calls ibis::part::buildIndexes on each data >>>>> partition it has. The actual work is done belong >>>>> ibis::part::buildIndexes. If you are not using >>>>> ibis::table::buildIndexes, you will be doing the looping yourself. >>>>> Either way is fine. Take the option that is convenient for you. >>>>> >>>>> Good luck. >>>>> >>>>> John >>>>> >>>>> >>>>> >>>>> On 3/8/12 8:10 AM, Thorgrin wrote: >>>>>> Hi John, >>>>>> >>>>>> thank you for your ongoing work on fastbit library, the improvements are >>>>>> great. >>>>>> >>>>>> I have a several question regarding correct usage. We are currently >>>>>> creating fastbit data partitions using tablex object in following >>>>>> manner: >>>>>> >>>>>> # initialise partition with columns >>>>>> tablex->addColumns(); >>>>>> >>>>>> tablex->reserveSpace(); >>>>>> # multiple calls to append data. We are storing the data on the fly as >>>>>> is comes, so there are lots of calls to append. >>>>>> tablex->append(); >>>>>> >>>>>> # When we fill the reserved space, we write the data to disk >>>>>> tablex->write(); >>>>>> tablex->clearData(); >>>>>> >>>>>> # And continue with >>>>>> tablex->append() >>>>>> . >>>>>> . >>>>>> >>>>>> Is this the right and efficient way? Or could you recommend a better >>>>>> approach? We really just need to receive data and store it into the >>>>>> partitions fast. Currently it seems that this consumes quite a lot of >>>>>> CPU resources, just for writing thins down. >>>>>> >>>>>> Additionally, we want to created indexes on the newly created >>>>>> partitions. Currently we load it as a table using >>>>>> table = ibis::table::create(); >>>>>> # and then >>>>>> table->buildIndexes(); >>>>>> delete table; >>>>>> >>>>>> I've noticed that there is a buildIndexes() function on part class. >>>>>> What is the difference? Should we use the other one? Additionally, >>>>>> table allows to build an index on specific columns, part only on all >>>>>> columns. Is there a reason for this? >>>>>> >>>>>> Thank you, >>>>>> Petr >>>> _______________________________________________ >>>> FastBit-users mailing list >>>> [email protected] >>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users >>>> _______________________________________________ >>>> FastBit-users mailing list >>>> [email protected] >>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
