Hi John, sorry for being unclear, you are right, I meant octetArray which is what BLOBs are called in the RFC we are working with. The types we need to support are defined here: http://www.iana.org/assignments/ipfix/ipfix.xml#ipfix-information-element-data-types except the last three items (the lists). Fastbit already has great support for everything else (we did not try the strings yet thoroughly), but the blobs are missing. For starters, it would be great to have columns for blobs which would work like placeholder for binary data. No support for searching, filtering or aggregating is necessary at this point.
Petr On 13 April 2012 17:38, K. John Wu <[email protected]> wrote: > Hi, Petr, > > We were contemplating of adding support for BLOBs. The first question > to you is that "is BLOB what you call byte arrays?" Or maybe you are > simply talking about 8-bit integers? The 8-bit integer field is > called ibis::BYTE and ibis::UBYTE in the current source code. There > are some indication we intended to support BLOB, but the support is > totally inadequate for any real use cases. If you have a more > detailed use case that you can share, let us know. We might be able > to do some work on BLOBs in month. > > John > > > > On 4/13/12 2:05 AM, Thorgrin wrote: >> Hi John, >> >> my colleague is currently working on the data storage and he decided >> to incorporate the creation of fastbit files directly into our code, >> thus we have no generic storage code to give back. There are no >> problems with our approach so far. >> >> I hope that I'll be able to test the performance soon to see if the >> difference is really significant. >> >> Meanwhile, I've another question. How does fastbit work with byte >> arrays? The table API seems to miss this feature, but there are some >> internal types for this. We need to support storing byte arrays along >> with strings and basic types in the end, so I would like to know >> whether this is possible using fastbit, or if we have to come up with >> another solution. >> >> Thank you, >> Petr >> >> On 6 April 2012 19:36, K. John Wu <[email protected]> wrote: >>> Hi, Petr, >>> >>> Have you got a chance to try this? Any comments or questions? >>> >>> John >>> >>> >>> On 3/22/12 8:14 AM, Thorgrin wrote: >>>> Thanks, >>>> >>>> we will definitely give it a try. Maybe this simple approach could be >>>> made into some C++ class or C code and provided along with the >>>> library, assuming the results are significantly better. >>>> >>>> Petr >>>> >>>> On 22 March 2012 15:58, Dominique Prunier >>>> <[email protected]> wrote: >>>>> Hi, >>>>> >>>>> I was doing pretty much the same thing and here how i did it: >>>>> * I handle import myself (the column file format and the -part.txt are >>>>> pretty straightforward, i was only using category and long, beware of the >>>>> endianness though) >>>>> * I generate indexes column by column, using the code from the C API >>>>> (which, to answer your question, builds the index for a specific column >>>>> using part::getColumn and then column::loadIndex and column::unloadIndex >>>>> on the selected column) >>>>> * From time to time, i merge smallers partitions in larger one and >>>>> reindex them >>>>> >>>>> Hope this helps, >>>>> >>>>> -----Original Message----- >>>>> From: [email protected] >>>>> [mailto:[email protected]] On Behalf Of Thorgrin >>>>> Sent: Thursday, March 22, 2012 10:12 AM >>>>> To: K. John Wu >>>>> Cc: FastBit Users >>>>> Subject: Re: [FastBit-users] Howto store data with fatbit library >>>>> >>>>> Hi John, >>>>> >>>>> Thank you for pointing me in the right direction. What we are >>>>> currently using seems quite similar, but we are experiencing some >>>>> performance issues. >>>>> >>>>> We are storing quite lot of rows into multiple tables at a same time. >>>>> Currently we are using one thread to write into about 7 different >>>>> tables, some of which are more heavily used than others. We are >>>>> experiencing high CPU load while the throughput is not as big as >>>>> expected. >>>>> On the test machine we have hit the ceiling at about 35k rows per >>>>> second. This of course includes processing the incoming data as well >>>>> as storing it, but we believe that the storing is what is the most >>>>> limiting factor. The harddrive performance should not be an issue >>>>> here, its busy at about 7-15%. >>>>> The data is not written to harddrive immediately, but always where >>>>> there are about 200k rows (this applies for each table). >>>>> >>>>> Since the fastbit data partition format is quite simple, it might be >>>>> best to store the data directly. This would allow us to create a >>>>> buffer for each data type and partition, which could be written >>>>> directly to harddrive. The only drawback is that we need to generate >>>>> the -part.txt file for ourselves, but that is not too hard. Then we >>>>> can use fastbit library to generate indexes on existing data. >>>>> What is your recommendation? >>>>> >>>>> >>>>> Regarding the buildIndexes() functions, they are indeed present both >>>>> at parts and tables. But the table have also a buildIndex() function, >>>>> which can be used to generate indexes on specific columns. I cannot >>>>> seem to find an equivalent in parts API. Is there any way to build >>>>> index on one column only using parts? >>>>> >>>>> >>>>> Thank you, >>>>> Petr >>>>> >>>>> On 8 March 2012 18:46, K. John Wu <[email protected]> wrote: >>>>>> Hi, Petr, >>>>>> >>>>>> You are on the right track. The file tests/setqgen.cpp is essentially >>>>>> doing what you are talking about. You can take a look at the file >>>>>> either in the source code directory or online at <http://goo.gl/D1XgX>. >>>>>> >>>>>> The class ibis::table (for a data table) is a container of ibis::part >>>>>> (for a data partition). A table can have multiple partitions. All >>>>>> data records written by setqgen.cpp can be regarded as one table, but >>>>>> it might have a number of partitions. The function >>>>>> ibis::table::buildIndexes calls ibis::part::buildIndexes on each data >>>>>> partition it has. The actual work is done belong >>>>>> ibis::part::buildIndexes. If you are not using >>>>>> ibis::table::buildIndexes, you will be doing the looping yourself. >>>>>> Either way is fine. Take the option that is convenient for you. >>>>>> >>>>>> Good luck. >>>>>> >>>>>> John >>>>>> >>>>>> >>>>>> >>>>>> On 3/8/12 8:10 AM, Thorgrin wrote: >>>>>>> Hi John, >>>>>>> >>>>>>> thank you for your ongoing work on fastbit library, the improvements >>>>>>> are great. >>>>>>> >>>>>>> I have a several question regarding correct usage. We are currently >>>>>>> creating fastbit data partitions using tablex object in following >>>>>>> manner: >>>>>>> >>>>>>> # initialise partition with columns >>>>>>> tablex->addColumns(); >>>>>>> >>>>>>> tablex->reserveSpace(); >>>>>>> # multiple calls to append data. We are storing the data on the fly as >>>>>>> is comes, so there are lots of calls to append. >>>>>>> tablex->append(); >>>>>>> >>>>>>> # When we fill the reserved space, we write the data to disk >>>>>>> tablex->write(); >>>>>>> tablex->clearData(); >>>>>>> >>>>>>> # And continue with >>>>>>> tablex->append() >>>>>>> . >>>>>>> . >>>>>>> >>>>>>> Is this the right and efficient way? Or could you recommend a better >>>>>>> approach? We really just need to receive data and store it into the >>>>>>> partitions fast. Currently it seems that this consumes quite a lot of >>>>>>> CPU resources, just for writing thins down. >>>>>>> >>>>>>> Additionally, we want to created indexes on the newly created >>>>>>> partitions. Currently we load it as a table using >>>>>>> table = ibis::table::create(); >>>>>>> # and then >>>>>>> table->buildIndexes(); >>>>>>> delete table; >>>>>>> >>>>>>> I've noticed that there is a buildIndexes() function on part class. >>>>>>> What is the difference? Should we use the other one? Additionally, >>>>>>> table allows to build an index on specific columns, part only on all >>>>>>> columns. Is there a reason for this? >>>>>>> >>>>>>> Thank you, >>>>>>> Petr >>>>> _______________________________________________ >>>>> FastBit-users mailing list >>>>> [email protected] >>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users >>>>> _______________________________________________ >>>>> FastBit-users mailing list >>>>> [email protected] >>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
