Yes, if you can show the actual code segment, it would be very helpful. Thanks.
John On 4/28/12 12:01 PM, Thorgrin wrote: > Hi John, > > I described the way we are using the FastBit library to store data in > my first post to this thread. What information do you need? I could > copy paste the relevant portions of our code if it helps, although it > should boil down to what I described earlier. > > Petr > > On 28 April 2012 02:18, K. John Wu <[email protected]> wrote: >> HI, Petr, >> >> Would you mind tell us a bit more about how you use FastBit to store >> data? I am interested in spending sometime on this to remove the >> excess operations in FastBit, but will need a concrete example to >> investigate the issue further. >> >> Thanks. >> >> John >> >> >> On 4/22/12 9:50 AM, Thorgrin wrote: >>> Hi John, >>> >>> I finally got to test the difference between the two approaches for >>> storing data. The difference between using FastBit library and storing >>> the data directly is quite significant. >>> >>> I'm sending the data to storage program over network using UDP, so >>> when it cannot cope with the speed, the network layer drops some >>> packets. I tried three speeds, 8k, 10k and 12k packets per second, >>> each packet contains number of rows, the size of packets are similar. >>> It takes about 154, 122 and 103 seconds to send the data at the >>> respective speeds. >>> >>> The number of stored rows are roughly summed up in following table: >>> #speed using library direct storage >>> 8000 7387000 31740000 >>> 10000 5925000 31737000 >>> 12000 5000000 31715000 >>> >>> The way we are storing the data is simple. We use buffer of size 70k >>> values for each column, that is just a piece of allocated memory. When >>> the buffer is full, we flush the memory to the file. >>> >>> I do not know whether the results are a result of misusing the library >>> somehow, but maybe someone else stumbled upon this issue. >>> >>> Regards, >>> Petr >>> >>>>>> On 4/13/12 2:05 AM, Thorgrin wrote: >>>>>>> Hi John, >>>>>>> >>>>>>> my colleague is currently working on the data storage and he decided >>>>>>> to incorporate the creation of fastbit files directly into our code, >>>>>>> thus we have no generic storage code to give back. There are no >>>>>>> problems with our approach so far. >>>>>>> >>>>>>> I hope that I'll be able to test the performance soon to see if the >>>>>>> difference is really significant. >>>>>>> >>>>>>> Meanwhile, I've another question. How does fastbit work with byte >>>>>>> arrays? The table API seems to miss this feature, but there are some >>>>>>> internal types for this. We need to support storing byte arrays along >>>>>>> with strings and basic types in the end, so I would like to know >>>>>>> whether this is possible using fastbit, or if we have to come up with >>>>>>> another solution. >>>>>>> >>>>>>> Thank you, >>>>>>> Petr >>>>>>> >>>>>>> On 6 April 2012 19:36, K. John Wu <[email protected]> wrote: >>>>>>>> Hi, Petr, >>>>>>>> >>>>>>>> Have you got a chance to try this? Any comments or questions? >>>>>>>> >>>>>>>> John >>>>>>>> >>>>>>>> >>>>>>>> On 3/22/12 8:14 AM, Thorgrin wrote: >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> we will definitely give it a try. Maybe this simple approach could be >>>>>>>>> made into some C++ class or C code and provided along with the >>>>>>>>> library, assuming the results are significantly better. >>>>>>>>> >>>>>>>>> Petr >>>>>>>>> >>>>>>>>> On 22 March 2012 15:58, Dominique Prunier >>>>>>>>> <[email protected]> wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I was doing pretty much the same thing and here how i did it: >>>>>>>>>> * I handle import myself (the column file format and the -part.txt >>>>>>>>>> are pretty straightforward, i was only using category and long, >>>>>>>>>> beware of the endianness though) >>>>>>>>>> * I generate indexes column by column, using the code from the C >>>>>>>>>> API (which, to answer your question, builds the index for a specific >>>>>>>>>> column using part::getColumn and then column::loadIndex and >>>>>>>>>> column::unloadIndex on the selected column) >>>>>>>>>> * From time to time, i merge smallers partitions in larger one and >>>>>>>>>> reindex them >>>>>>>>>> >>>>>>>>>> Hope this helps, >>>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: [email protected] >>>>>>>>>> [mailto:[email protected]] On Behalf Of Thorgrin >>>>>>>>>> Sent: Thursday, March 22, 2012 10:12 AM >>>>>>>>>> To: K. John Wu >>>>>>>>>> Cc: FastBit Users >>>>>>>>>> Subject: Re: [FastBit-users] Howto store data with fatbit library >>>>>>>>>> >>>>>>>>>> Hi John, >>>>>>>>>> >>>>>>>>>> Thank you for pointing me in the right direction. What we are >>>>>>>>>> currently using seems quite similar, but we are experiencing some >>>>>>>>>> performance issues. >>>>>>>>>> >>>>>>>>>> We are storing quite lot of rows into multiple tables at a same time. >>>>>>>>>> Currently we are using one thread to write into about 7 different >>>>>>>>>> tables, some of which are more heavily used than others. We are >>>>>>>>>> experiencing high CPU load while the throughput is not as big as >>>>>>>>>> expected. >>>>>>>>>> On the test machine we have hit the ceiling at about 35k rows per >>>>>>>>>> second. This of course includes processing the incoming data as well >>>>>>>>>> as storing it, but we believe that the storing is what is the most >>>>>>>>>> limiting factor. The harddrive performance should not be an issue >>>>>>>>>> here, its busy at about 7-15%. >>>>>>>>>> The data is not written to harddrive immediately, but always where >>>>>>>>>> there are about 200k rows (this applies for each table). >>>>>>>>>> >>>>>>>>>> Since the fastbit data partition format is quite simple, it might be >>>>>>>>>> best to store the data directly. This would allow us to create a >>>>>>>>>> buffer for each data type and partition, which could be written >>>>>>>>>> directly to harddrive. The only drawback is that we need to generate >>>>>>>>>> the -part.txt file for ourselves, but that is not too hard. Then we >>>>>>>>>> can use fastbit library to generate indexes on existing data. >>>>>>>>>> What is your recommendation? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Regarding the buildIndexes() functions, they are indeed present both >>>>>>>>>> at parts and tables. But the table have also a buildIndex() function, >>>>>>>>>> which can be used to generate indexes on specific columns. I cannot >>>>>>>>>> seem to find an equivalent in parts API. Is there any way to build >>>>>>>>>> index on one column only using parts? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thank you, >>>>>>>>>> Petr >>>>>>>>>> >>>>>>>>>> On 8 March 2012 18:46, K. John Wu <[email protected]> wrote: >>>>>>>>>>> Hi, Petr, >>>>>>>>>>> >>>>>>>>>>> You are on the right track. The file tests/setqgen.cpp is >>>>>>>>>>> essentially >>>>>>>>>>> doing what you are talking about. You can take a look at the file >>>>>>>>>>> either in the source code directory or online at >>>>>>>>>>> <http://goo.gl/D1XgX>. >>>>>>>>>>> >>>>>>>>>>> The class ibis::table (for a data table) is a container of >>>>>>>>>>> ibis::part >>>>>>>>>>> (for a data partition). A table can have multiple partitions. All >>>>>>>>>>> data records written by setqgen.cpp can be regarded as one table, >>>>>>>>>>> but >>>>>>>>>>> it might have a number of partitions. The function >>>>>>>>>>> ibis::table::buildIndexes calls ibis::part::buildIndexes on each >>>>>>>>>>> data >>>>>>>>>>> partition it has. The actual work is done belong >>>>>>>>>>> ibis::part::buildIndexes. If you are not using >>>>>>>>>>> ibis::table::buildIndexes, you will be doing the looping yourself. >>>>>>>>>>> Either way is fine. Take the option that is convenient for you. >>>>>>>>>>> >>>>>>>>>>> Good luck. >>>>>>>>>>> >>>>>>>>>>> John >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 3/8/12 8:10 AM, Thorgrin wrote: >>>>>>>>>>>> Hi John, >>>>>>>>>>>> >>>>>>>>>>>> thank you for your ongoing work on fastbit library, the >>>>>>>>>>>> improvements are great. >>>>>>>>>>>> >>>>>>>>>>>> I have a several question regarding correct usage. We are currently >>>>>>>>>>>> creating fastbit data partitions using tablex object in following >>>>>>>>>>>> manner: >>>>>>>>>>>> >>>>>>>>>>>> # initialise partition with columns >>>>>>>>>>>> tablex->addColumns(); >>>>>>>>>>>> >>>>>>>>>>>> tablex->reserveSpace(); >>>>>>>>>>>> # multiple calls to append data. We are storing the data on the >>>>>>>>>>>> fly as >>>>>>>>>>>> is comes, so there are lots of calls to append. >>>>>>>>>>>> tablex->append(); >>>>>>>>>>>> >>>>>>>>>>>> # When we fill the reserved space, we write the data to disk >>>>>>>>>>>> tablex->write(); >>>>>>>>>>>> tablex->clearData(); >>>>>>>>>>>> >>>>>>>>>>>> # And continue with >>>>>>>>>>>> tablex->append() >>>>>>>>>>>> . >>>>>>>>>>>> . >>>>>>>>>>>> >>>>>>>>>>>> Is this the right and efficient way? Or could you recommend a >>>>>>>>>>>> better >>>>>>>>>>>> approach? We really just need to receive data and store it into the >>>>>>>>>>>> partitions fast. Currently it seems that this consumes quite a lot >>>>>>>>>>>> of >>>>>>>>>>>> CPU resources, just for writing thins down. >>>>>>>>>>>> >>>>>>>>>>>> Additionally, we want to created indexes on the newly created >>>>>>>>>>>> partitions. Currently we load it as a table using >>>>>>>>>>>> table = ibis::table::create(); >>>>>>>>>>>> # and then >>>>>>>>>>>> table->buildIndexes(); >>>>>>>>>>>> delete table; >>>>>>>>>>>> >>>>>>>>>>>> I've noticed that there is a buildIndexes() function on part class. >>>>>>>>>>>> What is the difference? Should we use the other one? Additionally, >>>>>>>>>>>> table allows to build an index on specific columns, part only on >>>>>>>>>>>> all >>>>>>>>>>>> columns. Is there a reason for this? >>>>>>>>>>>> >>>>>>>>>>>> Thank you, >>>>>>>>>>>> Petr >>>>>>>>>> _______________________________________________ >>>>>>>>>> FastBit-users mailing list >>>>>>>>>> [email protected] >>>>>>>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users >>>>>>>>>> _______________________________________________ >>>>>>>>>> FastBit-users mailing list >>>>>>>>>> [email protected] >>>>>>>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users _______________________________________________ FastBit-users mailing list [email protected] https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
