Yes, if you can show the actual code segment, it would be very
helpful.  Thanks.

John


On 4/28/12 12:01 PM, Thorgrin wrote:
> Hi John,
> 
> I described the way we are using the FastBit library to store data in
> my first post to this thread. What information do you need? I could
> copy paste the relevant portions of our code if it helps, although it
> should boil down to what I described earlier.
> 
> Petr
> 
> On 28 April 2012 02:18, K. John Wu <[email protected]> wrote:
>> HI, Petr,
>>
>> Would you mind tell us a bit more about how you use FastBit to store
>> data?  I am interested in spending sometime on this to remove the
>> excess operations in FastBit, but will need a concrete example to
>> investigate the issue further.
>>
>> Thanks.
>>
>> John
>>
>>
>> On 4/22/12 9:50 AM, Thorgrin wrote:
>>> Hi John,
>>>
>>> I finally got to test the difference between the two approaches for
>>> storing data. The difference between using FastBit library and storing
>>> the data directly is quite significant.
>>>
>>> I'm sending the data to storage program over network using UDP, so
>>> when it cannot cope with the speed, the network layer drops some
>>> packets. I tried three speeds, 8k, 10k and 12k packets per second,
>>> each packet contains number of rows, the size of packets are similar.
>>> It takes about 154, 122 and 103 seconds to send the data at the
>>> respective speeds.
>>>
>>> The number of stored rows are roughly summed up in following table:
>>> #speed        using library   direct storage
>>> 8000  7387000         31740000
>>> 10000 5925000         31737000
>>> 12000 5000000         31715000
>>>
>>> The way we are storing the data is simple. We use buffer of size 70k
>>> values for each column, that is just a piece of allocated memory. When
>>> the buffer is full, we flush the memory to the file.
>>>
>>> I do not know whether the results are a result of misusing the library
>>> somehow, but maybe someone else stumbled upon this issue.
>>>
>>> Regards,
>>> Petr
>>>
>>>>>> On 4/13/12 2:05 AM, Thorgrin wrote:
>>>>>>> Hi John,
>>>>>>>
>>>>>>> my colleague is currently working on the data storage and he decided
>>>>>>> to incorporate the creation of fastbit files directly into our code,
>>>>>>> thus we have no generic storage code to give back. There are no
>>>>>>> problems with our approach so far.
>>>>>>>
>>>>>>> I hope that I'll be able to test the performance soon to see if the
>>>>>>> difference is really significant.
>>>>>>>
>>>>>>> Meanwhile, I've another question. How does fastbit work with byte
>>>>>>> arrays? The table API seems to miss this feature, but there are some
>>>>>>> internal types for this. We need to support storing byte arrays along
>>>>>>> with strings and basic types in the end, so I would like to know
>>>>>>> whether this is possible using fastbit, or if we have to come up with
>>>>>>> another solution.
>>>>>>>
>>>>>>> Thank you,
>>>>>>> Petr
>>>>>>>
>>>>>>> On 6 April 2012 19:36, K. John Wu <[email protected]> wrote:
>>>>>>>> Hi, Petr,
>>>>>>>>
>>>>>>>> Have you got a chance to try this?  Any comments or questions?
>>>>>>>>
>>>>>>>> John
>>>>>>>>
>>>>>>>>
>>>>>>>> On 3/22/12 8:14 AM, Thorgrin wrote:
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> we will definitely give it a try. Maybe this simple approach could be
>>>>>>>>> made into some C++ class or C code and provided along with the
>>>>>>>>> library, assuming the results are significantly better.
>>>>>>>>>
>>>>>>>>> Petr
>>>>>>>>>
>>>>>>>>> On 22 March 2012 15:58, Dominique Prunier
>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I was doing pretty much the same thing and here how i did it:
>>>>>>>>>>  * I handle import myself (the column file format and the -part.txt 
>>>>>>>>>> are pretty straightforward, i was only using category and long, 
>>>>>>>>>> beware of the endianness though)
>>>>>>>>>>  * I generate indexes column by column, using the code from the C 
>>>>>>>>>> API (which, to answer your question, builds the index for a specific 
>>>>>>>>>> column using part::getColumn and then column::loadIndex and 
>>>>>>>>>> column::unloadIndex on the selected column)
>>>>>>>>>>  * From time to time, i merge smallers partitions in larger one and 
>>>>>>>>>> reindex them
>>>>>>>>>>
>>>>>>>>>> Hope this helps,
>>>>>>>>>>
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: [email protected] 
>>>>>>>>>> [mailto:[email protected]] On Behalf Of Thorgrin
>>>>>>>>>> Sent: Thursday, March 22, 2012 10:12 AM
>>>>>>>>>> To: K. John Wu
>>>>>>>>>> Cc: FastBit Users
>>>>>>>>>> Subject: Re: [FastBit-users] Howto store data with fatbit library
>>>>>>>>>>
>>>>>>>>>> Hi John,
>>>>>>>>>>
>>>>>>>>>> Thank you for pointing me in the right direction. What we are
>>>>>>>>>> currently using seems quite similar, but we are experiencing some
>>>>>>>>>> performance issues.
>>>>>>>>>>
>>>>>>>>>> We are storing quite lot of rows into multiple tables at a same time.
>>>>>>>>>> Currently we are using one thread to write into about 7 different
>>>>>>>>>> tables, some of which are more heavily used than others. We are
>>>>>>>>>> experiencing high CPU load while the throughput is not as big as
>>>>>>>>>> expected.
>>>>>>>>>> On the test machine we have hit the ceiling at about 35k rows per
>>>>>>>>>> second. This of course includes processing the incoming data as well
>>>>>>>>>> as storing it, but we believe that the storing is what is the most
>>>>>>>>>> limiting factor. The harddrive performance should not be an issue
>>>>>>>>>> here, its busy at about 7-15%.
>>>>>>>>>> The data is not written to harddrive immediately, but always where
>>>>>>>>>> there are about 200k rows (this applies for each table).
>>>>>>>>>>
>>>>>>>>>> Since the fastbit data partition format is quite simple, it might be
>>>>>>>>>> best to store the data directly. This would allow us to create a
>>>>>>>>>> buffer for each data type and partition, which could be written
>>>>>>>>>> directly to harddrive. The only drawback is that we need to generate
>>>>>>>>>> the -part.txt file for ourselves, but that is not  too hard. Then we
>>>>>>>>>> can use fastbit library to generate indexes on existing data.
>>>>>>>>>> What is your recommendation?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Regarding the buildIndexes() functions, they are indeed present both
>>>>>>>>>> at parts and tables. But the table have also a buildIndex() function,
>>>>>>>>>> which can be used to generate indexes on specific columns. I cannot
>>>>>>>>>> seem to find an equivalent in parts API. Is there any way to build
>>>>>>>>>> index on one column only using parts?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thank you,
>>>>>>>>>> Petr
>>>>>>>>>>
>>>>>>>>>> On 8 March 2012 18:46, K. John Wu <[email protected]> wrote:
>>>>>>>>>>> Hi, Petr,
>>>>>>>>>>>
>>>>>>>>>>> You are on the right track.  The file tests/setqgen.cpp is 
>>>>>>>>>>> essentially
>>>>>>>>>>> doing what you are talking about.  You can take a look at the file
>>>>>>>>>>> either in the source code directory or online at 
>>>>>>>>>>> <http://goo.gl/D1XgX>.
>>>>>>>>>>>
>>>>>>>>>>> The class ibis::table (for a data table) is a container of 
>>>>>>>>>>> ibis::part
>>>>>>>>>>> (for a data partition).  A table can have multiple partitions.  All
>>>>>>>>>>> data records written by setqgen.cpp can be regarded as one table, 
>>>>>>>>>>> but
>>>>>>>>>>> it might have a number of partitions.  The function
>>>>>>>>>>> ibis::table::buildIndexes calls ibis::part::buildIndexes on each 
>>>>>>>>>>> data
>>>>>>>>>>> partition it has.  The actual work is done belong
>>>>>>>>>>> ibis::part::buildIndexes.  If you are not using
>>>>>>>>>>> ibis::table::buildIndexes, you will be doing the looping yourself.
>>>>>>>>>>> Either way is fine.  Take the option that is convenient for you.
>>>>>>>>>>>
>>>>>>>>>>> Good luck.
>>>>>>>>>>>
>>>>>>>>>>> John
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 3/8/12 8:10 AM, Thorgrin wrote:
>>>>>>>>>>>> Hi John,
>>>>>>>>>>>>
>>>>>>>>>>>> thank you for your ongoing work on fastbit library, the 
>>>>>>>>>>>> improvements are great.
>>>>>>>>>>>>
>>>>>>>>>>>> I have a several question regarding correct usage. We are currently
>>>>>>>>>>>> creating fastbit data partitions using tablex object in following
>>>>>>>>>>>> manner:
>>>>>>>>>>>>
>>>>>>>>>>>> # initialise partition with columns
>>>>>>>>>>>> tablex->addColumns();
>>>>>>>>>>>>
>>>>>>>>>>>> tablex->reserveSpace();
>>>>>>>>>>>> # multiple calls to append data. We are storing the data on the 
>>>>>>>>>>>> fly as
>>>>>>>>>>>> is comes, so there are lots of calls to append.
>>>>>>>>>>>> tablex->append();
>>>>>>>>>>>>
>>>>>>>>>>>> # When we fill the reserved space, we write the data to disk
>>>>>>>>>>>> tablex->write();
>>>>>>>>>>>> tablex->clearData();
>>>>>>>>>>>>
>>>>>>>>>>>> # And continue with
>>>>>>>>>>>> tablex->append()
>>>>>>>>>>>> .
>>>>>>>>>>>> .
>>>>>>>>>>>>
>>>>>>>>>>>> Is this the right and efficient way? Or could you recommend a 
>>>>>>>>>>>> better
>>>>>>>>>>>> approach? We really just need to receive data and store it into the
>>>>>>>>>>>> partitions fast. Currently it seems that this consumes quite a lot 
>>>>>>>>>>>> of
>>>>>>>>>>>> CPU resources, just for writing thins down.
>>>>>>>>>>>>
>>>>>>>>>>>> Additionally, we want to created indexes on the newly created
>>>>>>>>>>>> partitions. Currently we load it as a table using
>>>>>>>>>>>> table = ibis::table::create();
>>>>>>>>>>>> # and then
>>>>>>>>>>>> table->buildIndexes();
>>>>>>>>>>>> delete table;
>>>>>>>>>>>>
>>>>>>>>>>>> I've noticed that there is a buildIndexes() function on part class.
>>>>>>>>>>>> What is the difference? Should we use the other one? Additionally,
>>>>>>>>>>>> table allows to build an index on specific columns, part only on 
>>>>>>>>>>>> all
>>>>>>>>>>>> columns. Is there a reason for this?
>>>>>>>>>>>>
>>>>>>>>>>>> Thank you,
>>>>>>>>>>>> Petr
>>>>>>>>>> _______________________________________________
>>>>>>>>>> FastBit-users mailing list
>>>>>>>>>> [email protected]
>>>>>>>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
>>>>>>>>>> _______________________________________________
>>>>>>>>>> FastBit-users mailing list
>>>>>>>>>> [email protected]
>>>>>>>>>> https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users
_______________________________________________
FastBit-users mailing list
[email protected]
https://hpcrdm.lbl.gov/cgi-bin/mailman/listinfo/fastbit-users

Reply via email to