Yeah, I remind of that we should has a pending work for ObjectStore
refactor bp[1]. We need to change KeyValueDB interface to adopt new
improvment


[1]: http://pad.ceph.com/p/hammer-osd_transaction_encoding

On Fri, Feb 20, 2015 at 7:00 AM, Somnath Roy <somnath....@sandisk.com> wrote:
> Thanks Sage !
> Let me understand the K/V code base more in-depth and will come back to you 
> on this.
>
> Regards
> Somnath
>
> -----Original Message-----
> From: Sage Weil [mailto:s...@newdream.net]
> Sent: Thursday, February 19, 2015 11:46 AM
> To: Somnath Roy
> Cc: Haomai Wang; sj...@redhat.com; Gregory Farnum; Ceph Development
> Subject: RE: K/V interface buffer transaction
>
> On Thu, 19 Feb 2015, Somnath Roy wrote:
>> Sage/Haomai,
>> Some more questions.
>>
>> 1. I am not able to figure out why the KeyValueDB interface is so
>> dependent on iter based approach ? If a db supports range queries,
>> can't we get rid of these iterator interfaces ?
>>
>> 2. Also, the function like ::_generic_read() is calling
>> StripObjectMap::get_values_with_header -> GenericObjectMap::scan().
>> Scan is just looping over the keys and still calling
>> iter->lower_bound() , why not calling direct get call ? In case, the
>> db supports range queries , we can handover the db these keys and it
>> will return array of key/value pair itself. Why to bother about that
>> from generic keyvaluestore interface ? If dbs are not supporting range
>> queries, we can implement similar logic in the shim layer like
>> leveldbstore/rocksdbstore, isn't it ?
>
> The KeyValueDB is the interface that seemed necessary when Sam was 
> implementing the original DBObjectMap a couple years ago.  It's based on what 
> leveldb was providing and what was needed at the time.  We are more than 
> happy to change it!
>
> A few things:
>
> 1. Adding a call that returns multiple k/v pairs sounds fine as long as there 
> is a limmit so we don't get an unbounded result size.
>
> 2. I'm concerned (in general) about the efficiency of this interface.
> Right now pretty much everything is fetched and returned in the form of an 
> STL structure and I'm worried that there will be a bunch of data copying on 
> the implementation to conform to that.  On the flip side, lots of callers are 
> currently rejiggering their requests into those maps too.  I'd be very 
> interested in hearing about how you think we can make this fit more 
> efficiently to whatever backend you're currently working with.
> Leveldb and rocksdb will I think be the most common backends, but we want to 
> perform well with others too.
>
> 3. One simple example of this is there are several places where we have an 
> encoded bufferlist of map<string,bufferlist> that we are doing a set on (or 
> are pulling out).  Currently we end up decoding into an STL map and feeding 
> to the interface, but I suspect lots of callers could benefit from a set of 
> calls that go direct to/from such a buffer and skip the map<>.
>
> 4. There a trivial patch in my newstore wip branch that adds a get(prefix, 
> key, *value) so that you don't get to pass in a set<string> for a single 
> fetch.  It's somewhere in the pile at
>
>         https://github.com/liewegas/ceph/commits/wip-newstore
>
> sage
>
>>
>> Let me know if I am missing anything here.
>>
>> Thanks & Regards
>> Somnath
>>
>> -----Original Message-----
>> From: Haomai Wang [mailto:haomaiw...@gmail.com]
>> Sent: Wednesday, February 11, 2015 11:35 PM
>> To: Somnath Roy
>> Cc: sj...@redhat.com; Sage Weil; Gregory Farnum; Ceph Development
>> Subject: Re: K/V interface buffer transaction
>>
>> On Thu, Feb 12, 2015 at 3:26 PM, Somnath Roy <somnath....@sandisk.com> wrote:
>> > Haomai,
>> >
>> > << KeyValueStore will only write one for duplicate entry in ordering
>> >
>> > I saw K/v store (keyvaluestore.cc) itself is not removing the duplicates , 
>> > are you saying the shim layer like leveldbstore/rocksdbstore is removing 
>> > the duplicates or the leveldb/rocksdb ?
>>
>> Oh no, sorry. That's just I want to do in mind. I forget I haven't impl it.
>>
>> Each ObjectStore::Transaction in KeyValueStore has corresponding 
>> BufferTransaction will store all kvs needed to store. We could let 
>> submit_transaction do it at last instead of calling backend each op.
>>
>> Yeah, we could resolve it in KeyValueStore clearly.
>> >
>> > Thanks & Regards
>> > Somnath
>> >
>> > -----Original Message-----
>> > From: Haomai Wang [mailto:haomaiw...@gmail.com]
>> > Sent: Wednesday, February 11, 2015 7:36 PM
>> > To: Somnath Roy
>> > Cc: sj...@redhat.com; Sage Weil; Gregory Farnum; Ceph Development
>> > Subject: Re: K/V interface buffer transaction
>> >
>> > On Thu, Feb 12, 2015 at 6:53 AM, Somnath Roy <somnath....@sandisk.com> 
>> > wrote:
>> >> Yeah, thanks!
>> >> Not sure if level-db is handling duplicate entries within a transaction 
>> >> properly or not, if not, in case of filestore (and also for K/V stores) 
>> >> we are having an extra (redundant) OMAP write in the Write-Path.
>> >
>> > KeyValueStore will only write one for duplicate entry in ordering.
>> >
>> > But FileStore will write redundant omap.
>> >
>> > And from dump log, the duplicate entry looks like from pglog
>> >
>> >>
>> >> Regards
>> >> Somnath
>> >>
>> >> -----Original Message-----
>> >> From: Samuel Just [mailto:sam.j...@inktank.com]
>> >> Sent: Wednesday, February 11, 2015 2:36 PM
>> >> To: Somnath Roy
>> >> Cc: Sage Weil; Gregory Farnum; Haomai Wang (haomaiw...@gmail.com);
>> >> Ceph Development
>> >> Subject: Re: K/V interface buffer transaction
>> >>
>> >> Well, the transaction is atomic, so if the key is set twice, you can 
>> >> certainly ignore the first one.
>> >> -Sam
>> >>
>> >> On Wed, Feb 11, 2015 at 2:20 PM, Somnath Roy <somnath....@sandisk.com> 
>> >> wrote:
>> >>> Hi,
>> >>> My code had a bug during printing log. I was using map to store
>> >>> the attribute keys in sorted order and that was discarding the
>> >>> duplicates
>> >>> :-)
>> >>>
>> >>> This is what I found out coming during transaction.
>> >>>
>> >>> 2015-02-05 15:58:12.311738 7f27b5429700  0 queue_transactions ::
>> >>> before _do_transactions
>> >>> 2015-02-05 15:58:12.311754 7f27b5429700  0
>> >>> _do_transactions::before _do_transaction
>> >>> 2015-02-05 15:58:12.311770 7f27b5429700  0
>> >>> Transaction::OP_WRITE::cid = 1.a3_head oid =
>> >>> 680256a3/rbd_data.100974b0dc51.0000000000000631/head//1 offset =
>> >>> 3997696 len = 65536
>> >>> 2015-02-05 15:58:12.311800 7f27b5429700  0
>> >>> Transaction::OP_SETATTR::cid = 1.a3_head oid =
>> >>> 680256a3/rbd_data.100974b0dc51.0000000000000631/head//1 attr_name
>> >>> = _ attr_value_len = 273
>> >>> 2015-02-05 15:58:12.311822 7f27b5429700  0
>> >>> Transaction::OP_SETATTR::cid = 1.a3_head oid =
>> >>> 680256a3/rbd_data.100974b0dc51.0000000000000631/head//1 attr_name
>> >>> = snapset attr_value_len = 31
>> >>> 2015-02-05 15:58:12.311840 7f27b5429700  0
>> >>> Transaction::OP_OMAP_SETKEYS::cid = 1.a3_head oid = a3//head//1
>> >>> 2015-02-05 15:58:12.311845 7f27b5429700  0 OMAP_KEY = 
>> >>> 0000000102.00000000000000001592 Value = buffer::list(len=178,
>> >>>         buffer::ptr(0~4 0x3efc21000 in raw 0x3efc21000 len 4096 nref 6),
>> >>>         buffer::ptr(0~170 0x3d74840 in raw 0x3d74840 len 688 nref 3),
>> >>>         buffer::ptr(4~4 0x3efc21004 in raw 0x3efc21000 len 4096
>> >>> nref
>> >>> 6)
>> >>> )
>> >>> 2015-02-05 15:58:12.311931 7f27b5429700  0
>> >>> Transaction::OP_OMAP_SETKEYS::cid = 1.a3_head oid = a3//head//1
>> >>> 2015-02-05 15:58:12.311938 7f27b5429700  0 OMAP_KEY = _epoch Value = 
>> >>> buffer::list(len=4,
>> >>>         buffer::ptr(0~4 0x3efc1f000 in raw 0x3efc1f000 len 4096
>> >>> nref
>> >>> 3)
>> >>> )
>> >>> 2015-02-05 15:58:12.311943 7f27b5429700  0 OMAP_KEY = _info Value = 
>> >>> buffer::list(len=713,
>> >>>         buffer::ptr(0~713 0x3efc1e000 in raw 0x3efc1e000 len 4096
>> >>> nref
>> >>> 3)
>> >>> )
>> >>> 2015-02-05 15:58:12.311965 7f27b5429700  0
>> >>> Transaction::OP_OMAP_SETKEYS::cid = 1.a3_head oid = a3//head//1
>> >>> 2015-02-05 15:58:12.311969 7f27b5429700  0 OMAP_KEY = 
>> >>> 0000000102.00000000000000001592 Value = buffer::list(len=178,
>> >>>         buffer::ptr(0~4 0x3d75e40 in raw 0x3d75e40 len 688 nref 6),
>> >>>         buffer::ptr(0~170 0x3d75b80 in raw 0x3d75b80 len 688 nref 3),
>> >>>         buffer::ptr(4~4 0x3d75e44 in raw 0x3d75e40 len 688 nref 6)
>> >>> )
>> >>> 2015-02-05 15:58:12.311980 7f27b5429700  0 OMAP_KEY = can_rollback_to 
>> >>> Value = buffer::list(len=12,
>> >>>         buffer::ptr(0~12 0x3efc25000 in raw 0x3efc25000 len 4096
>> >>> nref
>> >>> 3)
>> >>> )
>> >>> 2015-02-05 15:58:12.311985 7f27b5429700  0 OMAP_KEY = 
>> >>> rollback_info_trimmed_to Value = buffer::list(len=12,
>> >>>         buffer::ptr(0~12 0x3efc24000 in raw 0x3efc24000 len 4096
>> >>> nref
>> >>> 3)
>> >>> )
>> >>>
>> >>>
>> >>>
>> >>> So, the OMAP_KEY = 0000000102.00000000000000001592 is coming twice !
>> >>>
>> >>> Is there any reason, why ? What is this attribute by the way ?
>> >>> Can we safely discard the first OP_OMAP_SETKEYS call for the same key ?
>> >>>
>> >>> Thanks & Regards
>> >>> Somnath
>> >>>
>> >>> -----Original Message-----
>> >>> From: Somnath Roy
>> >>> Sent: Tuesday, February 10, 2015 4:36 PM
>> >>> To: 'Sage Weil'; Gregory Farnum
>> >>> Cc: sj...@redhat.com; Haomai Wang (haomaiw...@gmail.com); Ceph
>> >>> Development
>> >>> Subject: RE: K/V interface buffer transaction
>> >>>
>> >>> Thanks Greg/Sam/Sage !
>> >>> For now, we will be doing our testing by sorting the keys and will keep 
>> >>> an eye on the duplicates.
>> >>> Another point, why do we need the K/V store thread pool for processing 
>> >>> transactions anymore ?
>> >>> I got rid of that and calling _do_transaction() directly from the 
>> >>> ::queue_trasaction , this is giving me ~3X performance improvement.
>> >>>
>> >>> Regards
>> >>> Somnath
>> >>>
>> >>> -----Original Message-----
>> >>> From: Sage Weil [mailto:sw...@redhat.com]
>> >>> Sent: Tuesday, February 10, 2015 10:44 AM
>> >>> To: Gregory Farnum
>> >>> Cc: Somnath Roy; sj...@redhat.com; Haomai Wang
>> >>> (haomaiw...@gmail.com); Ceph Development
>> >>> Subject: Re: K/V interface buffer transaction
>> >>>
>> >>> On Tue, 10 Feb 2015, Gregory Farnum wrote:
>> >>>> On Tue, Feb 10, 2015 at 10:26 AM, Sage Weil <sw...@redhat.com> wrote:
>> >>>> > On Tue, 10 Feb 2015, Somnath Roy wrote:
>> >>>> >> Thanks Sam !
>> >>>> >> So, is it safe to do ordering if in a transaction *no* 
>> >>>> >> remove/truncate/create/add call ?
>> >>>> >> For example, do we need to preserve ordering in case of the below 
>> >>>> >> transaction ?
>> >>>> >> It will be helpful if you can give some insight in what scenario 
>> >>>> >> preserving order is *must*.
>> >>>> >
>> >>>> > If I'm not mistaken teh only time ordering would matter at all
>> >>>> > in an transaction is when the same key is updated twice, right?
>> >>>> > The whole thing is committed atomically.  If there *are* dups,
>> >>>> > then the order there obviously should be preserved.
>> >>>> >
>> >>>> > Maybe a first pass would be add an assert or something that
>> >>>> > there are no dup keys and see if anything every falls out of that...
>> >>>> > hopefully there are none!
>> >>>>
>> >>>> I'm pretty sure some of the transaction analysis discussions
>> >>>> people have had say that we do double-updates at times. IIRC it
>> >>>> might have been the pglog head getting set twice in most transactions?
>> >>>
>> >>> Oh yeah, could be.  There was the snapset xattr update, but that was 
>> >>> resetting it to an existing value (not the same value inside the same 
>> >>> txn).  I forget if there were others.
>> >>>
>> >>> sage
>> >>>
>> >>> ________________________________
>> >>>
>> >>> PLEASE NOTE: The information contained in this electronic mail message 
>> >>> is intended only for the use of the designated recipient(s) named above. 
>> >>> If the reader of this message is not the intended recipient, you are 
>> >>> hereby notified that you have received this message in error and that 
>> >>> any review, dissemination, distribution, or copying of this message is 
>> >>> strictly prohibited. If you have received this communication in error, 
>> >>> please notify the sender by telephone or e-mail (as shown above) 
>> >>> immediately and destroy any and all copies of this message in your 
>> >>> possession (whether hard copies or electronically stored copies).
>> >>>
>> >
>> >
>> >
>> > --
>> > Best Regards,
>> >
>> > Wheat
>>
>>
>>
>> --
>> Best Regards,
>>
>> Wheat
>> N?????r??y??????X???v???)?{.n?????z?]z????ay? ????j ??f???h????? ?w???
> ???j:+v???w???????? ????zZ+???????j"????i



-- 
Best Regards,

Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to