Haomai, << KeyValueStore will only write one for duplicate entry in ordering
I saw K/v store (keyvaluestore.cc) itself is not removing the duplicates , are you saying the shim layer like leveldbstore/rocksdbstore is removing the duplicates or the leveldb/rocksdb ? Thanks & Regards Somnath -----Original Message----- From: Haomai Wang [mailto:haomaiw...@gmail.com] Sent: Wednesday, February 11, 2015 7:36 PM To: Somnath Roy Cc: sj...@redhat.com; Sage Weil; Gregory Farnum; Ceph Development Subject: Re: K/V interface buffer transaction On Thu, Feb 12, 2015 at 6:53 AM, Somnath Roy <somnath....@sandisk.com> wrote: > Yeah, thanks! > Not sure if level-db is handling duplicate entries within a transaction > properly or not, if not, in case of filestore (and also for K/V stores) we > are having an extra (redundant) OMAP write in the Write-Path. KeyValueStore will only write one for duplicate entry in ordering. But FileStore will write redundant omap. And from dump log, the duplicate entry looks like from pglog > > Regards > Somnath > > -----Original Message----- > From: Samuel Just [mailto:sam.j...@inktank.com] > Sent: Wednesday, February 11, 2015 2:36 PM > To: Somnath Roy > Cc: Sage Weil; Gregory Farnum; Haomai Wang (haomaiw...@gmail.com); > Ceph Development > Subject: Re: K/V interface buffer transaction > > Well, the transaction is atomic, so if the key is set twice, you can > certainly ignore the first one. > -Sam > > On Wed, Feb 11, 2015 at 2:20 PM, Somnath Roy <somnath....@sandisk.com> wrote: >> Hi, >> My code had a bug during printing log. I was using map to store the >> attribute keys in sorted order and that was discarding the duplicates >> :-) >> >> This is what I found out coming during transaction. >> >> 2015-02-05 15:58:12.311738 7f27b5429700 0 queue_transactions :: >> before _do_transactions >> 2015-02-05 15:58:12.311754 7f27b5429700 0 _do_transactions::before >> _do_transaction >> 2015-02-05 15:58:12.311770 7f27b5429700 0 Transaction::OP_WRITE::cid >> = 1.a3_head oid = >> 680256a3/rbd_data.100974b0dc51.0000000000000631/head//1 offset = >> 3997696 len = 65536 >> 2015-02-05 15:58:12.311800 7f27b5429700 0 >> Transaction::OP_SETATTR::cid = 1.a3_head oid = >> 680256a3/rbd_data.100974b0dc51.0000000000000631/head//1 attr_name = _ >> attr_value_len = 273 >> 2015-02-05 15:58:12.311822 7f27b5429700 0 >> Transaction::OP_SETATTR::cid = 1.a3_head oid = >> 680256a3/rbd_data.100974b0dc51.0000000000000631/head//1 attr_name = >> snapset attr_value_len = 31 >> 2015-02-05 15:58:12.311840 7f27b5429700 0 >> Transaction::OP_OMAP_SETKEYS::cid = 1.a3_head oid = a3//head//1 >> 2015-02-05 15:58:12.311845 7f27b5429700 0 OMAP_KEY = >> 0000000102.00000000000000001592 Value = buffer::list(len=178, >> buffer::ptr(0~4 0x3efc21000 in raw 0x3efc21000 len 4096 nref 6), >> buffer::ptr(0~170 0x3d74840 in raw 0x3d74840 len 688 nref 3), >> buffer::ptr(4~4 0x3efc21004 in raw 0x3efc21000 len 4096 nref >> 6) >> ) >> 2015-02-05 15:58:12.311931 7f27b5429700 0 >> Transaction::OP_OMAP_SETKEYS::cid = 1.a3_head oid = a3//head//1 >> 2015-02-05 15:58:12.311938 7f27b5429700 0 OMAP_KEY = _epoch Value = >> buffer::list(len=4, >> buffer::ptr(0~4 0x3efc1f000 in raw 0x3efc1f000 len 4096 nref >> 3) >> ) >> 2015-02-05 15:58:12.311943 7f27b5429700 0 OMAP_KEY = _info Value = >> buffer::list(len=713, >> buffer::ptr(0~713 0x3efc1e000 in raw 0x3efc1e000 len 4096 >> nref >> 3) >> ) >> 2015-02-05 15:58:12.311965 7f27b5429700 0 >> Transaction::OP_OMAP_SETKEYS::cid = 1.a3_head oid = a3//head//1 >> 2015-02-05 15:58:12.311969 7f27b5429700 0 OMAP_KEY = >> 0000000102.00000000000000001592 Value = buffer::list(len=178, >> buffer::ptr(0~4 0x3d75e40 in raw 0x3d75e40 len 688 nref 6), >> buffer::ptr(0~170 0x3d75b80 in raw 0x3d75b80 len 688 nref 3), >> buffer::ptr(4~4 0x3d75e44 in raw 0x3d75e40 len 688 nref 6) >> ) >> 2015-02-05 15:58:12.311980 7f27b5429700 0 OMAP_KEY = can_rollback_to Value >> = buffer::list(len=12, >> buffer::ptr(0~12 0x3efc25000 in raw 0x3efc25000 len 4096 nref >> 3) >> ) >> 2015-02-05 15:58:12.311985 7f27b5429700 0 OMAP_KEY = >> rollback_info_trimmed_to Value = buffer::list(len=12, >> buffer::ptr(0~12 0x3efc24000 in raw 0x3efc24000 len 4096 nref >> 3) >> ) >> >> >> >> So, the OMAP_KEY = 0000000102.00000000000000001592 is coming twice ! >> >> Is there any reason, why ? What is this attribute by the way ? >> Can we safely discard the first OP_OMAP_SETKEYS call for the same key ? >> >> Thanks & Regards >> Somnath >> >> -----Original Message----- >> From: Somnath Roy >> Sent: Tuesday, February 10, 2015 4:36 PM >> To: 'Sage Weil'; Gregory Farnum >> Cc: sj...@redhat.com; Haomai Wang (haomaiw...@gmail.com); Ceph >> Development >> Subject: RE: K/V interface buffer transaction >> >> Thanks Greg/Sam/Sage ! >> For now, we will be doing our testing by sorting the keys and will keep an >> eye on the duplicates. >> Another point, why do we need the K/V store thread pool for processing >> transactions anymore ? >> I got rid of that and calling _do_transaction() directly from the >> ::queue_trasaction , this is giving me ~3X performance improvement. >> >> Regards >> Somnath >> >> -----Original Message----- >> From: Sage Weil [mailto:sw...@redhat.com] >> Sent: Tuesday, February 10, 2015 10:44 AM >> To: Gregory Farnum >> Cc: Somnath Roy; sj...@redhat.com; Haomai Wang >> (haomaiw...@gmail.com); Ceph Development >> Subject: Re: K/V interface buffer transaction >> >> On Tue, 10 Feb 2015, Gregory Farnum wrote: >>> On Tue, Feb 10, 2015 at 10:26 AM, Sage Weil <sw...@redhat.com> wrote: >>> > On Tue, 10 Feb 2015, Somnath Roy wrote: >>> >> Thanks Sam ! >>> >> So, is it safe to do ordering if in a transaction *no* >>> >> remove/truncate/create/add call ? >>> >> For example, do we need to preserve ordering in case of the below >>> >> transaction ? >>> >> It will be helpful if you can give some insight in what scenario >>> >> preserving order is *must*. >>> > >>> > If I'm not mistaken teh only time ordering would matter at all in >>> > an transaction is when the same key is updated twice, right? The >>> > whole thing is committed atomically. If there *are* dups, then >>> > the order there obviously should be preserved. >>> > >>> > Maybe a first pass would be add an assert or something that there >>> > are no dup keys and see if anything every falls out of that... >>> > hopefully there are none! >>> >>> I'm pretty sure some of the transaction analysis discussions people >>> have had say that we do double-updates at times. IIRC it might have >>> been the pglog head getting set twice in most transactions? >> >> Oh yeah, could be. There was the snapset xattr update, but that was >> resetting it to an existing value (not the same value inside the same txn). >> I forget if there were others. >> >> sage >> >> ________________________________ >> >> PLEASE NOTE: The information contained in this electronic mail message is >> intended only for the use of the designated recipient(s) named above. If the >> reader of this message is not the intended recipient, you are hereby >> notified that you have received this message in error and that any review, >> dissemination, distribution, or copying of this message is strictly >> prohibited. If you have received this communication in error, please notify >> the sender by telephone or e-mail (as shown above) immediately and destroy >> any and all copies of this message in your possession (whether hard copies >> or electronically stored copies). >> -- Best Regards, Wheat