Re: What happened in hlog if data are deleted cuased by ttl?
Sorry for that. I didn't use the right parameter. Now I get the point. regards! Yong On Wed, Aug 22, 2012 at 10:49 AM, Harsh J wrote: > Hey Yonghu, > > You are right that TTL "deletions" (it isn't exactly a delete, its > more of a compact-time skip wizardry) do not go to the HLog as > "events". Know that TTLs aren't applied "per-cell", they are applied > on the whole CF globally. So there is no such thing as a TTL-write or > a TTL-delete event. In fact, the Region-level Coprocessor too has no > hooks for "TTL-events", as seen at > http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html, > for this doesn't happen on triggers. > > What you say about the compaction part is wrong however. Compaction > too runs a regular store-file scanner to compact, and so does the > regular Scan operation, to read (Both use the same file scanning > mechanism/code). So there's no difference in how compact or a client > scan handle TTL-expired row values from a store file, when reading it > up. > > I also am not able to understand what your sample shell command list > shows. As I see it, its shown that the HFile did have the entry in it > after you had flushed it. Note that you mentioned the TTL at the CF > level when creating the table, not in your "put" statement, and this > is a vital point in understanding how TTLs work. > > On Wed, Aug 22, 2012 at 1:49 PM, yonghu wrote: >> I can fully understand normal deletion. But, in my point of view, ttl >> deletion is different than the normal deletion. The insertion of ttl >> data is recorded in hlog. But the ttl deletion is not recorded by >> hlog. So, it failure occurs, should the ttl data be reinserted to data >> or can we discard the certain ttl data? Moreover, ttl deletion is not >> executed at data compaction time. Scanner needs to periodically scan >> each Store file to execute deletion. >> >> regards! >> >> Yong >> >> >> >> On Tue, Aug 21, 2012 at 5:29 PM, jmozah wrote: >>> This helped me >>> http://hadoop-hbase.blogspot.in/2011/12/deletion-in-hbase.html >>> >>> >>> ./Zahoor >>> HBase Musings >>> >>> >>> On 14-Aug-2012, at 6:54 PM, Harsh J wrote: >>> Hi Yonghu, A timestamp is stored along with each insert. The ttl is maintained at the region-store level. Hence, when the log replays, all entries with expired TTLs are automatically omitted. Also, TTL deletions happen during compactions, and hence do not carry/need Delete events. When scanning a store file, TTL-expired entries are automatically skipped away. On Tue, Aug 14, 2012 at 3:34 PM, yonghu wrote: > My hbase version is 0.92. I tried something as follows: > 1.Created a table 'test' with 'course' in which ttl=5. > 2. inserted one row into the table. 5 seconds later, the row was deleted. > Later when I checked the log infor of 'test' table, I only found the > inserted information but not deleted information. > > Can anyone tell me which information is written into hlog when data is > deleted by ttl or in this situation, no information is written into > the hlog. If there is no information of deletion in the log, how can > we guarantee the data recovered by log are correct? > > Thanks! > > Yong -- Harsh J >>> > > > > -- > Harsh J
Re: What happened in hlog if data are deleted cuased by ttl?
Hey Yonghu, You are right that TTL "deletions" (it isn't exactly a delete, its more of a compact-time skip wizardry) do not go to the HLog as "events". Know that TTLs aren't applied "per-cell", they are applied on the whole CF globally. So there is no such thing as a TTL-write or a TTL-delete event. In fact, the Region-level Coprocessor too has no hooks for "TTL-events", as seen at http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/coprocessor/RegionObserver.html, for this doesn't happen on triggers. What you say about the compaction part is wrong however. Compaction too runs a regular store-file scanner to compact, and so does the regular Scan operation, to read (Both use the same file scanning mechanism/code). So there's no difference in how compact or a client scan handle TTL-expired row values from a store file, when reading it up. I also am not able to understand what your sample shell command list shows. As I see it, its shown that the HFile did have the entry in it after you had flushed it. Note that you mentioned the TTL at the CF level when creating the table, not in your "put" statement, and this is a vital point in understanding how TTLs work. On Wed, Aug 22, 2012 at 1:49 PM, yonghu wrote: > I can fully understand normal deletion. But, in my point of view, ttl > deletion is different than the normal deletion. The insertion of ttl > data is recorded in hlog. But the ttl deletion is not recorded by > hlog. So, it failure occurs, should the ttl data be reinserted to data > or can we discard the certain ttl data? Moreover, ttl deletion is not > executed at data compaction time. Scanner needs to periodically scan > each Store file to execute deletion. > > regards! > > Yong > > > > On Tue, Aug 21, 2012 at 5:29 PM, jmozah wrote: >> This helped me http://hadoop-hbase.blogspot.in/2011/12/deletion-in-hbase.html >> >> >> ./Zahoor >> HBase Musings >> >> >> On 14-Aug-2012, at 6:54 PM, Harsh J wrote: >> >>> Hi Yonghu, >>> >>> A timestamp is stored along with each insert. The ttl is maintained at >>> the region-store level. Hence, when the log replays, all entries with >>> expired TTLs are automatically omitted. >>> >>> Also, TTL deletions happen during compactions, and hence do not >>> carry/need Delete events. When scanning a store file, TTL-expired >>> entries are automatically skipped away. >>> >>> On Tue, Aug 14, 2012 at 3:34 PM, yonghu wrote: My hbase version is 0.92. I tried something as follows: 1.Created a table 'test' with 'course' in which ttl=5. 2. inserted one row into the table. 5 seconds later, the row was deleted. Later when I checked the log infor of 'test' table, I only found the inserted information but not deleted information. Can anyone tell me which information is written into hlog when data is deleted by ttl or in this situation, no information is written into the hlog. If there is no information of deletion in the log, how can we guarantee the data recovered by log are correct? Thanks! Yong >>> >>> >>> >>> -- >>> Harsh J >> -- Harsh J
Re: What happened in hlog if data are deleted cuased by ttl?
And also an interesting point is that the ttl data will not exist in hfile. I have made the following test, hbase(main):003:0> create 'test',{TTL=>'200',NAME=>'course'} 0 row(s) in 1.1420 seconds hbase(main):005:0> put 'test','tom','course:english',90 0 row(s) in 0.0320 seconds hbase(main):006:0> flush 'test' 0 row(s) in 0.1680 seconds hbase(main):007:0> scan 'test' ROW COLUMN+CELL tom column=course:english, timestamp=1345623867082, value=90 1 row(s) in 0.0350 seconds ./hbase org.apache.hadoop.hbase.io.hfile.HFile -v -f /hbase/test/abe4d5adaa650cdd46d26dca0bf85b72/course/8c77fb321f934592869f9852f777b22e Scanning -> /hbase/test/abe4d5adaa650cdd46d26dca0bf85b72/course/8c77fb321f934592869f9852f777b22e 12/08/22 10:27:39 INFO hfile.CacheConfig: Allocating LruBlockCache with maximum size 247.9m Scanned kv count -> 1 so, I guess the ttl data is only managed in memstore. But the question is that if memstore doesn't have enough size to accept new incoming ttl data what will happen? Can anybody explain? Thanks! Yong On Wed, Aug 22, 2012 at 10:19 AM, yonghu wrote: > I can fully understand normal deletion. But, in my point of view, ttl > deletion is different than the normal deletion. The insertion of ttl > data is recorded in hlog. But the ttl deletion is not recorded by > hlog. So, it failure occurs, should the ttl data be reinserted to data > or can we discard the certain ttl data? Moreover, ttl deletion is not > executed at data compaction time. Scanner needs to periodically scan > each Store file to execute deletion. > > regards! > > Yong > > > > On Tue, Aug 21, 2012 at 5:29 PM, jmozah wrote: >> This helped me http://hadoop-hbase.blogspot.in/2011/12/deletion-in-hbase.html >> >> >> ./Zahoor >> HBase Musings >> >> >> On 14-Aug-2012, at 6:54 PM, Harsh J wrote: >> >>> Hi Yonghu, >>> >>> A timestamp is stored along with each insert. The ttl is maintained at >>> the region-store level. Hence, when the log replays, all entries with >>> expired TTLs are automatically omitted. >>> >>> Also, TTL deletions happen during compactions, and hence do not >>> carry/need Delete events. When scanning a store file, TTL-expired >>> entries are automatically skipped away. >>> >>> On Tue, Aug 14, 2012 at 3:34 PM, yonghu wrote: My hbase version is 0.92. I tried something as follows: 1.Created a table 'test' with 'course' in which ttl=5. 2. inserted one row into the table. 5 seconds later, the row was deleted. Later when I checked the log infor of 'test' table, I only found the inserted information but not deleted information. Can anyone tell me which information is written into hlog when data is deleted by ttl or in this situation, no information is written into the hlog. If there is no information of deletion in the log, how can we guarantee the data recovered by log are correct? Thanks! Yong >>> >>> >>> >>> -- >>> Harsh J >>
Re: What happened in hlog if data are deleted cuased by ttl?
I can fully understand normal deletion. But, in my point of view, ttl deletion is different than the normal deletion. The insertion of ttl data is recorded in hlog. But the ttl deletion is not recorded by hlog. So, it failure occurs, should the ttl data be reinserted to data or can we discard the certain ttl data? Moreover, ttl deletion is not executed at data compaction time. Scanner needs to periodically scan each Store file to execute deletion. regards! Yong On Tue, Aug 21, 2012 at 5:29 PM, jmozah wrote: > This helped me http://hadoop-hbase.blogspot.in/2011/12/deletion-in-hbase.html > > > ./Zahoor > HBase Musings > > > On 14-Aug-2012, at 6:54 PM, Harsh J wrote: > >> Hi Yonghu, >> >> A timestamp is stored along with each insert. The ttl is maintained at >> the region-store level. Hence, when the log replays, all entries with >> expired TTLs are automatically omitted. >> >> Also, TTL deletions happen during compactions, and hence do not >> carry/need Delete events. When scanning a store file, TTL-expired >> entries are automatically skipped away. >> >> On Tue, Aug 14, 2012 at 3:34 PM, yonghu wrote: >>> My hbase version is 0.92. I tried something as follows: >>> 1.Created a table 'test' with 'course' in which ttl=5. >>> 2. inserted one row into the table. 5 seconds later, the row was deleted. >>> Later when I checked the log infor of 'test' table, I only found the >>> inserted information but not deleted information. >>> >>> Can anyone tell me which information is written into hlog when data is >>> deleted by ttl or in this situation, no information is written into >>> the hlog. If there is no information of deletion in the log, how can >>> we guarantee the data recovered by log are correct? >>> >>> Thanks! >>> >>> Yong >> >> >> >> -- >> Harsh J >
Re: What happened in hlog if data are deleted cuased by ttl?
This helped me http://hadoop-hbase.blogspot.in/2011/12/deletion-in-hbase.html ./Zahoor HBase Musings On 14-Aug-2012, at 6:54 PM, Harsh J wrote: > Hi Yonghu, > > A timestamp is stored along with each insert. The ttl is maintained at > the region-store level. Hence, when the log replays, all entries with > expired TTLs are automatically omitted. > > Also, TTL deletions happen during compactions, and hence do not > carry/need Delete events. When scanning a store file, TTL-expired > entries are automatically skipped away. > > On Tue, Aug 14, 2012 at 3:34 PM, yonghu wrote: >> My hbase version is 0.92. I tried something as follows: >> 1.Created a table 'test' with 'course' in which ttl=5. >> 2. inserted one row into the table. 5 seconds later, the row was deleted. >> Later when I checked the log infor of 'test' table, I only found the >> inserted information but not deleted information. >> >> Can anyone tell me which information is written into hlog when data is >> deleted by ttl or in this situation, no information is written into >> the hlog. If there is no information of deletion in the log, how can >> we guarantee the data recovered by log are correct? >> >> Thanks! >> >> Yong > > > > -- > Harsh J
Re: What happened in hlog if data are deleted cuased by ttl?
Thanks for your response. Can you tell me how the data is deleted due to the ttl? Which module in HBase will trigger deletion? You mentioned the scanner, does it mean the scanner will scan the store file periodically and then deletes the data which expire? regards! Yong On Thu, Aug 16, 2012 at 6:16 AM, Ramkrishna.S.Vasudevan wrote: > Hi > > Just to add on, The HLog is just an edit log. Any transaction updates( > Puts/Deletes) are just added to HLog. It is the Scanner that takes care of > the TTL part which is calculated from the TTL configured at the column > family(Store) level. > > Regards > Ram > >> -Original Message- >> From: Harsh J [mailto:ha...@cloudera.com] >> Sent: Tuesday, August 14, 2012 8:51 PM >> To: user@hbase.apache.org >> Subject: Re: What happened in hlog if data are deleted cuased by ttl? >> >> Yes, TTL deletions are done only during compactions. They aren't >> "Deleted" in the sense of what a Delete insert signifies, but are >> rather eliminated in the write process when new >> storefiles are written out - if the value being written to the >> compacted store has already expired. >> >> On Tue, Aug 14, 2012 at 8:40 PM, yonghu wrote: >> > Hi Hars, >> > >> > Thanks for your reply. If I understand you right, it means the ttl >> > deletion will not reflect in log. >> > >> > On Tue, Aug 14, 2012 at 3:24 PM, Harsh J wrote: >> >> Hi Yonghu, >> >> >> >> A timestamp is stored along with each insert. The ttl is maintained >> at >> >> the region-store level. Hence, when the log replays, all entries >> with >> >> expired TTLs are automatically omitted. >> >> >> >> Also, TTL deletions happen during compactions, and hence do not >> >> carry/need Delete events. When scanning a store file, TTL-expired >> >> entries are automatically skipped away. >> >> >> >> On Tue, Aug 14, 2012 at 3:34 PM, yonghu >> wrote: >> >>> My hbase version is 0.92. I tried something as follows: >> >>> 1.Created a table 'test' with 'course' in which ttl=5. >> >>> 2. inserted one row into the table. 5 seconds later, the row was >> deleted. >> >>> Later when I checked the log infor of 'test' table, I only found >> the >> >>> inserted information but not deleted information. >> >>> >> >>> Can anyone tell me which information is written into hlog when data >> is >> >>> deleted by ttl or in this situation, no information is written into >> >>> the hlog. If there is no information of deletion in the log, how >> can >> >>> we guarantee the data recovered by log are correct? >> >>> >> >>> Thanks! >> >>> >> >>> Yong >> >> >> >> >> >> >> >> -- >> >> Harsh J >> >> >> >> -- >> Harsh J >
RE: What happened in hlog if data are deleted cuased by ttl?
Hi Just to add on, The HLog is just an edit log. Any transaction updates( Puts/Deletes) are just added to HLog. It is the Scanner that takes care of the TTL part which is calculated from the TTL configured at the column family(Store) level. Regards Ram > -Original Message- > From: Harsh J [mailto:ha...@cloudera.com] > Sent: Tuesday, August 14, 2012 8:51 PM > To: user@hbase.apache.org > Subject: Re: What happened in hlog if data are deleted cuased by ttl? > > Yes, TTL deletions are done only during compactions. They aren't > "Deleted" in the sense of what a Delete insert signifies, but are > rather eliminated in the write process when new > storefiles are written out - if the value being written to the > compacted store has already expired. > > On Tue, Aug 14, 2012 at 8:40 PM, yonghu wrote: > > Hi Hars, > > > > Thanks for your reply. If I understand you right, it means the ttl > > deletion will not reflect in log. > > > > On Tue, Aug 14, 2012 at 3:24 PM, Harsh J wrote: > >> Hi Yonghu, > >> > >> A timestamp is stored along with each insert. The ttl is maintained > at > >> the region-store level. Hence, when the log replays, all entries > with > >> expired TTLs are automatically omitted. > >> > >> Also, TTL deletions happen during compactions, and hence do not > >> carry/need Delete events. When scanning a store file, TTL-expired > >> entries are automatically skipped away. > >> > >> On Tue, Aug 14, 2012 at 3:34 PM, yonghu > wrote: > >>> My hbase version is 0.92. I tried something as follows: > >>> 1.Created a table 'test' with 'course' in which ttl=5. > >>> 2. inserted one row into the table. 5 seconds later, the row was > deleted. > >>> Later when I checked the log infor of 'test' table, I only found > the > >>> inserted information but not deleted information. > >>> > >>> Can anyone tell me which information is written into hlog when data > is > >>> deleted by ttl or in this situation, no information is written into > >>> the hlog. If there is no information of deletion in the log, how > can > >>> we guarantee the data recovered by log are correct? > >>> > >>> Thanks! > >>> > >>> Yong > >> > >> > >> > >> -- > >> Harsh J > > > > -- > Harsh J
Re: What happened in hlog if data are deleted cuased by ttl?
Yes, TTL deletions are done only during compactions. They aren't "Deleted" in the sense of what a Delete insert signifies, but are rather eliminated in the write process when new storefiles are written out - if the value being written to the compacted store has already expired. On Tue, Aug 14, 2012 at 8:40 PM, yonghu wrote: > Hi Hars, > > Thanks for your reply. If I understand you right, it means the ttl > deletion will not reflect in log. > > On Tue, Aug 14, 2012 at 3:24 PM, Harsh J wrote: >> Hi Yonghu, >> >> A timestamp is stored along with each insert. The ttl is maintained at >> the region-store level. Hence, when the log replays, all entries with >> expired TTLs are automatically omitted. >> >> Also, TTL deletions happen during compactions, and hence do not >> carry/need Delete events. When scanning a store file, TTL-expired >> entries are automatically skipped away. >> >> On Tue, Aug 14, 2012 at 3:34 PM, yonghu wrote: >>> My hbase version is 0.92. I tried something as follows: >>> 1.Created a table 'test' with 'course' in which ttl=5. >>> 2. inserted one row into the table. 5 seconds later, the row was deleted. >>> Later when I checked the log infor of 'test' table, I only found the >>> inserted information but not deleted information. >>> >>> Can anyone tell me which information is written into hlog when data is >>> deleted by ttl or in this situation, no information is written into >>> the hlog. If there is no information of deletion in the log, how can >>> we guarantee the data recovered by log are correct? >>> >>> Thanks! >>> >>> Yong >> >> >> >> -- >> Harsh J -- Harsh J
Re: What happened in hlog if data are deleted cuased by ttl?
Hi Hars, Thanks for your reply. If I understand you right, it means the ttl deletion will not reflect in log. On Tue, Aug 14, 2012 at 3:24 PM, Harsh J wrote: > Hi Yonghu, > > A timestamp is stored along with each insert. The ttl is maintained at > the region-store level. Hence, when the log replays, all entries with > expired TTLs are automatically omitted. > > Also, TTL deletions happen during compactions, and hence do not > carry/need Delete events. When scanning a store file, TTL-expired > entries are automatically skipped away. > > On Tue, Aug 14, 2012 at 3:34 PM, yonghu wrote: >> My hbase version is 0.92. I tried something as follows: >> 1.Created a table 'test' with 'course' in which ttl=5. >> 2. inserted one row into the table. 5 seconds later, the row was deleted. >> Later when I checked the log infor of 'test' table, I only found the >> inserted information but not deleted information. >> >> Can anyone tell me which information is written into hlog when data is >> deleted by ttl or in this situation, no information is written into >> the hlog. If there is no information of deletion in the log, how can >> we guarantee the data recovered by log are correct? >> >> Thanks! >> >> Yong > > > > -- > Harsh J
Re: What happened in hlog if data are deleted cuased by ttl?
Hi Yonghu, A timestamp is stored along with each insert. The ttl is maintained at the region-store level. Hence, when the log replays, all entries with expired TTLs are automatically omitted. Also, TTL deletions happen during compactions, and hence do not carry/need Delete events. When scanning a store file, TTL-expired entries are automatically skipped away. On Tue, Aug 14, 2012 at 3:34 PM, yonghu wrote: > My hbase version is 0.92. I tried something as follows: > 1.Created a table 'test' with 'course' in which ttl=5. > 2. inserted one row into the table. 5 seconds later, the row was deleted. > Later when I checked the log infor of 'test' table, I only found the > inserted information but not deleted information. > > Can anyone tell me which information is written into hlog when data is > deleted by ttl or in this situation, no information is written into > the hlog. If there is no information of deletion in the log, how can > we guarantee the data recovered by log are correct? > > Thanks! > > Yong -- Harsh J