Re: [infinispan-dev] chunking ability on the JDBC cacheloader

Sanne Grinovero Mon, 23 May 2011 10:06:18 -0700

2011/5/23 "이희승 (Trustin Lee)" <trus...@gmail.com>:
> On 05/23/2011 07:40 PM, Sanne Grinovero wrote:
>> 2011/5/23 Dan Berindei<dan.berin...@gmail.com>:
>>> On Mon, May 23, 2011 at 7:04 AM, "이희승 (Trustin Lee)"<trus...@gmail.com>  
>>> wrote:
>>>> On 05/20/2011 03:54 PM, Manik Surtani wrote:
>>>>> Is spanning rows the only real solution?  As you say it would mandate 
>>>>> using transactions to keep multiple rows coherent, and 'm not sure if 
>>>>> everyone would want to enable transactions for this.
>>>>
>>>> There are more hidden overheads.  To update a value, the cache store
>>>> must determine how many chunks already exists in the cache store and
>>>> selectively delete and update them.  To simply aggressively, we could
>>>> delete all chunks and insert new chunks.  Both at the cost of great
>>>> overhead.
>>
>> I see no alternative to delete all values for each key, as we don't
>> know which part of the byte array is dirty;
>> At which overhead are you referring? We would still store the same
>> amount of data, slit or not split, but yes multiple statements might
>> require clever batching.
>>
>>>>
>>>> Even MySQL supports a blog up to 4GiB, so I think it's better update the
>>>> schema?
>>
>> You mean by accommodating the column size only, or adding the chunk_id ?
>> I'm just asking, but all of yours and Dan's feedback have already
>> persuaded me that my initial idea of providing chunking should be
>> avoided.
>
> I mean user's updating the column type of the schema.
>
>>>
>>> +1
>>>
>>> BLOBs are only stored in external storage if the actual data can't fit
>>> in a normal table row, so the only penalty in using a LONGBLOB
>>> compared to a VARBINARY(255) is 3 extra bytes for the length.
>>>
>>> If the user really wants to use a data type with a smaller max length,
>>> we can just report an error when the data column size is too small. We
>>> will need to check the length and throw an exception ourselves though,
>>> with MySQL we can't be sure that it is configured to raise errors when
>>> a value is truncated.
>>
>> +1
>> it might be better to just check for the maximum size of stored values
>> to fit in "something"; I'm not sure if we can guess the proper size
>> from database metadata: not only the column maximum size is involved,
>> but MySQL (to keep it as reference example, but might apply to others)
>> also has a default maximum packet size for the connections which is
>> not very big, when using it with Infinispan I always had to
>> reconfigure the database server.
>>
>> Also as BLOBs are very poor as primary key, people might want to use a
>> limited and well known byte size for their keys.
>>
>> So, shall we just add a method to check to not have surpassed a user
>> defined threshold, checking for both key and value but on different
>> configurable sizes? Should an exception be raised in that case?
>
> Exception will be raised by JDBC driver if key doesn't fit into the key
> column, so we could simply wrap it?


If that always happens, the I wouldn't wrap it. entering the business
of wrapping driver specific exceptions is very tricky ;)
I was more concerned about the fact that some database might not raise
any exception ? Not sure if that's the case, and possibly not our
problem.

Sanne

>
>>
>> Cheers,
>> Sanne
>>
>>>
>>> Cheers
>>> Dan
>>>
>>>
>>>>> On 19 May 2011, at 19:06, Sanne Grinovero wrote:
>>>>>
>>>>>> As mentioned on the user forum [1], people setting up a JDBC
>>>>>> cacheloader need to be able to define the size of columns to be used.
>>>>>> The Lucene Directory has a feature to autonomously chunk the segment
>>>>>> contents at a configurable specified byte number, and so has the
>>>>>> GridFS; still there are other metadata objects which Lucene currently
>>>>>> doesn't chunk as it's "fairly small" (but undefined and possibly
>>>>>> growing), and in a more general sense anybody using the JDBC
>>>>>> cacheloader would face the same problem: what's the dimension I need
>>>>>> to use ?
>>>>>>
>>>>>> While in most cases the maximum size can be estimated, this is still
>>>>>> not good enough, as when you're wrong the byte array might get
>>>>>> truncated, so I think the CacheLoader should take care of this.
>>>>>>
>>>>>> what would you think of:
>>>>>> - adding a max_chunk_size option to JdbcStringBasedCacheStoreConfig
>>>>>> and JdbcBinaryCacheStore
>>>>>> - have them store in multiple rows the values which would be bigger
>>>>>> than max_chunk_size
>>>>>> - this will need transactions, which are currently not being used by
>>>>>> the cacheloaders
>>>>>>
>>>>>> It looks like to me that only the JDBC cacheloader has these issues,
>>>>>> as the other stores I'm aware of are more "blob oriented". Could it be
>>>>>> worth to build this abstraction in an higher level instead of in the
>>>>>> JDBC cacheloader?
>>>>>>
>>>>>> Cheers,
>>>>>> Sanne
>>>>>>
>>>>>> [1] - http://community.jboss.org/thread/166760
>>>>>> _______________________________________________
>>>>>> infinispan-dev mailing list
>>>>>> infinispan-dev@lists.jboss.org
>>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>>
>>>>> --
>>>>> Manik Surtani
>>>>> ma...@jboss.org
>>>>> twitter.com/maniksurtani
>>>>>
>>>>> Lead, Infinispan
>>>>> http://www.infinispan.org
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> infinispan-dev mailing list
>>>>> infinispan-dev@lists.jboss.org
>>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>
>>>>
>>>> --
>>>> Trustin Lee, http://gleamynode.net/
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev@lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>>
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev@lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev@lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
> --
> Trustin Lee, http://gleamynode.net/
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

_______________________________________________
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] chunking ability on the JDBC cacheloader

Reply via email to