Re: [infinispan-dev] chunking ability on the JDBC cacheloader

Sanne Grinovero Mon, 23 May 2011 03:41:11 -0700

2011/5/23 Dan Berindei <dan.berin...@gmail.com>:
> On Mon, May 23, 2011 at 7:04 AM, "이희승 (Trustin Lee)" <trus...@gmail.com> 
> wrote:
>> On 05/20/2011 03:54 PM, Manik Surtani wrote:
>>> Is spanning rows the only real solution?  As you say it would mandate using 
>>> transactions to keep multiple rows coherent, and 'm not sure if everyone 
>>> would want to enable transactions for this.
>>
>> There are more hidden overheads.  To update a value, the cache store
>> must determine how many chunks already exists in the cache store and
>> selectively delete and update them.  To simply aggressively, we could
>> delete all chunks and insert new chunks.  Both at the cost of great
>> overhead.


I see no alternative to delete all values for each key, as we don't
know which part of the byte array is dirty;
At which overhead are you referring? We would still store the same
amount of data, slit or not split, but yes multiple statements might
require clever batching.

>>
>> Even MySQL supports a blog up to 4GiB, so I think it's better update the
>> schema?

You mean by accommodating the column size only, or adding the chunk_id ?
I'm just asking, but all of yours and Dan's feedback have already
persuaded me that my initial idea of providing chunking should be
avoided.

>
> +1
>
> BLOBs are only stored in external storage if the actual data can't fit
> in a normal table row, so the only penalty in using a LONGBLOB
> compared to a VARBINARY(255) is 3 extra bytes for the length.
>
> If the user really wants to use a data type with a smaller max length,
> we can just report an error when the data column size is too small. We
> will need to check the length and throw an exception ourselves though,
> with MySQL we can't be sure that it is configured to raise errors when
> a value is truncated.

+1
it might be better to just check for the maximum size of stored values
to fit in "something"; I'm not sure if we can guess the proper size
from database metadata: not only the column maximum size is involved,
but MySQL (to keep it as reference example, but might apply to others)
also has a default maximum packet size for the connections which is
not very big, when using it with Infinispan I always had to
reconfigure the database server.

Also as BLOBs are very poor as primary key, people might want to use a
limited and well known byte size for their keys.

So, shall we just add a method to check to not have surpassed a user
defined threshold, checking for both key and value but on different
configurable sizes? Should an exception be raised in that case?

Cheers,
Sanne

>
> Cheers
> Dan
>
>
>>> On 19 May 2011, at 19:06, Sanne Grinovero wrote:
>>>
>>>> As mentioned on the user forum [1], people setting up a JDBC
>>>> cacheloader need to be able to define the size of columns to be used.
>>>> The Lucene Directory has a feature to autonomously chunk the segment
>>>> contents at a configurable specified byte number, and so has the
>>>> GridFS; still there are other metadata objects which Lucene currently
>>>> doesn't chunk as it's "fairly small" (but undefined and possibly
>>>> growing), and in a more general sense anybody using the JDBC
>>>> cacheloader would face the same problem: what's the dimension I need
>>>> to use ?
>>>>
>>>> While in most cases the maximum size can be estimated, this is still
>>>> not good enough, as when you're wrong the byte array might get
>>>> truncated, so I think the CacheLoader should take care of this.
>>>>
>>>> what would you think of:
>>>> - adding a max_chunk_size option to JdbcStringBasedCacheStoreConfig
>>>> and JdbcBinaryCacheStore
>>>> - have them store in multiple rows the values which would be bigger
>>>> than max_chunk_size
>>>> - this will need transactions, which are currently not being used by
>>>> the cacheloaders
>>>>
>>>> It looks like to me that only the JDBC cacheloader has these issues,
>>>> as the other stores I'm aware of are more "blob oriented". Could it be
>>>> worth to build this abstraction in an higher level instead of in the
>>>> JDBC cacheloader?
>>>>
>>>> Cheers,
>>>> Sanne
>>>>
>>>> [1] - http://community.jboss.org/thread/166760
>>>> _______________________________________________
>>>> infinispan-dev mailing list
>>>> infinispan-dev@lists.jboss.org
>>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>> --
>>> Manik Surtani
>>> ma...@jboss.org
>>> twitter.com/maniksurtani
>>>
>>> Lead, Infinispan
>>> http://www.infinispan.org
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> infinispan-dev mailing list
>>> infinispan-dev@lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>>
>> --
>> Trustin Lee, http://gleamynode.net/
>> _______________________________________________
>> infinispan-dev mailing list
>> infinispan-dev@lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>
> _______________________________________________
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

_______________________________________________
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] chunking ability on the JDBC cacheloader

Reply via email to