Re: Text or....

2018-04-04 Thread Jon Haddad
Depending on the compression rate, I think it would generate less garbage on 
the Cassandra side if you compressed it client side.  Something to test out.


> On Apr 4, 2018, at 7:19 AM, Jeff Jirsa  wrote:
> 
> Compressing server side and validating checksums is hugely important in the 
> more frequently used versions of cassandra - so since you probably want to 
> run compression on the server anyway, I’m not sure why you’d compress it 
> twice 
> 
> -- 
> Jeff Jirsa
> 
> 
> On Apr 4, 2018, at 6:23 AM, DuyHai Doan  > wrote:
> 
>> Compressing client-side is better because it will save:
>> 
>> 1) a lot of bandwidth on the network
>> 2) a lot of Cassandra CPU because no decompression server-side
>> 3) a lot of Cassandra HEAP because the compressed blob should be relatively 
>> small (text data compress very well) compared to the raw size
>> 
>> On Wed, Apr 4, 2018 at 2:59 PM, Jeronimo de A. Barros 
>> mailto:jeronimo.bar...@gmail.com>> wrote:
>> Hi,
>> 
>> We use a pseudo file-system table where the chunks are blobs of 64 KB and we 
>> never had any performance issue.
>> 
>> Primary-key structure is ((file-uuid), chunck-id).
>> 
>> Jero
>> 
>> On Wed, Apr 4, 2018 at 9:25 AM, shalom sagges > > wrote:
>> Hi All, 
>> 
>> A certain application is writing ~55,000 characters for a single row. Most 
>> of these characters are entered to one column with "text" data type. 
>> 
>> This looks insanely large for one row. 
>> Would you suggest to change the data type from "text" to BLOB or any other 
>> option that might fit this scenario?
>> 
>> Thanks!
>> 
>> 



Re: Text or....

2018-04-04 Thread Jeff Jirsa
Compressing server side and validating checksums is hugely important in the 
more frequently used versions of cassandra - so since you probably want to run 
compression on the server anyway, I’m not sure why you’d compress it twice 

-- 
Jeff Jirsa


> On Apr 4, 2018, at 6:23 AM, DuyHai Doan  wrote:
> 
> Compressing client-side is better because it will save:
> 
> 1) a lot of bandwidth on the network
> 2) a lot of Cassandra CPU because no decompression server-side
> 3) a lot of Cassandra HEAP because the compressed blob should be relatively 
> small (text data compress very well) compared to the raw size
> 
>> On Wed, Apr 4, 2018 at 2:59 PM, Jeronimo de A. Barros 
>>  wrote:
>> Hi,
>> 
>> We use a pseudo file-system table where the chunks are blobs of 64 KB and we 
>> never had any performance issue.
>> 
>> Primary-key structure is ((file-uuid), chunck-id).
>> 
>> Jero
>> 
>>> On Wed, Apr 4, 2018 at 9:25 AM, shalom sagges  
>>> wrote:
>>> Hi All, 
>>> 
>>> A certain application is writing ~55,000 characters for a single row. Most 
>>> of these characters are entered to one column with "text" data type. 
>>> 
>>> This looks insanely large for one row. 
>>> Would you suggest to change the data type from "text" to BLOB or any other 
>>> option that might fit this scenario?
>>> 
>>> Thanks!
>> 
> 


Re: Text or....

2018-04-04 Thread DuyHai Doan
Compressing client-side is better because it will save:

1) a lot of bandwidth on the network
2) a lot of Cassandra CPU because no decompression server-side
3) a lot of Cassandra HEAP because the compressed blob should be relatively
small (text data compress very well) compared to the raw size

On Wed, Apr 4, 2018 at 2:59 PM, Jeronimo de A. Barros <
jeronimo.bar...@gmail.com> wrote:

> Hi,
>
> We use a pseudo file-system table where the chunks are blobs of 64 KB and
> we never had any performance issue.
>
> Primary-key structure is ((file-uuid), chunck-id).
>
> Jero
>
> On Wed, Apr 4, 2018 at 9:25 AM, shalom sagges 
> wrote:
>
>> Hi All,
>>
>> A certain application is writing ~55,000 characters for a single row.
>> Most of these characters are entered to one column with "text" data type.
>>
>> This looks insanely large for one row.
>> Would you suggest to change the data type from "text" to BLOB or any
>> other option that might fit this scenario?
>>
>> Thanks!
>>
>
>


Re: Text or....

2018-04-04 Thread Jeronimo de A. Barros
Hi,

We use a pseudo file-system table where the chunks are blobs of 64 KB and
we never had any performance issue.

Primary-key structure is ((file-uuid), chunck-id).

Jero

On Wed, Apr 4, 2018 at 9:25 AM, shalom sagges 
wrote:

> Hi All,
>
> A certain application is writing ~55,000 characters for a single row. Most
> of these characters are entered to one column with "text" data type.
>
> This looks insanely large for one row.
> Would you suggest to change the data type from "text" to BLOB or any other
> option that might fit this scenario?
>
> Thanks!
>


Re: Text or....

2018-04-04 Thread Nicolas Guyomar
Hi Shalom,

You might want to compress on application side before inserting in
Cassandra, using the algorithm on your choice, based on compression ratio
and speed that you found acceptable with your use case


On 4 April 2018 at 14:38, shalom sagges  wrote:

> Thanks DuyHai!
>
> I'm using the default table compression. Is there anything else I should
> look into?
> Regarding the table compression, I understand that for write heavy tables,
> it's best to keep the default and not compress it further. Have I
> understood correctly?
>
> On Wed, Apr 4, 2018 at 3:28 PM, DuyHai Doan  wrote:
>
>> Compress it and stores it as a blob.
>> Unless you ever need to index it but I guess even with SASI indexing a so
>> huge text block is not a good idea
>>
>> On Wed, Apr 4, 2018 at 2:25 PM, shalom sagges 
>> wrote:
>>
>>> Hi All,
>>>
>>> A certain application is writing ~55,000 characters for a single row.
>>> Most of these characters are entered to one column with "text" data type.
>>>
>>> This looks insanely large for one row.
>>> Would you suggest to change the data type from "text" to BLOB or any
>>> other option that might fit this scenario?
>>>
>>> Thanks!
>>>
>>
>>
>


Re: Text or....

2018-04-04 Thread shalom sagges
Thanks DuyHai!

I'm using the default table compression. Is there anything else I should
look into?
Regarding the table compression, I understand that for write heavy tables,
it's best to keep the default and not compress it further. Have I
understood correctly?

On Wed, Apr 4, 2018 at 3:28 PM, DuyHai Doan  wrote:

> Compress it and stores it as a blob.
> Unless you ever need to index it but I guess even with SASI indexing a so
> huge text block is not a good idea
>
> On Wed, Apr 4, 2018 at 2:25 PM, shalom sagges 
> wrote:
>
>> Hi All,
>>
>> A certain application is writing ~55,000 characters for a single row.
>> Most of these characters are entered to one column with "text" data type.
>>
>> This looks insanely large for one row.
>> Would you suggest to change the data type from "text" to BLOB or any
>> other option that might fit this scenario?
>>
>> Thanks!
>>
>
>


Re: Text or....

2018-04-04 Thread DuyHai Doan
Compress it and stores it as a blob.
Unless you ever need to index it but I guess even with SASI indexing a so
huge text block is not a good idea

On Wed, Apr 4, 2018 at 2:25 PM, shalom sagges 
wrote:

> Hi All,
>
> A certain application is writing ~55,000 characters for a single row. Most
> of these characters are entered to one column with "text" data type.
>
> This looks insanely large for one row.
> Would you suggest to change the data type from "text" to BLOB or any other
> option that might fit this scenario?
>
> Thanks!
>


Text or....

2018-04-04 Thread shalom sagges
Hi All,

A certain application is writing ~55,000 characters for a single row. Most
of these characters are entered to one column with "text" data type.

This looks insanely large for one row.
Would you suggest to change the data type from "text" to BLOB or any other
option that might fit this scenario?

Thanks!


Store JSON as text or UTF-8 encoded blobs?

2015-08-23 Thread Kevin Burton
Hey.

I’m considering migrating my DB from using multiple columns to just 2
columns, with the second one being a JSON object.  Is there going to be any
real difference between TEXT or UTF-8 encoded BLOB?

I guess it would probably be easier to get tools like spark to parse the
object as JSON if it’s represented as a BLOB.

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>