[ 
https://issues.apache.org/jira/browse/OOZIE-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13716603#comment-13716603
 ] 

Virag Kothari commented on OOZIE-1462:
--------------------------------------

Thanks Robert and Mohammad for your comments

bq. As this would require a db change, would we try to do this for 4.0 or wait 
for 5.0?

I believe we have to do this for 5.0 as db schema change is required.

bq. If a compressed column is a binary lob (byte[]) does that mean you'd no 
longer be able to manually go and look in the database to see the data? Or is 
there some way to decompress it from a query? (Sorry for the noob question; I'm 
not a db guy)

Correct. I dont think there is an sql way to decompress, but should be doable 
through stored procedure

bq. Have you looked into Derby support for compression? I bet it will have some 
shortcommings compared to the other databases that might make things difficult.

Probably, I didn't clarify that we will not be using db compression. Oozie will 
do the compression and store data in binary form. So there shouldn't be any 
problem with any particular db. 

bq. How much performance degradation for compression and decompression(most 
frequent one)?

It would be a CPU intensive task. I will check the performance overhead and put 
the results here.

bq. Will this compression/decompression be active for all the time? Or it could 
be configurable using oozie-site.xml.

It will be configurable and can be switched on/off

bq. What other fields? Or all *lob fields.

Usually only lobs are inefficient as usually data is not stored inline but in a 
separate file outside the tablerow.

bq. Please consider MySQL case as well.

In general compressing data will benefit as less data will be stored in db, so 
fetch time will be less. Also, less traffic over the network. 
MySQL supports inline storage of data when size of col is less than 768 bytes. 
So performance will greatly improve if size of data after compression becomes 
less than 768. 



                
> Compress lob columns before storing in database
> -----------------------------------------------
>
>                 Key: OOZIE-1462
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1462
>             Project: Oozie
>          Issue Type: Improvement
>            Reporter: Virag Kothari
>            Assignee: Virag Kothari
>
> Storing huge data in lobs is very inefficient. Making Oozie compress the data 
> before storing will reduce size of data to be stored in lobs and help in 
> reducing the time for queries. Also most databases like oracle, mysql support 
> storing lob data in tablerow (inline) if the data is of smaller size. Inline 
> storage has much better performance compared to outline storage (storage 
> outside of tablerow)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to