[jira] [Commented] (HDFS-2115) Transparent compression in HDFS

2019-06-04 Thread Kevin Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16856247#comment-16856247
 ] 

Kevin Yu commented on HDFS-2115:


[~umamaheswararao]

"In our cluster we had implemented the compression support for HDFS 
(HDFS-1640). "

Has this feature been merged into HDFS?

Regards,

Kevin Yu

> Transparent compression in HDFS
> ---
>
> Key: HDFS-2115
> URL: https://issues.apache.org/jira/browse/HDFS-2115
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, hdfs-client
>Reporter: Todd Lipcon
>Priority: Major
>
> In practice, we find that a lot of users store text data in HDFS without 
> using any compression codec. Improving usability of compressible formats like 
> Avro/RCFile helps with this, but we could also help many users by providing 
> an option to transparently compress data as it is stored.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2115) Transparent compression in HDFS

2015-03-10 Thread Hari Sekhon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14355349#comment-14355349
 ] 

Hari Sekhon commented on HDFS-2115:
---

MapR-FS provides transparent compression at the filesystem level - it's a very 
good idea.

It could be done on a directory basis (like MapR) with specific subdirectory 
and file / file extension exclusions, such as a .ignore_compress file in the 
directory.

Keeping files in plain text format makes it easier to use different tools on 
them without worrying about codec or container format support etc, but 
currently one can pay an 8x storage penalty for keeping uncompressed text.

This would solve some real problems for us right now if we had it. It's also 
annoying that many tools are always showing reading textfiles but this is so 
costly on storage without this transparent compression. We actually are stuck 
with a large historical archive of compressed files we can't work with (no zip 
inputformat) and can't leave them uncompressed either because of the storage 
waste which would exceed our cluster capacity. Having to reprocess them all to 
convert to different compression and then hope all future tools can handle that 
format is far less ideal than just having transparent compression.

The increasing proliferation of tools and products on Hadoop exacerbates this 
issue as we can never be sure that the next tool will support format X. 
Everything supports text. Please add transparent compression to make working 
with text better.

Regards,

Hari Sekhon
http://www.linkedin.com/in/harisekhon

> Transparent compression in HDFS
> ---
>
> Key: HDFS-2115
> URL: https://issues.apache.org/jira/browse/HDFS-2115
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: datanode, hdfs-client
>Reporter: Todd Lipcon
>
> In practice, we find that a lot of users store text data in HDFS without 
> using any compression codec. Improving usability of compressible formats like 
> Avro/RCFile helps with this, but we could also help many users by providing 
> an option to transparently compress data as it is stored.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-2115) Transparent compression in HDFS

2012-02-17 Thread Roy Roye (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210351#comment-13210351
 ] 

Roy Roye commented on HDFS-2115:


This http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-36.pdf says:

We analyzed how compression can improve performance and energy efficiency for 
MapReduce workloads. Our results show that compression provides 35-60% energy 
savings for read heavy jobs as well as jobs with highly compressible data. 
Based on our measurements, we construct an algorithm that examines per-job data 
characteristics and IO patterns, and decides when and where to use compression.

> Transparent compression in HDFS
> ---
>
> Key: HDFS-2115
> URL: https://issues.apache.org/jira/browse/HDFS-2115
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node, hdfs client
>Reporter: Todd Lipcon
>
> In practice, we find that a lot of users store text data in HDFS without 
> using any compression codec. Improving usability of compressible formats like 
> Avro/RCFile helps with this, but we could also help many users by providing 
> an option to transparently compress data as it is stored.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2115) Transparent compression in HDFS

2011-11-10 Thread Suresh Srinivas (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147825#comment-13147825
 ] 

Suresh Srinivas commented on HDFS-2115:
---

Todd, given how this functionality shapes up, it could make lot of changes to 
HDFS. Please post a design document, when the mechanism is in reasonable shape.

> Transparent compression in HDFS
> ---
>
> Key: HDFS-2115
> URL: https://issues.apache.org/jira/browse/HDFS-2115
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node, hdfs client
>Reporter: Todd Lipcon
>
> In practice, we find that a lot of users store text data in HDFS without 
> using any compression codec. Improving usability of compressible formats like 
> Avro/RCFile helps with this, but we could also help many users by providing 
> an option to transparently compress data as it is stored.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2115) Transparent compression in HDFS

2011-09-19 Thread Michael Schmitz (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108020#comment-13108020
 ] 

Michael Schmitz commented on HDFS-2115:
---

An easier feature might be to automatically set up the proper codec when 
reading the file extension as input to a job.  Also, when using streaming with 
compression you get the offset as the key, but not when you use an uncompressed 
TSV.  It would be nice if this behavior were uniform.

> Transparent compression in HDFS
> ---
>
> Key: HDFS-2115
> URL: https://issues.apache.org/jira/browse/HDFS-2115
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node, hdfs client
>Reporter: Todd Lipcon
>
> In practice, we find that a lot of users store text data in HDFS without 
> using any compression codec. Improving usability of compressible formats like 
> Avro/RCFile helps with this, but we could also help many users by providing 
> an option to transparently compress data as it is stored.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2115) Transparent compression in HDFS

2011-06-29 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057615#comment-13057615
 ] 

Uma Maheswara Rao G commented on HDFS-2115:
---

Hi Todd,

In our cluster we had implemented the compression support for HDFS (HDFS-1640). 
But here we were not storing the compressed data in DFS. We will decompress and 
store the data. Main goal of our compression is to save the network bandwidth. 
We could achieve ~50-70% improvements in read and write operations.

bq. Not sure when I'd have time to work on it
We will be happy to coordinate our efforts in implemening this feature.



> Transparent compression in HDFS
> ---
>
> Key: HDFS-2115
> URL: https://issues.apache.org/jira/browse/HDFS-2115
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node, hdfs client
>Reporter: Todd Lipcon
>
> In practice, we find that a lot of users store text data in HDFS without 
> using any compression codec. Improving usability of compressible formats like 
> Avro/RCFile helps with this, but we could also help many users by providing 
> an option to transparently compress data as it is stored.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2115) Transparent compression in HDFS

2011-06-29 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13057351#comment-13057351
 ] 

Todd Lipcon commented on HDFS-2115:
---

I'm thinking something like the following:
- DFSClient can optionally specify a compression codec when writing a file. If 
specified, each "packet" in the write pipeline will be compressed with that 
codec.
- DataNode uses a special header in the block meta file to indicate that the 
block is compressed with the given codec.
- To facilitate random access, an index file is kept (either separately or part 
of the block meta file) which contains pairs of (uncompressed offset, 
compressed offset). This allows binary search to each compression block.
- DFSClient reader is modified to support decompression on the client side.
- Some handshaking will be necessary in case the set of codecs available on the 
client and server differ.

Any thoughts on this? Not sure when I'd have time to work on it, but worth 
starting some brainstorming.

> Transparent compression in HDFS
> ---
>
> Key: HDFS-2115
> URL: https://issues.apache.org/jira/browse/HDFS-2115
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: data-node, hdfs client
>Reporter: Todd Lipcon
>
> In practice, we find that a lot of users store text data in HDFS without 
> using any compression codec. Improving usability of compressible formats like 
> Avro/RCFile helps with this, but we could also help many users by providing 
> an option to transparently compress data as it is stored.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira