Re: [julia-users] Compressing .jld files

Robert Feldt Mon, 10 Nov 2014 00:36:12 -0800

Has there been any progress on a (stand-alone) Blosc package for Julia? If 
not I might have time to contribute since I need a fast compressor for a 
project. If there is any code/start for it I'd appreciate it though.


Cheers,

Robert Feldt

Den tisdagen den 2:e september 2014 kl. 21:47:33 UTC+2 skrev Douglas Bates:
>
> Would it be reasonable to create a Blosc package or it is best to 
> incorporate it directly into the HDF5 package?  If a separate package is 
> reasonable I could start on it, as I was the one who suggested this in the 
> first place.
>
> On Tuesday, September 2, 2014 2:43:15 PM UTC-5, Tim Holy wrote:
>>
>> All these testimonials do make it sound promising. Even three-fold 
>> compression 
>> is a pretty big deal. 
>>
>> One disadvantage to compression is that it makes mmap impossible. But, 
>> since 
>> HDF5 supports hyperslabs, that's not as big a deal as it would have been. 
>>
>> --Tim 
>>
>> On Tuesday, September 02, 2014 12:11:55 PM Jake Bolewski wrote: 
>> > I've used Blosc in the past with great success.  Oftentimes it is 
>> faster 
>> > than the uncompressed version if IO is the bottleneck.  The compression 
>> > ratios are not great but that is really not the point. 
>> > 
>> > On Tuesday, September 2, 2014 2:09:20 PM UTC-4, Stefan Karpinski wrote: 
>> > > That looks pretty sweet. It seems to avoid a lot of the pitfalls of 
>> > > naively compressing data files while still getting the benefits. It 
>> would 
>> > > be great to support that in JLD, maybe even turned on by default. 
>> > > 
>> > > 
>> > > On Tue, Sep 2, 2014 at 1:35 PM, Kevin Squire <kevin....@gmail.com 
>> > > 
>> > > <javascript:>> wrote: 
>> > >> Just to hype blosc a little more, see 
>> > >> 
>> > >> http://www.blosc.org/blosc-in-depth.html 
>> > >> 
>> > >> The main feature is that data is chunked so that the compressed 
>> chunk 
>> > >> size fits into L1 cache, and is then decompressed and used there. 
>>  There 
>> > >> are a few more buzzwords (multithreading, simd) in the link above. 
>> Worth 
>> > >> exploring where this might be useful in Julia. 
>> > >> 
>> > >> Cheers, 
>> > >> 
>> > >>   Kevin 
>> > >> 
>> > >> On Tuesday, September 2, 2014, Tim Holy <tim....@gmail.com 
>> <javascript:>> 
>> > >> 
>> > >> wrote: 
>> > >>> HDF5/JLD does support compression: 
>> > >>> 
>> > >>> 
>> https://github.com/timholy/HDF5.jl/blob/master/doc/hdf5.md#reading-and-w 
>> > >>> riting-data 
>> > >>> 
>> > >>> But it's not turned on by default. Matlab uses compression by 
>> default, 
>> > >>> and 
>> > >>> I've found it's a huge bottleneck in terms of performance 
>> > >>> ( 
>> > >>> 
>> http://www.mathworks.com/matlabcentral/fileexchange/39721-save-mat-files 
>> > >>> -more-quickly). But perhaps there's a good middle ground. It would 
>> take 
>> > >>> someone 
>> > >>> doing a little experimentation to see what the compromises are. 
>> > >>> 
>> > >>> --Tim 
>> > >>> 
>> > >>> On Tuesday, September 02, 2014 08:30:39 AM Douglas Bates wrote: 
>> > >>> > Now that the JLD format can handle DataFrame objects I would like 
>> to 
>> > >>> 
>> > >>> switch 
>> > >>> 
>> > >>> > from storing data sets in .RData format to .jld format.  Datasets 
>> > >>> 
>> > >>> stored in 
>> > >>> 
>> > >>> > .RData format are compressed after they are written.  The default 
>> > >>> > compression is gzip.  Bzip2 and xz compression are also 
>> available. 
>> > >>> > The 
>> > >>> > compression can make a substantial difference in the file size 
>> because 
>> > >>> 
>> > >>> the 
>> > >>> 
>> > >>> > data values are often highly repetitive. 
>> > >>> > 
>> > >>> > JLD is different in scope in that .jld files can be queried using 
>> > >>> 
>> > >>> external 
>> > >>> 
>> > >>> > programs like h5ls and the files can have new data added or 
>> existing 
>> > >>> 
>> > >>> data 
>> > >>> 
>> > >>> > edited or removed.  The .RData format is an archival format. 
>>  Once the 
>> > >>> 
>> > >>> file 
>> > >>> 
>> > >>> > is written it cannot be modified in place. 
>> > >>> > 
>> > >>> > Given these differences I can appreciate that JLD files are not 
>> > >>> 
>> > >>> compressed. 
>> > >>> 
>> > >>> >  Nevertheless I think it would be useful to adopt a convention in 
>> the 
>> > >>> 
>> > >>> JLD 
>> > >>> 
>> > >>> > module for accessing data from files with a .jld.xz or .jld.7z 
>> > >>> 
>> > >>> extension. 
>> > >>> 
>> > >>> >  It could be as simple as uncompressing the files in a temporary 
>> > >>> 
>> > >>> directory, 
>> > >>> 
>> > >>> > reading then removing, or it could be more sophisticated.  I 
>> notice 
>> > >>> 
>> > >>> that my 
>> > >>> 
>> > >>> > versions of libjulia.so on an Ubuntu 64-bit system are linked 
>> against 
>> > >>> 
>> > >>> both 
>> > >>> 
>> > >>> > libz.so and liblzma.so 
>> > >>> > 
>> > >>> > $ ldd /usr/lib/x86_64-linux-gnu/julia/libjulia.so 
>> > >>> > linux-vdso.so.1 =>  (0x00007fff5214f000) 
>> > >>> > libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 
>> (0x00007f62932ee000) 
>> > >>> > libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f62930d5000) 
>> > >>> > libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f6292dce000) 
>> > >>> > librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 
>> (0x00007f6292bc6000) 
>> > >>> > libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 
>> > >>> > (0x00007f62929a8000) 
>> > >>> > libunwind.so.8 => /usr/lib/x86_64-linux-gnu/libunwind.so.8 
>> > >>> > (0x00007f629278c000) 
>> > >>> > libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 
>> > >>> > (0x00007f6292488000) 
>> > >>> > libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 
>> > >>> 
>> > >>> (0x00007f6292272000) 
>> > >>> 
>> > >>> > libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f6291eab000) 
>> > >>> > /lib64/ld-linux-x86-64.so.2 (0x00007f62944b3000) 
>> > >>> > liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 
>> > >>> > (0x00007f6291c89000) 
>> > >>> > 
>> > >>> > 
>> > >>> > AFAIK the user-level interface to gzip requires the GZip package. 
>> > >>> 
>> > >>> Unless I 
>> > >>> 
>> > >>> > have missed something (always a possibility) there is no 
>> user-level 
>> > >>> > interface to liblzma in Julia.  If the library is going to be 
>> linked 
>> > >>> > anyway, would it make sense to provide a user-level interface in 
>> > >>> > Julia? 
>>
>>

Re: [julia-users] Compressing .jld files

Reply via email to