[julia-users] HDF5 file id biger then txt. What wrong?

2015-07-21 Thread paul analyst
I have data in txt file, some milons like this:
0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,2,0,0,0,2,0,0,0,0,1
0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,1

Coding win1250.

size of dane.txt is 1.3 GB

D=readcsv("dane.txt")
k,l=size(D)

using HDF5, JLD
hfi=h5open("D.h5","w")
close(hfi)

fid = h5open("D.h5","r+")
g = fid["/"]
dset1 = d_create(g, "/D", datatype(Int64), dataspace(k,l))
dset1[:,:]=D
close(fid)

After save to h5 file the file has 6.3 GB ? Why new file is 4 times biger?
Paul


Re: [julia-users] HDF5 file id biger then txt. What wrong?

2015-07-21 Thread Stefan Karpinski
In your example data, each value is represented with two bytes: one for the
value, one for a comma or newline. Each Int64 value is 8 bytes. If all your
values are between 0 and 255, you could use UInt8 to represent them and cut
the size in half.

On Tue, Jul 21, 2015 at 1:16 PM, paul analyst  wrote:

> I have data in txt file, some milons like this:
> 0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0
> 0,0,0,0,0,0,0,2,0,0,0,2,0,0,0,0,1
> 0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,1
>
> Coding win1250.
>
> size of dane.txt is 1.3 GB
>
> D=readcsv("dane.txt")
> k,l=size(D)
>
> using HDF5, JLD
> hfi=h5open("D.h5","w")
> close(hfi)
>
> fid = h5open("D.h5","r+")
> g = fid["/"]
> dset1 = d_create(g, "/D", datatype(Int64), dataspace(k,l))
> dset1[:,:]=D
> close(fid)
>
> After save to h5 file the file has 6.3 GB ? Why new file is 4 times biger?
> Paul
>


Re: [julia-users] HDF5 file id biger then txt. What wrong?

2015-07-21 Thread Erik Schnetter
HDF5 file support compression. This is enabled via a flag when writing the
file; when reading, it is automatically decompressed. I assume that
compression would greatly reduce the file size.

-erik

On Tue, Jul 21, 2015 at 1:21 PM, Stefan Karpinski 
wrote:

> In your example data, each value is represented with two bytes: one for
> the value, one for a comma or newline. Each Int64 value is 8 bytes. If all
> your values are between 0 and 255, you could use UInt8 to represent them
> and cut the size in half.
>
> On Tue, Jul 21, 2015 at 1:16 PM, paul analyst 
> wrote:
>
>> I have data in txt file, some milons like this:
>> 0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0
>> 0,0,0,0,0,0,0,2,0,0,0,2,0,0,0,0,1
>> 0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,1
>>
>> Coding win1250.
>>
>> size of dane.txt is 1.3 GB
>>
>> D=readcsv("dane.txt")
>> k,l=size(D)
>>
>> using HDF5, JLD
>> hfi=h5open("D.h5","w")
>> close(hfi)
>>
>> fid = h5open("D.h5","r+")
>> g = fid["/"]
>> dset1 = d_create(g, "/D", datatype(Int64), dataspace(k,l))
>> dset1[:,:]=D
>> close(fid)
>>
>> After save to h5 file the file has 6.3 GB ? Why new file is 4 times biger?
>> Paul
>>
>
>


-- 
Erik Schnetter 
http://www.perimeterinstitute.ca/personal/eschnetter/


Re: [julia-users] HDF5 file id biger then txt. What wrong?

2015-07-21 Thread Stefan Karpinski
Yes, that could be even more effective.

On Tue, Jul 21, 2015 at 2:09 PM, Erik Schnetter  wrote:

> HDF5 file support compression. This is enabled via a flag when writing the
> file; when reading, it is automatically decompressed. I assume that
> compression would greatly reduce the file size.
>
> -erik
>
> On Tue, Jul 21, 2015 at 1:21 PM, Stefan Karpinski 
> wrote:
>
>> In your example data, each value is represented with two bytes: one for
>> the value, one for a comma or newline. Each Int64 value is 8 bytes. If all
>> your values are between 0 and 255, you could use UInt8 to represent them
>> and cut the size in half.
>>
>> On Tue, Jul 21, 2015 at 1:16 PM, paul analyst 
>> wrote:
>>
>>> I have data in txt file, some milons like this:
>>> 0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0
>>> 0,0,0,0,0,0,0,2,0,0,0,2,0,0,0,0,1
>>> 0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,1
>>>
>>> Coding win1250.
>>>
>>> size of dane.txt is 1.3 GB
>>>
>>> D=readcsv("dane.txt")
>>> k,l=size(D)
>>>
>>> using HDF5, JLD
>>> hfi=h5open("D.h5","w")
>>> close(hfi)
>>>
>>> fid = h5open("D.h5","r+")
>>> g = fid["/"]
>>> dset1 = d_create(g, "/D", datatype(Int64), dataspace(k,l))
>>> dset1[:,:]=D
>>> close(fid)
>>>
>>> After save to h5 file the file has 6.3 GB ? Why new file is 4 times
>>> biger?
>>> Paul
>>>
>>
>>
>
>
> --
> Erik Schnetter 
> http://www.perimeterinstitute.ca/personal/eschnetter/
>