[go-nuts] Re: files, readers, byte arrays (slices?), byte buffers and http.requests

2016-08-04 Thread Tamás Gulácsi
If md5 is enough at the end, use an io.T eeReader. If not, you need to buffer 
it, with bytes.Buffer. That can be reused with sync.Pool (don't forget the 
Reset).

For mime, the first 1024 bytes is enough. Read that into a [1024]byte and 
create a Reader with io.MultiReader.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[go-nuts] Re: files, readers, byte arrays (slices?), byte buffers and http.requests

2016-08-03 Thread Sri G
Doh. Thanks. I did the setup but didnt click "execute".

Revisiting this because its now a bottleneck since it directly impact user 
experience (how long a request will take to process) and scalability 
(requests per second a single instance can handle). It wasn't pre-mature 
optimization, rather proper architecture planning :)

In C, the request would come into a ring buffer of Struct of Arrays (read 
SoAs vs AoS on Intel x86) -> a pointer to the post data is kept. This is 
used to check the mime type as well as compute the md5. Then it passed to 
be written to disk before it is released. No copies are needed.

How can I accomplish this in idiomatically Go? When I say idiomatic, I mean 
efficient in space, time and verbosity depending on the requirements and 
most importantly, not fighting the language. 

I'm having difficulty grokking whether a command copies data or uses a 
reference to the underlying buffer (pointer). Or does everything copy 
because data needs to be in each stack for each go routine? 

I've read the source code of io.copy, if there is a reader.ReadFrom or 
writer.WriteTo, the copy uses the existing buffer, avoiding allocation and 
a copy. However crypto/md5 does not have either of these, so its not 
possible to compute the md5 without copying data. Is this because the md5 
library is written for streaming data vs static data?

Is there a way to accomplish this? i.e. here's a buffer of data, compute 
the md5 on it.

Re: the mimetype, I should be able to create a 1024 byte slice of the file 
and pass it to mimemagic. This should avoid the copy.


On Saturday, July 2, 2016 at 9:27:15 PM UTC-4, Dave Cheney wrote:
>
> The hash is always the same because you ask for the hash value before 
> writing any data through it with io.Copy.


On Saturday, July 2, 2016 at 9:27:15 PM UTC-4, Dave Cheney wrote:
>
> The hash is always the same because you ask for the hash value before 
> writing any data through it with io.Copy. 

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[go-nuts] Re: files, readers, byte arrays (slices?), byte buffers and http.requests

2016-07-02 Thread Dave Cheney
The hash is always the same because you ask for the hash value before writing 
any data through it with io.Copy. 

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[go-nuts] Re: files, readers, byte arrays (slices?), byte buffers and http.requests

2016-07-02 Thread Sri G
Update:

Adding file.Seek(0,0) does fix the issue in Version 2. The uploaded file is 
the correct size on disk with the correct md5. Without it, the uploaded 
file which is saved is missing the first 1024 bytes. This makes sense.

There is something wrong with the way the md5 is calculated, it keeps 
giving the same hash. Any ideas?

This version, while most likely not idiomatic, works:

mimebuf := make([]byte, 1024)
 _, err = file.Read(mimebuf)


mime := mimemagic.Match("", mimebuf)

file.Seek(0, 0)

checksum := md5.New()

io.Copy(checksum, file)

md5hex := hex.EncodeToString(checksum.Sum(nil))
fmt.Println("md5=", md5hex)

file.Seek(0, 0)
io.Copy(f, file)

It would be much appreciated if someone understands the idiomatic way to do 
this with and can explain it.

On Saturday, July 2, 2016 at 5:48:45 PM UTC-4, Sri G wrote:
>
> Thanks for the pointer. I also found this helpful Asynchronously Split an 
> io.Reader in Go (golang) « Rodaine 
>  but I'm 
> still missing something.
>
> Version 1: the uploaded file is 1024 bytes extra at the end (too big):
>
> mimebuf := make([]byte, 1024)
> _, err = file.Read(mimebuf)
>
> mime := mimemagic.Match("", mimebuf)
>
> fileReader := io.MultiReader(bytes.NewReader(mimebuf), file)
>
> checksum := md5.New()
>
> b := io.TeeReader(fileReader, checksum)
>
> md5hex := hex.EncodeToString(checksum.Sum(nil))
>
> // Save file
> io.Copy(f, b)
>
> Version 2: the uploaded file is truncated by 1024 byte (too small): (this 
> makes sense since the first 1024 bytes of file was consumed)
>
> mimebuf := make([]byte, 1024)
> _, err = file.Read(mimebuf)
>
> mime := mimemagic.Match("", mimebuf)
>
> checksum := md5.New()
>
> // Adding file.Seek(0,0) here does not fix this issue
>
> b := io.TeeReader(file, checksum)
>
> md5hex := hex.EncodeToString(checksum.Sum(nil))
>
> // Save file
> io.Copy(f, b)
>
>
> What is incorrect which is causing this? How do I get the goldilocks 
> version that's just right?
>
> On Saturday, July 2, 2016 at 3:18:51 AM UTC-4, Tamás Gulácsi wrote:
>>
>>
>> 2016. július 2., szombat 8:15:19 UTC+2 időpontban Sri G a következőt írta:
>>>
>>> I'm working on receiving uploads through a form.
>>>
>>> The tricky part is validation.
>>>
>>> I attempt to read the first 1024 bytes to check the mime of the file and 
>>> then if valid read the rest and hash it and also save it to disk. Reading 
>>> the mime type is successful and I've gotten it to work by chaining 
>>> TeeReader but it seems very hackish. Whats the idiomatic way to do this?
>>>
>>> I'm trying something like this: 
>>>
>>>
>>> // Parse my multi part form 
>>> ...
>>> // Get file handle
>>> file, err := fh.Open()
>>>
>>> var a bytes.Buffer
>>>
>>> io.CopyN(, file, 1024)
>>>
>>> mime := mimemagic.Match("", a.Bytes())
>>> // Check mime type (this works fine)
>>>
>>> I'm trying to seek a stream so this should be no-op
>>> file.Seek(0, 0)
>>>
>>> The file stored on disk is 1KB larger than the original so it appears to 
>>> be re-copying the entire file and appending it to bytes.Buffer
>>> io.Copy(, file)
>>>
>>> checksum := md5.New()
>>> b := io.TeeReader(, checksum)
>>>
>>> md5hex := hex.EncodeToString(checksum.Sum(nil))
>>> fmt.Println("md5=", md5hex)
>>>
>>> //Open file f for writing to disk
>>> ...
>>> //Save file
>>> io.Copy(f, b)
>>>
>>>
>>> Checked the md5 of (1KB of orig + orig), and (orginal - first 1 KB), 
>>> neither match the md5 of the file being hashed.
>>>
>>> Why can't I append the rest of the stream to the byte buffer to get the 
>>> complete file in memory and why is the byte buffer being "consumed"? 
>>>
>>> I simply need to read the same array of byte multiple times, I don't 
>>> need to "copy" them. I'm coming from a C background so I'm wondering what 
>>> is going on behind the scenes as well.
>>>
>>
>> If you know you'll have to read the whole file into memory, then do that, 
>> and use bytes.NewReader to create  a reader for that byte slice.
>>
>> If you read partly, to decide whether to go on, then use fh.Read or 
>> io.ReadAtLeast with a byte slice.
>>
>> If you read sth, then want to read the whole from the beginning, 
>> construct a Reader with io.MultiReader(bytes.NewReader(b), fh).
>>
>> You can combine these approaches, but if the while file size is less than 
>> a few KiB, I think it is easier, simpler and more performant (!) to read 
>> the whole file up into memory,
>> into a bytes.Buffer, and construct the needed readers with 
>> bytes.NewReader(buf.Bytes()). 
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[go-nuts] Re: files, readers, byte arrays (slices?), byte buffers and http.requests

2016-07-02 Thread Sri G
Thanks for the pointer. I also found this helpful Asynchronously Split an 
io.Reader in Go (golang) « Rodaine 
 but I'm still 
missing something.

Version 1: the uploaded file is 1024 bytes extra at the end (too big):

mimebuf := make([]byte, 1024)
_, err = file.Read(mimebuf)

mime := mimemagic.Match("", mimebuf)

fileReader := io.MultiReader(bytes.NewReader(mimebuf), file)

checksum := md5.New()

b := io.TeeReader(fileReader, checksum)

md5hex := hex.EncodeToString(checksum.Sum(nil))

// Save file
io.Copy(f, b)

Version 2: the uploaded file is truncated by 1024 byte (too small): (this 
makes sense since the first 1024 bytes of file was consumed)

mimebuf := make([]byte, 1024)
_, err = file.Read(mimebuf)

mime := mimemagic.Match("", mimebuf)

checksum := md5.New()

// Adding file.Seek(0,0) here does not fix this issue

b := io.TeeReader(file, checksum)

md5hex := hex.EncodeToString(checksum.Sum(nil))

// Save file
io.Copy(f, b)


What is incorrect which is causing this? How do I get the goldilocks 
version that's just right?

On Saturday, July 2, 2016 at 3:18:51 AM UTC-4, Tamás Gulácsi wrote:
>
>
> 2016. július 2., szombat 8:15:19 UTC+2 időpontban Sri G a következőt írta:
>>
>> I'm working on receiving uploads through a form.
>>
>> The tricky part is validation.
>>
>> I attempt to read the first 1024 bytes to check the mime of the file and 
>> then if valid read the rest and hash it and also save it to disk. Reading 
>> the mime type is successful and I've gotten it to work by chaining 
>> TeeReader but it seems very hackish. Whats the idiomatic way to do this?
>>
>> I'm trying something like this: 
>>
>>
>> // Parse my multi part form 
>> ...
>> // Get file handle
>> file, err := fh.Open()
>>
>> var a bytes.Buffer
>>
>> io.CopyN(, file, 1024)
>>
>> mime := mimemagic.Match("", a.Bytes())
>> // Check mime type (this works fine)
>>
>> I'm trying to seek a stream so this should be no-op
>> file.Seek(0, 0)
>>
>> The file stored on disk is 1KB larger than the original so it appears to 
>> be re-copying the entire file and appending it to bytes.Buffer
>> io.Copy(, file)
>>
>> checksum := md5.New()
>> b := io.TeeReader(, checksum)
>>
>> md5hex := hex.EncodeToString(checksum.Sum(nil))
>> fmt.Println("md5=", md5hex)
>>
>> //Open file f for writing to disk
>> ...
>> //Save file
>> io.Copy(f, b)
>>
>>
>> Checked the md5 of (1KB of orig + orig), and (orginal - first 1 KB), 
>> neither match the md5 of the file being hashed.
>>
>> Why can't I append the rest of the stream to the byte buffer to get the 
>> complete file in memory and why is the byte buffer being "consumed"? 
>>
>> I simply need to read the same array of byte multiple times, I don't need 
>> to "copy" them. I'm coming from a C background so I'm wondering what is 
>> going on behind the scenes as well.
>>
>
> If you know you'll have to read the whole file into memory, then do that, 
> and use bytes.NewReader to create  a reader for that byte slice.
>
> If you read partly, to decide whether to go on, then use fh.Read or 
> io.ReadAtLeast with a byte slice.
>
> If you read sth, then want to read the whole from the beginning, construct 
> a Reader with io.MultiReader(bytes.NewReader(b), fh).
>
> You can combine these approaches, but if the while file size is less than 
> a few KiB, I think it is easier, simpler and more performant (!) to read 
> the whole file up into memory,
> into a bytes.Buffer, and construct the needed readers with 
> bytes.NewReader(buf.Bytes()). 
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[go-nuts] Re: files, readers, byte arrays (slices?), byte buffers and http.requests

2016-07-02 Thread Tamás Gulácsi

2016. július 2., szombat 8:15:19 UTC+2 időpontban Sri G a következőt írta:
>
> I'm working on receiving uploads through a form.
>
> The tricky part is validation.
>
> I attempt to read the first 1024 bytes to check the mime of the file and 
> then if valid read the rest and hash it and also save it to disk. Reading 
> the mime type is successful and I've gotten it to work by chaining 
> TeeReader but it seems very hackish. Whats the idiomatic way to do this?
>
> I'm trying something like this: 
>
>
> // Parse my multi part form 
> ...
> // Get file handle
> file, err := fh.Open()
>
> var a bytes.Buffer
>
> io.CopyN(, file, 1024)
>
> mime := mimemagic.Match("", a.Bytes())
> // Check mime type (this works fine)
>
> I'm trying to seek a stream so this should be no-op
> file.Seek(0, 0)
>
> The file stored on disk is 1KB larger than the original so it appears to 
> be re-copying the entire file and appending it to bytes.Buffer
> io.Copy(, file)
>
> checksum := md5.New()
> b := io.TeeReader(, checksum)
>
> md5hex := hex.EncodeToString(checksum.Sum(nil))
> fmt.Println("md5=", md5hex)
>
> //Open file f for writing to disk
> ...
> //Save file
> io.Copy(f, b)
>
>
> Checked the md5 of (1KB of orig + orig), and (orginal - first 1 KB), 
> neither match the md5 of the file being hashed.
>
> Why can't I append the rest of the stream to the byte buffer to get the 
> complete file in memory and why is the byte buffer being "consumed"? 
>
> I simply need to read the same array of byte multiple times, I don't need 
> to "copy" them. I'm coming from a C background so I'm wondering what is 
> going on behind the scenes as well.
>

If you know you'll have to read the whole file into memory, then do that, 
and use bytes.NewReader to create  a reader for that byte slice.

If you read partly, to decide whether to go on, then use fh.Read or 
io.ReadAtLeast with a byte slice.

If you read sth, then want to read the whole from the beginning, construct 
a Reader with io.MultiReader(bytes.NewReader(b), fh).

You can combine these approaches, but if the while file size is less than a 
few KiB, I think it is easier, simpler and more performant (!) to read the 
whole file up into memory,
into a bytes.Buffer, and construct the needed readers with 
bytes.NewReader(buf.Bytes()). 

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.