Re: Protoc and line endings

2009-07-02 Thread Kenton Varda
If you want to write up a patch to recognize a UTF8 BOM and ignore it, go
ahead.  You can just modify the Tokenizer class to recognize and discard a
BOM appearing at the beginning of the input.

On Thu, Jul 2, 2009 at 1:31 PM, Marc Gravell  wrote:

>
> OK... is there any way it /could/ silently ignore the BOM? ;-p
>
> I can try to advise the caller to use files without BOMs, but protoc
> reads UTF8 anyway it seems reasonable to accept a BOM?
>
> Marc
> >
>

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Protoc and line endings

2009-07-02 Thread Marc Gravell

OK... is there any way it /could/ silently ignore the BOM? ;-p

I can try to advise the caller to use files without BOMs, but protoc
reads UTF8 anyway it seems reasonable to accept a BOM?

Marc
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Protoc and line endings

2009-07-02 Thread Kenton Varda
protoc actually expects its input to be UTF-8 (though non-ASCII characters
are only allowed in default values for string fields).  It just doesn't like
the BOM.

On Thu, Jul 2, 2009 at 12:44 PM, Marc Gravell wrote:

>
> My bad... it isn't the line endings - it is the UTF8 BOM; when I
> switched one it switched the other.
>
> (which is annoying; encoding is much trickier than just cr/lf!)
>
> Marc
>
>
> >
>

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Protoc and line endings

2009-07-02 Thread Marc Gravell

My bad... it isn't the line endings - it is the UTF8 BOM; when I
switched one it switched the other.

(which is annoying; encoding is much trickier than just cr/lf!)

Marc


--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: Protoc and line endings

2009-07-02 Thread Kenton Varda
Protoc treats \r as plain whitespace, so it should have no problem with
Windows line endings.  I just tested this and sure enough, protoc works fine
with .proto files that use Windows-style line endings.
Mac pre-OSX line endings (\r with no \n) won't work if the file contains any
comments.

What kind of errors are you seeing?

On Wed, Jul 1, 2009 at 11:48 PM, Marc Gravell wrote:

>
> I'm using protoc as the raw .proto parser for protobuf-net (I then
> process the compiled binary for code-generation); at the moment, it is
> very sensitive about line endings - if it isn't LF, it won't work.
>
> This creates a bit of a nag for Windows users, as you have to go out
> of your way to get the right line endings. I could patch the input
> files, but that gets very tricky if the user is importing multiple
> other files (with the wrong line endings) into their proto - and it
> means that their CRLF files wouldn't work fully interoperably.
>
> I have no idea how complex it would be, but is there any chance that
> protoc could be less choosy about line endings?
>
> For now, I've added some code to protobuf-net's "protogen", so that if
> protoc reports failure it checks the *known* files for CR (and advises
> the user) - but this is still tricky for the import scenario.
>
> Thoughts?
>
> Marc
> >
>

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Protoc and line endings

2009-07-01 Thread Marc Gravell

I'm using protoc as the raw .proto parser for protobuf-net (I then
process the compiled binary for code-generation); at the moment, it is
very sensitive about line endings - if it isn't LF, it won't work.

This creates a bit of a nag for Windows users, as you have to go out
of your way to get the right line endings. I could patch the input
files, but that gets very tricky if the user is importing multiple
other files (with the wrong line endings) into their proto - and it
means that their CRLF files wouldn't work fully interoperably.

I have no idea how complex it would be, but is there any chance that
protoc could be less choosy about line endings?

For now, I've added some code to protobuf-net's "protogen", so that if
protoc reports failure it checks the *known* files for CR (and advises
the user) - but this is still tricky for the import scenario.

Thoughts?

Marc
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---