Re: Protoc and line endings
If you want to write up a patch to recognize a UTF8 BOM and ignore it, go ahead. You can just modify the Tokenizer class to recognize and discard a BOM appearing at the beginning of the input. On Thu, Jul 2, 2009 at 1:31 PM, Marc Gravell wrote: > > OK... is there any way it /could/ silently ignore the BOM? ;-p > > I can try to advise the caller to use files without BOMs, but protoc > reads UTF8 anyway it seems reasonable to accept a BOM? > > Marc > > > --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: Protoc and line endings
OK... is there any way it /could/ silently ignore the BOM? ;-p I can try to advise the caller to use files without BOMs, but protoc reads UTF8 anyway it seems reasonable to accept a BOM? Marc --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: Protoc and line endings
protoc actually expects its input to be UTF-8 (though non-ASCII characters are only allowed in default values for string fields). It just doesn't like the BOM. On Thu, Jul 2, 2009 at 12:44 PM, Marc Gravell wrote: > > My bad... it isn't the line endings - it is the UTF8 BOM; when I > switched one it switched the other. > > (which is annoying; encoding is much trickier than just cr/lf!) > > Marc > > > > > --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: Protoc and line endings
My bad... it isn't the line endings - it is the UTF8 BOM; when I switched one it switched the other. (which is annoying; encoding is much trickier than just cr/lf!) Marc --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: Protoc and line endings
Protoc treats \r as plain whitespace, so it should have no problem with Windows line endings. I just tested this and sure enough, protoc works fine with .proto files that use Windows-style line endings. Mac pre-OSX line endings (\r with no \n) won't work if the file contains any comments. What kind of errors are you seeing? On Wed, Jul 1, 2009 at 11:48 PM, Marc Gravell wrote: > > I'm using protoc as the raw .proto parser for protobuf-net (I then > process the compiled binary for code-generation); at the moment, it is > very sensitive about line endings - if it isn't LF, it won't work. > > This creates a bit of a nag for Windows users, as you have to go out > of your way to get the right line endings. I could patch the input > files, but that gets very tricky if the user is importing multiple > other files (with the wrong line endings) into their proto - and it > means that their CRLF files wouldn't work fully interoperably. > > I have no idea how complex it would be, but is there any chance that > protoc could be less choosy about line endings? > > For now, I've added some code to protobuf-net's "protogen", so that if > protoc reports failure it checks the *known* files for CR (and advises > the user) - but this is still tricky for the import scenario. > > Thoughts? > > Marc > > > --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Protoc and line endings
I'm using protoc as the raw .proto parser for protobuf-net (I then process the compiled binary for code-generation); at the moment, it is very sensitive about line endings - if it isn't LF, it won't work. This creates a bit of a nag for Windows users, as you have to go out of your way to get the right line endings. I could patch the input files, but that gets very tricky if the user is importing multiple other files (with the wrong line endings) into their proto - and it means that their CRLF files wouldn't work fully interoperably. I have no idea how complex it would be, but is there any chance that protoc could be less choosy about line endings? For now, I've added some code to protobuf-net's "protogen", so that if protoc reports failure it checks the *known* files for CR (and advises the user) - but this is still tricky for the import scenario. Thoughts? Marc --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---