doing this efficiently is more complicated than i thought. Can we not simply just count 2 bytes for one char ;-)
BTW, seem the JsonLocation column value leave also room for interpretation: Is the most left column 0 or 1? Texteditors for example start with column 1 (there is never a column 0) but RI starts with 0. Regards Hendrik On Wed, Jul 23, 2014 at 1:49 PM, Hendrik Dev <[email protected]> wrote: > agree, will make it so > > On Wed, Jul 23, 2014 at 1:28 PM, Romain Manni-Bucau > <[email protected]> wrote: >> Hi >> >> I agree wording is wrong but IMO it is not ambiguous: we get an inputstream >> or reader (and we *don't* want to check it is a file or not) so we just >> count the chars or bytes we read. All other implementation would lead to >> confusion IMO (make default text file reader compliant friendly). >> >> We can start this way and if we have issues go further but I really doubt >> we need it. >> >> What's your opinion? >> >> >> >> >> Romain Manni-Bucau >> Twitter: @rmannibucau >> Blog: http://rmannibucau.wordpress.com/ >> LinkedIn: http://fr.linkedin.com/in/rmannibucau >> Github: https://github.com/rmannibucau >> >> >> 2014-07-23 13:21 GMT+02:00 Hendrik Dev <[email protected]>: >> >>> Hi, >>> >>> the JSR 353 API says about JsonLocation.getStreamOffset() >>> >>> "long getStreamOffset() >>> >>> Return the stream offset into the input source this location is >>> pointing to. If the input source is a file or a byte stream then this >>> is the byte offset into that stream, but if the input source is a >>> character media then the offset is the character offset. Returns -1 if >>> there is no offset available." >>> >>> There are IMHO two issues here: >>> >>> 1) How can we know that the input source is a file(stream)? We can >>> only know if the parser read from an Inputstream (=byte stream) or >>> from an Reader (=character stream). Wording here is unclear/ambiguous. >>> >>> 2) Since a UTF8 or UTF16 character can map to one, two, three or four >>> bytes the output can be very confusing (especially if the user don't >>> know whether the parser was constructed form a byte or character >>> stream and which charset is used). >>> >>> Seems that the RI is not implementing these distinctions, if i looked >>> correctly they always return character offsets. >>> >>> So want we want do to? >>> >>> Thanks >>> Hendrik >>> >>> >>> -- >>> Hendrik Saly (salyh, hendrikdev22) >>> @hendrikdev22 >>> PGP: 0x22D7F6EC >>> > > > > -- > Hendrik Saly (salyh, hendrikdev22) > @hendrikdev22 > PGP: 0x22D7F6EC -- Hendrik Saly (salyh, hendrikdev22) @hendrikdev22 PGP: 0x22D7F6EC
