On Fri, Nov 24, 2017 at 12:24:48PM -0500, Eric Sunshine wrote:
> On Fri, Nov 24, 2017 at 11:14 AM,  <tbo...@web.de> wrote:
> > When a text file had been commited with CRLF and the file is commited
> > again, the CRLF are kept if .gitattributs has "text=auto".
> > This is done by analyzing the content of the blob stored in the index:
> > If a '\r' is found, Git assumes that the blob was commited with CRLF.
> >
> > The simple search for a '\r' does not always work as expected:
> > A file is encoded in UTF-16 with CRLF and commited. Git treats it as binary.
> > Now the content is converted into UTF-8. At the next commit Git treats the
> > file as text, the CRLF should be converted into LF, but isn't.
> >
> > Solution:
> > Replace has_cr_in_index() with has_crlf_in_index(). When no '\r' is found,
> > 0 is returned directly, this is the most common case.
> > If a '\r' is found, the content is analyzed more deeply.
> >
> > Signed-off-by: Torsten Bögershausen <tbo...@web.de>
> > ---
> > diff --git a/convert.c b/convert.c
> > @@ -220,18 +220,27 @@ static void check_safe_crlf(const char *path, enum 
> > crlf_action crlf_action,
> > -static int has_cr_in_index(const struct index_state *istate, const char 
> > *path)
> > +static int has_crlf_in_index(const struct index_state *istate, const char 
> > *path)
> >  {
> >         unsigned long sz;
> >         void *data;
> > -       int has_cr;
> > +       const char *crp;
> > +       int has_crlf = 0;
> >
> >         data = read_blob_data_from_index(istate, path, &sz);
> >         if (!data)
> >                 return 0;
> > -       has_cr = memchr(data, '\r', sz) != NULL;
> > +
> > +       crp = memchr(data, '\r', sz);
> > +       if (crp && (crp[1] == '\n')) {
> 
> If I understand correctly, this isn't a NUL-terminated string and it
> might be a binary blob, so if the lone CR in a file resides at the end
> of the file, won't this try looking for LF out-of-bounds? I would have
> expected the conditional to be:
> 
>     if (crp && crp - data + 1 < sz && crp[1] == '\n') {
> 
> or any equivalent variation.
> 

The read_blob_data_from_index() function should always append a '\0',
regardless if the blob is binary or not.

Reply via email to