OK,

thank you for your reply! In the meantime I figured out why this was
working without errors in my first code!

There I had some REGEX checks before saving each row into the
database. That means the first row always got skipped, because the
unicode indentifiers didn't fit to the REGEX.

Now I know where my fault is, but I don't really know how to solve it.

If the source csv is in utf-8 I can of course strip the first three
chars. But if it is in another encoding, that means I strip of chars
that I need. How can I check which encoding the file has? I tried this
here, but that gives me always CP850 as encoding:

file = File.open("my.csv")
puts file.external_encoding.name

Or is there a way to transform a file before uploading? I use
file.temp for uploading.

Cheers,
Sebastian

On 4 Jul., 18:31, Walter Lee Davis <wa...@wdstudio.com> wrote:
> Unicode uses them to indicate to the application reading the text file  
> which order the following bytes are in. Since UTF-8 uses compound  
> characters to indicate the scary-high end of the unicode character  
> table (two bytes needed to encode some characters) the order that the  
> bits arrived in is of critical importance. Text files may be little-
> endian or big-endian, and unless you know what order to expect, you  
> can't really know.
>
> Walter
>
> On Jul 4, 2011, at 3:02 AM, Sebastian wrote:
>
>
>
>
>
>
>
> > Thank you for your reply!
>
> > Stripping the first chars is possible of course, but I don't
> > understand why these chars are there.
>
> > It was working before! I could just upload the utf-8 csv and everthing
> > was working great before. I don't really know what I changed that now
> > these chars are appearing.
>
> > Sebastian
>
> > On 1 Jul., 15:12, Frederick Cheung <frederick.che...@gmail.com> wrote:
> >> On Jul 1, 11:48 am, Sebastian <sebastian.go...@googlemail.com> wrote:
>
> >>> OK,
>
> >>> it was working perfectly when I just made sure that my csv file is  
> >>> in
> >>> utf-8 encoding format.
>
> >>> I deleted some of my programm, so I had to write a lot of stuff  
> >>> again.
>
> >>> If I now upload a csv file which is in utf-8 format and then I have
> >>> every time in the first row that the first three character are: \xEF
> >>> \xBBxBF
>
> >> That's a utf BOM: a magic unicode character that tells whoever is
> >> reading the stream what endianness is and also allows to tell UTF8
> >> apart from utf16
> >> You can safely strip them from the file.
>
> >>> I read that these is something about unicode and ordering, but i  
> >>> don't
> >>> know where these hex chars come from.
>
> >>> Also every german special character is also shown in this hex code,
> >>> e.g. "k\xC3\xBChler" should be "kühler"
>
> >> That is probably just an output thing if you are seeing this in a
> >> terminal window- \xC3\xBC is the utf8 sequence for ü
>
> >> Fred
>
> >>> If I use files in other encodings there are not these three chars in
> >>> the beginning, but every special char is "?"
>
> >>> Has anyone an idea where this comes from?
>
> >>> Cheers,
> >>> Sebastian
>
> >>> On 22 Jun., 13:26, Sebastian <sebastian.go...@googlemail.com> wrote:
>
> >>>> file.temp is an object. I have a form where a csv can be  
> >>>> uploaded, but
> >>>> it is never stored. That's why I use tempfile. That means that I
> >>>> probably have no path to use in that method.
>
> >>>> BUT, the open and foreach method for the CSV class is working  
> >>>> with an
> >>>> object whenever I don't have a german special character in my csv  
> >>>> file
> >>>> or when my csv file is already in utf-8 encoding format.
>
> >>>> On 22 Jun., 12:05, Chirag Singhal <chirag.sing...@gmail.com> wrote:
>
> >>>>> What does file.tempfile return?
> >>>>> If it is a file object, then we have a problem, we need to pass  
> >>>>> in file path
> >>>>> here.
> >>>>> So call path on the file object and pass that as the first  
> >>>>> argument.
>
> > --
> > You received this message because you are subscribed to the Google  
> > Groups "Ruby on Rails: Talk" group.
> > To post to this group, send email to rubyonrails-
> > t...@googlegroups.com.
> > To unsubscribe from this group, send email to 
> > rubyonrails-talk+unsubscr...@googlegroups.com
> > .
> > For more options, visit this group 
> > athttp://groups.google.com/group/rubyonrails-talk?hl=en
> > .

-- 
You received this message because you are subscribed to the Google Groups "Ruby 
on Rails: Talk" group.
To post to this group, send email to rubyonrails-talk@googlegroups.com.
To unsubscribe from this group, send email to 
rubyonrails-talk+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/rubyonrails-talk?hl=en.

Reply via email to