Re: [Tutor] Reading CSV files in Pandas

2013-10-21 Thread Manish Tripathi
It's pipeline data so must have been generated through Siebel and sent as
excel csv.


On Mon, Oct 21, 2013 at 11:32 PM, Danny Yoo  wrote:

> >
> > * Where is this data coming from?
> > * Who or what is generating this file?
>
>
> Just to be more specific about this: I have a very strong suspicion that
> whatever is generating the input that you're trying to read is doing
> something ad-hoc with regards to CSV file format.  Knowing what generated
> the file, whether it be Excel, or some custom script, is very helpful in
> diagnosing where the problem's originating from.
>
>
> Your suspicion about the quotes around entire rows:
>
> > Does it have to do with the "" marks present before each line in the
> data?
>
> sounds reasonable.  I expect quotes around individual fields, but not
> around entire rows.  Such a feature sounds anomalous because it doesn't fit
> the description of known CSV formats:
>
> http://en.wikipedia.org/wiki/Comma-separated_values
>
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Reading CSV files in Pandas

2013-10-21 Thread Mark Lawrence

On 21/10/2013 22:42, Danny Yoo wrote:




This question has now been placed on the correct forum here 
http://article.gmane.org/gmane.comp.python.pydata/2294 so I see little 
sense in us attempting to follow it up.


--
Python is the second best programming language in the world.
But the best has yet to be invented.  Christian Tismer

Mark Lawrence

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Reading CSV files in Pandas

2013-10-21 Thread Danny Yoo
On Mon, Oct 21, 2013 at 11:57 AM, Manish Tripathi wrote:

> It's pipeline data so must have been generated through Siebel and sent as
> excel csv.
>
>
I am assuming that you are talking about "Siebel Analytics", some kind of
analysis software from Oracle:

http://en.wikipedia.org/wiki/Siebel_Systems

That would be fine, except that knowing it comes out of Siebel is no
guarantee that the output you're consuming is well-formed Excel CSV.  For
example, I see things like this:

http://spendolini.blogspot.com/2006/04/custom-export-to-csv.html

where the generated output is "ad-hoc".



---

Hmmm... but let's assume for the moment that your data is ok.  Could the
problem be in pandas?  Let's follow this line of logic, and see where it
takes us.

Given the structure of the error you're seeing, I have to assume that
pandas is trying to decode the bytes, and runs into an issue, though the
exact position where it's running into an error is in question.  In fact,
looking at:

https://github.com/pydata/pandas/blob/master/pandas/io/parsers.py#L1357

for example, the library appears to be trying to decode line-by-line under
certain situations.  If it runs into an error, it will report an offset
into a particular line.

Wow.  That can be very bad, if I'm reading that right.  It does not give
that offset from the perspective of the whole file.  But it's worse because
it's unsound.  The code _should_ be doing the decoding from the perspective
of the whole file, not at the level of single lines.  It needs to be using
codecs.open(), and let codecs.open() handle the details of
byte->unicode-string decoding.  Otherwise, by that time, it's way too late:
we've just taken an interpretation of the bytes that's potentially invalid.
 Example: if we're working with UTF-16, and we got into this code path,
it'd be really bad.


It's hard to tell whether or not we're taking that code path.  I'm
following the definition of read_csv from:

https://github.com/pydata/pandas/blob/master/pandas/io/parsers.py#L409

to:

   https://github.com/pydata/pandas/blob/master/pandas/io/parsers.py#L282

to:

https://github.com/pydata/pandas/blob/master/pandas/io/parsers.py#L184

to:

https://github.com/pydata/pandas/blob/master/pandas/io/common.py#L100



Ok, at that point, they appear to try to decode the entire file.  Somewhat
good so far.  Though, technically, pandas should be using codecs.open():


http://docs.python.org/2/howto/unicode.html#reading-and-writing-unicode-data

and because they aren't, they appears to suck the entire file into memory
with StringIO.  Yikes.


Now the pandas library must make sure _not_ to decode() again, because
decoding is not an idempotent operation.

As a concrete example:

##
>>> 'foobar'.decode('utf-16')
u'\u6f66\u626f\u7261'
>>> 'foobar'.decode('utf-16').decode('utf-16')
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib/python2.7/encodings/utf_16.py", line 16, in decode
return codecs.utf_16_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2:
ordinal not in range(128)
##

This is reminiscent of the kind of error you're encountering, though I'm
not sure if this is the same situation.



Unfortunately, I'm running out of time to analyze this further.  If you
could upload your data file somewhere, someone else here may have time to
investigate the error you're seeing in more detail.  From reading the
Pandas code, I'm discouraged by the code quality: I do think that there's a
potential of a bug in the library.  The code is a heck of a lot more
complicated than I think it needs to be.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Reading CSV files in Pandas

2013-10-21 Thread Danny Yoo
>
> * Where is this data coming from?
> * Who or what is generating this file?


Just to be more specific about this: I have a very strong suspicion that
whatever is generating the input that you're trying to read is doing
something ad-hoc with regards to CSV file format.  Knowing what generated
the file, whether it be Excel, or some custom script, is very helpful in
diagnosing where the problem's originating from.


Your suspicion about the quotes around entire rows:

> Does it have to do with the "" marks present before each line in the data?

sounds reasonable.  I expect quotes around individual fields, but not
around entire rows.  Such a feature sounds anomalous because it doesn't fit
the description of known CSV formats:

http://en.wikipedia.org/wiki/Comma-separated_values
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Reading CSV files in Pandas

2013-10-21 Thread Danny Yoo
On Sat, Oct 19, 2013 at 7:29 AM, Manish Tripathi 
wrote:
>
> I am trying to import a csv file in Pandas but it throws an error. The
format of the data when opened in notepad++ is as follows with first row
being column names:
>
> "End Customer Organization ID,End Customer Organization Name,End Customer
Top Parent Organization ID,End Customer Top Parent Organization
Name,Reseller Top Parent ID,Reseller Top Parent Name,Business,Rev Sum
Division,Rev Sum Category,Product Family,Version,Pricing Level,Summary
Pricing Level,Detail Pricing Level,MS Sales Amount,MS Sales Licenses,Fiscal
Year,Sales Date"
> "11027676,Baroda Western Uttar Pradesh Gramin
Bankgfhgfnjgfnmjmhgmghmghmghmnghnmghnmhgnmghnghngh,4078446,Bank Of
Barodadfhhgfjyjtkyukujkyujkuhykluiluilui;iooi';po'fserwefvegwegf,1809012,""Hcl
Infosystems Ltd - Partnerdghftrutyhb
frhywer5y5tyu6ui7iukluyj,lgjmfgnhfrgweffw"",Server &
CALsdgrgrfgtrhytrnhjdgthjtyjkukmhjmghmbhmgfngdfbndfhtgh,SQL Server &
CALdfhtrhtrgbhrghrye5y45y45yu56juhydsgfaefwe,SQL
CALdhdfthtrutrjurhjethfdehrerfgwerweqeadfawrqwerwegtrhyjuytjhyj,SQL
CALdtrye45y3t434tjkabcjkasdhfhasdjkcbaksmjcbfuigkjasbcjkasbkdfhiwh,2005,Openfkvgjesropiguwe90fujklascnioawfy98eyfuiasdbcvjkxsbhg,Open
Lklbjdfoigueroigbjvwioergyuiowerhgosdhvgfoisdhyguiserhguisrh,""Open
Stddfm,vdnoghioerivnsdflierohgushdfovhsiodghuiohdbvgsjdhgouiwerho"",125.85,1,FY07,12/28/2006"
> "12835756,Uttam Strips Pvt Ltd,12835756,Uttam Strips Pvt
Ltd,12565538,Redington C/O Fortis Financial Services Ltd,MBS,Dynamics
ERP,Dynamics NAV,Dynamics NAV Business Essentials,Non-specific,Other,MBS
SA,MBS New Customer Enhanc. Def,0,0,FY09,9/15/2008"
> "12233135,Bhagwan Singh Tondon,12233135,Bhagwan Singh Tondon,2652941,H B
S Systems Pvt Ltd,Server & CAL,SQL Server & CAL,SQL CAL,SQL
CAL,Non-specific,Open,Open L&SA,Deferred Open L&SA - New,0,0,FY09,9/15/2008"
> "11602305,Maya Academy Of Advanced Cinematics,9750934,Maya Entertainment
Ltd,336146,Embee Software Pvt Ltd,Server & CAL,Windows Server & CAL,Windows
Server HPC,Windows Compute Cluster Server,Non-specific,Open,Open V/MYO -
Rec,OLV Perpet L&SA Recur-Def,0,0,FY09,9/25/2008"
> "13336009,Remiel Softech Solution Pvt Ltd,13336009,Remiel Softech
Solution Pvt Ltd,13335482,Redington C/O Remiel Softech Solutions Pvt
Ltd,MBS,Dynamics ERP,Dynamics NAV,Dynamics NAV Business
Essentials,Non-specific,Other,MBS SA,MBS New Customer Enhanc.
Def,0,0,FY09,12/23/2008"
> "7872800,Science Application International Corporation,2839760,GOVERNMENT
OF KARNATAKA,10237455,Cubic Computing P.L,Server & CAL,SQL Server & CAL,SQL
Server Standard,SQL Server Standard Edition,Non-specific,Open,Open
SA/UA,Deferred Open SA - Renewal,0,0,FY09,1/15/2009"
> "13096361,Pratham Software Pvt Ltd,13096361,Pratham Software Pvt
Ltd,10133086,Krap Computer,Information Worker,Office,Office Standard /
Basic,Office Standard,2007,Open,Open L,Open Std,7132.44,28,FY09,9/24/2008"
> "12192276,Texmo Precision Castings,12192276,Texmo Precision
Castings,4059430,Quadra Systems. - Partner,Server & CAL,Windows Server &
CAL,Windows Standard Server,Windows Server Standard,Non-specific,Open,Open
L&SA,Deferred Open L&SA - New,0,0,FY09,11/15/2008"
>
> Kindly note that the same file when double clicked in the csv format
opens in excel with comma separated values BUT with NO quotation marks in
each line as shown in notepad++.
>
> I have used encoding as UTF-8 which gives the following error:

Questions:

* Where is this data coming from?
* Who or what is generating this file?
* Is it being automatically generated, or is someone manually typing in the
file's content?


Knowing the answers to these questions may help to isolate what the actual
problem is.

The source of this input, if they are a good, responsible party, should be
saying up front how to interpret its bytes.  Otherwise you are being put
into a position of having to guess the proper interpretation.  Guessing can
be fun sometimes, I suppose, but I personally don't like doing it unless I
have no choice.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Reading CSV files in Pandas

2013-10-21 Thread Mark Lawrence

On 21/10/2013 04:05, Sivaram Neelakantan wrote:


you could try the following newsgroup or mailing list for more
specialised help.

  gmane.org:gmane.comp.python.pydata

  sivaram
  --


Thanks for this, it explains why I couldn't find pandas there :)

--
Python is the second best programming language in the world.
But the best has yet to be invented.  Christian Tismer

Mark Lawrence

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Reading CSV files in Pandas

2013-10-20 Thread Sivaram Neelakantan
On Sat, Oct 19 2013,Manish Tripathi wrote:

> I am trying to import a csv file in Pandas but it throws an error. The
> format of the data when opened in notepad++ is as follows with first row
> being column names:


you could try the following newsgroup or mailing list for more
specialised help.

 gmane.org:gmane.comp.python.pydata

[snipped 82 lines]



 sivaram
 -- 

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Reading CSV files in Pandas

2013-10-20 Thread Sivaram Neelakantan
On Sun, Oct 20 2013,Mark Lawrence wrote:

> On 19/10/2013 23:40, Alan Gauld wrote:
>
>> This is the second time I've seen pandas mentioned recedntly I really
>> must go and look it up to find out what it is...
>
> Just started out myself at http://pandas.pydata.org/ and at first
> glance it seems quite awesome.

The data handling framework is very good; it cuts out large swathes of
code from a normal procedural coding POV.  That is, instead of lists,
tuples etc. you work with a conceptual dataframe, like R.

 sivaram
 -- 

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Reading CSV files in Pandas

2013-10-20 Thread Albert-Jan Roskam
On Sun, 10/20/13, Mark Lawrence  wrote:

 Subject: Re: [Tutor] Reading CSV files in Pandas
 To: tutor@python.org
 Date: Sunday, October 20, 2013, 1:16 AM
 
 On 19/10/2013 23:40, Alan Gauld
 wrote:
 
 > This is the second time I've seen pandas mentioned
 recedntly I really
 > must go and look it up to find out what it is...
 
 Just started out myself at http://pandas.pydata.org/ and at first glance it 
seems
 quite awesome.

Yeah, I've read the Pandas book and I was amazed too. It's R (as in CRAN R) on 
steroids.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Reading CSV files in Pandas

2013-10-20 Thread John Steedman
I have in front of me a copy an (unread, borrowed) copy of "Python for
Data Analysis".  Well, on page 104, there is the start of an answer.

Pandas : has two useful functions: read_csv and read_table
Numpy : see np.loadtxt and np.genfromtxt

There is an example for using the first numpy function:

arr = np.loadtxt ( fileName, delimiter = ','  )

This information from Python for Data Analysis , author: Wes McKinney
published by O'Reilly - a succinct, useful-looking book.





On Sun, Oct 20, 2013 at 7:05 PM, Alan Gauld  wrote:
> On 20/10/13 11:30, Matthew Ngaha wrote:
>>
>> Does pandas do the same thing numpy does?
>
>
> Pandas is more about analysis of big data volumes rather than complex
> calculations. eg statistical analysis and data mining.
>
> As such it's closer to R than to numpy in its function, so far as I can
> tell. (Although it uses numpy under the covers).
>
>
> --
> Alan G
> Author of the Learn to Program web site
> http://www.alan-g.me.uk/
> http://www.flickr.com/photos/alangauldphotos
>
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Reading CSV files in Pandas

2013-10-20 Thread Alan Gauld

On 20/10/13 11:30, Matthew Ngaha wrote:

Does pandas do the same thing numpy does?


Pandas is more about analysis of big data volumes rather than complex 
calculations. eg statistical analysis and data mining.


As such it's closer to R than to numpy in its function, so far as I can 
tell. (Although it uses numpy under the covers).


--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.flickr.com/photos/alangauldphotos

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Reading CSV files in Pandas

2013-10-20 Thread Mark Lawrence

On 20/10/2013 11:30, Matthew Ngaha wrote:

Does pandas do the same thing numpy does? I've never used them and
unsure of what they are about.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor



Pandas uses numpy, as does many other Python packages.

From http://pandas.pydata.org/ "pandas is an open source, BSD-licensed 
library providing high-performance, easy-to-use data structures and data 
analysis tools for the Python programming language."


From http://www.numpy.org/ "NumPy is the fundamental package for 
scientific computing with Python. It contains among other things:


a powerful N-dimensional array object
sophisticated (broadcasting) functions
tools for integrating C/C++ and Fortran code
useful linear algebra, Fourier transform, and random number capabilities"

--
Roses are red,
Violets are blue,
Most poems rhyme,
But this one doesn't.

Mark Lawrence

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Reading CSV files in Pandas

2013-10-20 Thread Matthew Ngaha
Does pandas do the same thing numpy does? I've never used them and
unsure of what they are about.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Reading CSV files in Pandas

2013-10-20 Thread Manish Tripathi
Thanks Mark. I have already asked this question on StackOverflow but to no
avail. So thought of asking here.


On Sun, Oct 20, 2013 at 5:47 AM, Mark Lawrence wrote:

> On 19/10/2013 15:29, Manish Tripathi wrote:
>
> You are far more likely to get a response to the identical question that
> you've already asked on stackoverflow than you are here.
>
>
> --
> Roses are red,
> Violets are blue,
> Most poems rhyme,
> But this one doesn't.
>
> Mark Lawrence
>
> __**_
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/**mailman/listinfo/tutor
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Reading CSV files in Pandas

2013-10-19 Thread Mark Lawrence

On 19/10/2013 15:29, Manish Tripathi wrote:

You are far more likely to get a response to the identical question that 
you've already asked on stackoverflow than you are here.


--
Roses are red,
Violets are blue,
Most poems rhyme,
But this one doesn't.

Mark Lawrence

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Reading CSV files in Pandas

2013-10-19 Thread Mark Lawrence

On 19/10/2013 23:40, Alan Gauld wrote:


This is the second time I've seen pandas mentioned recedntly I really
must go and look it up to find out what it is...


Just started out myself at http://pandas.pydata.org/ and at first glance 
it seems quite awesome.


--
Roses are red,
Violets are blue,
Most poems rhyme,
But this one doesn't.

Mark Lawrence

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Reading CSV files in Pandas

2013-10-19 Thread Alan Gauld

On 19/10/13 15:29, Manish Tripathi wrote:

I am trying to import a csv file in Pandas but it throws an error.


This is the second time I've seen pandas mentioned recedntly I really 
must go and look it up to find out what it is...


Meanwhile can you clarify what you mean by importing the csv file?
Are you using a Pandas facility to do this?
Or are you using Python code? And if the latter are you using the csv 
module - striongly recommended for any csv work




|df=pd.read_csv(filename,encoding='cp1252')


I'm assuming from this it is some kind of Pandas feature?
Have you tried asking on a Pandas mailing list/forum?
This group is targetted at standard library and core
language, so it will be a matter of luck if you find
anyone who uses pandas and can help.

OK, I found the Pandas page, it's for data analysis/modelling.
It has a StackOverflow link for asking questions so if you
don't get help here you should try that next.

I'm assuming you have read the docs on csv input here:

http://pandas.pydata.org/pandas-docs/stable/io.html#io-read-csv-table

There are way too many options for me to read through
so I'll leave it to those who know!

HTH
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.flickr.com/photos/alangauldphotos

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] Reading CSV files in Pandas

2013-10-19 Thread Manish Tripathi
I am trying to import a csv file in Pandas but it throws an error. The
format of the data when opened in notepad++ is as follows with first row
being column names:

"End Customer Organization ID,End Customer Organization Name,End
Customer Top Parent Organization ID,End Customer Top Parent
Organization Name,Reseller Top Parent ID,Reseller Top Parent
Name,Business,Rev Sum Division,Rev Sum Category,Product
Family,Version,Pricing Level,Summary Pricing Level,Detail Pricing
Level,MS Sales Amount,MS Sales Licenses,Fiscal Year,Sales
Date""11027676,Baroda Western Uttar Pradesh Gramin
Bankgfhgfnjgfnmjmhgmghmghmghmnghnmghnmhgnmghnghngh,4078446,Bank Of
Barodadfhhgfjyjtkyukujkyujkuhykluiluilui;iooi';po'fserwefvegwegf,1809012,""Hcl
Infosystems Ltd - Partnerdghftrutyhb
frhywer5y5tyu6ui7iukluyj,lgjmfgnhfrgweffw"",Server &
CALsdgrgrfgtrhytrnhjdgthjtyjkukmhjmghmbhmgfngdfbndfhtgh,SQL Server &
CALdfhtrhtrgbhrghrye5y45y45yu56juhydsgfaefwe,SQL
CALdhdfthtrutrjurhjethfdehrerfgwerweqeadfawrqwerwegtrhyjuytjhyj,SQL
CALdtrye45y3t434tjkabcjkasdhfhasdjkcbaksmjcbfuigkjasbcjkasbkdfhiwh,2005,Openfkvgjesropiguwe90fujklascnioawfy98eyfuiasdbcvjkxsbhg,Open
Lklbjdfoigueroigbjvwioergyuiowerhgosdhvgfoisdhyguiserhguisrh,""Open
Stddfm,vdnoghioerivnsdflierohgushdfovhsiodghuiohdbvgsjdhgouiwerho"",125.85,1,FY07,12/28/2006""12835756,Uttam
Strips Pvt Ltd,12835756,Uttam Strips Pvt Ltd,12565538,Redington C/O
Fortis Financial Services Ltd,MBS,Dynamics ERP,Dynamics NAV,Dynamics
NAV Business Essentials,Non-specific,Other,MBS SA,MBS New Customer
Enhanc. Def,0,0,FY09,9/15/2008""12233135,Bhagwan Singh
Tondon,12233135,Bhagwan Singh Tondon,2652941,H B S Systems Pvt
Ltd,Server & CAL,SQL Server & CAL,SQL CAL,SQL
CAL,Non-specific,Open,Open L&SA,Deferred Open L&SA -
New,0,0,FY09,9/15/2008""11602305,Maya Academy Of Advanced
Cinematics,9750934,Maya Entertainment Ltd,336146,Embee Software Pvt
Ltd,Server & CAL,Windows Server & CAL,Windows Server HPC,Windows
Compute Cluster Server,Non-specific,Open,Open V/MYO - Rec,OLV Perpet
L&SA Recur-Def,0,0,FY09,9/25/2008""13336009,Remiel Softech Solution
Pvt Ltd,13336009,Remiel Softech Solution Pvt Ltd,13335482,Redington
C/O Remiel Softech Solutions Pvt Ltd,MBS,Dynamics ERP,Dynamics
NAV,Dynamics NAV Business Essentials,Non-specific,Other,MBS SA,MBS New
Customer Enhanc. Def,0,0,FY09,12/23/2008""7872800,Science Application
International Corporation,2839760,GOVERNMENT OF
KARNATAKA,10237455,Cubic Computing P.L,Server & CAL,SQL Server &
CAL,SQL Server Standard,SQL Server Standard
Edition,Non-specific,Open,Open SA/UA,Deferred Open SA -
Renewal,0,0,FY09,1/15/2009""13096361,Pratham Software Pvt
Ltd,13096361,Pratham Software Pvt Ltd,10133086,Krap
Computer,Information Worker,Office,Office Standard / Basic,Office
Standard,2007,Open,Open L,Open
Std,7132.44,28,FY09,9/24/2008""12192276,Texmo Precision
Castings,12192276,Texmo Precision Castings,4059430,Quadra Systems. -
Partner,Server & CAL,Windows Server & CAL,Windows Standard
Server,Windows Server Standard,Non-specific,Open,Open L&SA,Deferred
Open L&SA - New,0,0,FY09,11/15/2008"

*Kindly note that the same file when double clicked in the csv format opens
in excel with comma separated values BUT with NO quotation marks in each
line as shown in notepad++.*

I have used encoding as UTF-8 which gives the following error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x91 in position
13: invalid start byte

Then used encoding='cp1252' first and then tried with latin1.

df=pd.read_csv(filename,encoding='cp1252')
or

df=pd.read_csv(filename,encoding='latin1')

With both the encodings it didn't give any error and the data got imported
but as one single column and not as different columns.

Does it have to do with the "" marks present before each line in the data?
I had a similar csv file with comma separated values, but that didn't have
double quotation marks in each line and that got imported correctly both
with cp1252 and latin1. But not for UTF-8 even though the file was saved in
utf8 format in notepad++. But in this case utf8 doesnt work as usual and
other two import it as single column.

Please advise.

Thanks
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor