Re: open text file

2022-06-24 Thread jak

Il 24/06/2022 15:10, simone zambonardi ha scritto:

Good morning, I need to read a text file. How come when I open it (running the 
script) it says this? The text file is type RFT

{\rtf1\ansi\ansicpg1252\cocoartf2636
\cocoatextscaling0\cocoaplatform0{\fonttbl\f0\fswiss\fcharset0 Helvetica;}
{\colortbl;\red255\green255\blue255;}
{\*\expandedcolortbl;;}
\paperw11900\paperh16840\margl1440\margr1440\vieww11520\viewh8400\viewkind0
\pard\tx566\tx1133\tx1700\tx2267\tx2834\tx3401\tx3968\tx4535\tx5102\tx5669\tx6236\tx6803\pardirnatural\partightenfactor0

\f0\fs24 \cf0

But even with in .txt it gives me this long string. I think it is a formatting 
problem .
thank you



I don't see the script you are talking about but it is normal to find
those strings inside a file of type '.rtf'. RTF documents are not simple
text documents (https://en.wikipedia.org/wiki/Rich_Text_Format) so if
you want to extract the text contained in these documents you will need
to use a library that can do this
(eg: https://pypi.org/project/striprtf/)
...or write a parser yourself.
--
https://mail.python.org/mailman/listinfo/python-list


Re: open text file

2022-06-24 Thread jak

Il 24/06/2022 15:44, jak ha scritto:

Il 24/06/2022 15:10, simone zambonardi ha scritto:
Good morning, I need to read a text file. How come when I open it 
(running the script) it says this? The text file is type RFT


{\rtf1\ansi\ansicpg1252\cocoartf2636
\cocoatextscaling0\cocoaplatform0{\fonttbl\f0\fswiss\fcharset0 
Helvetica;}

{\colortbl;\red255\green255\blue255;}
{\*\expandedcolortbl;;}
\paperw11900\paperh16840\margl1440\margr1440\vieww11520\viewh8400\viewkind0 

\pard\tx566\tx1133\tx1700\tx2267\tx2834\tx3401\tx3968\tx4535\tx5102\tx5669\tx6236\tx6803\pardirnatural\partightenfactor0 



\f0\fs24 \cf0

But even with in .txt it gives me this long string. I think it is a 
formatting problem .

thank you



I don't see the script you are talking about but it is normal to find
those strings inside a file of type '.rtf'. RTF documents are not simple
text documents (https://en.wikipedia.org/wiki/Rich_Text_Format) so if
you want to extract the text contained in these documents you will need
to use a library that can do this
(eg: https://pypi.org/project/striprtf/)
...or write a parser yourself.

P.S.
renaming the file extension does not change its content but it is useful
for the system to select the app that knows how to manage it

--
https://mail.python.org/mailman/listinfo/python-list


Re: python Store text file in mangodb

2022-06-13 Thread Peter Otten

On 12/06/2022 14:40, Ayesha Tassaduq wrote:

Hi i am trying to store a text file into MongoDB but i got the error .
"failing because no such method exists." % self.__name.split(".")[-1]
TypeError: 'Collection' object is not callable. If you meant to call the 
'insert' method on a 'Collection' object it is failing because no such method 
exists.

Can anyone please tell what is wrong here
i also tried to do it with insert_one  and insert_many
but when i try to do it with insert_many it shows error
   raise TypeError("documents must be a non-empty list")
TypeError: documents must be a non-empty list


Read the error messages carefully:

(1) "...If you meant to call the 'insert' method on a 'Collection'
object it is failing because no such method exists."

It may be a bit unfortunate that attributes spring into existence when
you try to access them, but you want an  "insert" method, and the error
message warns you that no such method exists. You can run your script in
idle and then type

>>> collections.insert

to see what it actually is.

(2) "...documents must be a non-empty list"

The insert_many() method expects a non-empty list of documents Example:

collection.insert_many([text_file_doc])

You don't provide an error message for insert_one(), and indeed it
should work where you tried insert():

collection.insert(text_file_doc)



from pymongo import MongoClient
client = MongoClient()
db = client.test_database  # use a database called "test_database"
collection = db.files   # and inside that DB, a collection called "files"

f = open('hashes.txt')  # open a file

# build a document to be inserted
text_file_doc = {"file_name": "hashes.txt"}
# insert the contents into the "file" collection
collection.insert(text_file_doc)

File names Hshes.txt has follown=ing data


You are not yet at the point where you are using the file or its
contents, so the file object and the file's contents could be omitted.
Generally it is a good idea

- to make your script as short as possible as long as it still produces
the error. In the process you will often be able to fix the problem
yourself.

- always provide the tracebacks using cut-and-paste. That is often
sufficient to diagnose and fix the problem.


Hash 1: 39331a6a2ea1cf31a5014b2a7c9e8dfad82df0b0666e81ce04cf8173cc5aed

Hash 2: 0e0ff63b7e5e872b9ea2f0d604b5d5afd6ba05665e52246fa321ead5b79c00ad

Hash 3: 89241ce841704508be1d0b76c478c9575ec8a7ec8be46742fd5acb0dc72787f3

Hash 4: 80283cb08f91b415aae04bcada0da1ca3e37bbe971ae821116b4d29008970bdb


--
https://mail.python.org/mailman/listinfo/python-list


python Store text file in mangodb

2022-06-12 Thread Ayesha Tassaduq
Hi i am trying to store a text file into MongoDB but i got the error .
"failing because no such method exists." % self.__name.split(".")[-1]
TypeError: 'Collection' object is not callable. If you meant to call the 
'insert' method on a 'Collection' object it is failing because no such method 
exists.

Can anyone please tell what is wrong here
i also tried to do it with insert_one  and insert_many 
but when i try to do it with insert_many it shows error
  raise TypeError("documents must be a non-empty list")
TypeError: documents must be a non-empty list


from pymongo import MongoClient
client = MongoClient()
db = client.test_database  # use a database called "test_database"
collection = db.files   # and inside that DB, a collection called "files"

f = open('hashes.txt')  # open a file

# build a document to be inserted
text_file_doc = {"file_name": "hashes.txt"}
# insert the contents into the "file" collection
collection.insert(text_file_doc)

File names Hshes.txt has follown=ing data
Hash 1: 39331a6a2ea1cf31a5014b2a7c9e8dfad82df0b0666e81ce04cf8173cc5aed

Hash 2: 0e0ff63b7e5e872b9ea2f0d604b5d5afd6ba05665e52246fa321ead5b79c00ad

Hash 3: 89241ce841704508be1d0b76c478c9575ec8a7ec8be46742fd5acb0dc72787f3

Hash 4: 80283cb08f91b415aae04bcada0da1ca3e37bbe971ae821116b4d29008970bdb
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Trying to read from a text file to generate a graph

2021-07-29 Thread Anssi Saari
"Steve"  writes:

> I am going though a struggle with this and just don't see where it fails.

It seems to me you're putting your data into strings when you need to
put it into lists. And no, adding brackets and commas to your strings so
that printing out the strings makes them look like lists doesn't make
them into lists.

There's python tutorial at https://docs.python.org/3/tutorial/index.html
which may help with the basics.

-- 
https://mail.python.org/mailman/listinfo/python-list


RE: Trying to read from a text file to generate a graph

2021-07-29 Thread Steve
Thank you, the responses here have been extremely helpful.
Steve



Footnote:
There's 99 bugs in the code, in the code.
99 bugs in the code.
Take one down and patch it all around.
Now there's 117 bugs in the code.





-Original Message-
From: Python-list  On
Behalf Of Stephen Berman
Sent: Wednesday, July 28, 2021 5:36 PM
To: python-list@python.org
Subject: Re: Trying to read from a text file to generate a graph

[Resending to the list only, since I couldn't post it without subscribing.]

On Wed, 28 Jul 2021 11:58:21 -0400 "Steve"  wrote:

> I forgot about the no-file rule...
>
>> On 28Jul2021 02:55, Steve  wrote:
>> I am going though a struggle with this and just don't see where it 
>> fails.  I am using the Dual Bar Graph.py program from 
>> https://matplotlib.org/stable/gallery/index.html website.  The file 
>> from the web site works so that shows that all my installations are 
>> complete.
>>
>> My program, LibreGraphics 05.py program runs but the graph is all 
>> smutched up.  I am pulling data from the EXCEL-FILE.txt into the 
>> program, selecting three values for each line and creating three 
>> variables formatted as is shown in the original demo file.
>>
>> When you run the program, choose 112 when prompted. You will see the 
>> values of the variables I want to pass to the graph section of the 
>> code.  If the values are hardcoded, the graphs look good.  When the 
>> variables generated by my section of the code, it does not.

The problem is due to the values of Sensors, TestStrips and SampleNumber
being strings; what you want is for them to be lists, as in the assignments
you commented out.  And since the data from the file is read in as strings,
you have to cast the elements of the Sensors and TestStrips lists to
integers, since you want the numerical values.  The following code does the
job:

Sensors = []
TestStrips = []
SampleNumber = []

x = 1
SensorNumber = input("Enter senaor number: ")
with open("_EXCEL-FILE.txt", 'r') as infile:
for lineEQN in infile:
if (lineEQN[0:1]== "."):
SN = lineEQN[44:48].strip()
if (SensorNumber == SN):
SN = x
SampleNumber.append(SN)

sv = lineEQN[25:29].strip()
Sensors.append(int(sv))

tv = lineEQN[32:37].strip()
TestStrips.append(int(tv))

x += 1

labels = SampleNumber

Add the rest of your code from the second half to make the desired bar
chart.

Steve Berman
--
https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Trying to read from a text file to generate a graph

2021-07-29 Thread Stephen Berman
[Resending to the list only, since I couldn't post it without subscribing.]

On Wed, 28 Jul 2021 11:58:21 -0400 "Steve"  wrote:

> I forgot about the no-file rule...
>
>> On 28Jul2021 02:55, Steve  wrote:
>> I am going though a struggle with this and just don't see where it
>> fails.  I am using the Dual Bar Graph.py program from
>> https://matplotlib.org/stable/gallery/index.html website.  The file
>> from the web site works so that shows that all my installations are
>> complete.
>>
>> My program, LibreGraphics 05.py program runs but the graph is all
>> smutched up.  I am pulling data from the EXCEL-FILE.txt into the
>> program, selecting three values for each line and creating three
>> variables formatted as is shown in the original demo file.
>>
>> When you run the program, choose 112 when prompted. You will see the
>> values of the variables I want to pass to the graph section of the
>> code.  If the values are hardcoded, the graphs look good.  When the
>> variables generated by my section of the code, it does not.

The problem is due to the values of Sensors, TestStrips and SampleNumber
being strings; what you want is for them to be lists, as in the
assignments you commented out.  And since the data from the file is read
in as strings, you have to cast the elements of the Sensors and
TestStrips lists to integers, since you want the numerical values.  The
following code does the job:

Sensors = []
TestStrips = []
SampleNumber = []

x = 1
SensorNumber = input("Enter senaor number: ")
with open("_EXCEL-FILE.txt", 'r') as infile:
for lineEQN in infile:
if (lineEQN[0:1]== "."):
SN = lineEQN[44:48].strip()
if (SensorNumber == SN):
SN = x
SampleNumber.append(SN)

sv = lineEQN[25:29].strip()
Sensors.append(int(sv))

tv = lineEQN[32:37].strip()
TestStrips.append(int(tv))

x += 1

labels = SampleNumber

Add the rest of your code from the second half to make the desired bar
chart.

Steve Berman
-- 
https://mail.python.org/mailman/listinfo/python-list


RE: Trying to read from a text file to generate a graph

2021-07-28 Thread Steve
I forgot about the no-file rule...

On 28Jul2021 02:55, Steve  wrote:
>I am going though a struggle with this and just don't see where it fails.
>I am using the Dual Bar Graph.py program from
https://matplotlib.org/stable/gallery/index.html website.
>The file from the web site works so that shows that all my installations
are complete.
>
>My program, LibreGraphics 05.py program runs but the graph is all smutched
up.  I am pulling data from the EXCEL-FILE.txt into the program, selecting
three values for each line and creating three variables formatted as is
shown in the original demo file.
>
>When you run the program, choose 112 when prompted. You will see the values
of the variables I want to pass to the graph section of the code.  If the
values are hardcoded, the graphs look good.  When the variables generated by
my section of the code, it does not.
>
>I am not sure what more to explain.
>Please help me
>Steve
>
>I am attaching a zip file.  I hope it gets through.

Alas, the python-list is text only, and attachments are discarded.

Here is my code for the main program:
=

#https://matplotlib.org/stable/gallery/index.html

import matplotlib.pyplot as plt
import numpy as np

## In this first half of the program, I am reading lines of data from
## a file and reformatting them to create comms separated values into
## three variables.

Sensors = ""
TestStrips = ""
SampleNumber = ""
   
x = 1
SensorNumber = input("Enter senaor number: ")
with open("_EXCEL-FILE.txt" , 'r') as infile:
 for lineEQN in infile: # loop to find each line in the file for that
dose 
   if (lineEQN[0:1]== "."):
   SN = lineEQN[44:48].strip()
   if (SensorNumber == SN):
   SN = x
   sn = "'" + str(SN) + "', "
   SampleNumber = SampleNumber + sn
   
   sv = lineEQN[25:29].strip()
   sv = sv + ", "
   Sensors = Sensors + sv

   tv = lineEQN[32:37].strip()
   tv = tv + ", "
   TestStrips = TestStrips + tv
   
   x += 1

SnLen = len(SampleNumber) -2
SampleNumber = SampleNumber[0:SnLen]
labels = "[" + SampleNumber + "]"
print("labels = " + labels)

SenLen = len(Sensors) -2
Sensors = Sensors[0:SenLen]
Sensors = "[" + Sensors + "]"
print("Sensors = " + Sensors)

TsLen = len(TestStrips) -2
TestStrips = TestStrips[0:TsLen]
TestStrips = "[" + TestStrips + "]"
print("TestStrips = " + TestStrips)

labels = SampleNumber

## =

## In this second half of the program, I want to use the three
## variables ## to populate a fraph. 

## There are problems with this technique.

## =
## With the following 6 lines of code commented-out, the graphing
## program uses the variables from the first half of the program
## and the graph fails

## =

## Uncommented, the following works by overwriting the variables
## from the previous code and generates a proper graph.

#label = ['1', '2', '3', '4', '5']
#Sensor = [150, 132, 182, 75, 117]
#TestStrip = [211, 144, 219, 99, 142]

#labels = label
#Sensors = Sensor
#TestStrips = TestStrip

## ===

## The follows is the original cose from the sample program
## with minor variable names and label changes.

x = np.arange(len(labels))  # the label locations
width = 0.35  # the width of the bars

fig, ax = plt.subplots()
rects1 = ax.bar(x - width/2, Sensors, width, label='Sensors')
rects2 = ax.bar(x + width/2, TestStrips, width, label='TestStrips')

# Add some text for labels, title and custom x-axis tick labels, etc.
ax.set_ylabel('Glucose Readings')
ax.set_title('Sensors VS Test Strip')
ax.set_xticks(x)
ax.set_xticklabels(labels)
ax.legend()

ax.bar_label(rects1, padding=3)
ax.bar_label(rects2, padding=3)

fig.tight_layout()

plt.show()

===
And here is a sample of the data file:

.Thu Jul 22, 2021 20:47   250   27727   111   2 

.Fri Jul 23, 2021 00:05   188   194 6   111   3 
.Fri Jul 23, 2021 09:08   142   16624   111   3 
.Fri Jul 23, 2021 12:58   138   16527   111   3 
.Fri Jul 23, 2021 22:32   356   39135   111   3 

.Sat Jul 24, 2021 09:44   150   21161   112   4 
.Sat Jul 24, 2021 13:24   132   14412   112   4 
.Sat Jul 24, 2021 16:40   182   21331   112   4 
.Sat Jul 24, 2021 19:52759924   112   4 
.Sat Jul 24, 2021 23:19   117   14225   112   4

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Trying to read from a text file to generate a graph

2021-07-28 Thread Cameron Simpson
On 28Jul2021 02:55, Steve  wrote:
>I am going though a struggle with this and just don't see where it fails.
>I am using the Dual Bar Graph.py program from 
>https://matplotlib.org/stable/gallery/index.html website.
>The file from the web site works so that shows that all my installations are 
>complete.
>
>My program, LibreGraphics 05.py program runs but the graph is all smutched up. 
> I am pulling data from the EXCEL-FILE.txt into the program, selecting three 
>values for each line and creating three variables formatted as is shown in the 
>original demo file.
>
>When you run the program, choose 112 when prompted. You will see the values of 
>the variables I want to pass to the graph section of the code.  If the values 
>are hardcoded, the graphs look good.  When the variables generated by my 
>section of the code, it does not.
>
>I am not sure what more to explain.
>Please help me
>Steve
>
>I am attaching a zip file.  I hope it gets through.

Alas, the python-list is text only, and attachments are discarded.

Is your programme small enough (one file, not insanely long) to just 
include inline in your next message?

Have you printed the variables generated by your code? _Are_ they the 
same as the hardcoded values? You may want to graph different stuff, but 
start by trying to exactly reproduce the hardcoded value, but extracted 
from the file.

Cheers,
Cameron Simpson 
-- 
https://mail.python.org/mailman/listinfo/python-list


Trying to read from a text file to generate a graph

2021-07-28 Thread Steve
I am going though a struggle with this and just don't see where it fails.
I am using the Dual Bar Graph.py program from 
https://matplotlib.org/stable/gallery/index.html website.
The file from the web site works so that shows that all my installations are 
complete.

My program, LibreGraphics 05.py program runs but the graph is all smutched up.  
I am pulling data from the EXCEL-FILE.txt into the program, selecting three 
values for each line and creating three variables formatted as is shown in the 
original demo file.

When you run the program, choose 112 when prompted. You will see the values of 
the variables I want to pass to the graph section of the code.  If the values 
are hardcoded, the graphs look good.  When the variables generated by my 
section of the code, it does not. 

I am not sure what more to explain.
Please help me
Steve

I am attaching a zip file.  I hope it gets through.





 George Melly remarked to Mike Jagger on how lined his face was for one so 
young. Jagger replied “They’re laughter lines George” to which Melly countered: 
“Mick, nothing’s that f**king funny!”.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to loop over a text file (to remove tags and normalize) using Python

2021-03-10 Thread Dan Stromberg
If you want text without tags, sometimes it's easier to use a text-based
web browser, EG:

#!/bin/sh

# for mutt to view html e-mails

#where html2txt is a shell script that performs the conversion, e.g. by
#calling

links -html-numbered-links 1 -html-images 1 -dump "file://$@"

#or
#
#lynx -force_html -dump "$@"
#
#or
#
#w3m -T text/html -F -dump "$@"


On Tue, Mar 9, 2021 at 1:26 PM S Monzur  wrote:

> Dear List,
>
> Newbie here. I am trying to loop over a text file to remove html tags,
> punctuation marks, stopwords. I have already used Beautiful Soup (Python v
> 3.8.3) to scrape the text (newspaper articles) from the site. It returns a
> list that I saved as a file. However, I am not sure how to use a loop in
> order to process all the items in the text file.
>
> In the code below I have used listfilereduced.text(containing data from one
> news article, link to listfilereduced.txt here
> <
> https://drive.google.com/file/d/1ojwN4u8cmh_nUoMJpdZ5ObaGW5URYYj3/view?usp=sharing
> >),
> however I would like to run this code on listfile.text(containing data from
> multiple articles, link to listfile.text
> <
> https://drive.google.com/file/d/1V3s8w8a3NQvex91EdOhdC9rQtCAOElpm/view?usp=sharing
> >
> ).
>
>
> Any help would be greatly appreciated!
>
> P.S. The text is in a Non-English script, but the tags are all in English.
>
>
> #The code below is for a textfile containing just one item. I am not sure
> how to tweak this to make it run for listfile.text (which contains raw data
> from multiple articles) with open('listfilereduced.txt', 'r',
> encoding='utf8') as my_file: rawData = my_file.read() print(rawData)
> #Separating body text from other data articleStart = rawData.find(" class=\"story-element story-element-text\">") articleData =
> rawData[:articleStart] articleBody = rawData[articleStart:]
> print(articleData) print("***") print(articleBody) print("***")
> #First, I define a function to strip tags from the body text def
> stripTags(pageContents): insideTag = 0 text = '' for char in pageContents:
> if char == '<': insideTag = 1 elif (insideTag == 1 and char == '>'):
> insideTag = 0 elif insideTag == 1: continue else: text += char return text
> #Calling the function articleBodyText = stripTags(articleBody)
> print(articleBodyText) ##Isolating article title and publication date
> TitleEndLoc = articleData.find("") dateStartLoc =
> articleData.find(" class=\"storyPageMetaData-m__publish-time__19bdV\">")
> dateEndLoc=articleData.find(" storyPageMetaDataIcons-m__icons__3E4Xg\">") titleString =
> articleData[:TitleEndLoc] dateString = articleData[dateStartLoc:dateEndLoc]
> ##Call stripTags to clean articleTitle= stripTags(titleString) articleDate
> = stripTags(dateString) print(articleTitle) print(articleDate) #Cleaning
> the date a bit more startLocDate = articleDate.find(":") endLocDate =
> articleDate.find(",") articleDateClean =
> articleDate[startLocDate+2:endLocDate] print(articleDateClean) #save all
> this data to a dictionary that saves the title, data and the body text
> PAloTextDict = {"Title": articleTitle, "Date": articleDateClean, "Text":
> articleBodyText} print(PAloTextDict) #Normalize text by: #1. Splitting
> paragraphs of text into lists of words articleBodyWordList =
> articleBodyText.split() print(articleBodyWordList) #2.Removing punctuation
> and stopwords from bnlp.corpus import stopwords, punctuations #A. Remove
> punctuation first listNoPunct = [] for word in articleBodyWordList: for
> mark in punctuations: word=word.replace(mark, '') listNoPunct.append(word)
> print(listNoPunct) #B. removing stopwords banglastopwords = stopwords()
> print(banglastopwords) cleanList=[] for word in listNoPunct: if word in
> banglastopwords: continue else: cleanList.append(word) print(cleanList)
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to loop over a text file (to remove tags and normalize) using Python

2021-03-10 Thread Peter Otten

On 10/03/2021 13:19, S Monzur wrote:

I initially scraped the links using beautiful soup, and from those links
downloaded the specific content of the articles I was interested in
(titles, dates, names of contributor, main texts) and stored that
information in a list. I then saved the list to a text file.
https://pastebin.com/8BMi9qjW . I am now trying to remove the html tags
from this text file, and running into issues as mentioned in the previous
post.


As I said in my previous post, when you process the list entries
separately you will probably avoid the problem.

Unfortunately with the format you chose to store your intermediate data
you cannot reconstruct it reliably.

I recommend that you either

(1) avoid the text file and extract the interesting parts from PASoup
directly or

(2) pick a different file format to store the result sets. For
short-term storage pickle
<https://docs.python.org/3/library/pickle.html#examples> should work.

--
https://mail.python.org/mailman/listinfo/python-list


Re: How to loop over a text file (to remove tags and normalize) using Python

2021-03-10 Thread S Monzur
I initially scraped the links using beautiful soup, and from those links
downloaded the specific content of the articles I was interested in
(titles, dates, names of contributor, main texts) and stored that
information in a list. I then saved the list to a text file.
https://pastebin.com/8BMi9qjW . I am now trying to remove the html tags
from this text file, and running into issues as mentioned in the previous
post.



On Wed, Mar 10, 2021 at 3:46 PM Peter Otten <__pete...@web.de> wrote:

> On 10/03/2021 04:35, S Monzur wrote:
> > Thanks! I ended up using beautiful soup to remove the html tags and
> create
> > three lists (titles of article, publications dates, main body) but am
> still
> > facing a problem where the list is not properly storing the main body.
> > There is something wrong with my code for that section, and any comment
> > would be really helpful!
> >
> >   ListFile Text
> > <
> https://drive.google.com/file/d/1V3s8w8a3NQvex91EdOhdC9rQtCAOElpm/view?usp=sharing
> >
>
> How did you create that file?
>
>  > BeautifulSoup code for removing tags <https://pastebin.com/qvbVMUGD>
>
> > print(bodytext[0]) # so here, I'm only getting the first paragraph of
> the body of the first article, not all of the first article
> >
> > print(bodytext[1]) # here, I'm getting the second paragraph of the first
> article, and not the second article
>
> It may help if you process the individual articles with beautiful soup,
> not the whole list at once.
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to loop over a text file (to remove tags and normalize) using Python

2021-03-10 Thread Peter Otten

On 10/03/2021 04:35, S Monzur wrote:

Thanks! I ended up using beautiful soup to remove the html tags and create
three lists (titles of article, publications dates, main body) but am still
facing a problem where the list is not properly storing the main body.
There is something wrong with my code for that section, and any comment
would be really helpful!

  ListFile Text



How did you create that file?

> BeautifulSoup code for removing tags 


print(bodytext[0]) # so here, I'm only getting the first paragraph of the body 
of the first article, not all of the first article

print(bodytext[1]) # here, I'm getting the second paragraph of the first 
article, and not the second article


It may help if you process the individual articles with beautiful soup, 
not the whole list at once.


--
https://mail.python.org/mailman/listinfo/python-list


Re: How to loop over a text file (to remove tags and normalize) using Python

2021-03-10 Thread Joel Goldstick
On Tue, Mar 9, 2021 at 10:36 PM S Monzur  wrote:
>
> Thanks! I ended up using beautiful soup to remove the html tags and create
> three lists (titles of article, publications dates, main body) but am still
> facing a problem where the list is not properly storing the main body.
> There is something wrong with my code for that section, and any comment
> would be really helpful!
>

Can you use a very small file to test?  I think you could edit your
data file to contain maybe two or three articles.  Then you could post
that file in your email (no attachments).  And you could post your
code which is probably not very long.  In that way, people here will
be better able to help you.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to loop over a text file (to remove tags and normalize) using Python

2021-03-09 Thread S Monzur
Thanks! I ended up using beautiful soup to remove the html tags and create
three lists (titles of article, publications dates, main body) but am still
facing a problem where the list is not properly storing the main body.
There is something wrong with my code for that section, and any comment
would be really helpful!

 ListFile Text
<https://drive.google.com/file/d/1V3s8w8a3NQvex91EdOhdC9rQtCAOElpm/view?usp=sharing>
BeautifulSoup code for removing tags <https://pastebin.com/qvbVMUGD>


On Wed, Mar 10, 2021 at 4:32 AM Dan Ciprus (dciprus) 
wrote:

> No problem, list just converts everything into plain/txt which is GREAT !
> :-)
>
> So without digging deeply into what you need to do: I am assuming that
> your
> input contains html tags. Why don't you utilize lib like:
> https://pypi.org/project/beautifulsoup4/ instead of doing harakiri with
> parsing
> data without using regex ? Just a hint ..
>
> On Wed, Mar 10, 2021 at 04:22:19AM +0600, S Monzur wrote:
> >   Thank you and apologies! I did not realize how jumbled it was at the
> >   receiver's end.
> >   The code is now at this site :  [1]https://pastebin.com/wSi2xzBh
> >   I'm basically trying to do a few things with my code-
> >
> >1. Extract 3 strings from the text- title, date and main text
> >
> >2. Remove all tags afterwards
> >
> >3. Save in a dictionary, with three keys- title, date and bodytext.
> >
> >4. Remove punctuation and stopwords (I've used a user generated
> function
> >   for that).
> >
> >   I've been able to do all of these steps for the file
> [2]ListFileReduced,
> >   as shown in the code (although it's clunky).
> >
> >   But, I would like to be able to do it for the other text file:
> [3]ListFile
> >   which has more articles. I used BeautifulSoup to scrape the data from
> the
> >   website, and then generated a list that I saved as a text file.
> >
> >   Best,
> >   Monzur
> >   On Wed, Mar 10, 2021 at 4:00 AM Dan Ciprus (dciprus)
> >   <[4]dcip...@cisco.com> wrote:
> >
> > If you could utilized pastebin or similar site to show your code, it
> > would help
> > tremendously since it's an unindented mess now and can not be read
> > easily.
> >
> > On Wed, Mar 10, 2021 at 03:07:14AM +0600, S Monzur wrote:
> > >Dear List,
> > >
> > >Newbie here. I am trying to loop over a text file to remove html
> tags,
> > >punctuation marks, stopwords. I have already used Beautiful Soup
> > (Python v
> > >3.8.3) to scrape the text (newspaper articles) from the site. It
> > returns a
> > >list that I saved as a file. However, I am not sure how to use a
> loop
> > in
> > >order to process all the items in the text file.
> > >
> > >In the code below I have used listfilereduced.text(containing data
> from
> > one
> > >news article, link to listfilereduced.txt here
> > ><[5]
> https://drive.google.com/file/d/1ojwN4u8cmh_nUoMJpdZ5ObaGW5URYYj3/view?usp=sharing
> >),
> > >however I would like to run this code on listfile.text(containing
> data
> > from
> > >multiple articles, link to listfile.text
> > ><[6]
> https://drive.google.com/file/d/1V3s8w8a3NQvex91EdOhdC9rQtCAOElpm/view?usp=sharing
> >
> > >).
> > >
> > >
> > >Any help would be greatly appreciated!
> > >
> > >P.S. The text is in a Non-English script, but the tags are all in
> > English.
> > >
> > >
> > >#The code below is for a textfile containing just one item. I am not
> > sure
> > >how to tweak this to make it run for listfile.text (which contains
> raw
> > data
> > >from multiple articles) with open('listfilereduced.txt', 'r',
> > >encoding='utf8') as my_file: rawData = my_file.read() print(rawData)
> > >#Separating body text from other data articleStart =
> rawData.find(" > >class=\"story-element story-element-text\">") articleData =
> > >rawData[:articleStart] articleBody = rawData[articleStart:]
> > >print(articleData) print("***") print(articleBody)
> print("***")
> > >#First, I define a function to strip tags from the body text def
> > >stripTags(pageContents): insideTag = 0 text = '' for char in
> > pageContents:
> > >if char == '<': insideTag = 1 elif (insideTag == 1 and char == '>'):
> &g

Re: How to loop over a text file (to remove tags and normalize) using Python

2021-03-09 Thread Dan Ciprus (dciprus) via Python-list

No problem, list just converts everything into plain/txt which is GREAT ! :-)

So without digging deeply into what you need to do: I am assuming that your 
input contains html tags. Why don't you utilize lib like: 
https://pypi.org/project/beautifulsoup4/ instead of doing harakiri with parsing 
data without using regex ? Just a hint ..


On Wed, Mar 10, 2021 at 04:22:19AM +0600, S Monzur wrote:

  Thank you and apologies! I did not realize how jumbled it was at the
  receiver's end. 
  The code is now at this site :  [1]https://pastebin.com/wSi2xzBh 
  I'm basically trying to do a few things with my code-

   1. Extract 3 strings from the text- title, date and main text

   2. Remove all tags afterwards

   3. Save in a dictionary, with three keys- title, date and bodytext.

   4. Remove punctuation and stopwords (I've used a user generated function
  for that).

  I've been able to do all of these steps for the file [2]ListFileReduced,
  as shown in the code (although it's clunky).

  But, I would like to be able to do it for the other text file: [3]ListFile
  which has more articles. I used BeautifulSoup to scrape the data from the
  website, and then generated a list that I saved as a text file. 

  Best,
  Monzur
  On Wed, Mar 10, 2021 at 4:00 AM Dan Ciprus (dciprus)
  <[4]dcip...@cisco.com> wrote:

If you could utilized pastebin or similar site to show your code, it
would help
tremendously since it's an unindented mess now and can not be read
easily.

On Wed, Mar 10, 2021 at 03:07:14AM +0600, S Monzur wrote:
>Dear List,
>
>Newbie here. I am trying to loop over a text file to remove html tags,
>punctuation marks, stopwords. I have already used Beautiful Soup
(Python v
>3.8.3) to scrape the text (newspaper articles) from the site. It
returns a
>list that I saved as a file. However, I am not sure how to use a loop
in
>order to process all the items in the text file.
>
>In the code below I have used listfilereduced.text(containing data from
one
>news article, link to listfilereduced.txt here

><[5]https://drive.google.com/file/d/1ojwN4u8cmh_nUoMJpdZ5ObaGW5URYYj3/view?usp=sharing>),
>however I would like to run this code on listfile.text(containing data
from
>multiple articles, link to listfile.text

><[6]https://drive.google.com/file/d/1V3s8w8a3NQvex91EdOhdC9rQtCAOElpm/view?usp=sharing>
>).
>
>
>Any help would be greatly appreciated!
>
>P.S. The text is in a Non-English script, but the tags are all in
English.
>
>
>#The code below is for a textfile containing just one item. I am not
sure
>how to tweak this to make it run for listfile.text (which contains raw
data
>from multiple articles) with open('listfilereduced.txt', 'r',
>encoding='utf8') as my_file: rawData = my_file.read() print(rawData)
>#Separating body text from other data articleStart = rawData.find("class=\"story-element story-element-text\">") articleData =
>rawData[:articleStart] articleBody = rawData[articleStart:]
>print(articleData) print("***") print(articleBody) print("***")
>#First, I define a function to strip tags from the body text def
>stripTags(pageContents): insideTag = 0 text = '' for char in
pageContents:
>if char == '<': insideTag = 1 elif (insideTag == 1 and char == '>'):
>insideTag = 0 elif insideTag == 1: continue else: text += char return
text
>#Calling the function articleBodyText = stripTags(articleBody)
>print(articleBodyText) ##Isolating article title and publication date
>TitleEndLoc = articleData.find("") dateStartLoc =
>articleData.find("class=\"storyPageMetaData-m__publish-time__19bdV\">")
>dateEndLoc=articleData.find("storyPageMetaDataIcons-m__icons__3E4Xg\">") titleString =
>articleData[:TitleEndLoc] dateString =
articleData[dateStartLoc:dateEndLoc]
>##Call stripTags to clean articleTitle= stripTags(titleString)
articleDate
>= stripTags(dateString) print(articleTitle) print(articleDate)
#Cleaning
>the date a bit more startLocDate = articleDate.find(":") endLocDate =
>articleDate.find(",") articleDateClean =
>articleDate[startLocDate+2:endLocDate] print(articleDateClean) #save
all
>this data to a dictionary that saves the title, data and the body text
>PAloTextDict = {"Title": articleTitle, "Date": articleDateClean,
"Text":
>articleBodyText} print(PAloTextDict) #Normalize text by: #1. Splitting
>paragraphs of text into lists of words articleBodyWordList =
>articleBodyText.split() print(articleBodyWordList) #2

Re: How to loop over a text file (to remove tags and normalize) using Python

2021-03-09 Thread S Monzur
Thank you and apologies! I did not realize how jumbled it was at the
receiver's end.

The code is now at this site :  https://pastebin.com/wSi2xzBh

I'm basically trying to do a few things with my code-

   1.

   Extract 3 strings from the text- title, date and main text
   2.

   Remove all tags afterwards
   3.

   Save in a dictionary, with three keys- title, date and bodytext.
   4.

   Remove punctuation and stopwords (I've used a user generated function
   for that).

I've been able to do all of these steps for the file ListFileReduced
<https://drive.google.com/file/d/1ojwN4u8cmh_nUoMJpdZ5ObaGW5URYYj3/view?usp=sharing>,
as shown in the code (although it's clunky).

But, I would like to be able to do it for the other text file: ListFile
<https://drive.google.com/file/d/1V3s8w8a3NQvex91EdOhdC9rQtCAOElpm/view?usp=sharing>
which has more articles. I used BeautifulSoup to scrape the data from the
website, and then generated a list that I saved as a text file.


Best,

Monzur

On Wed, Mar 10, 2021 at 4:00 AM Dan Ciprus (dciprus) 
wrote:

> If you could utilized pastebin or similar site to show your code, it would
> help
> tremendously since it's an unindented mess now and can not be read easily.
>
> On Wed, Mar 10, 2021 at 03:07:14AM +0600, S Monzur wrote:
> >Dear List,
> >
> >Newbie here. I am trying to loop over a text file to remove html tags,
> >punctuation marks, stopwords. I have already used Beautiful Soup (Python v
> >3.8.3) to scrape the text (newspaper articles) from the site. It returns a
> >list that I saved as a file. However, I am not sure how to use a loop in
> >order to process all the items in the text file.
> >
> >In the code below I have used listfilereduced.text(containing data from
> one
> >news article, link to listfilereduced.txt here
> ><
> https://drive.google.com/file/d/1ojwN4u8cmh_nUoMJpdZ5ObaGW5URYYj3/view?usp=sharing
> >),
> >however I would like to run this code on listfile.text(containing data
> from
> >multiple articles, link to listfile.text
> ><
> https://drive.google.com/file/d/1V3s8w8a3NQvex91EdOhdC9rQtCAOElpm/view?usp=sharing
> >
> >).
> >
> >
> >Any help would be greatly appreciated!
> >
> >P.S. The text is in a Non-English script, but the tags are all in English.
> >
> >
> >#The code below is for a textfile containing just one item. I am not sure
> >how to tweak this to make it run for listfile.text (which contains raw
> data
> >from multiple articles) with open('listfilereduced.txt', 'r',
> >encoding='utf8') as my_file: rawData = my_file.read() print(rawData)
> >#Separating body text from other data articleStart = rawData.find(" >class=\"story-element story-element-text\">") articleData =
> >rawData[:articleStart] articleBody = rawData[articleStart:]
> >print(articleData) print("***") print(articleBody) print("***")
> >#First, I define a function to strip tags from the body text def
> >stripTags(pageContents): insideTag = 0 text = '' for char in pageContents:
> >if char == '<': insideTag = 1 elif (insideTag == 1 and char == '>'):
> >insideTag = 0 elif insideTag == 1: continue else: text += char return text
> >#Calling the function articleBodyText = stripTags(articleBody)
> >print(articleBodyText) ##Isolating article title and publication date
> >TitleEndLoc = articleData.find("") dateStartLoc =
> >articleData.find(" >class=\"storyPageMetaData-m__publish-time__19bdV\">")
> >dateEndLoc=articleData.find(" >storyPageMetaDataIcons-m__icons__3E4Xg\">") titleString =
> >articleData[:TitleEndLoc] dateString =
> articleData[dateStartLoc:dateEndLoc]
> >##Call stripTags to clean articleTitle= stripTags(titleString) articleDate
> >= stripTags(dateString) print(articleTitle) print(articleDate) #Cleaning
> >the date a bit more startLocDate = articleDate.find(":") endLocDate =
> >articleDate.find(",") articleDateClean =
> >articleDate[startLocDate+2:endLocDate] print(articleDateClean) #save all
> >this data to a dictionary that saves the title, data and the body text
> >PAloTextDict = {"Title": articleTitle, "Date": articleDateClean, "Text":
> >articleBodyText} print(PAloTextDict) #Normalize text by: #1. Splitting
> >paragraphs of text into lists of words articleBodyWordList =
> >articleBodyText.split() print(articleBodyWordList) #2.Removing punctuation
> >and stopwords from bnlp.corpus import stopwords, punctuations #A. Remove
> >punctuation first listNoPunct = [] for word in articleBodyWordList: for
> >mark in punctuations: word=word.repl

Re: How to loop over a text file (to remove tags and normalize) using Python

2021-03-09 Thread Dan Ciprus (dciprus) via Python-list
If you could utilized pastebin or similar site to show your code, it would help 
tremendously since it's an unindented mess now and can not be read easily.


On Wed, Mar 10, 2021 at 03:07:14AM +0600, S Monzur wrote:

Dear List,

Newbie here. I am trying to loop over a text file to remove html tags,
punctuation marks, stopwords. I have already used Beautiful Soup (Python v
3.8.3) to scrape the text (newspaper articles) from the site. It returns a
list that I saved as a file. However, I am not sure how to use a loop in
order to process all the items in the text file.

In the code below I have used listfilereduced.text(containing data from one
news article, link to listfilereduced.txt here
<https://drive.google.com/file/d/1ojwN4u8cmh_nUoMJpdZ5ObaGW5URYYj3/view?usp=sharing>),
however I would like to run this code on listfile.text(containing data from
multiple articles, link to listfile.text
<https://drive.google.com/file/d/1V3s8w8a3NQvex91EdOhdC9rQtCAOElpm/view?usp=sharing>
).


Any help would be greatly appreciated!

P.S. The text is in a Non-English script, but the tags are all in English.


#The code below is for a textfile containing just one item. I am not sure
how to tweak this to make it run for listfile.text (which contains raw data
from multiple articles) with open('listfilereduced.txt', 'r',
encoding='utf8') as my_file: rawData = my_file.read() print(rawData)
#Separating body text from other data articleStart = rawData.find("") articleData =
rawData[:articleStart] articleBody = rawData[articleStart:]
print(articleData) print("***") print(articleBody) print("***")
#First, I define a function to strip tags from the body text def
stripTags(pageContents): insideTag = 0 text = '' for char in pageContents:
if char == '<': insideTag = 1 elif (insideTag == 1 and char == '>'):
insideTag = 0 elif insideTag == 1: continue else: text += char return text
#Calling the function articleBodyText = stripTags(articleBody)
print(articleBodyText) ##Isolating article title and publication date
TitleEndLoc = articleData.find("") dateStartLoc =
articleData.find("")
dateEndLoc=articleData.find("") titleString =
articleData[:TitleEndLoc] dateString = articleData[dateStartLoc:dateEndLoc]
##Call stripTags to clean articleTitle= stripTags(titleString) articleDate
= stripTags(dateString) print(articleTitle) print(articleDate) #Cleaning
the date a bit more startLocDate = articleDate.find(":") endLocDate =
articleDate.find(",") articleDateClean =
articleDate[startLocDate+2:endLocDate] print(articleDateClean) #save all
this data to a dictionary that saves the title, data and the body text
PAloTextDict = {"Title": articleTitle, "Date": articleDateClean, "Text":
articleBodyText} print(PAloTextDict) #Normalize text by: #1. Splitting
paragraphs of text into lists of words articleBodyWordList =
articleBodyText.split() print(articleBodyWordList) #2.Removing punctuation
and stopwords from bnlp.corpus import stopwords, punctuations #A. Remove
punctuation first listNoPunct = [] for word in articleBodyWordList: for
mark in punctuations: word=word.replace(mark, '') listNoPunct.append(word)
print(listNoPunct) #B. removing stopwords banglastopwords = stopwords()
print(banglastopwords) cleanList=[] for word in listNoPunct: if word in
banglastopwords: continue else: cleanList.append(word) print(cleanList)
--
https://mail.python.org/mailman/listinfo/python-list


--

Daniel Ciprus  .:|:.:|:.
CONSULTING ENGINEER.CUSTOMER DELIVERY   Cisco Systems Inc.

dcip...@cisco.com

tel: +1 703 484 0205
mob: +1 540 223 7098



signature.asc
Description: PGP signature
-- 
https://mail.python.org/mailman/listinfo/python-list


How to loop over a text file (to remove tags and normalize) using Python

2021-03-09 Thread S Monzur
Dear List,

Newbie here. I am trying to loop over a text file to remove html tags,
punctuation marks, stopwords. I have already used Beautiful Soup (Python v
3.8.3) to scrape the text (newspaper articles) from the site. It returns a
list that I saved as a file. However, I am not sure how to use a loop in
order to process all the items in the text file.

In the code below I have used listfilereduced.text(containing data from one
news article, link to listfilereduced.txt here
<https://drive.google.com/file/d/1ojwN4u8cmh_nUoMJpdZ5ObaGW5URYYj3/view?usp=sharing>),
however I would like to run this code on listfile.text(containing data from
multiple articles, link to listfile.text
<https://drive.google.com/file/d/1V3s8w8a3NQvex91EdOhdC9rQtCAOElpm/view?usp=sharing>
).


Any help would be greatly appreciated!

P.S. The text is in a Non-English script, but the tags are all in English.


#The code below is for a textfile containing just one item. I am not sure
how to tweak this to make it run for listfile.text (which contains raw data
from multiple articles) with open('listfilereduced.txt', 'r',
encoding='utf8') as my_file: rawData = my_file.read() print(rawData)
#Separating body text from other data articleStart = rawData.find("") articleData =
rawData[:articleStart] articleBody = rawData[articleStart:]
print(articleData) print("***") print(articleBody) print("***")
#First, I define a function to strip tags from the body text def
stripTags(pageContents): insideTag = 0 text = '' for char in pageContents:
if char == '<': insideTag = 1 elif (insideTag == 1 and char == '>'):
insideTag = 0 elif insideTag == 1: continue else: text += char return text
#Calling the function articleBodyText = stripTags(articleBody)
print(articleBodyText) ##Isolating article title and publication date
TitleEndLoc = articleData.find("") dateStartLoc =
articleData.find("")
dateEndLoc=articleData.find("") titleString =
articleData[:TitleEndLoc] dateString = articleData[dateStartLoc:dateEndLoc]
##Call stripTags to clean articleTitle= stripTags(titleString) articleDate
= stripTags(dateString) print(articleTitle) print(articleDate) #Cleaning
the date a bit more startLocDate = articleDate.find(":") endLocDate =
articleDate.find(",") articleDateClean =
articleDate[startLocDate+2:endLocDate] print(articleDateClean) #save all
this data to a dictionary that saves the title, data and the body text
PAloTextDict = {"Title": articleTitle, "Date": articleDateClean, "Text":
articleBodyText} print(PAloTextDict) #Normalize text by: #1. Splitting
paragraphs of text into lists of words articleBodyWordList =
articleBodyText.split() print(articleBodyWordList) #2.Removing punctuation
and stopwords from bnlp.corpus import stopwords, punctuations #A. Remove
punctuation first listNoPunct = [] for word in articleBodyWordList: for
mark in punctuations: word=word.replace(mark, '') listNoPunct.append(word)
print(listNoPunct) #B. removing stopwords banglastopwords = stopwords()
print(banglastopwords) cleanList=[] for word in listNoPunct: if word in
banglastopwords: continue else: cleanList.append(word) print(cleanList)
-- 
https://mail.python.org/mailman/listinfo/python-list


[issue37036] Iterating a text file by line should not implicitly disable tell

2019-05-27 Thread Jeffrey Kintscher


Change by Jeffrey Kintscher :


--
nosy: +Jeffrey.Kintscher

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37036] Iterating a text file by line should not implicitly disable tell

2019-05-24 Thread Josh Rosenberg


Josh Rosenberg  added the comment:

Left a dangling sentence in there:

"I used two arg iter in both cases to keep the code paths as similar as 
possible so the `telling`."

should read:

"I used iter(f.readline, '') in both cases to keep the code paths as similar as 
possible so the `telling` optimization was tested in isolation."

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue37036] Iterating a text file by line should not implicitly disable tell

2019-05-24 Thread Josh Rosenberg

New submission from Josh Rosenberg :

TextIOWrapper explicitly sets the `telling` flag to 0 when .__next__ ( 
textiowrapper_iternext ) is called ( 
https://github.com/python/cpython/blob/3.7/Modules/_io/textio.c#L2974 ), e.g. 
during standard for loops over the file of this form, trying to call tell 
raises an exception:

with open(filename) as myfile:
for line in myfile:
myfile.tell()

which raises:

OSError: telling position disabled by next() call

while the effectively equivalent:

with open(filename) as myfile:
for line in iter(myfile.readline, ''):
myfile.tell()

works fine.

The implementation of __next__ and readline is almost identical (__next__ is 
calling readline and handling the EOF sentinel per the iterator protocol, 
that's all). Given they're implemented identically, I see no reason aside from 
nannying (discouraging slow operations like seek/tell during iteration by 
forbidding them on the most convenient means of iterating) to forbid tell after 
beginning iteration, but not after readline. Given the general Python 
philosophy of "we're all adults here", I don't see nannying as a good reason, 
which leaves the performance benefit of avoiding snapshotting as the only 
compelling reason to do this.

But the performance benefit is trivial; in local tests, the savings from 
avoiding that work is barely noticeable, per ipython microbenchmarks (on 3.7.2 
Linux x86-64):

>>> %%timeit -r5 from collections import deque; f = 
open('American-english.txt'); consume = deque(maxlen=0).extend
... f.seek(0)
... consume(iter(f.readline, ''))
...
...
15.8 ms ± 38.4 μs per loop (mean ± std. dev. of 5 runs, 100 loops each)

>>> %%timeit -r5 from collections import deque; f = 
open('American-english.txt'); consume = deque(maxlen=0).extend
... f.seek(0)
... next(f)  # Triggers optimization for all future read_chunk calls
... consume(iter(f.readline, ''))  # Otherwise iterated identically
...
...
15.7 ms ± 98.5 μs per loop (mean ± std. dev. of 5 runs, 100 loops each)

The two blocks are identical except that the second one explicitly advances the 
file one line at the beginning with next(f) to flip `telling` to 0 so future 
calls to readline don't involve the snapshotting code in 
textiowrapper_read_chunk.

Calling consume(f) would drop the time to 9.86 ms, but that's saying more about 
the optimization of the raw iterator protocol over method calls than it is 
about the `telling` optimization; I used two arg iter in both cases to keep the 
code paths as similar as possible so the `telling`.

For reference, the input file was 931708 bytes (931467 characters thanks to a 
few scattered non-ASCII UTF-8 characters), 98569 lines long.

Presumably, the speed difference of 0.1 ms can be chalked up to the telling 
optimization, so removing it would increase the cost of normal iteration from 
9.86 ms to 9.96 ms. That seems de minimis to my mind, in the context of text 
oriented I/O.

Given that, it seems like triggering this optimization via __next__ should be 
dropped; it's a microoptimization at best, that's mostly irrelevant compared to 
the overhead of text-oriented I/O, and it introduces undocumented limitations 
on the use of TextIOWrapper.

The changes would be to remove all use of the `telling` variable, and (if we 
want to keep the optimization for unseekable files, where at least no 
functionality is lost by having it), change the two tests in 
textiowrapper_read_chunk to test `seekable` in its place. Or we drop the 
optimization entirely and save 50+ lines of code that provide a fairly tiny 
benefit in any event.

--
components: IO, Library (Lib)
messages: 343402
nosy: josh.r
priority: normal
severity: normal
status: open
title: Iterating a text file by line should not implicitly disable tell
versions: Python 3.7, Python 3.8

___
Python tracker 
<https://bugs.python.org/issue37036>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: Python read text file columnwise

2019-01-15 Thread Neil Cerutti
On 2019-01-15, Juris __  wrote:
> Hi!
>
> On 15/01/2019 17:04, Neil Cerutti wrote:
>> On 2019-01-11, shibashib...@gmail.com  wrote:
>>> Hello

 I'm very new in python. I have a file in the format:

 2018-05-31   16:00:0028.90   81.77   4.3
 2018-05-31   20:32:0028.17   84.89   4.1
 2018-06-20   04:09:0027.36   88.01   4.8
 2018-06-20   04:15:0027.31   87.09   4.7
 2018-06-28   04.07:0027.87   84.91   5.0
 2018-06-29   00.42:0032.20   104.61  4.8
>>>
>>> I would like to read this file in python column-wise.
>>>
>>> I tried this way but not working 
>>>event_list = open('seismicity_R023E.txt',"r")
>>>  info_event = read(event_list,'%s %s %f %f %f %f\n');
>> 
>> If it's really tabular data in fixed-width columns you can read
>> it that way with Python.
>> 
>> records = []
>> for line in file:
>>  record = []
>>  i = 0
>>  for width in (30, 8, 7, 5): # approximations
>>  item = line[i:i+width]
>>  record.append(item)
>>  i += width
>>  records.append(record)
>> 
>> This leaves them all strings, which in my experience is more
>> convenient in practice. You can convert as you go if you
>> want,though it won't look nice and simple any longer.
>>
>
> Perhaps even better approach is to use csv module from standard library:
>
> import csv
>
> csv_reader = csv.reader(file, dialect="excel-tab")
> for row in csv_reader:
>  # do something with record data which is conveniently parsed to list
>  print(row)
>
> ['2018-05-31', '16:00:00', '28.90', '81.77', '4.3']
> ...
> ['2018-06-29', '00.42:00', '32.20', '104.61', '4.8']

Yes, if applicable it is awesome!

-- 
Neil Cerutti
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python read text file columnwise

2019-01-15 Thread Juris __
Hi!

On 15/01/2019 17:04, Neil Cerutti wrote:
> On 2019-01-11, shibashib...@gmail.com  wrote:
>> Hello
>>>
>>> I'm very new in python. I have a file in the format:
>>>
>>> 2018-05-31   16:00:0028.90   81.77   4.3
>>> 2018-05-31   20:32:0028.17   84.89   4.1
>>> 2018-06-20   04:09:0027.36   88.01   4.8
>>> 2018-06-20   04:15:0027.31   87.09   4.7
>>> 2018-06-28   04.07:0027.87   84.91   5.0
>>> 2018-06-29   00.42:0032.20   104.61  4.8
>>
>> I would like to read this file in python column-wise.
>>
>> I tried this way but not working 
>>event_list = open('seismicity_R023E.txt',"r")
>>  info_event = read(event_list,'%s %s %f %f %f %f\n');
> 
> If it's really tabular data in fixed-width columns you can read
> it that way with Python.
> 
> records = []
> for line in file:
>  record = []
>  i = 0
>  for width in (30, 8, 7, 5): # approximations
>  item = line[i:i+width]
>  record.append(item)
>  i += width
>  records.append(record)
> 
> This leaves them all strings, which in my experience is more
> convenient in practice. You can convert as you go if you
> want,though it won't look nice and simple any longer.
>

Perhaps even better approach is to use csv module from standard library:

import csv

csv_reader = csv.reader(file, dialect="excel-tab")
for row in csv_reader:
 # do something with record data which is conveniently parsed to list
 print(row)

['2018-05-31', '16:00:00', '28.90', '81.77', '4.3']
...
['2018-06-29', '00.42:00', '32.20', '104.61', '4.8']


BR, Juris
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python read text file columnwise

2019-01-15 Thread Neil Cerutti
On 2019-01-11, shibashib...@gmail.com  wrote:
> Hello
>> 
>> I'm very new in python. I have a file in the format:
>> 
>> 2018-05-31   16:00:0028.90   81.77   4.3
>> 2018-05-31   20:32:0028.17   84.89   4.1
>> 2018-06-20   04:09:0027.36   88.01   4.8
>> 2018-06-20   04:15:0027.31   87.09   4.7
>> 2018-06-28   04.07:0027.87   84.91   5.0
>> 2018-06-29   00.42:0032.20   104.61  4.8
>
> I would like to read this file in python column-wise.  
>
> I tried this way but not working 
>   event_list = open('seismicity_R023E.txt',"r")
> info_event = read(event_list,'%s %s %f %f %f %f\n');

If it's really tabular data in fixed-width columns you can read
it that way with Python.

records = []
for line in file:
record = []
i = 0
for width in (30, 8, 7, 5): # approximations
item = line[i:i+width]
record.append(item)
i += width
records.append(record)

This leaves them all strings, which in my experience is more
convenient in practice. You can convert as you go if you
want,though it won't look nice and simple any longer.

-- 
Neil Cerutti
-- 
https://mail.python.org/mailman/listinfo/python-list


RE: Python read text file columnwise

2019-01-14 Thread Schachner, Joseph
About the original question:   If I were you, I would put the 3 numbers into a 
list (or a tuple, if you don't need to modify them) and put this into a 
dictionary.  The key would be the date & time string.

Then, if you need to find a particular entry you can look it up by date and 
time.  But I suspect, since you want column access, you won't need to do that.  
You can iterate through the entries in the dictionary easily and extract the 
data from a column, or from all the columns, if that’s what you want.

for entry in mydict:
value = entry.datalist[0]   # I hope I have the syntax correct


Now, what you do with value is up to you.  I think personally rather than 
building a list I would make a generator function.  A generator uses a "yield" 
statement to return a value, and it waits in that state.  The next time you 
call it it continues and returns the next value.  Kind of useful when the 
alternative is making and passing around huge lists.

--- Joseph S.

-Original Message-
From: DL Neil  
Sent: Saturday, January 12, 2019 4:48 PM
To: python-list@python.org
Subject: Re: Python read text file columnwise

On 12/01/19 1:03 PM, Piet van Oostrum wrote:
> shibashib...@gmail.com writes:
> 
>> Hello
>>>
>>> I'm very new in python. I have a file in the format:
>>>
>>> 2018-05-31  16:00:0028.90   81.77   4.3
>>> 2018-05-31  20:32:0028.17   84.89   4.1
>>> 2018-06-20  04:09:0027.36   88.01   4.8
>>> 2018-06-20  04:15:0027.31   87.09   4.7
>>> 2018-06-28  04.07:0027.87   84.91   5.0
>>> 2018-06-29  00.42:0032.20   104.61  4.8
>>
>> I would like to read this file in python column-wise.
>>
>> I tried this way but not working 
>>event_list = open('seismicity_R023E.txt',"r")
>>  info_event = read(event_list,'%s %s %f %f %f %f\n');


To the OP:

Python's standard I/O is based around data "streams". Whilst there is a concept 
of "lines" and thus an end-of-line character, there is not the idea of a 
record, in the sense of fixed-length fields and thus a defining and distinction 
between data items based upon position.

Accordingly, whilst the formatting specification of strings and floats might 
work for output, there is no equivalent for accepting input data. 
Please re-read refs on file, read, readline, etc.


> Why would you think that this would work?

To the PO:

Because in languages/libraries built around fixed-length files this is 
how one specifies the composition of fields making up a record - a data 
structure which dates back to FORTRAN and Assembler on mainframes and 
other magtape-era machines.

Whilst fixed-length records/files are, by definition, less flexible than 
the more free-form data input Python accepts, they are more efficient 
and faster in situations where the data (format) is entirely consistent 
- such as the OP is describing!


-- 
Regards =dn

-- 
https://mail.python.org/mailman/listinfo/python-list


RE: Python read text file columnwise

2019-01-12 Thread Avi Gross



-Original Message-
From: Avi Gross  
Sent: Saturday, January 12, 2019 8:26 PM
To: 'DL Neil' 
Subject: RE: Python read text file columnwise

I am not sure what the big deal is here. If the data is consistently
formatted you can read in a string per line and use offsets as in line[0:8]
and so on then call the right transformations to comvert them to dates and
so on. If it is delimited by something consistent like spaces or table or
commas, we have all kinds of solutions ranging from splitting the line on
the delimiter to using the kind of functionality that reads in such files
into a pandas DataFrame.

In the latter case, you get the columns already. In the former, there are
well known ways to extract the info such as:

[row[0] for row in listofrows]

And repeat for additional items.

Or am I missing something and there is no end of line and you need to read
in the entire file and split it into size N chunks first? Still fairly
straightforward.

-Original Message-
From: Python-list  On
Behalf Of DL Neil
Sent: Saturday, January 12, 2019 4:48 PM
To: python-list@python.org
Subject: Re: Python read text file columnwise

On 12/01/19 1:03 PM, Piet van Oostrum wrote:
> shibashib...@gmail.com writes:
> 
>> Hello
>>>
>>> I'm very new in python. I have a file in the format:
>>>
>>> 2018-05-31  16:00:0028.90   81.77   4.3
>>> 2018-05-31  20:32:0028.17   84.89   4.1
>>> 2018-06-20  04:09:0027.36   88.01   4.8
>>> 2018-06-20  04:15:0027.31   87.09   4.7
>>> 2018-06-28  04.07:0027.87   84.91   5.0
>>> 2018-06-29  00.42:0032.20   104.61  4.8
>>
>> I would like to read this file in python column-wise.
>>
>> I tried this way but not working 
>>event_list = open('seismicity_R023E.txt',"r")
>>  info_event = read(event_list,'%s %s %f %f %f %f\n');


To the OP:

Python's standard I/O is based around data "streams". Whilst there is a
concept of "lines" and thus an end-of-line character, there is not the idea
of a record, in the sense of fixed-length fields and thus a defining and
distinction between data items based upon position.

Accordingly, whilst the formatting specification of strings and floats might
work for output, there is no equivalent for accepting input data. 
Please re-read refs on file, read, readline, etc.


> Why would you think that this would work?

To the PO:

Because in languages/libraries built around fixed-length files this is how
one specifies the composition of fields making up a record - a data
structure which dates back to FORTRAN and Assembler on mainframes and other
magtape-era machines.

Whilst fixed-length records/files are, by definition, less flexible than the
more free-form data input Python accepts, they are more efficient and faster
in situations where the data (format) is entirely consistent
- such as the OP is describing!


--
Regards =dn
--
https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python read text file columnwise

2019-01-12 Thread DL Neil

On 12/01/19 1:03 PM, Piet van Oostrum wrote:

shibashib...@gmail.com writes:


Hello


I'm very new in python. I have a file in the format:

2018-05-31  16:00:0028.90   81.77   4.3
2018-05-31  20:32:0028.17   84.89   4.1
2018-06-20  04:09:0027.36   88.01   4.8
2018-06-20  04:15:0027.31   87.09   4.7
2018-06-28  04.07:0027.87   84.91   5.0
2018-06-29  00.42:0032.20   104.61  4.8


I would like to read this file in python column-wise.

I tried this way but not working 
   event_list = open('seismicity_R023E.txt',"r")
 info_event = read(event_list,'%s %s %f %f %f %f\n');



To the OP:

Python's standard I/O is based around data "streams". Whilst there is a 
concept of "lines" and thus an end-of-line character, there is not the 
idea of a record, in the sense of fixed-length fields and thus a 
defining and distinction between data items based upon position.


Accordingly, whilst the formatting specification of strings and floats 
might work for output, there is no equivalent for accepting input data. 
Please re-read refs on file, read, readline, etc.




Why would you think that this would work?


To the PO:

Because in languages/libraries built around fixed-length files this is 
how one specifies the composition of fields making up a record - a data 
structure which dates back to FORTRAN and Assembler on mainframes and 
other magtape-era machines.


Whilst fixed-length records/files are, by definition, less flexible than 
the more free-form data input Python accepts, they are more efficient 
and faster in situations where the data (format) is entirely consistent 
- such as the OP is describing!



--
Regards =dn
--
https://mail.python.org/mailman/listinfo/python-list


Silent data corruption in pandas, was Re: Python read text file columnwise

2019-01-12 Thread Peter Otten
Peter Otten wrote:

> shibashib...@gmail.com wrote:
> 
>> Hello
>>> 
>>> I'm very new in python. I have a file in the format:
>>> 
>>> 2018-05-31  16:00:0028.90   81.77   4.3
>>> 2018-05-31  20:32:0028.17   84.89   4.1
>>> 2018-06-20  04:09:0027.36   88.01   4.8
>>> 2018-06-20  04:15:0027.31   87.09   4.7
>>> 2018-06-28  04.07:0027.87   84.91   5.0
>>> 2018-06-29  00.42:0032.20   104.61  4.8
>> 
>> I would like to read this file in python column-wise.

> However, in the long term you may be better off with a tool like pandas:
> 
 import pandas
 pandas.read_table(
> ... "seismicity_R023E.txt", sep=r"\s+",
> ... names=["date", "time", "foo", "bar", "baz"],
> ... parse_dates=[["date", "time"]]
> ... )
> date_timefoo bar  baz
> 0 2018-05-31 16:00:00  28.90   81.77  4.3
> 1 2018-05-31 20:32:00  28.17   84.89  4.1
> 2 2018-06-20 04:09:00  27.36   88.01  4.8
> 3 2018-06-20 04:15:00  27.31   87.09  4.7
> 4 2018-06-28 04:00:00  27.87   84.91  5.0
> 5 2018-06-29 00:00:00  32.20  104.61  4.8
> 
> [6 rows x 4 columns]

> 
> It will be harder in the beginning, but if you work with tabular data
> regularly it will pay off.

After posting the above I noted that the malformed time in the last two rows 
was silently botched. So I just spent an insane amount of time to try and 
fix this from within pandas:

import datetime

import numpy
import pandas


def parse_datetime(dt):
return datetime.datetime.strptime(
dt.replace(".", ":"), "%Y-%m-%d %H:%M:%S"
)


def date_parser(dates, times):
return numpy.array([
parse_datetime(date + " " + time)
for date, time in zip(dates, times)
])

 
df = pandas.read_table(
"seismicity_R023E.txt", sep=r"\s+",
names=["date", "time", "foo", "bar", "baz"],
parse_dates=[["date", "time"]], date_parser=date_parser
)


print(df)

There's probably a better way as I am only a determined amateur...

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python read text file columnwise

2019-01-12 Thread Peter Otten
shibashib...@gmail.com wrote:

> Hello
>> 
>> I'm very new in python. I have a file in the format:
>> 
>> 2018-05-31   16:00:0028.90   81.77   4.3
>> 2018-05-31   20:32:0028.17   84.89   4.1
>> 2018-06-20   04:09:0027.36   88.01   4.8
>> 2018-06-20   04:15:0027.31   87.09   4.7
>> 2018-06-28   04.07:0027.87   84.91   5.0
>> 2018-06-29   00.42:0032.20   104.61  4.8
> 
> I would like to read this file in python column-wise.
> 
> I tried this way but not working 
>   event_list = open('seismicity_R023E.txt',"r")
> info_event = read(event_list,'%s %s %f %f %f %f\n');

There is actually a library that implements a C-like scanf. You can install 
it with

$ pip install scanf

After that:

$ cat read_table.py
from scanf import scanf

with open("seismicity_R023E.txt") as f:
for line in f:
print(
scanf("%s %s %f %f %f\n", line)
)
$ cat seismicity_R023E.txt 
2018-05-31  16:00:0028.90   81.77   4.3
2018-05-31  20:32:0028.17   84.89   4.1
2018-06-20  04:09:0027.36   88.01   4.8
2018-06-20  04:15:0027.31   87.09   4.7
2018-06-28  04.07:0027.87   84.91   5.0
2018-06-29  00.42:0032.20   104.61  4.8
$ python read_table.py 
('2018-05-31', '16:00:00', 28.9, 81.77, 4.3)
('2018-05-31', '20:32:00', 28.17, 84.89, 4.1)
('2018-06-20', '04:09:00', 27.36, 88.01, 4.8)
('2018-06-20', '04:15:00', 27.31, 87.09, 4.7)
('2018-06-28', '04.07:00', 27.87, 84.91, 5.0)
('2018-06-29', '00.42:00', 32.2, 104.61, 4.8)
$

However, in the long term you may be better off with a tool like pandas:

>>> import pandas
>>> pandas.read_table(
... "seismicity_R023E.txt", sep=r"\s+",
... names=["date", "time", "foo", "bar", "baz"],
... parse_dates=[["date", "time"]]
... )
date_timefoo bar  baz
0 2018-05-31 16:00:00  28.90   81.77  4.3
1 2018-05-31 20:32:00  28.17   84.89  4.1
2 2018-06-20 04:09:00  27.36   88.01  4.8
3 2018-06-20 04:15:00  27.31   87.09  4.7
4 2018-06-28 04:00:00  27.87   84.91  5.0
5 2018-06-29 00:00:00  32.20  104.61  4.8

[6 rows x 4 columns]
>>>

It will be harder in the beginning, but if you work with tabular data 
regularly it will pay off.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python read text file columnwise

2019-01-11 Thread Cameron Simpson

On 11Jan2019 12:43, shibashib...@gmail.com  wrote:

I'm very new in python. I have a file in the format:

2018-05-31  16:00:0028.90   81.77   4.3
2018-05-31  20:32:0028.17   84.89   4.1
2018-06-20  04:09:0027.36   88.01   4.8
2018-06-20  04:15:0027.31   87.09   4.7
2018-06-28  04.07:0027.87   84.91   5.0
2018-06-29  00.42:0032.20   104.61  4.8


It is unclear what delimits the columns, but it looks like whitespace: 
tabs and/or spaces.


You could read the file a line at a time and call .split() on each line 
to get the nonwhitespace fields, which would seem to correspond to the 
columns above.


That gets you an list of strings; then you could convert them for 
processing as required.


Cheers,
Cameron Simpson 
--
https://mail.python.org/mailman/listinfo/python-list


Re: Python read text file columnwise

2019-01-11 Thread Piet van Oostrum
shibashib...@gmail.com writes:

> Hello
>> 
>> I'm very new in python. I have a file in the format:
>> 
>> 2018-05-31   16:00:0028.90   81.77   4.3
>> 2018-05-31   20:32:0028.17   84.89   4.1
>> 2018-06-20   04:09:0027.36   88.01   4.8
>> 2018-06-20   04:15:0027.31   87.09   4.7
>> 2018-06-28   04.07:0027.87   84.91   5.0
>> 2018-06-29   00.42:0032.20   104.61  4.8
>
> I would like to read this file in python column-wise.  
>
> I tried this way but not working 
>   event_list = open('seismicity_R023E.txt',"r")
> info_event = read(event_list,'%s %s %f %f %f %f\n');

Why would you think that this would work?

See https://docs.python.org/3/library/csv.html

Something like:

#!/usr/bin/env python3

import csv

with open('testcsv.csv', newline='') as csvfile:
reader = csv.reader(csvfile, delimiter='\t')
for row in reader:
for i in range(2, 5):
row[i] = float(row[i])
print(row)

You could convert the first two columns to datetime format if you wish.
-- 
Piet van Oostrum 
WWW: http://piet.vanoostrum.org/
PGP key: [8DAE142BE17999C4]
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python read text file columnwise

2019-01-11 Thread Rich Shepard

On Fri, 11 Jan 2019, shibashib...@gmail.com wrote:


I'm very new in python. I have a file in the format:

2018-05-31  16:00:0028.90   81.77   4.3
2018-05-31  20:32:0028.17   84.89   4.1
2018-06-20  04:09:0027.36   88.01   4.8
2018-06-20  04:15:0027.31   87.09   4.7
2018-06-28  04.07:0027.87   84.91   5.0
2018-06-29  00.42:0032.20   104.61  4.8


  So? What do you want to do with it? Are the fields fixed length?
Tab-separated? Space-separated? Regardless of what you want to do, replace
all whitespace with commas.

Rich
--
https://mail.python.org/mailman/listinfo/python-list


Re: Python read text file columnwise

2019-01-11 Thread shibashibani
Hello
> 
> I'm very new in python. I have a file in the format:
> 
> 2018-05-3116:00:0028.90   81.77   4.3
> 2018-05-3120:32:0028.17   84.89   4.1
> 2018-06-2004:09:0027.36   88.01   4.8
> 2018-06-2004:15:0027.31   87.09   4.7
> 2018-06-2804.07:0027.87   84.91   5.0
> 2018-06-2900.42:0032.20   104.61  4.8

I would like to read this file in python column-wise.  

I tried this way but not working 
  event_list = open('seismicity_R023E.txt',"r")
info_event = read(event_list,'%s %s %f %f %f %f\n');
-- 
https://mail.python.org/mailman/listinfo/python-list


Python read text file columnwise

2019-01-11 Thread shibashibani
Hello,

I'm very new in python. I have a file in the format:

2018-05-31  16:00:0028.90   81.77   4.3
2018-05-31  20:32:0028.17   84.89   4.1
2018-06-20  04:09:0027.36   88.01   4.8
2018-06-20  04:15:0027.31   87.09   4.7
2018-06-28  04.07:0027.87   84.91   5.0
2018-06-29  00.42:0032.20   104.61  4.8
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: I try to edit and find data from a text file with python 2.7 (data from stock market)

2018-09-07 Thread Peter Otten
alon.naj...@gmail.com wrote:

> hi,
> 
> I try to edit a text file with python 2.7:
> 
> * AAPL *
> Date: September 07 2018
> Time: 14:10:52
> Price: ,068,407
> Ask: None
> High: None
> Low: None
> Previous Close: ,068,407
> Volume: $ 227.35 / $ 221.30
> Market Cap: 20.23

It looks like the author of the nasdaq_stock module was interested in a 
quick and dirty tool for personal needs rather than a clean library for 
general use.

The text quoted above is printed implicitly by the stock() function. To 
suppress it you have to modify that function or to redirect sys.stdout.

Another problem with the code is that it may swallow arbitrary exceptions. 
Therefore my error message below has to be vague.

> but when I write it to a file I get:
> {'previous_close': ',068,407', 'volume':
> {u'$\xa0227.35\xa0/\xa0$\xa0221.30', 'market_cap': '20.23', 'price':
> {',068,407', 'high': 'None', 'ask': 'None', 'low': 'None', 'time':
> {'14:15:45', 'date': 'September 07 2018', 'ticker': 'AAPL'}
> 
> why is that? 

If all goes well the stock function returns the above dict.

> and how do I get the price value only? so I will have only
> that in a variable? for example the variable: Price.

import sys
from nasdaq_stock import nasdaq_stock
 
stock_info = nasdaq_stock.stock('AAPL')

print
if stock_info:
price = stock_info["price"]
ticker = stock_info["ticker"]
print "Ticker:", ticker, "Price:", price
else:
print >> sys.stderr, "Something went wrong"


-- 
https://mail.python.org/mailman/listinfo/python-list


I try to edit and find data from a text file with python 2.7 (data from stock market)

2018-09-07 Thread alon . najman
hi,

I try to edit a text file with python 2.7:

* AAPL *
Date: September 07 2018
Time: 14:10:52
Price: ,068,407
Ask: None
High: None
Low: None
Previous Close: ,068,407
Volume: $ 227.35 / $ 221.30
Market Cap: 20.23

but when I write it to a file I get:
{'previous_close': ',068,407', 'volume': u'$\xa0227.35\xa0/\xa0$\xa0221.30', 
'market_cap': '20.23', 'price': ',068,407', 'high': 'None', 'ask': 'None', 
'low': 'None', 'time': '14:15:45', 'date': 'September 07 2018', 'ticker': 
'AAPL'}

why is that? and how do I get the price value only? so I will have only that in 
a variable? for example the variable: Price.

this is my code..
import csv
import itertools
from nasdaq_stock import nasdaq_stock

x=str(nasdaq_stock.stock('AAPL'))

with open("log.txt", "w") as text_file:
text_file.write(format(x))



with open('log.txt', 'r') as in_file:
lines = in_file.read().splitlines()
stripped = [line.replace(","," ").split() for line in lines]
grouped = itertools.izip(*[stripped]*1)
with open('log.csv', 'w') as out_file:
writer = csv.writer(out_file)
writer.writerow(('title', 'intro', 'tagline'))
for group in grouped:
writer.writerows(group)

#locate cell
import csv

def read_cell(x, y):
with open('log.csv', 'r') as f:
reader = csv.reader(f)
y_count = 0
for n in reader:
if y_count == y:
cell = n[x]
return cell
y_count += 1
#I try to find the value of Price..
print (read_cell(2,3)) 


-- 
https://mail.python.org/mailman/listinfo/python-list


[issue33580] Make binary/text file glossary entries follow most common "see also" style

2018-05-22 Thread Serhiy Storchaka

Change by Serhiy Storchaka :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33580] Make binary/text file glossary entries follow most common "see also" style

2018-05-22 Thread Andrés Delfino

Andrés Delfino  added the comment:

This one can be closed, right?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33580] Make binary/text file glossary entries follow most common "see also" style

2018-05-20 Thread Serhiy Storchaka

Serhiy Storchaka <storchaka+cpyt...@gmail.com> added the comment:


New changeset 4ecdc1110df211686a4406ba666a7f8106e0f618 by Serhiy Storchaka 
(Miss Islington (bot)) in branch '3.7':
bpo-33580: Make binary/text file glossary entries follow most common "see also" 
style. (GH-6991) (GH-7012)
https://github.com/python/cpython/commit/4ecdc1110df211686a4406ba666a7f8106e0f618


--

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue33580>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33580] Make binary/text file glossary entries follow most common "see also" style

2018-05-20 Thread miss-islington

miss-islington <mariatta.wijaya+miss-isling...@gmail.com> added the comment:


New changeset 983e9653e0584b65a6ec66543ce1631f888aa285 by Miss Islington (bot) 
in branch '3.6':
bpo-33580: Make binary/text file glossary entries follow most common "see also" 
style. (GH-6991)
https://github.com/python/cpython/commit/983e9653e0584b65a6ec66543ce1631f888aa285


--
nosy: +miss-islington

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue33580>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33580] Make binary/text file glossary entries follow most common "see also" style

2018-05-20 Thread miss-islington

Change by miss-islington :


--
pull_requests: +6663

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33580] Make binary/text file glossary entries follow most common "see also" style

2018-05-20 Thread miss-islington

Change by miss-islington :


--
pull_requests: +6662

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33580] Make binary/text file glossary entries follow most common "see also" style

2018-05-20 Thread Serhiy Storchaka

Serhiy Storchaka <storchaka+cpyt...@gmail.com> added the comment:


New changeset 0c4be82890858f874ff2158b0fcfdb8f261569c0 by Serhiy Storchaka 
(Andrés Delfino) in branch 'master':
bpo-33580: Make binary/text file glossary entries follow most common "see also" 
style. (GH-6991)
https://github.com/python/cpython/commit/0c4be82890858f874ff2158b0fcfdb8f261569c0


--
nosy: +serhiy.storchaka

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue33580>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33580] Make binary/text file glossary entries follow most common "see also" style

2018-05-19 Thread Andrés Delfino

Change by Andrés Delfino :


--
keywords: +patch
pull_requests: +6643
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue33580] Make binary/text file glossary entries follow most common "see also" style

2018-05-19 Thread Andrés Delfino

New submission from Andrés Delfino <adelf...@gmail.com>:

While most entries don't show "see also" as a separate block, binary/text file 
entries do.

I'm proposing to change this.

--
assignee: docs@python
components: Documentation
messages: 317135
nosy: adelfino, docs@python
priority: normal
severity: normal
status: open
title: Make binary/text file glossary entries follow most common "see also" 
style
versions: Python 3.6, Python 3.7, Python 3.8

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue33580>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32358] json.dump: fp must be a text file object

2018-03-23 Thread Berker Peksag

Berker Peksag  added the comment:

This is already documented in the json.dump() documentation:

The json module always produces str objects, not bytes objects.
Therefore, fp.write() must support str input.

Note that the traceback you've posted doesn't have anything to do with the json 
module and it's expected:

>>> f = open('/tmp/t.json', 'wb')
>>> f.write('foo')
Traceback (most recent call last):
  File "", line 1, in 
TypeError: a bytes-like object is required, not 'str'

--
nosy: +berker.peksag
resolution:  -> not a bug
stage:  -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue32358] json.dump: fp must be a text file object

2017-12-17 Thread TaoQingyun

New submission from TaoQingyun <845767...@qq.com>:

```
>>> import json
>>> f = open('/tmp/t.json', 'wb')  
>>> json.dump(123, f)  
Traceback (most recent call last): 
  File "", line 1, in   
  File "/usr/lib/python3.6/json/__init__.py", line 180, in dump 
  
fp.write(chunk)
TypeError: a bytes-like object is required, not 'str'
```

This may not a bug. But it should mention at docs 
https://docs.python.org/3/library/json.html#json.dump

--
components: Library (Lib)
messages: 308517
nosy: qingyunha
priority: normal
severity: normal
status: open
title: json.dump: fp must be a text file object
versions: Python 3.6

___
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue32358>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: program that search string in text file and do something

2017-08-06 Thread Rick Johnson
Grant Edwards wrote:
> Peter Otten <__pete...@web.de> wrote:
> 
> > What we won't do is write a program for you ready to
> > present to your teacher.
> 
> Or if we do, it will be subtly sabotaged in a manner that
> will make it obvious to an experienced Python programmer
> that you didn't write it.  My favorite is to use some
> combination of particularly obscure and obtuse mechanisms
> that will actually product the correct results but do so in
> a way that nobody (including myself a few days later) will
> be able to explain without some intense study.

I would caution the OP against blindly soliciting for free
homework answers on the internet, most especially when the
request includes the deletion of files. Perhaps in this
forum the worst case scenario would be the return of a
scornful rebuke or a heavily obfuscated chunk of abysmal
code. However, in some of the darker, more devious corners
of the web, a naive solicitation such as this could end with
the student's file-system suddenly being relieved of the
burden of carrying around all those extra files. A most
unforgiving lesson in the power and conciseness of the
recursive algorithm.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: program that search string in text file and do something

2017-08-04 Thread Grant Edwards
On 2017-08-04, Peter Otten <__pete...@web.de> wrote:

> What we won't do is write a program for you ready to present to your 
> teacher.

Or if we do, it will be subtly sabotaged in a manner that will make it
obvious to an experienced Python programmer that you didn't write it.

My favorite is to use some combination of particularly obscure and
obtuse mechanisms that will actually product the correct results but
do so in a way that nobody (including myself a few days later) will be
able to explain without some intense study.

-- 
Grant Edwards   grant.b.edwardsYow! It don't mean a
  at   THING if you ain't got
  gmail.comthat SWING!!

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: program that search string in text file and do something

2017-08-04 Thread ast


 a écrit dans le message de 
news:f705c092-de18-4c37-bde1-42316e8de...@googlegroups.com...

On Friday, August 4, 2017 at 12:27:02 PM UTC+3, ast wrote:

 a écrit dans le message de
news:b6cc4ee5-71be-4550-be3e-59ebeee7a...@googlegroups.com...



thanks man! that works


I hope it is not a school homework 


--
https://mail.python.org/mailman/listinfo/python-list


Re: program that search string in text file and do something

2017-08-04 Thread alon . najman
On Friday, August 4, 2017 at 12:27:02 PM UTC+3, ast wrote:
> <alon.naj...@gmail.com> a écrit dans le message de 
> news:b6cc4ee5-71be-4550-be3e-59ebeee7a...@googlegroups.com...
> > Hi, I'm new to thing forum and to this programming in python!
> >
> > can someone help me and write me how to write a program that do:
> > - search for a string in certain text file and if it founds the string it 
> > delete the file? and 
> > print something?
> >
> > thanks.
> 
> import os
> pattern = "azerty"
> found = False
> 
> with open("foo.txt", "r") as f:
> for line in f:
> if pattern in line:
> found = True
> break
> 
> if found:
> os.remove("foo.txt")

thanks man! that works
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: program that search string in text file and do something

2017-08-04 Thread Peter Otten
Peter Otten wrote:

> What we won't do is write a program for you ready to present to your
> teacher.

I should have known better :( 


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: program that search string in text file and do something

2017-08-04 Thread Peter Otten
alon.naj...@gmail.com wrote:

> Hi, I'm new to thing forum and to this programming in python!
> 
> can someone help me and write me how to write a program that do:
> - search for a string in certain text file and if it founds the string it
> delete the file? and print something?

Programming is mostly about splitting a complex task into baby steps.

Do you know how to open a file?
Do you know how to iterate over the lines of that file?
Do you know how to search for a string in that line (i. e. another string)?
Do you know how to delete a file?

Work through a Python tutorial or the first chapters of beginner's textbook, 
then try your hand at each of the subtasks outlined above. Consult the 
documentation for missing parts. 

Once you have some code we will help you improve or even fix it.

What we won't do is write a program for you ready to present to your 
teacher.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: program that search string in text file and do something

2017-08-04 Thread ast


<alon.naj...@gmail.com> a écrit dans le message de 
news:b6cc4ee5-71be-4550-be3e-59ebeee7a...@googlegroups.com...

Hi, I'm new to thing forum and to this programming in python!

can someone help me and write me how to write a program that do:
- search for a string in certain text file and if it founds the string it delete the file? and 
print something?


thanks.


import os
pattern = "azerty"
found = False

with open("foo.txt", "r") as f:
   for line in f:
   if pattern in line:
   found = True
   break

if found:
   os.remove("foo.txt")




--
https://mail.python.org/mailman/listinfo/python-list


program that search string in text file and do something

2017-08-04 Thread alon . najman
Hi, I'm new to thing forum and to this programming in python!

can someone help me and write me how to write a program that do:
- search for a string in certain text file and if it founds the string it 
delete the file? and print something?

thanks.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Convert text file data into RDF format through the Python

2017-05-01 Thread Grant Edwards
On 2017-05-01, Peter Pearson <pkpearson@nowhere.invalid> wrote:
> On Sat, 29 Apr 2017 10:06:12 -0700 (PDT), marsh <shashwatu...@gmail.com> 
> wrote:
>> Hi, 
>>
>> I would like to ask how can I convert text file data into RDF fromat.
> [snip]
>
> What is RDF format?

https://en.wikipedia.org/wiki/Resource_Description_Framework


-- 
Grant Edwards   grant.b.edwardsYow! The FALAFEL SANDWICH
  at   lands on my HEAD and I
  gmail.combecome a VEGETARIAN ...

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Convert text file data into RDF format through the Python

2017-05-01 Thread Peter Pearson
On Sat, 29 Apr 2017 10:06:12 -0700 (PDT), marsh <shashwatu...@gmail.com> wrote:
> Hi, 
>
> I would like to ask how can I convert text file data into RDF fromat.
[snip]

What is RDF format?

-- 
To email me, substitute nowhere->runbox, invalid->com.
-- 
https://mail.python.org/mailman/listinfo/python-list


Convert text file data into RDF format through the Python

2017-04-29 Thread marsh
Hi, 

I would like to ask how can I convert text file data into RDF fromat. data look 
like this: 
# sent_id = weblog-juancole.com_juancole_20051126063000_ENG_20051126_063000-0001
# text = Al-Zaman : American forces killed Shaikh Abdullah al-Ani, the preacher 
at the mosque in the town of Qaim, near the Syrian border.
1   Al  Al  PROPN   NNP Number=Sing 0   root_   
SpaceAfter=No
2   -   -   PUNCT   HYPH_   1   punct   _   
SpaceAfter=No
3   Zaman   Zaman   PROPN   NNP Number=Sing 1   flat_   
_
4   :   :   PUNCT   :   _   1   punct   _   _
5   AmericanamericanADJ JJ  Degree=Pos  6   
amod_   _

please suggest me
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Reading structured text file (non-CSV) into Pandas Dataframe

2017-04-13 Thread breamoreboy
On Thursday, April 13, 2017 at 11:09:23 AM UTC+1, David Shi wrote:
> http://www.ebi.ac.uk/ena/data/warehouse/search?query=%22geo_circ(-0.587,-90.5713,170)%22=sequence_release=text
> The above is a web link to a structured text file.  It is not a CSV.
> How can this text file be read into a Pandas Dataframe, so that further 
> processing can be made?
> Looking forward to hearing from you.
> Regards.
> David

http://pandas.pydata.org/pandas-docs/stable/io.html
-- 
https://mail.python.org/mailman/listinfo/python-list


Reading structured text file (non-CSV) into Pandas Dataframe

2017-04-13 Thread David Shi via Python-list
http://www.ebi.ac.uk/ena/data/warehouse/search?query=%22geo_circ(-0.587,-90.5713,170)%22=sequence_release=text
The above is a web link to a structured text file.  It is not a CSV.
How can this text file be read into a Pandas Dataframe, so that further 
processing can be made?
Looking forward to hearing from you.
Regards.
David
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Read a text file into a Pandas DataFrame Table

2017-04-13 Thread breamoreboy
On Thursday, April 13, 2017 at 9:15:18 AM UTC+1, David Shi wrote:
> Dear All,
> Can anyone help to read a text file into a Pandas DataFrame Table?
> Please see the link below.
> http://www.ebi.ac.uk/ena/data/warehouse/search?query=%22geo_circ(-0.587,-90.5713,170)%22=sequence_release=text
> 
> Regards.
> David

http://pandas.pydata.org/pandas-docs/stable/io.html#io-read-csv-table
-- 
https://mail.python.org/mailman/listinfo/python-list


Read a text file into a Pandas DataFrame Table

2017-04-13 Thread David Shi via Python-list
Dear All,
Can anyone help to read a text file into a Pandas DataFrame Table?
Please see the link below.
http://www.ebi.ac.uk/ena/data/warehouse/search?query=%22geo_circ(-0.587,-90.5713,170)%22=sequence_release=text

Regards.
David
-- 
https://mail.python.org/mailman/listinfo/python-list


[issue28246] Unable to read simple text file

2016-09-22 Thread Eryk Sun

Eryk Sun added the comment:

Codepage 1251 is a single-byte encoding and a superset of ASCII (i.e. ordinals 
0-127). UTF-8 is also a superset of ASCII, so there's no problem as long as the 
encoded text is strictly ASCII. But decoding non-ASCII UTF-8 as codepage 1251 
produces nonsense, otherwise known as mojibake. It happens that codepage 1251 
maps every one of the 256 possible byte values, except for 0x98 (152). The 
exception can't be made any clearer.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28246] Unable to read simple text file

2016-09-22 Thread AndreyTomsk

AndreyTomsk added the comment:

Thanks for quick reply. I'm new to python, just used tutorial docs and didn't 
read carefully enough to notice encoding info.

Still, IMHO behaviour not consistent. In three sequential symbols in russian 
alphabet - З, И, К, it crashes on И, and displays other in two-byte form.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28246] Unable to read simple text file

2016-09-22 Thread SilentGhost

SilentGhost added the comment:

It would be good to add a FAQ / HowTo entry for this question.

--
nosy: +SilentGhost

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28246] Unable to read simple text file

2016-09-22 Thread Eryk Sun

Eryk Sun added the comment:

The default encoding on your system is Windows codepage 1251. However, your 
file is encoded using UTF-8:

>>> lines = open('ResourceStrings.rc', 'rb').read().splitlines()
>>> print(*lines, sep='\n')
b'\xef\xbb\xbf\xd0\x90 (cyrillic A)'
b'\xd0\x98 (cyrillic I) <<< line read fails'
b'\xd0\x91 (cyrillic B)'

It even has a UTF-8 BOM (i.e. b'\xef\xbb\xbf'). You need to pass the encoding 
to built-in open():

>>> print(open('ResourceStrings.rc', encoding='utf-8').read())
А (cyrillic A)
И (cyrillic I) <<< line read fails
Б (cyrillic B)

--
nosy: +eryksun
resolution:  -> not a bug
stage:  -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue28246] Unable to read simple text file

2016-09-22 Thread AndreyTomsk

New submission from AndreyTomsk:

File read operation fails when gets specific cyrillic symbol. Tested with 
script:

testFile = open('ResourceStrings.rc', 'r')
for line in testFile:
print(line)


Exception message:
Traceback (most recent call last):
  File "min_test.py", line 6, in 
for line in testFile:
  File 
"C:\Users\afi\AppData\Local\Programs\Python\Python36\lib\encodings\cp1251.py", 
line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x98 in position 24: 
character maps to 

--
components: IO, Unicode, Windows
files: ResourceStrings.rc
messages: 277206
nosy: AndreyTomsk, ezio.melotti, haypo, paul.moore, steve.dower, tim.golden, 
zach.ware
priority: normal
severity: normal
status: open
title: Unable to read simple text file
type: behavior
versions: Python 3.5, Python 3.6
Added file: http://bugs.python.org/file44783/ResourceStrings.rc

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue28246>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: Python text file fetch specific part of line

2016-08-02 Thread honeygne
On Thursday, July 28, 2016 at 1:00:17 PM UTC+5:30, c...@zip.com.au wrote:
> On 27Jul2016 22:12, Arshpreet Singh  wrote:
> >I am writing Imdb scrapper, and getting available list of titles from IMDB 
> >website which provide txt file in very raw format, Here is the one part of 
> >file(http://pastebin.com/fpMgBAjc) as the file provides tags like 
> >Distribution  
> >Votes,Rank,Title I want to parse title names, I tried with readlines() 
> >method 
> >but it returns only list which is quite heterogeneous, is it possible that I 
> >can parse each value comes under title section?
> 
> Just for etiquette: please just post text snippets like that inline in your 
> text. Some people don't like fetching random URLs, and some of us are not 
> always online when reading and replying to email. Either way, having the text 
> in the message, especially when it is small, is preferable.
> 
> To your question:
> 
> Your sample text looks like this:
> 
> New  Distribution  Votes  Rank  Title
>   000125  1680661   9.2  The Shawshank Redemption (1994)
>   000125  1149871   9.2  The Godfather (1972)
>   000124  786433   9.0  The Godfather: Part II (1974)
>   000124  1665643   8.9  The Dark Knight (2008)
>   000133  860145   8.9  Schindler's List (1993)
>   000133  444718   8.9  12 Angry Men (1957)
>   000123  1317267   8.9  Pulp Fiction (1994)
>   000124  1209275   8.9  The Lord of the Rings: The Return of the 
> King 
> (2003)
>   000123  500803   8.9  Il buono, il brutto, il cattivo (1966)
>   000133  1339500   8.8  Fight Club (1999)
>   000123  1232468   8.8  The Lord of the Rings: The Fellowship of the 
> Ring (2001)
>   000223  832726   8.7  Star Wars: Episode V - The Empire Strikes 
> Back 
> (1980)
>   000233  1243066   8.7  Forrest Gump (1994)
>   000123  1459168   8.7  Inception (2010)
>   000223  1094504   8.7  The Lord of the Rings: The Two Towers (2002)
>   000232  676479   8.7  One Flew Over the Cuckoo's Nest (1975)
>   000232  724590   8.7  Goodfellas (1990)
>   000233  1211152   8.7  The Matrix (1999)
> 
> Firstly, I would suggest you not use readlines(), it pulls all the text into 
> memory. For small text like this is it ok, but some things can be arbitrarily 
> large, so it is something to avoid if convenient. Normally you can just 
> iterate 
> over a file and get lines.
> 
> You want "text under the Title." Looking at it, I would be inclined to say 
> that 
> the first line is a header and the rest consist of 4 columns: a number 
> (distribution?), a vote count, a rank and the rest (title plus year).
> 
> You can parse data like that like this (untested):
> 
>   # presumes `fp` is reading from the text
>   for n, line in enumerate(fp):
> if n == 0:
>   # heading, skip it
>   continue
> distnum, nvotes, rank, etc = split(line, 3)
> ... do stuff with the various fields ...
> 
> I hope that gets you going. If not, return with what code you have, what 
> happened, and what you actually wanted to happen and we may help further.
Thanks I am able to do it with following:
https://github.com/alberanid/imdbpy/blob/master/bin/imdbpy2sql.py (it was very 
helpful)

python imdbpy2sql.py -d <.txt files downloaded from IMDB> -u 
sqlite:/where/to/save/db --sqlite-transactions
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python text file fetch specific part of line

2016-07-29 Thread cs

On 29Jul2016 18:42, Gordon Levi  wrote:

c...@zip.com.au wrote:


On 28Jul2016 19:28, Gordon Levi  wrote:

Arshpreet Singh  wrote:

I am writing Imdb scrapper, and getting available list of titles from IMDB
website which provide txt file in very raw format, Here is the one part of
file(http://pastebin.com/fpMgBAjc) as the file provides tags like
Distribution  Votes,Rank,Title I want to parse title names, I tried with
readlines() method but it returns only list which is quite heterogeneous, is
it possible that I can parse each value comes under title section?


Beautiful Soup will make your task much easier
.


Did you look at his sample data?


No. I read he was "writing an IMDB scraper, and getting the available
list of titles from the IMDB web site". It's here
.


Plain text, not HTML or XML. Beautiful Soup is
not what he needs here.


Fortunately the OP told us his application rather than just telling us
his current problem. His life would be much easier if he ignored the
plain text he has obtained so far and started again using a Beautiful
Soup tutorial.


Or bypass IMDB's computer unfriendliness and go straight to http://omdbapi.com/

You can have JSON directly from it, and avoid BS entirely. BS is an amazing 
library, but is essentially a workaround for computer-hostile websites: those 
not providing clean machine readable data, and only unstable mutable HTML 
output.


Cheers,
Cameron Simpson 
--
https://mail.python.org/mailman/listinfo/python-list


Re: Python text file fetch specific part of line

2016-07-29 Thread Gordon Levi
c...@zip.com.au wrote:

>On 28Jul2016 19:28, Gordon Levi  wrote:
>>Arshpreet Singh  wrote:
>>>I am writing Imdb scrapper, and getting available list of titles from IMDB 
>>>website which provide txt file in very raw format, Here is the one part of 
>>>file(http://pastebin.com/fpMgBAjc) as the file provides tags like 
>>>Distribution  Votes,Rank,Title I want to parse title names, I tried with 
>>>readlines() method but it returns only list which is quite heterogeneous, is 
>>>it possible that I can parse each value comes under title section?
>>
>>Beautiful Soup will make your task much easier
>>.
>
>Did you look at his sample data?

No. I read he was "writing an IMDB scraper, and getting the available
list of titles from the IMDB web site". It's here
.  
> Plain text, not HTML or XML. Beautiful Soup is 
>not what he needs here.

Fortunately the OP told us his application rather than just telling us
his current problem. His life would be much easier if he ignored the
plain text he has obtained so far and started again using a Beautiful
Soup tutorial. 
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python text file fetch specific part of line

2016-07-28 Thread cs

On 28Jul2016 19:28, Gordon Levi  wrote:

Arshpreet Singh  wrote:
I am writing Imdb scrapper, and getting available list of titles from IMDB 
website which provide txt file in very raw format, Here is the one part of 
file(http://pastebin.com/fpMgBAjc) as the file provides tags like 
Distribution  Votes,Rank,Title I want to parse title names, I tried with 
readlines() method but it returns only list which is quite heterogeneous, is 
it possible that I can parse each value comes under title section?


Beautiful Soup will make your task much easier
.


Did you look at his sample data? Plain text, not HTML or XML. Beautiful Soup is 
not what he needs here.


Cheers,
Cameron Simpson 
--
https://mail.python.org/mailman/listinfo/python-list


Re: Python text file fetch specific part of line

2016-07-28 Thread Gordon Levi
Arshpreet Singh  wrote:

>I am writing Imdb scrapper, and getting available list of titles from IMDB 
>website which provide txt file in very raw format, Here is the one part of 
>file(http://pastebin.com/fpMgBAjc) as the file provides tags like Distribution 
> Votes,Rank,Title I want to parse title names, I tried with readlines() method 
>but it returns only list which is quite heterogeneous, is it possible that I 
>can parse each value comes under title section?

Beautiful Soup will make your task much easier
.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python text file fetch specific part of line

2016-07-28 Thread cs

On 27Jul2016 22:12, Arshpreet Singh  wrote:
I am writing Imdb scrapper, and getting available list of titles from IMDB 
website which provide txt file in very raw format, Here is the one part of 
file(http://pastebin.com/fpMgBAjc) as the file provides tags like Distribution  
Votes,Rank,Title I want to parse title names, I tried with readlines() method 
but it returns only list which is quite heterogeneous, is it possible that I 
can parse each value comes under title section?


Just for etiquette: please just post text snippets like that inline in your 
text. Some people don't like fetching random URLs, and some of us are not 
always online when reading and replying to email. Either way, having the text 
in the message, especially when it is small, is preferable.


To your question:

Your sample text looks like this:

   New  Distribution  Votes  Rank  Title
 000125  1680661   9.2  The Shawshank Redemption (1994)
 000125  1149871   9.2  The Godfather (1972)
 000124  786433   9.0  The Godfather: Part II (1974)
 000124  1665643   8.9  The Dark Knight (2008)
 000133  860145   8.9  Schindler's List (1993)
 000133  444718   8.9  12 Angry Men (1957)
 000123  1317267   8.9  Pulp Fiction (1994)
 000124  1209275   8.9  The Lord of the Rings: The Return of the King 
(2003)

 000123  500803   8.9  Il buono, il brutto, il cattivo (1966)
 000133  1339500   8.8  Fight Club (1999)
 000123  1232468   8.8  The Lord of the Rings: The Fellowship of the 
Ring (2001)
 000223  832726   8.7  Star Wars: Episode V - The Empire Strikes Back 
(1980)

 000233  1243066   8.7  Forrest Gump (1994)
 000123  1459168   8.7  Inception (2010)
 000223  1094504   8.7  The Lord of the Rings: The Two Towers (2002)
 000232  676479   8.7  One Flew Over the Cuckoo's Nest (1975)
 000232  724590   8.7  Goodfellas (1990)
 000233  1211152   8.7  The Matrix (1999)

Firstly, I would suggest you not use readlines(), it pulls all the text into 
memory. For small text like this is it ok, but some things can be arbitrarily 
large, so it is something to avoid if convenient. Normally you can just iterate 
over a file and get lines.


You want "text under the Title." Looking at it, I would be inclined to say that 
the first line is a header and the rest consist of 4 columns: a number 
(distribution?), a vote count, a rank and the rest (title plus year).


You can parse data like that like this (untested):

 # presumes `fp` is reading from the text
 for n, line in enumerate(fp):
   if n == 0:
 # heading, skip it
 continue
   distnum, nvotes, rank, etc = split(line, 3)
   ... do stuff with the various fields ...

I hope that gets you going. If not, return with what code you have, what 
happened, and what you actually wanted to happen and we may help further.


Cheers,
Cameron Simpson 
--
https://mail.python.org/mailman/listinfo/python-list


Python text file fetch specific part of line

2016-07-27 Thread Arshpreet Singh
I am writing Imdb scrapper, and getting available list of titles from IMDB 
website which provide txt file in very raw format, Here is the one part of 
file(http://pastebin.com/fpMgBAjc) as the file provides tags like Distribution  
Votes,Rank,Title I want to parse title names, I tried with readlines() method 
but it returns only list which is quite heterogeneous, is it possible that I 
can parse each value comes under title section?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: fastest way to read a text file in to a numpy array

2016-06-30 Thread Christian Gollwitzer

Am 30.06.16 um 17:49 schrieb Heli:

Dear all,

After a few tests, I think I will need to correct a bit my question. I will 
give an example here.

I have file 1 with 250 lines:
X1,Y1,Z1
X2,Y2,Z2


Then I have file 2 with 3M lines:
X1,Y1,Z1,value11,value12, value13,
X2,Y2,Z2,value21,value22, value23,...


I will need to interpolate values for the coordinates on file 1 from file 2. 
(using nearest)
I am using the scipy.griddata for this.

scipy.interpolate.griddata(points, values, xi, method='linear', fill_value=nan, 
rescale=False)


This constructs a Delaunay triangulation and no wonder takes some time 
if you run it over 3M datapoints. You can probably save a factor of 
three, because:



I need to repeat the griddata above to get interpolation for each of the column 
of values.


I think this is wrong. It should, according to the docs, happily 
interpolate from a 2D array of values. BTW, you stated you want nearest 
interpolation, but you chose "linear". I think it won't make a big 
difference on runtime, though. (nearest uses a KDtree, Linear uses QHull)



I was wondering if there are any ways to improve the time spent in 
interpolation.


Are you sure you need the full generality of this algorithm? i.e., are 
your values given on a scattered cloud of points in the 3D space, or 
maybe the X,Y,Z in file2 are in fact on a rectangular grid? In the 
former case, there is probably nothing you can really do. In the latter, 
there should be a more efficient algorithm by looking up the nearest 
index from X,Y,Z by index arithmetics. Or maybe even reshaping it into a 
3D-array.


Christian

--
https://mail.python.org/mailman/listinfo/python-list


Re: fastest way to read a text file in to a numpy array

2016-06-30 Thread Heli
Dear all, 

After a few tests, I think I will need to correct a bit my question. I will 
give an example here. 

I have file 1 with 250 lines:
X1,Y1,Z1
X2,Y2,Z2


Then I have file 2 with 3M lines:
X1,Y1,Z1,value11,value12, value13,
X2,Y2,Z2,value21,value22, value23,...


I will need to interpolate values for the coordinates on file 1 from file 2. 
(using nearest) 
I am using the scipy.griddata for this.  

scipy.interpolate.griddata(points, values, xi, method='linear', fill_value=nan, 
rescale=False)

When slicing the code, reading files in to numpy is not the culprit, but the 
griddata is. 

time to read file2= 2 min
time to interpolate= 48 min

I need to repeat the griddata above to get interpolation for each of the column 
of values. I was wondering if there are any ways to improve the time spent in 
interpolation. 


Thank you very much in advance for your help, 


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: fastest way to read a text file in to a numpy array

2016-06-28 Thread Cody Piersall
On Tue, Jun 28, 2016 at 8:45 AM, Heli  wrote:
> Hi,
>
> I need to read a file in to a 2d numpy array containing many number of lines.
> I was wondering what is the fastest way to do this?
>
> Is even reading the file in to numpy array the best method or there are 
> better approaches?
>

numpy.genfromtxt[1] is a pretty robust function for reading text files.

If you're generating the file from a numpy array already, you should
use arr.save()[2] and numpy.load()[3].

[1]: http://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html
[2]: http://docs.scipy.org/doc/numpy/reference/generated/numpy.save.html
[3]: http://docs.scipy.org/doc/numpy/reference/generated/numpy.load.html

Cody
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: fastest way to read a text file in to a numpy array

2016-06-28 Thread Michael Selik
On Tue, Jun 28, 2016 at 10:08 AM Hedieh Ebrahimi  wrote:

> File 1 has :
> x1,y1,z1
> x2,y2,z2
> 
>
> and file2 has :
> x1,y1,z1,value1
> x2,y2,z2,value2
> x3,y3,z3,value3
> ...
>
> I need to read the coordinates from file 1 and then interpolate a value
> for these coordinates on file 2 to the closest coordinate possible. The
> problem is file 2 is has around 5M lines. So I was wondering what would be
> the fastest approach?
>

Is this a one-time task, or something you'll need to repeat frequently?
How many points need to be interpolated?
How do you define distance? Euclidean 3d distance? K-nearest?

5 million can probably fit into memory, so it's not so bad.

NumPy is a good option for broadcasting the distance function across all 5
million labeled points for each unlabeled point. Given that file format,
NumPy can probably read from file directly into an array.

http://stackoverflow.com/questions/3518778/how-to-read-csv-into-record-array-in-numpy
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: fastest way to read a text file in to a numpy array

2016-06-28 Thread Michael Selik
On Tue, Jun 28, 2016 at 9:51 AM Heli  wrote:

> Is even reading the file in to numpy array the best method or there are
> better approaches?
>

What are you trying to accomplish?
Summary statistics, data transformation, analysis...?
-- 
https://mail.python.org/mailman/listinfo/python-list


fastest way to read a text file in to a numpy array

2016-06-28 Thread Heli
Hi, 

I need to read a file in to a 2d numpy array containing many number of lines. 
I was wondering what is the fastest way to do this?

Is even reading the file in to numpy array the best method or there are better 
approaches?

Thanks for your suggestions, 
-- 
https://mail.python.org/mailman/listinfo/python-list


[issue26737] csv.DictReader throws generic error when fieldnames is accessed for non-text file

2016-04-12 Thread Berker Peksag

Berker Peksag added the comment:

> The scenario is a web application allowing people to upload csv files, but 
> they can upload any files they like.

This looks like a potential security flaw in the application. The application 
should reject any non-CSV files from being uploaded (instead of relying on the 
CSV module).

Thanks for the report.

--
nosy: +berker.peksag
resolution:  -> not a bug
stage:  -> resolved
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26737] csv.DictReader throws generic error when fieldnames is accessed for non-text file

2016-04-12 Thread Bayo Opadeyi

Bayo Opadeyi added the comment:

Yes, the problem is that the file is not csv. The scenario is a web application 
allowing people to upload csv files, but they can upload any files they like.

--
status: pending -> open

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26737] csv.DictReader throws generic error when fieldnames is accessed for non-text file

2016-04-12 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


--
status: open -> pending

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26737] csv.DictReader throws generic error when fieldnames is accessed for non-text file

2016-04-11 Thread Josh Rosenberg

Josh Rosenberg added the comment:

This already behaves usefully in 3.5 where reading fieldnames from a DictReader 
wrapping a file opened in binary mode gets you:

_csv.Error: iterator should return strings, not bytes (did you open the file in 
text mode?)

And 2.7 is highly unlikely to make fit and finish fixes at this stage in the 
game.

That said, not sure what you'd expect in 2.7; standard open in binary mode is 
correct there, and you'd get str either way. Is the problem that it's not a CSV 
file in the first place? Because Python 2's csv isn't encoding aware; as long 
as it doesn't have embedded NULs, anything could be legitimate data (csv 
doesn't have the context to say that it should be latin-1, EBCDIC, or whatever).

--
nosy: +josh.r

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26737] csv.DictReader throws generic error when fieldnames is accessed for non-text file

2016-04-11 Thread Bayo Opadeyi

Changes by Bayo Opadeyi <bayokra...@gmail.com>:


--
title: csv.DictReader throws generic error when fieldnames is accessed on 
non-text file -> csv.DictReader throws generic error when fieldnames is 
accessed for non-text file

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue26737>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26737] csv.DictReader throws generic error when fieldnames is accessed on non-text file

2016-04-11 Thread Bayo Opadeyi

New submission from Bayo Opadeyi:

If you use the csv.DictReader to open a non-text file and try to access 
fieldnames on it, it crashes with a generic error instead of something specific.

--
messages: 263199
nosy: boyombo
priority: normal
severity: normal
status: open
title: csv.DictReader throws generic error when fieldnames is accessed on 
non-text file
versions: Python 2.7

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue26737>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: How can I export data from a website and write the contents to a text file?

2015-11-20 Thread Michael Torrie
On 11/19/2015 12:17 PM, Patrick Hess wrote:
> ryguy7272 wrote:
>> text_file = open("C:/Users/rshuell001/Desktop/excel/Text1.txt", "wb")
>> [...]
>> It doesn't seem like the '\n' is doing anything useful.  All the text is 
>> jumbled together.
>> [...]
>> I finally got it working.  It's like this:
>> "\r\n"
> 
> The better solution would be to open text files in actual text mode:
> 
> open("filename", "wb")   # binary mode
> open("filename", "w")# text mode
> 
> In text mode, the correct line-ending characters, which will vary
> depending on the operating system, are chosen automatically.

It's not just a matter of line endings. It's a matter of text encoding
also.  This is critical in Python3 where everything is unicode and
encoding is essential.  You have to to use the text mode when writing
files here, and it's also a good idea to specify what encoding you wish
to write with (UTF-8 is a good default).

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How can I export data from a website and write the contents to a text file?

2015-11-19 Thread Patrick Hess
ryguy7272 wrote:
> text_file = open("C:/Users/rshuell001/Desktop/excel/Text1.txt", "wb")
> [...]
> It doesn't seem like the '\n' is doing anything useful.  All the text is 
> jumbled together.
> [...]
> I finally got it working.  It's like this:
> "\r\n"

The better solution would be to open text files in actual text mode:

open("filename", "wb")   # binary mode
open("filename", "w")# text mode

In text mode, the correct line-ending characters, which will vary
depending on the operating system, are chosen automatically.

with open("test.txt", "w") as textfile:
textfile.write("line 1\n")
textfile.write("line 2")

This produces "line 1\nline 2" on Unix systems and "line 1\r\nline 2"
on Windows.

Also involves less typing this way. ;-)

Patrick
-- 
https://mail.python.org/mailman/listinfo/python-list


How can I export data from a website and write the contents to a text file?

2015-11-18 Thread ryguy7272
I'm trying the script below, and it simple writes the last line to a text file. 
 I want to add a '\n' after each line is written, so I don't overwrite all the 
lines.


from bs4 import BeautifulSoup
import urllib2

var_file = urllib2.urlopen("http://www.imdb.com/chart/top;)

var_html  = var_file.read()

var_file.close()
soup = BeautifulSoup(var_html)
for item in soup.find_all(class_='lister-list'):
for link in item.find_all('a'):
print(link)
text_file = open("C:/Users/rshuell001/Desktop/excel/Text1.txt", "wb")
z = str(link)
text_file.write(z + "\n")
text_file.write("\n")
text_file.close()


Can someone please help me get this working?
Thanks!!
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How can I export data from a website and write the contents to a text file?

2015-11-18 Thread Chris Angelico
On Thu, Nov 19, 2015 at 3:37 AM, ryguy7272  wrote:
>   text_file = open("C:/Users/rshuell001/Desktop/excel/Text1.txt", "wb")
> z = str(link)
> text_file.write(z + "\n")
> text_file.write("\n")
> text_file.close()

You're opening the file every time you go through the loop,
overwriting each time. Instead, open the file once, then start the
loop, and then close it at the end. You can use a 'with' statement to
do the closing for you, or you can do it the way you are here.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How can I export data from a website and write the contents to a text file?

2015-11-18 Thread ryguy7272
On Wednesday, November 18, 2015 at 11:58:17 AM UTC-5, Chris Angelico wrote:
> On Thu, Nov 19, 2015 at 3:37 AM, ryguy7272  wrote:
> >   text_file = open("C:/Users/rshuell001/Desktop/excel/Text1.txt", "wb")
> > z = str(link)
> > text_file.write(z + "\n")
> > text_file.write("\n")
> > text_file.close()
> 
> You're opening the file every time you go through the loop,
> overwriting each time. Instead, open the file once, then start the
> loop, and then close it at the end. You can use a 'with' statement to
> do the closing for you, or you can do it the way you are here.
> 
> ChrisA



Thanks.  What would the code look like?  I tried the code below, and got the 
same results.


for item in soup.find_all(class_='lister-list'):
for link in item.find_all('a'):
#print(link)
z = str(link)
text_file = open("C:/Users/rshuell001/Desktop/excel/Text1.txt", "wb")
text_file.write(z + "\n")
text_file.close()



-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How can I export data from a website and write the contents to a text file?

2015-11-18 Thread ryguy7272
On Wednesday, November 18, 2015 at 12:41:19 PM UTC-5, ryguy7272 wrote:
> On Wednesday, November 18, 2015 at 12:21:47 PM UTC-5, Denis McMahon wrote:
> > On Wed, 18 Nov 2015 08:37:47 -0800, ryguy7272 wrote:
> > 
> > > I'm trying the script below...
> > 
> > The problem isn't that you're over-writing the lines (although it may 
> > seem that way to you), the problem is that you're overwriting the whole 
> > file every time you write a link to it. This is because you open and 
> > close the file for every link you write, and you do so in file mode "wb" 
> > which restarts writing at the first byte of the file every time.
> > 
> > You only need to open and close the text file once, instead of for every 
> > link you output. Try moving the lines to open and close the file outside 
> > the outer for loop to change the loop from:
> > 
> > for item in soup.find_all(class_='lister-list'):
> > for link in item.find_all('a'):
> > # open file
> > # write link to file
> > # close file
> > 
> > to:
> > 
> > # open file
> > for item in soup.find_all(class_='lister-list'):
> > for link in item.find_all('a'):
> > # write link to file
> > # close file
> > 
> > Alternatively, use the with form:
> > 
> > with open("blah","wb") as text_file:
> > for item in soup.find_all(class_='lister-list'):
> > for link in item.find_all('a'):
> > # write link to file
> > 
> > -- 
> > Denis McMahon, 
> 
> 
> Yes, I just figured it out.  Thanks.  
> 
> It doesn't seem like the '\n' is doing anything useful.  All the text is 
> jumbled together.  When I open the file in Excel, or Notepad++, it is easy to 
> read.  However, when I open it in as a regular text file, everything is 
> jumbled together.  Is there an easy way to fix this?

I finally got it working.  It's like this:
"\r\n"

Thanks everyone!!
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How can I export data from a website and write the contents to a text file?

2015-11-18 Thread Denis McMahon
On Wed, 18 Nov 2015 08:37:47 -0800, ryguy7272 wrote:

> I'm trying the script below...

The problem isn't that you're over-writing the lines (although it may 
seem that way to you), the problem is that you're overwriting the whole 
file every time you write a link to it. This is because you open and 
close the file for every link you write, and you do so in file mode "wb" 
which restarts writing at the first byte of the file every time.

You only need to open and close the text file once, instead of for every 
link you output. Try moving the lines to open and close the file outside 
the outer for loop to change the loop from:

for item in soup.find_all(class_='lister-list'):
for link in item.find_all('a'):
# open file
# write link to file
# close file

to:

# open file
for item in soup.find_all(class_='lister-list'):
for link in item.find_all('a'):
# write link to file
# close file

Alternatively, use the with form:

with open("blah","wb") as text_file:
for item in soup.find_all(class_='lister-list'):
for link in item.find_all('a'):
# write link to file

-- 
Denis McMahon, denismfmcma...@gmail.com
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How can I export data from a website and write the contents to a text file?

2015-11-18 Thread ryguy7272
On Wednesday, November 18, 2015 at 12:21:47 PM UTC-5, Denis McMahon wrote:
> On Wed, 18 Nov 2015 08:37:47 -0800, ryguy7272 wrote:
> 
> > I'm trying the script below...
> 
> The problem isn't that you're over-writing the lines (although it may 
> seem that way to you), the problem is that you're overwriting the whole 
> file every time you write a link to it. This is because you open and 
> close the file for every link you write, and you do so in file mode "wb" 
> which restarts writing at the first byte of the file every time.
> 
> You only need to open and close the text file once, instead of for every 
> link you output. Try moving the lines to open and close the file outside 
> the outer for loop to change the loop from:
> 
> for item in soup.find_all(class_='lister-list'):
> for link in item.find_all('a'):
> # open file
> # write link to file
> # close file
> 
> to:
> 
> # open file
> for item in soup.find_all(class_='lister-list'):
> for link in item.find_all('a'):
> # write link to file
> # close file
> 
> Alternatively, use the with form:
> 
> with open("blah","wb") as text_file:
> for item in soup.find_all(class_='lister-list'):
> for link in item.find_all('a'):
> # write link to file
> 
> -- 
> Denis McMahon, 


Yes, I just figured it out.  Thanks.  

It doesn't seem like the '\n' is doing anything useful.  All the text is 
jumbled together.  When I open the file in Excel, or Notepad++, it is easy to 
read.  However, when I open it in as a regular text file, everything is jumbled 
together.  Is there an easy way to fix this?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How can I export data from a website and write the contents to a text file?

2015-11-18 Thread ryguy7272
On Wednesday, November 18, 2015 at 12:04:16 PM UTC-5, ryguy7272 wrote:
> On Wednesday, November 18, 2015 at 11:58:17 AM UTC-5, Chris Angelico wrote:
> > On Thu, Nov 19, 2015 at 3:37 AM, ryguy7272 <> wrote:
> > >   text_file = open("C:/Users/rshuell001/Desktop/excel/Text1.txt", 
> > > "wb")
> > > z = str(link)
> > > text_file.write(z + "\n")
> > > text_file.write("\n")
> > > text_file.close()
> > 
> > You're opening the file every time you go through the loop,
> > overwriting each time. Instead, open the file once, then start the
> > loop, and then close it at the end. You can use a 'with' statement to
> > do the closing for you, or you can do it the way you are here.
> > 
> > ChrisA
> 
> 
> 
> Thanks.  What would the code look like?  I tried the code below, and got the 
> same results.
> 
> 
> for item in soup.find_all(class_='lister-list'):
> for link in item.find_all('a'):
> #print(link)
> z = str(link)
> text_file = open("C:/Users/rshuell001/Desktop/excel/Text1.txt", "wb")
> text_file.write(z + "\n")
> text_file.close()


Oh, I see, it's like this:

text_file = open("C:/Users/rshuell001/Desktop/excel/Text1.txt", "wb")
var_file.close()
soup = BeautifulSoup(var_html)
for item in soup.find_all(class_='lister-list'):
for link in item.find_all('a'):
#print(link)
z = str(link)
text_file.write(z + "\n")
text_file.close()


However, it's not organized very well, and it's hard to read.  I thought the 
'\n' would create a new line after one line was written.  Now, it seems like 
everything is jumbled together.  Kind of weird.  Am I missing something?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How can I export data from a website and write the contents to a text file?

2015-11-18 Thread Rob Gaddi
On Wed, 18 Nov 2015 09:40:58 -0800, ryguy7272 wrote:
> 
> It doesn't seem like the '\n' is doing anything useful.  All the text is
> jumbled together.  When I open the file in Excel, or Notepad++, it is
> easy to read.  However, when I open it in as a regular text file,
> everything is jumbled together.  Is there an easy way to fix this?

You're suffering cause-effect inversion.  Windows default Notepad is a 
fundamentally crippled text editor that only knows how to handle Windows/
DOS style text files, where the line endings is '\n\r'.  Notepad++, along 
with many other excellent editors available for Windows, is smart enough 
to figure out from the file whether it's Windows style or UNIX style, 
where line endings are just a bare '\n'.

So the problem wasn't with what you were writing, it's with how you 
define "open it as a regular text file".  On my Windows machine I long 
ago switched the default editor to Notepad++ for everything and was far 
happier for it.

-- 
Rob Gaddi, Highland Technology -- www.highlandtechnology.com
Email address domain is currently out of order.  See above to fix.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How can I export data from a website and write the contents to a text file?

2015-11-18 Thread Random832
ryguy7272  writes:
> text_file = open("C:/Users/rshuell001/Desktop/excel/Text1.txt", "wb")

Remove the "b" from this line. This is causing it to omit the
platform-specific translation of "\n", which means some Windows
applications will not recognize the line endings.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Irregular last line in a text file, was Re: Regular expressions

2015-11-04 Thread Tim Chase
On 2015-11-04 14:39, Steven D'Aprano wrote:
> On Wednesday 04 November 2015 03:56, Tim Chase wrote:
>> Or even more valuable to me:
>> 
>>   with open(..., newline="strip") as f:
>> assert all(not line.endswith(("\n", "\r")) for line in f)
> 
> # Works only on Windows text files.
> def chomp(lines):
> for line in lines:
> yield line.rstrip('\r\n')

.rstrip() takes a string that is a set of characters, so it will
remove any \r or \n at the end of the string (so it works with
both Windows & *nix line-endings) whereas just using .rstrip()
without a parameter can throw away data you might want:

  >>> "hello \r\n\r\r\n\n\n".rstrip("\r\n")
  'hello '
  >>> "hello \r\n\r\r\n\n\n".rstrip()
  'hello'

-tkc




-- 
https://mail.python.org/mailman/listinfo/python-list


  1   2   3   4   5   6   7   8   9   10   >