Re: How can I verify if the regex exist in a file without reading ?

2018-06-15 Thread francois . rabanel
Le vendredi 15 juin 2018 12:36:40 UTC+2, Steven D'Aprano a écrit :
> On Fri, 15 Jun 2018 01:01:03 -0700, francois.rabanel wrote:
> 
> > I work with a file which contains millions lines, a simply file.read()
> > and I'm running out of memory
> 
> Assuming each line is on average a hundred characters long, a million 
> lines is (approximately) 100 MB. Even on a computer with only 2GB of 
> memory, you should be able to read 100 MB.
> 
> But you shouldn't: it is much better to process the file line by line.
> 
> 
> # Don't do this:
> with open(pathname) as f:
> text = f.read()  # Slurp the entire file into memory at once.
> ...
> 
> # Do this instead
> with open(pathname) as f:
> for line in f:
> # process one line at a time
> 
> 
> You said you are running out of memory, earlier you said the computer was 
> crashing... please describe exactly what happens. If you get a Traceback, 
> copy and paste the entire message.
> 
> (Not just the last line.)
> 
> 
> 
> 
> -- 
> Steven D'Aprano
> "Ever since I learned about confirmation bias, I've been seeing
> it everywhere." -- Jon Ronson

I resolve my problem and when I look to my solution I don't understand why I 
didn't do it earlier :)


with open(path) as file:
  result = []
  for line in file:
find_regex  = re.search(regex,line)
if find_regex:
  result.append(find_regex.group())
  if len(result) == 0:
sys.exit('Regex not found')
  elif result[0] == '':
sys.exit('Whitespace as regex don\'t work')


I was looking for a way to check if the regex's user was correct or not
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How can I verify if the regex exist in a file without reading ?

2018-06-15 Thread Steven D'Aprano
On Fri, 15 Jun 2018 01:01:03 -0700, francois.rabanel wrote:

> I work with a file which contains millions lines, a simply file.read()
> and I'm running out of memory

Assuming each line is on average a hundred characters long, a million 
lines is (approximately) 100 MB. Even on a computer with only 2GB of 
memory, you should be able to read 100 MB.

But you shouldn't: it is much better to process the file line by line.


# Don't do this:
with open(pathname) as f:
text = f.read()  # Slurp the entire file into memory at once.
...

# Do this instead
with open(pathname) as f:
for line in f:
# process one line at a time


You said you are running out of memory, earlier you said the computer was 
crashing... please describe exactly what happens. If you get a Traceback, 
copy and paste the entire message.

(Not just the last line.)




-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How can I verify if the regex exist in a file without reading ?

2018-06-15 Thread francois . rabanel
Le vendredi 15 juin 2018 02:42:12 UTC+2, Cameron Simpson a écrit :
> On 15Jun2018 00:24, Steven D'Aprano  
> wrote:
> >On Fri, 15 Jun 2018 10:00:59 +1000, Cameron Simpson wrote:
> >> Francois, unless your regex can cross multiple lines it is better to
> >> search files like this:
> >>
> >>   with open(the_filename) as f:
> >> for line in f:
> >>   ... search the line for the regexp ...
> >>
> >> That way you only need to keep one line at a time in memory.
> >
> >That's what François is doing.
> 
> Urr, so he is. Then like you, I don't know why he's concerned about running 
> out 
> of memory. Unless it hasn't been made clear the Python will free up unused 
> memory on its own.
> 
> Cheers,
> Cameron Simpson 


Thanks you a lot for all your tips ! They helps me a lot :) 
I'm beginner in python, this is an excercise I'm trying to realise.

I gonna read more about exceptions and .splitex() !

I work with a file which contains millions lines, a simply file.read() and I'm 
running out of memory


François
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How can I verify if the regex exist in a file without reading ?

2018-06-14 Thread Cameron Simpson

On 15Jun2018 00:24, Steven D'Aprano  
wrote:

On Fri, 15 Jun 2018 10:00:59 +1000, Cameron Simpson wrote:

Francois, unless your regex can cross multiple lines it is better to
search files like this:

  with open(the_filename) as f:
for line in f:
  ... search the line for the regexp ...

That way you only need to keep one line at a time in memory.


That's what François is doing.


Urr, so he is. Then like you, I don't know why he's concerned about running out 
of memory. Unless it hasn't been made clear the Python will free up unused 
memory on its own.


Cheers,
Cameron Simpson 
--
https://mail.python.org/mailman/listinfo/python-list


Re: How can I verify if the regex exist in a file without reading ?

2018-06-14 Thread Steven D'Aprano
On Fri, 15 Jun 2018 10:00:59 +1000, Cameron Simpson wrote:

> Francois, unless your regex can cross multiple lines it is better to
> search files like this:
> 
>   with open(the_filename) as f:
> for line in f:
>   ... search the line for the regexp ...
> 
> That way you only need to keep one line at a time in memory.

That's what François is doing.


> Importantly:
> 
>   os.rename(path, new_filename)
> 
> The old name comes first, then the new name.

Oops! I forgot about that.


-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How can I verify if the regex exist in a file without reading ?

2018-06-14 Thread Cameron Simpson

On 14Jun2018 16:54, Steven D'Aprano  
wrote:

On Thu, 14 Jun 2018 09:26:44 -0700, francois.rabanel wrote:

My problem is, if I work on a huge file, I'll try to avoid to read the
file because it will be crash my computer :)


How does reading a file crash your computer?


Likely because he tried to read the whole file into memory and match against 
it. Guessing:


 text = open(the_filename).read()
 ... search the text for the regexp ...

Francois, unless your regex can cross multiple lines it is better to search 
files like this:


 with open(the_filename) as f:
   for line in f:
 ... search the line for the regexp ...

That way you only need to keep one line at a time in memory.


except OSError:
  print("Permission denied")


That's not what OSError means. OSError can mean many different things.
That's why it isn't called "PermissionDeniedError".

You need to look at the exception to see what caused it, not just assume
it was a permissions error.


except IOError:
  print("This file doesn't exist")


That's not what IOError means either. That is why it isn't called
FileDoesntExistError. Again, you need to look at the exception to see
what the error actually is.


In particular, you should always _either_ inspect the exception to see what 
went wrong and handle it, _or_ include the exception text in your error 
message, for example:


 except IOError as e:
   print("IO Error on file:", e)

That way an unhandled exception gets reported.


else:
  os.rename(new_filename, filename + 'txt')


os.rename(new_filename, path)


Importantly:

 os.rename(path, new_filename)

The old name comes first, then the new name.

Also, you might want to ensure that new_filename doesn't already exist...

Cheers,
Cameron Simpson 
--
https://mail.python.org/mailman/listinfo/python-list


Re: How can I verify if the regex exist in a file without reading ?

2018-06-14 Thread Steven D'Aprano
On Thu, 14 Jun 2018 09:26:44 -0700, francois.rabanel wrote:

> Hi,
> 
> Here is my script :
> 
> It propose to replace some words in a file with a regular expression. It
> create a copy to write on it, and if there isn't an error, it delete the
> original by the copy with "os.rename" at the end.
> 
> My problem is, if I work on a huge file, I'll try to avoid to read the
> file because it will be crash my computer :) 

How does reading a file crash your computer?


> and I would to verify if the regex enter by the user, exist.

The only way to know if a regex matches the file is to try to match the 
regex and see if it matches.


[...]

> import re
> import os
> 
> try:
> 
>   path = raw_input('Please enter the path of your file that you want to
>   correct : \n')
>   print("")
>   print('Which regex ? \n')
>   regex = raw_input('- : ')
>   print('By what ? \n')
>   new_word = raw_input('- : ')
> 

Don't do this:

>   # Creating copy file
>   filenames_regex = re.findall(r'[a-zA-Z0-9]+\.', path)
>   filename = filenames_regex[len(filenames_regex)-1] 
>   new_filename = filename + 'copy.txt'

Do this instead:

filename, extension = os.path.splitext(path)
new_filename = filename + '.copy' + extension


>   # Replace regex by new word line by line on copy file
>   with open(path) as rf, open(new_filename, 'w') as wf:
> for line in rf:
>   wf.write(re.sub(regex, new_word, line))
> 
> except OSError:
>   print("Permission denied")

That's not what OSError means. OSError can mean many different things. 
That's why it isn't called "PermissionDeniedError".

You need to look at the exception to see what caused it, not just assume 
it was a permissions error.

> except IOError:
>   print("This file doesn't exist")

That's not what IOError means either. That is why it isn't called 
FileDoesntExistError. Again, you need to look at the exception to see 
what the error actually is.

> else:
>   os.rename(new_filename, filename + 'txt')

os.rename(new_filename, path)





-- 
Steven D'Aprano
"Ever since I learned about confirmation bias, I've been seeing
it everywhere." -- Jon Ronson

-- 
https://mail.python.org/mailman/listinfo/python-list