Re: [Tutor] Finding line number by offset value.

2016-02-22 Thread Peter Otten
Steven D'Aprano wrote:

> On Mon, Feb 22, 2016 at 01:41:42AM +, Alan Gauld wrote:
>> On 21/02/16 19:32, Cody West wrote:
> 
>> > I'm trying to take 48L, which I believe is the character number, and
>> > get the line number from that.

The documentation isn't explicit, but

"""
with open('/foo/bar/my_file', 'rb') as f:
  matches = rules.match(data=f.read())
"""

suggests that the library operates on bytes, not characters.

>> I'm not totally clear what you mean but, if it is that 48L
>> is the character count from the start of the file and you
>> want to know the line number then you need to count the
>> number of \n characters between the first and 48th
>> characters.
>> 
>> But thats depending on your line-end system of course,
>> there may be two characters on each EOL...
> 
> Provided your version of Python is built with "universal newline
> support", and nearly every Python is, then if you open the file in text
> mode, all end-of-lines are automatically converted to \n on reading.

Be careful, *if* the numbers are byte offsets and you open the file in 
universal newlines mode or text mode your results will be unreliable.

> If the file is small enough to read all at once, you can do this:

> offset = 48
> text = the_file.read(offset)
> print text.count('\n')

It's the offset that matters, not the file size; the first 48 bytes of a 
terabyte file will easily fit into the memory of your Apple II ;)


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Finding line number by offset value.

2016-02-21 Thread Steven D'Aprano
On Mon, Feb 22, 2016 at 01:41:42AM +, Alan Gauld wrote:
> On 21/02/16 19:32, Cody West wrote:

> > I'm trying to take 48L, which I believe is the character number, and get
> > the line number from that.
> 
> I'm not totally clear what you mean but, if it is that 48L
> is the character count from the start of the file and you
> want to know the line number then you need to count the
> number of \n characters between the first and 48th
> characters.
> 
> But thats depending on your line-end system of course,
> there may be two characters on each EOL... 

Provided your version of Python is built with "universal newline 
support", and nearly every Python is, then if you open the file in text 
mode, all end-of-lines are automatically converted to \n on reading.

If the file is small enough to read all at once, you can do this:

offset = 48
text = the_file.read(offset)
print text.count('\n')


to print a line number starting from 0.

If the file is too big to read all at once, you can do this:

# untested
running_total = 0
line_num = -1
offset = 4800  # say
for line in the_file:
running_total += len(line)
line_num += 1
if running_total >= offset:
print line_num
break


-- 
Steve
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Finding line number by offset value.

2016-02-21 Thread Alan Gauld
On 21/02/16 19:32, Cody West wrote:
> I'm using yara-python for some file scanning and I'm trying to take the
> offset in the 'strings' field and turn it into a line number.

I know nothing about yara except that its some kind of
pattern matching engine. However...

> (48L, '$execution', 'eval(base64_decode')
> 
> I'm trying to take 48L, which I believe is the character number, and get
> the line number from that.

I'm not totally clear what you mean but, if it is that 48L
is the character count from the start of the file and you
want to know the line number then you need to count the
number of \n characters between the first and 48th
characters.

But thats depending on your line-end system of course,
there may be two characters on each EOL... It depends
on your OS/version and possibly the character encoding
too. And if it's a binary file, who knows, as there
won't really be any line endings. And, as I understand
it, yara is targeted at reading byte patterns from
binary files?

Are you sure you really need the line number?


-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor